Why Uber migrated its databases from Postgres to MySQL?


Uber has been in the news for numerous reasons in the past few days. Be it expansion to new countries or selling its China operations to Didi chuxing, Uber is growing exponentially and expanding into newer markets. Recently, Uber also announced a major change – Changing their databases from Postgres to MySQL. While enterprises are constantly checking and trying to find the right fit for their databases, it takes immense research and analysis to decide on THE one.

Each enterprise has their own requirements and it is imperative for the company to decide on the database that suits their needs. This is exactly what Uber did. Uber recently shared a blog detailing various issues they faced using Postgres and what prompted them to make the change to MySQL.

Limitations with Postgres

One of the main limitations faced by Uber was its inability to optimize their upgrading process which resulted in downtime. And we all agree to the fact that, downtime is the biggest enemy of a growing enterprise. Some of the key issues faced by Uber include:

  • Inefficient architecture for writes: A relational database must perform certain key tasks that Postgres were not suited for Uber’s needs.
    • Providing efficient insert/update/delete capabilities
    • Providing capabilities for making schema changes
    • Implementing a MVCC mechanism so that different connections have a transactional view of the data they work with.
  • Inefficient data replication: Postgres’ replication required higher bandwidth as compared to MySQL but Postgres replication may not be a problem within a single data center. The real problem with Postgres arises when replication is done between multiple data centers.
  • Issues with table corruption: During one of Uber’s routine master database promotion to increase database capacity, they encountered a Postgres 9.2 bug. This resulted in numerous incorrect timeline switches on replicas that were being undertaken.
  • Poor replica MVCC support: Postgres does not have true replica MVCC support. The fact that replicas apply WAL updates results in having a copy of on-disk data identical to the master at any given point in time.
  • Difficulty upgrading to newer releases: With different general available releases of Postgres, it was not possible to replicate the data. Replication records work at the physical level which made it difficult for Uber.

All these factors affected Uber in a number of ways, both from the enterprise as well as engineering view-point. Postgres required high bandwidth that resulted in higher incurred cost and time to the company. Data corruption can have a critical impact on the enterprise and one cannot afford such an issue with data. Moreover, upgrading to newer releases meant long downtime and time consumption which affects the timeframe for scaling up for a growing enterprise like Uber.

Postgres worked for Uber during the early days. However, it was not the most appropriate match while scaling up their operations. MySQL, on the other hand, is more sustainable and helps in better management of large databases.

What does MySQL bring to the table?

MySQL helps overcome these critical limitations that have triggered Uber’s decision to shift its database.

InnoDB On-Disk Representation

There are a number of architectural differences but the most important of them all is that Postgres directly maps index records to on-disk locations but InnoDB maintains a secondary structure. InnoDB secondary index records hold a pointer to the primary key value and a secondary index in MySQL associates index keys with primary keys. All in all, this requires less number of updates for indexes. Moreover, the data is normalized which optimizes the key lookup and the row updates only need to update index records that are actually changed by the row update.

The Undo Log was used which helped Uber engineering when old transactions were needed to reference a row for the purposes of MVCC. The architecture helped them make vacuuming and compaction more efficient for their databases.

Replication

MySQL supports multiple replication modes:

  • Statement-based replication replicates logical SQL statements
  • Row-based replication replicates altered row records
  • Mixed replication mixes these two modes mentioned above.

The biggest advantage it provided was almost zero downtime while updating replicas to newer versions.

Caching (The Buffer Pool)

Caching is done differently in both the databases. Postgres accesses data via the page cache which is expensive to certain degree as compared to accessing RSS memory. InnoDB storage engine implements it’s own LRU in InnoDB buffer pool, which has massive upsides:

  • Possibility of implementing a custom LRU.
  • It will result in fewer context switches.

Conclusion:

We feel there is always a trade-off between different databases. Primarily, it depends on the company’s specific requirements and the scale at which their business operates. For most cases, we feel that MySQL is the most scalable and secure database.

Over the years, companies have tried their hands on various databases for their business operations. However, many come back to MySQL due to its core advantages as mentioned above. Hence, we do believe that MySQL is here to stay, making it a perfect choice for emerging enterprises.

What do you think? Let us know your thoughts in the comments section below.

MONyog is a leading monitoring tool that helps you secure your servers and scale up. You can download our 14-day free trial here.

+ Leave a Comment