How Olery Replaced MongoDB with PostgreSQL: Lessons in Database Migration
Olery’s journey from a mixed MySQL‑MongoDB setup to a PostgreSQL‑centric architecture reveals the pitfalls of schema‑less storage, the criteria for a robust database, a three‑stage migration process, and measurable performance gains across its services.
Olery, founded about five years ago, initially ran two databases: MySQL for core data such as users and address books, and MongoDB for reviews and similar content. As the company grew, the MongoDB layer caused severe operational problems, prompting a complete migration to PostgreSQL.
Problems with the Original Setup
MongoDB was used to store millions of review documents. Operations like deleting and re‑inserting a million documents locked the database for hours, and repairing the database with repairDatabase also took many hours. Performance issues were hard to diagnose; even extensive monitoring failed to pinpoint the cause until the primary node was replaced.
Schema‑less Pitfalls
MongoDB’s schemaless nature led to implicit schema problems. For example, Ruby code assumed every document had a title field:
post_slug = post.title.downcase.gsub(/\W+/, '-')When documents used a different field name or lacked title, the code crashed. The fix required explicit existence checks or defining a schema in the model (e.g., using Mongoid), which also improved reusability across dozens of applications.
Criteria for a Good Database
Consistency – guarantees predictable behavior and simplifies application logic.
Visibility – makes system state and data retrieval easy for debugging and monitoring.
Explicitness – enforces correct data types and constraints, preventing invalid data.
Scalability – covers performance, cost, and the ability to adapt to evolving requirements.
Choosing PostgreSQL Over MongoDB
Olery evaluated MySQL and PostgreSQL against the four criteria. MySQL allowed inserting text into integer columns (only issuing a warning) and locked tables during schema changes, which could halt services for hours. PostgreSQL rejected invalid type inserts outright and supported online schema changes without full table locks. Additional PostgreSQL features such as trigram indexes, full‑text search, JSON support, key‑value storage, and pub/sub further satisfied Olery’s needs.
Migration Process
The migration was split into three major steps:
Create a PostgreSQL database and migrate a small data subset.
Update all MongoDB‑dependent applications to use PostgreSQL, performing necessary refactoring.
Migrate production data to the new database and deploy the new platform.
Subset Migration
Custom Ruby scripts handled data transformation, such as renaming fields, fixing encoding issues, and normalising language codes for sentiment analysis. These one‑off scripts moved comments, cleaned data, and corrected primary‑key sequences.
Updating Applications
The bulk of effort involved refactoring code that relied on MongoDB’s aggregation framework. The process included:
Replace MongoDB driver/model code with PostgreSQL equivalents.
Run the test suite.
Fix failing tests.
Iterate until all tests pass.
Non‑Rails services used the Sequel library, while Rails services continued with ActiveRecord. An example SQL query to compute user counts per locale and percentages:
SELECT locale,
count(*) AS amount,
(count(*) / sum(count(*)) OVER ()) * 100.0 AS percentage
FROM users
GROUP BY locale
ORDER BY percentage DESC;The same query expressed with Sequel’s DSL:
star = Sequel.lit('*')
User.select(:locale)
.select_append { count(star).as(:amount) }
.select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
.group(:locale)
.order(Sequel.desc(:percentage))Production Data Migration
Two strategies were considered: a full shutdown for a single cut‑over, or a live migration that kept the service running. Olery chose the live approach because most write traffic (users, address books) was a small, predictable fraction of total activity.
The live migration followed these steps:
Migrate critical data that must never be lost (users, address books).
Migrate less critical data that can be recomputed.
Validate everything on a separate set of servers.
Switch production traffic to the new servers.
Re‑migrate the critical data to capture any changes that occurred during the cut‑over.
Step 2 took the longest (about 24 hours); steps 1 and 5 each required roughly 45 minutes.
Results
Olery completed the migration a month ago. Performance improvements were significant. The Hotel Review Data API’s response time roughly halved after the cut‑over, as shown in the first chart.
Comment persistence also saw dramatic speed gains, illustrated in the second chart.
The Scraper component became faster as well, as depicted in the third chart.
The Scheduler’s average processing time also dropped after migration, as shown in the final chart.
Overall, Olery is very satisfied with the migration: query performance, reliability, and developer experience have all improved, and the remaining MongoDB‑based service is expected to move to PostgreSQL soon.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
