From MongoDB to ClickHouse: Lessons Learned and Performance Gains
This article recounts the author's journey from using MongoDB for front‑end monitoring logs to migrating to ClickHouse, detailing the challenges with large‑scale data, optimization attempts, the fundamental differences between row‑ and column‑oriented databases, and the resulting performance and storage improvements.
Preface
When the front‑end monitoring system was first built, MongoDB was chosen as the log storage because its document model fits well with JSON and Node.js, and it makes schema changes easy.
After a few months the single collection grew to billions of records, making queries—especially aggregations—slow. Various optimizations such as compound indexes, time‑based constraints, and periodic data cleanup were tried but did not solve the problem.
Through conversations with colleagues the author learned about ClickHouse and decided to evaluate it.
The Joys and Pains of MongoDB
Initially MongoDB felt very convenient, but several core design issues emerged.
Sharding vs. Partitioning Front‑end logs are stored in separate collections per application and data type. MongoDB does not support native time‑based partitioning; it only supports sharding across a cluster. Month‑based sharding leads to queries that cannot span months efficiently.
Indexes Time‑based single indexes were used, and later compound indexes were added for multi‑field aggregation, but they consumed storage and offered limited speed gains.
Query Constraints Default queries without time limits scanned the entire collection, causing long runtimes on large tables. Adding a default time filter helped but did not fully address all use cases.
Data Cleanup Deleting data older than six months required a scheduled job. Deleting millions of rows saturated CPU and slowed both reads and writes. MongoDB only releases space when entire collections are dropped, so storage did not shrink after deletions.
Explain Analysis Running explain showed that queries were correctly planned but still slow, indicating inherent limitations.
Summary
As slow queries increased, the shared cloud database became a risk. The front‑end monitoring workload caused high CPU usage and interfered with other services. Eventually a dedicated database instance was provisioned for monitoring, reducing load on the main system.
New Insights into ClickHouse
Initially the author assumed MongoDB could still be optimized, but later realized that ClickHouse’s columnar architecture offered a fundamentally different approach.
Row Store vs. Column Store
Row‑oriented databases (e.g., MySQL, MongoDB) store complete records together, leading to unnecessary I/O when only a few columns are needed. Column‑oriented databases (e.g., ClickHouse) store each column contiguously, allowing queries to read only the required columns.
Row example: id:1, name:'A', year:21 → stored together
Column example: id:1,2,3 → stored together; name:'A','B','C' → stored together
Because ClickHouse reads only the needed columns, I/O is reduced and query speed improves dramatically.
Composite Indexes in Row Databases
MySQL and MongoDB support multi‑field composite indexes, but they consume extra storage and are limited by left‑most prefix rules, making them less flexible than columnar scans.
Summary
After exhausting MongoDB optimizations, the author switched to ClickHouse, learning the core differences between row and column stores and recognizing that each has its own strengths for OLTP vs. OLAP workloads.
OLTP and OLAP
OLTP focuses on transactional write operations, while OLAP emphasizes analytical read queries. Traditional databases like MySQL excel at OLTP on modest data sizes, but large‑scale analytical queries benefit from columnar OLAP systems such as ClickHouse.
Data pipelines often extract from an OLTP source, transform, and load into an OLAP warehouse for analysis. The author’s front‑end monitoring now writes logs directly to ClickHouse, bypassing the need for a separate OLTP store for that data.
Pros and Cons of ClickHouse
ClickHouse offers excellent query performance and compression, often ranking at the top of benchmark scores. Its drawbacks include a less mature ecosystem compared to MySQL, with fewer third‑party tools and libraries.
Performance Comparison
Side‑by‑side tests showed ClickHouse delivering faster queries and lower storage usage than MongoDB for the same schema. ClickHouse also supports partition‑by‑month, allowing easy deletion of old partitions and automatic space reclamation.
CREATE TABLE xxx (
time DateTime,
status Int32,
) ENGINE = MergeTree PARTITION BY toYYYYMM(time) ORDER BY time;Deleting a month’s data:
ALTER TABLE xxx DROP PARTITION '202104';Some Practical Notes
New ClickHouse APIs change frequently; ensure production, testing, and local environments use the same version.
ClickHouse supports the MySQL protocol but lacks features such as prepared statements and certain data type mappings.
ORMs designed for MySQL (e.g., Sequelize) cannot fully manage ClickHouse schemas; direct SQL or HTTP APIs are preferred.
ClickHouse’s HTTP interface can be used with third‑party client libraries; the author built a lightweight ClickHouse ORM for Node.js.
GUI tools like LightHouse provide simple query interfaces via HTTP.
Conclusion
Introducing ClickHouse dramatically improved the system’s performance, though MongoDB remains in use for metadata and frequently updated data. The migration journey involved extensive learning and experimentation, ultimately yielding valuable insights into database selection for large‑scale log storage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
