Databases 11 min read

ClickHouse Deployment in Lenovo Manufacturing: Architecture, Data Integration, and Performance Optimization

This article details Lenovo's implementation of ClickHouse in a manufacturing environment, covering the current data landscape, cluster architecture, integration challenges, performance enhancements, and solutions such as Seatunnel and query pre‑aggregation, illustrating how OLAP engines can address real‑time analytics and concurrency issues in production data pipelines.

DataFunSummit

Aug 10, 2023

ClickHouse Deployment in Lenovo Manufacturing: Architecture, Data Integration, and Performance Optimization

Lenovo, a traditional manufacturing enterprise, faces a fragmented data environment with numerous business systems using heterogeneous databases such as MySQL, PostgreSQL, Hive, MongoDB, Oracle, and SQL Server, leading to slow T+1/T+2 reporting cycles.

The data flow moves from business systems to an integration platform, then to a data center containing various storage solutions, and finally to MySQL for front‑end consumption, resulting in latency of one to several weeks for metric delivery.

Key pain points include massive UPDATE operations on order records that cause deadlocks and extremely long SQL queries with dozens of LEFT JOINs, sometimes taking hours to complete.

The proposed solution records data as immutable events, leveraging an OLAP engine (ClickHouse) to perform direct, detailed queries without complex joins, and builds wide tables to simplify query logic.

ClickHouse is deployed in a two‑shard, two‑replica cluster managed by ZooKeeper for replication, with Nginx load balancing to distribute query traffic; data ingestion occurs via Kafka, separating write and read paths to improve concurrency.

Initial JDBC‑based ingestion became a bottleneck at billions of rows due to heavy merge operations. Introducing Seatunnel allowed ClickHouse to receive pre‑merged data files directly, bypassing the merge step and dramatically increasing write throughput, especially for bulk historical loads.

Query concurrency was further enhanced by separating read/write workloads, scaling ZooKeeper memory, and applying pre‑aggregation (Projection) techniques that reduced typical query times from seconds to milliseconds, at the cost of increased storage.

The Q&A section addresses incremental synchronization via date partitions, Seatunnel's dual modes (JDBC and file‑based), replica recovery procedures, and methods to boost concurrency by adding shards or replicas.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ClickHouse OLAP Data Integration SeaTunnel manufacturing

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.