Design and Optimization of Zhihu's Bridge Platform for DMP/CDP: Architecture, Challenges, and Solutions
This article presents a comprehensive case study of Zhihu's Bridge platform, detailing its background, five core modules, unified architecture built on Spark and Flink, bitmap‑based tagging, and performance optimizations that address query speed, write latency, and high‑QPS online checks while outlining future directions with Doris 2.0 and large language models.
Introduction: The article shares the research background, components, and problem‑solving approach of Zhihu’s Bridge platform, focusing on DMP/CDP implementation.
Background: Zhihu’s community product combines recommendation and search to match content and users; the Bridge platform serves as the operational terminal linking product and operation systems.
Platform capabilities: The Bridge platform integrates five core modules—Content Operations, Internal Marketing, Creator Operations, Data Center, and Tiered Operations—each described with functions such as content pool, ranking, CRM, DMP, and feature management.
Architecture and design: Business requirements are analyzed, leading to a unified middle‑system architecture that decouples services, shares resources, and uses bitmap‑based tagging, ID mapping, and feature pipelines built on Spark (offline) and Flink (real‑time).
Key challenges and solutions: (1) Query speed – adopted bitmap inverted indexes, intelligent sharding, and Doris colocation groups to achieve sub‑second estimations; (2) Write latency – moved heavy CPU‑bound loading to Spark, leveraged off‑peak windows, and used Spark Load to reduce impact on online services; (3) High QPS online checks – cached bitmap objects in memory, bypassed serialization, and optimized GC to support thousands of audience‑package checks within a few milliseconds.
Future outlook: Plans include migrating to Doris 2.0 with native inverted indexes, integrating large language models for automated audience definition and workflow orchestration, and further scaling the system through partitioned proxy aggregation.
Conclusion: The sharing concludes with gratitude to the audience.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.