Interview with Douban Chief Architect Hong QN: System Architecture, BeansDB, DAE, DPark and Team Practices
The interview with Douban's chief architect Hong QN details the platform's online and offline architecture, including load balancing, the DAE PaaS, the BeansDB key‑value store, the DPark big‑data processing engine, and the team organization and operational practices that support these systems.
Guest Introduction
Hong QN, Douban's chief architect and the company's first full‑time employee, graduated from Tsinghua University, started his career in embedded systems, and began using Python in 2002, gaining deep insight into how a language works at the computer's low level.
Architecture
Douban's overall infrastructure is roughly divided into online and offline parts. The online side uses LVS for high availability and Nginx as a reverse proxy for load balancing. Application services run on the DAE platform, which hosts most of Douban's services today. Supporting services include MySQL, memcached, redis, beanstalkd, and the home‑grown KV store BeansDB.
BeansDB, an open‑source project started in 2008 and released in 2009, began with Tokyo Cabinet as its storage engine and later switched to a Bitcask‑style engine in 2010 for better performance. It hashes keys to determine node placement, writes to multiple nodes (currently three copies with one read), and can store hundreds of terabytes, offering simple value types and easy operations while providing high availability and eventual consistency.
Two BeansDB clusters are deployed internally: doubandb stores small textual data such as reviews and user profiles, reducing MySQL load, while doubanfs handles medium‑sized media like images and audio.
DAE is an internal PaaS built on many existing components, offering simplified security and isolation compared to public clouds. It currently supports Python applications, with plans to add Go support.
The offline side focuses on data mining and analysis, using the MooseFS distributed file system (a C‑based HDFS‑like system with a well‑implemented FUSE module) and the custom distributed computing platform DPark.
DPark, originally a Python implementation of Spark, has diverged from Spark and leverages in‑memory caching to accelerate iterative algorithms, which is crucial for Douban's recommendation workloads. It processes 60–100 TB of data per day and benefits from functional‑style programming for concise code.
Team
The Douban Platform Department consists of four groups: Core System (6 engineers, led by Hong QN), DAE (4 engineers, led by Peng Yu), DBA (2 engineers), and SA (2 engineers). The department focuses on infrastructure that is not directly visible to users, allowing product teams to concentrate on user‑facing features.
Projects are initiated based on whether they are public or business‑specific. For example, the SMS service started as a product‑line need, later became a shared service, and was transferred to the SA team for maintenance.
Core System projects include DPark, BeansDB, MooseFS, search services, and long‑connection push services. Code review is mandatory to promote knowledge sharing, and the primary owner of a project is responsible for its operation, including fault response and gray‑release deployments.
The Platform Department operates without dedicated product managers; engineers identify problems and drive solutions themselves. New technology adoption follows a strict vetting process: successful, comparable‑scale case studies are required, and the team must fully understand and be able to modify the technology, which is why Java is rarely introduced despite its popularity.
Interviewer Introduction
Zhuang Biaowei, currently at Huawei's 2012 Lab R&D Capability Center, has been involved with computers and programming since the mid‑80s and focuses on applying open‑source community practices within enterprises.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
