Cloud Native 12 min read

Why We Chose Kafka for Our Open‑Source Real‑Time Streaming Platform

The article explains how market trends, data‑driven enterprise needs, and internal platform experience led Didi to build Know Streaming—a zero‑intrusion, plugin‑based real‑time streaming solution built on Kafka—to address scalability, operability, and community adoption challenges.

ShiZhen AI

Mar 1, 2023

Why We Chose Kafka for Our Open‑Source Real‑Time Streaming Platform

In the previous post we described the thinking behind the Nightingale product; this article details the motivations and decisions behind the open‑source Know Streaming project (https://github.com/didi/KnowStreaming) and aims to help other open‑source enthusiasts join the effort.

Market drivers and strategic context

Two main factors shape our direction: our company is a data‑driven emerging enterprise that directly experiences data’s value in production, and the Chinese government now treats data as a new production factor alongside traditional resources, as outlined in the central government’s policy on improving factor market allocation. Combined with rapid internet adoption and the rise of digital and intelligent economies, the data market’s commercial scale is expanding.

Why real‑time stream processing still has room

Our analysis identified three reasons the real‑time streaming niche remains promising:

Speed is paramount—data must be perceived instantly during production to maximize value, especially in domains such as social networks, news, financial risk control, supply‑demand matching, and instant decision‑making where latency≈0 is required.

Existing open‑source solutions (e.g., Kafka, Flink) are easy to adopt but become costly to operate at large scale; enterprise‑grade stability and availability are often missing, making it hard to find reliable service providers or achieve full self‑control.

Our internal platform has handled peaks over 80 GB/s, served more than 1,500 data‑driven applications, and maintained an SLA above 99.95 %, evolving through componentization, service‑orientation, productization, and intelligence, each stage bringing stability improvements.

Choosing a Kafka‑based, zero‑intrusion, plugin architecture

Given our cloud‑native capabilities and experience, we decided to build a Kafka‑centric platform that requires no invasive changes to Apache Kafka, aiming to capture the “first‑kilometer” of data consumption in the ecosystem.

Why base the platform on Kafka?

Kafka is the de‑facto standard for data transport; most enterprises already run one or more Kafka clusters, so a compatible processing layer is easier for users to adopt. Moreover, mainstream streaming engines (Kafka + Storm, Kafka + Flink) need additional components, whereas Kafka + Kafka Streaming forms a closed‑loop system that reduces development and operational complexity and saves resources.

Addressing Kafka’s operational challenges

Although Kafka is widely deployed, mastering it or building an enterprise‑grade platform remains difficult because clusters vary in version and age, creating isolated data islands. Our solution provides a zero‑intrusion, plugin‑based control plane that unifies management, lowers the operational barrier, and improves resource efficiency while preserving existing investments.

Feature pillars for three user roles

Developers: Complex CLI commands are replaced by a user‑friendly GUI, a full multi‑tenant system is offered, and the platform claims a 10× usability boost, enabling zero‑threshold access to streaming capabilities.

Operations: Compatibility spans Kafka 0.10 to 3.x, allowing seamless, non‑intrusive cluster onboarding, rich metrics, and diagnostic tools that codify expert knowledge into standard utilities, reducing operational cost.

Architects: High‑level functions are modularized as plugins, decoupling from native Kafka, giving enterprises control over real‑time data integration while mitigating vendor lock‑in.

Community feedback and the data‑integration flywheel

Based on community input, the platform addresses most real‑time transmission pain points, encouraging more data to be integrated, which in turn creates a virtuous cycle between data volume and platform stability.

Connectors Hub for zero‑code integration

To accelerate the flywheel, we built a Connectors Hub that lets users integrate upstream and downstream systems without writing code, supports drag‑and‑drop data processing, and provides visual monitoring of processing progress and data quality.

Open‑source traction and recognition

Since its launch, Know Streaming has attracted over ten WeChat groups, nearly 5,000 users on Knowledge Planet, and more than 6,000 GitHub stars (https://github.com/didi/KnowStreaming). It has also been selected for several national open‑source accolades, including the 2021 Trusted Open‑Source Project list and the 2022 Trusted Open‑Source Community Galaxy Plan.

Future outlook

While the project has solved many practical problems, sustainability remains a concern because most contributors are “powered by love.” The author believes that commercializing the open‑source effort will be essential for broader adoption and long‑term success.

cloud native Real-time Streaming Kafka Data Platform Open Source KnowStreaming

Written by

ShiZhen AI

Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.