Databases 15 min read

Applying Doris OLAP Data Warehouse in NIO Automotive: Architecture, Evaluation, and Practices

This technical presentation details NIO's evolution of OLAP solutions—from Druid and TiDB to Doris—explaining the selection criteria, Doris's advantages as a unified OLAP warehouse, its role in the CDP platform, practical deployment experiences, and lessons learned from real‑world usage.

DataFunTalk
DataFunTalk
DataFunTalk
Applying Doris OLAP Data Warehouse in NIO Automotive: Architecture, Evaluation, and Practices

Speaker: Tang Huaidong, Head of Data Team, NIO

Overview: The talk introduces the evolution of OLAP technologies at NIO, covering the adoption of Druid (2017), TiDB (2019), and Doris (2021), and explains why Doris was ultimately chosen as the unified OLAP data warehouse.

1. OLAP Development at NIO

2017: Introduced Druid for real‑time and offline analytics; advantages include columnar storage and high concurrency, but drawbacks were lack of standard protocols, weak join support, high operational cost, and dimension explosion.

2019: Adopted TiDB for distributed OLTP; TiFlash added OLAP capabilities, offering MySQL compatibility, low update cost, but limited OLAP performance and higher storage due to OLTP‑OLAP coupling.

2021: Adopted Doris, benefiting from recent advances in Chinese infrastructure; chosen for its high concurrency, real‑time & offline support, detail & aggregation queries, update capability, MySQL compatibility, performance, and low operational cost.

2. Doris as a Unified OLAP Warehouse

Doris serves as the central component in NIO's data pipeline, handling data sources (business, telemetry, vehicle data), ingestion (CDC to Kafka, batch loads), computation (Lambda architecture with separate batch and stream paths), storage (Routine Load, Broker Load), service (API generation, traffic & permission control), and application (dashboards, reporting).

3. CDP Platform Architecture

The CDP consists of modules: tags (basic and behavior), audience segmentation, insights, and outreach. Doris stores tag data, audience definitions, and effect analysis results, enabling both real‑time and offline queries.

4. Storage Selection Considerations

Unified offline & real‑time access via Doris Routine Load and Broker Load.

High‑efficiency audience selection using vectorized query acceleration.

Efficient aggregation through data sharding, node‑level aggregation, and vectorized execution.

Multi‑table joins supported by Doris, crucial for custom tag calculations.

Federated queries combining Doris with TiDB for OLTP‑OLAP scenarios.

5. Practical Experience and Lessons Learned

Bitmap aggregation is beneficial only at very large cardinalities (e.g., >50 M unique IDs).

External tables on Elasticsearch provide fast point queries but poor aggregation and join performance.

Column‑wise batch updates using REPLACE_IF_NOT_NULL allow different update frequencies without conflicts.

Separate online services are preferred to avoid interference between real‑time and offline workloads.

Future work includes better Doris management via Manager and resource isolation.

6. Q&A Highlights

Conflict‑free simultaneous updates are achieved with REPLACE_IF_NOT_NULL .

Tags are stored in Doris as either atomic base tags or aggregated behavior tags, with custom tags added as new columns in wide tables.

Multi‑table joins are a key reason for choosing Doris, enabling flexible custom tag logic.

Not all data warehouses are Doris; large vehicle‑level detail data remains in Hive, while Doris handles real‑time OLAP workloads.

Thank you for attending.

analyticsBig DataNIOData WarehouseOLAPCDPDoris
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.