Databases 10 min read

Evaluation and Deployment of DorisDB for Analytical Workloads at 58 Group

This article details 58 Group's comprehensive evaluation of DorisDB, TiFlash, and ClickHouse for large‑scale analytical workloads, covering functional and performance benchmarks, real‑world use cases such as security analysis and DBA operations, data ingestion methods, cluster architecture, automation practices, and lessons learned.

DataFunTalk
DataFunTalk
DataFunTalk
Evaluation and Deployment of DorisDB for Analytical Workloads at 58 Group

58 Group, a leader in China's internet life‑service sector, needed a high‑performance analytical platform to support security analysis, business intelligence, and data‑warehouse reporting across massive data volumes.

The DBA team evaluated several analytical databases—DorisDB, TiFlash (4.0.10), and ClickHouse (20.3.8.53)—from early 2021, assessing both functionality and performance using the Star Schema Benchmark.

Functional Evaluation

DorisDB met all functional requirements, offering comprehensive SQL support, materialized views, and multi‑table query capabilities suitable for the group's business scenarios.

Performance Evaluation

Overall query time (single‑table and multi‑table) was shortest with DorisDB.

For single‑table queries, DorisDB was fastest, followed by ClickHouse.

For multi‑table joins, DorisDB consistently outperformed the others.

TiFlash and TiDB showed the longest execution times due to TiKV‑based plans and limited MPP support in the tested version.

ClickHouse required SQL adjustments for multi‑table joins and was case‑sensitive, but delivered strong single‑node performance.

Security Analysis Use Case

To handle billions of daily security logs, the team initially stored raw detail data in DorisDB, reaching 800 billion rows (≈8 TB) in 20 days, which degraded query speed. By switching to an aggregated model with time‑bucket dimensions (day, hour, 15 min), data volume dropped 75 % and query performance improved dramatically, using Kafka + routine load for ingestion.

DBA Internal Business

ProxySQL logs were first routed to Elasticsearch via Filebeat, Kafka, and Logstash, but SQL‑based analysis proved more convenient. The pipeline was later changed to ProxySQL → Filebeat → Kafka → DorisDB, enabling fast SQL queries on log data. A selective logging strategy reduced storage pressure.

Data Ingestion

DorisDB supports various import methods (local files, HDFS, Kafka in CSV/JSON, external tables, batch SQL). Important considerations include providing HDFS NameNode info, defining table schemas, and using Kafka JSON with proper field definitions. Routine load status can be checked with SHOW ROUTINE LOAD\G; .

Cluster Architecture

The current deployment consists of a single cluster with 3 Front‑End (FE) nodes, 3 Back‑End (BE) nodes, and on‑demand brokers, monitored by Prometheus + Grafana. Kafka is recommended for data ingestion.

Operations & Automation

Since DorisDB Standard Edition lacks built‑in management, the DBA team implemented standards for operation, deployment, scaling, upgrades, topology visualization, alerts, and reporting (e.g., table size, disk usage, SQL statistics). Tools like qdorisdb provide cluster overview and login capabilities.

Issues & Recommendations

Plan ports carefully for mixed deployments.

Upgrade promptly to avoid bugs (e.g., max_allowed_packet).

Account permissions differ from MySQL; adjust accordingly.

JSON import lacks field reuse; coordination with developers is needed.

Kafka data debugging requires custom tooling.

Conclusion

Two DorisDB clusters are already in production, with additional clusters in testing. The evaluation demonstrated DorisDB's superior performance and suitability for diverse analytical scenarios, and the team expressed gratitude to the DorisDB vendor for their support.

operationsPerformance BenchmarkData ingestionDorisDBAnalytical Database
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.