Tagged articles
5 articles
Page 1 of 1
ITPUB
ITPUB
Apr 8, 2022 · Big Data

How to Build a Billion-Scale Real-Time Data Warehouse with ClickHouse

This article explains how a large‑scale advertising platform replaced its slow offline data‑warehouse with a ClickHouse‑based real‑time warehouse, covering data source integration, performance comparison, materialized views, projections, schema management, and cost‑effective hot‑cold storage strategies.

ClickHouseKafka IntegrationMaterialized Views
0 likes · 19 min read
How to Build a Billion-Scale Real-Time Data Warehouse with ClickHouse
360 Tech Engineering
360 Tech Engineering
Jul 18, 2019 · Databases

Principles and Practices of Apache Doris: Architecture, Key Technologies, and Real‑World Use Cases

This article presents a comprehensive overview of Apache Doris, covering its positioning as a distributed MPP analytical database, core architecture with FE and BE nodes, key technologies such as vectorized execution and materialized views, integration with Kafka and Elasticsearch, additional features, roadmap, and detailed case studies from Baidu Statistics and Meituan, illustrating its practical deployment and performance characteristics.

Apache DorisColumnar StorageData Warehouse
0 likes · 25 min read
Principles and Practices of Apache Doris: Architecture, Key Technologies, and Real‑World Use Cases
Meitu Technology
Meitu Technology
Aug 2, 2018 · Big Data

Spark Streaming vs Flink – Architecture, Scheduling & Fault Tolerance

This article compares Spark Streaming and Flink across runtime models, component roles, programming APIs, task scheduling, time semantics, dynamic Kafka partition detection, fault‑tolerance mechanisms, exactly‑once guarantees, and back‑pressure handling, providing code examples and practical insights for real‑time data processing.

Dynamic Partition DetectionExactly-OnceFlink
0 likes · 23 min read
Spark Streaming vs Flink – Architecture, Scheduling & Fault Tolerance
dbaplus Community
dbaplus Community
Sep 12, 2016 · Big Data

Apache Flume Quickstart: Log Collection and Kafka Integration

This article introduces Apache Flume, explains its design goals of reliability, scalability, manageability and extensibility, outlines core concepts and architecture, provides step‑by‑step configuration using the first mode, demonstrates integration with Zookeeper, Kafka and a shell script, and shows how to launch and verify the agent.

Apache FlumeBig DataKafka Integration
0 likes · 7 min read
Apache Flume Quickstart: Log Collection and Kafka Integration