Tagged articles
278 articles
Page 2 of 3
ITPUB
ITPUB
Mar 13, 2023 · Databases

10 Years of Amazon Redshift: From MPP to Serverless and Real‑Time Data Warehousing

This article traces a decade of Amazon Redshift’s evolution, detailing its shift from a traditional MPP warehouse to a fully cloud‑native Serverless architecture, exploring its underlying innovations, key features such as Concurrency Scaling, built‑in ML, Data Sharing, and offering practical best‑practice guidance for real‑time analytics across diverse industry scenarios.

Amazon RedshiftConcurrency ScalingData Warehouse
0 likes · 17 min read
10 Years of Amazon Redshift: From MPP to Serverless and Real‑Time Data Warehousing
DataFunSummit
DataFunSummit
Mar 9, 2023 · Big Data

Designing Efficient and Agile Real-Time Big Data Analytics Platforms for Enterprises

The article explains how enterprises can build a comprehensive big data analytics platform—covering data collection, storage, computation, and decision layers—by clarifying business scenarios, choosing appropriate on‑premise or cloud deployment, selecting suitable architectures such as Lambda/Kappa, and addressing component choices and emerging technical trends.

Big DataData ArchitectureReal-time analytics
0 likes · 9 min read
Designing Efficient and Agile Real-Time Big Data Analytics Platforms for Enterprises
StarRocks
StarRocks
Feb 21, 2023 · Databases

How Yidian Tianxia Built a Unified Real‑Time & Offline Data Warehouse with StarRocks

Yidian Tianxia tackled massive daily data volumes and complex analytics by defining a five‑layer data‑warehouse standard, comparing ClickHouse and StarRocks performance, and implementing a unified real‑time/offline architecture with StarRocks, DataPlus, and EasyJob, achieving multi‑fold query speedups and lower operational costs.

ClickHouseData GovernanceData Warehouse
0 likes · 14 min read
How Yidian Tianxia Built a Unified Real‑Time & Offline Data Warehouse with StarRocks
ITPUB
ITPUB
Feb 13, 2023 · Databases

How Apache Doris Enables Cloud‑Native Real‑Time Data Warehousing for Log Analytics

Based on a DTCC2022 presentation, this article explains Apache Doris's high‑performance MPP architecture, its cloud‑native extensions in SelectDB, and how they solve large‑scale log storage and analysis with superior write throughput, storage efficiency, and interactive query speed.

Apache DorisMPPReal-time analytics
0 likes · 11 min read
How Apache Doris Enables Cloud‑Native Real‑Time Data Warehousing for Log Analytics
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 10, 2023 · Big Data

How Alibaba’s Dolphin Engine Uses Flink + Hologres for Real‑Time Big Data

The Dolphin engine, built by Alibaba’s Data Engine team, combines Flink and Hologres to deliver ultra‑large‑scale OLAP, streaming, batch, and AI capabilities for real‑time advertising analytics, offering smart materialization, intelligent indexing, and vector recall while supporting millions of advertisers and petabyte‑level data.

AIBig DataFlink
0 likes · 13 min read
How Alibaba’s Dolphin Engine Uses Flink + Hologres for Real‑Time Big Data
DataFunSummit
DataFunSummit
Jan 8, 2023 · Big Data

Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis

This article presents a comprehensive overview of evolving big‑data architectures—from classic offline warehouses to Lambda and Kappa models—and details a streaming‑batch integrated solution that addresses latency, data freshness, and multi‑table join challenges to achieve minute‑level real‑time multi‑dimensional analytics.

Batch ProcessingData WarehouseKappa architecture
0 likes · 18 min read
Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis
ITPUB
ITPUB
Dec 21, 2022 · Databases

How OpenMLDB Guarantees Real‑Time, Consistent Features for Machine Learning at Scale

This article explains the data and feature engineering challenges of deploying machine learning, introduces OpenMLDB’s open‑source architecture—including offline Spark‑based processing, a high‑availability online engine with dual‑layer memory indexes, snapshot/binlog persistence, and pre‑aggregation techniques—then showcases real‑world case studies and the project’s roadmap.

Feature StoreOpenMLDBReal-time analytics
0 likes · 15 min read
How OpenMLDB Guarantees Real‑Time, Consistent Features for Machine Learning at Scale
Alipay Experience Technology
Alipay Experience Technology
Nov 29, 2022 · Mobile Development

How Ant's Mobile Edge Computing Container Powers Real‑Time AI on Devices

This article explains the challenges of deploying intelligent features on mobile clients and describes Ant Group's edge computing container, its three‑layer architecture, real‑time compute, feature, and decision engines, and the low‑code platform that enables fast, stable, and scalable AI solutions on devices.

Real-time analyticsdecision enginefeature engineering
0 likes · 15 min read
How Ant's Mobile Edge Computing Container Powers Real‑Time AI on Devices
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 29, 2022 · Big Data

How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data

The article explores Apache Flink’s eight‑year journey to becoming a top‑level Apache project, Alibaba’s extensive contributions, the rise of stream‑batch unified computing, its impact on real‑time data integration, cloud‑native deployment, and the emerging Flink‑based data‑warehouse and serverless solutions.

Apache FlinkBig DataCloud Native
0 likes · 15 min read
How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data
TAL Education Technology
TAL Education Technology
Nov 17, 2022 · Big Data

Real-Time Data Warehouse: Background, Value Assessment, and Half-Year Progress

This article outlines the background and terminology of data warehousing, presents a formula for evaluating warehouse value, and details the team's half‑year efforts—including architecture selection, quality assurance, stability governance, and data‑value externalization—to improve efficiency, quality, stability, and cost in real‑time data services.

Data GovernanceReal-time analyticsdata operations
0 likes · 10 min read
Real-Time Data Warehouse: Background, Value Assessment, and Half-Year Progress
StarRocks
StarRocks
Nov 8, 2022 · Databases

How StarRocks’ Real‑Time Storage Engine Evolves to Meet Modern Analytics Demands

This article outlines the evolution of StarRocks’ storage engine—from its real‑time update capabilities and primary‑key model challenges to recent optimizations like persistent indexes, partial column updates, conditional updates, high‑frequency import improvements, DML support, and future plans for separating primary and sort keys, introducing row‑store, and enhancing materialized view support.

DMLReal-time analyticsRow Store
0 likes · 18 min read
How StarRocks’ Real‑Time Storage Engine Evolves to Meet Modern Analytics Demands
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 3, 2022 · Big Data

How Alibaba Cloud’s ODPS Upgrade Redefines Big Data Processing and AI Integration

Alibaba Cloud announced that its ODPS platform has been upgraded into an integrated big‑data solution that supports massive batch jobs, real‑time analytics, and AI workloads, delivering record‑breaking performance and enabling use cases from smart city traffic optimization to accelerated autonomous‑driving model training.

AIBig DataReal-time analytics
0 likes · 5 min read
How Alibaba Cloud’s ODPS Upgrade Redefines Big Data Processing and AI Integration
DataFunSummit
DataFunSummit
Oct 15, 2022 · Cloud Computing

Design and Evolution of Tencent Cloud Product Metering and Billing System

The article presents a comprehensive overview of Tencent Cloud's metering and billing system, detailing the billing models, multi‑dimensional data analysis, real‑time data‑warehouse construction, operator orchestration, hot‑key handling, smooth upgrade strategies, and future evolution directions for large‑scale cloud services.

Data WarehouseMulti-dimensional AnalysisReal-time analytics
0 likes · 16 min read
Design and Evolution of Tencent Cloud Product Metering and Billing System
dbaplus Community
dbaplus Community
Sep 14, 2022 · Databases

How Apache Doris Enables Real‑Time Analysis of Hudi Data Lakes

This article explains the architecture of Apache Doris, introduces Apache Hudi as a data‑lake format, compares Lambda and Kappa approaches, and details the design, implementation steps, and future roadmap for querying Hudi tables directly from Doris.

Apache DorisApache HudiBig Data
0 likes · 10 min read
How Apache Doris Enables Real‑Time Analysis of Hudi Data Lakes
DeWu Technology
DeWu Technology
Sep 14, 2022 · Databases

Introduction to StarRocks: Architecture, Storage, Use Cases, and Troubleshooting

StarRocks is a high‑performance MPP database whose simplified FE/BE architecture, fully vectorized engine, and CBO optimizer enable fast multi‑table joins, while its partition‑bucket‑tablet storage model supports real‑time metric services and dashboard migrations, accompanied by practical troubleshooting guidance and upcoming enhancements.

Data WarehouseMPP databaseReal-time analytics
0 likes · 15 min read
Introduction to StarRocks: Architecture, Storage, Use Cases, and Troubleshooting
Shopee Tech Team
Shopee Tech Team
Sep 2, 2022 · Big Data

Shopee Data System Challenges and Apache Hudi Practices

Shopee tackled its data‑system bottlenecks by customizing Apache Hudi to provide unified stream‑batch integration, efficient state‑detail snapshots, and low‑latency wide‑table generation, using CDC‑based bootstrapping, COW/MOR tables, savepoints and partial updates, which cut latency to ten minutes, lowered resource use, and yielded several community‑backed enhancements.

Apache HudiBig DataData Integration
0 likes · 18 min read
Shopee Data System Challenges and Apache Hudi Practices
DataFunTalk
DataFunTalk
Jul 30, 2022 · Databases

StarRocks-Based Unified Data Service and Analytics Platform at JD Logistics

JD Logistics leverages StarRocks to create the Udata unified query engine, addressing data silos, low performance, and high maintenance costs by integrating data services and analytics, enabling low‑code data service generation, high‑speed federated queries, real‑time updates, and future data‑lake and resource isolation capabilities.

Data IntegrationReal-time analyticsStarRocks
0 likes · 14 min read
StarRocks-Based Unified Data Service and Analytics Platform at JD Logistics
DataFunSummit
DataFunSummit
Jul 28, 2022 · Databases

TiDB HTAP for Financial Intelligent Risk Control: Architecture, Challenges, and Solutions

This article presents a comprehensive overview of how TiDB's HTAP capabilities enable real‑time multi‑source data processing for financial intelligent risk control, detailing the overall risk‑control architecture, digital transformation challenges, TiDB‑based solutions, practical implementations, and future outlooks.

Database ArchitectureHTAPIntelligent Risk Control
0 likes · 12 min read
TiDB HTAP for Financial Intelligent Risk Control: Architecture, Challenges, and Solutions
Efficient Ops
Efficient Ops
Jul 19, 2022 · Databases

How CDC Powers Real-Time Analytics Without Overloading Your Database

This article introduces the practice of Change Data Capture (CDC), explaining how capturing only data changes can feed downstream systems and data warehouses in near real‑time, reducing load on the source database, improving reporting latency, and supporting scalable, reliable analytics pipelines.

CDCChange Data CaptureReal-time analytics
0 likes · 9 min read
How CDC Powers Real-Time Analytics Without Overloading Your Database
DataFunTalk
DataFunTalk
Jul 18, 2022 · Big Data

Integrating Apache Doris with Hudi: Design, Implementation, and Future Plans

This article introduces Apache Doris, an MPP analytical database, and explains how it integrates with the Hudi data lake format, covering architectural features, design choices, implementation steps including external table creation and query processing, and outlines future enhancements for supporting MOR snapshots and incremental queries.

Apache DorisData LakeHudi
0 likes · 12 min read
Integrating Apache Doris with Hudi: Design, Implementation, and Future Plans
DataFunSummit
DataFunSummit
Jul 17, 2022 · Big Data

Elasticsearch and Big Data: Architecture, Use Cases, and Advantages

This article explains what Elasticsearch is, how it solves database acceleration, log observability, and data analysis problems, details its core components and underlying engine features, compares its strengths and weaknesses, and presents classic application scenarios and a real‑world case study integrating Elasticsearch with Flink for large‑scale log analytics.

Big DataCase StudyElasticsearch
0 likes · 13 min read
Elasticsearch and Big Data: Architecture, Use Cases, and Advantages

Mastering Apache Druid: Architecture, Real-Time Ingestion, and Query Optimization

Apache Druid is a distributed, column‑store OLAP engine designed for massive real‑time data ingestion and sub‑second queries; this article explains its LSM‑tree‑inspired architecture, DataSource and Segment structures, memory‑based querying, practical deployment steps, common pitfalls, and optimization techniques for high‑throughput analytics.

Apache DruidOLAPReal-time analytics
0 likes · 20 min read
Mastering Apache Druid: Architecture, Real-Time Ingestion, and Query Optimization
dbaplus Community
dbaplus Community
Jun 14, 2022 · Big Data

How Qunar Built a Scalable BI Platform for Real‑Time Analytics and Self‑Service Reporting

This article details Qunar's multi‑year journey of designing and evolving a full‑stack BI platform—covering data ingestion, storage, query engines, self‑service analytics, and real‑time OLAP—by iterating through three development phases, selecting technologies such as Impala, Kudu, ClickHouse and Apache Druid, and addressing performance, usability and governance challenges to empower business users with fast, reliable data insights.

Apache DruidBIBig Data
0 likes · 24 min read
How Qunar Built a Scalable BI Platform for Real‑Time Analytics and Self‑Service Reporting
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 14, 2022 · Big Data

Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?

This article examines the core trade‑offs of data warehouses—freshness, query latency, and cost—compares offline and real‑time architectures, introduces the concept of a streaming data warehouse, and details how Apache Flink Table Store aims to provide a unified, low‑cost solution.

Big DataData WarehouseFlink
0 likes · 19 min read
Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?
IT Architects Alliance
IT Architects Alliance
Jun 7, 2022 · Databases

Introduction to Change Data Capture (CDC) Practices

This article introduces the concept and practice of Change Data Capture (CDC), explaining how it captures database changes to provide real‑time incremental data for analytics and reporting without impacting source performance, and outlines modern CDC methods, challenges, and production‑ready system requirements.

CDCChange Data CaptureData Integration
0 likes · 8 min read
Introduction to Change Data Capture (CDC) Practices
Top Architect
Top Architect
Jun 7, 2022 · Databases

An Introduction to Change Data Capture (CDC) Practices and Modern Approaches

This article introduces the concept of Change Data Capture (CDC), explains why traditional batch reporting strains resources, describes how CDC captures only data changes to keep source databases performant, and outlines modern CDC architectures, production‑ready considerations, and best‑practice guidelines for building reliable data pipelines.

CDCChange Data CaptureData Integration
0 likes · 16 min read
An Introduction to Change Data Capture (CDC) Practices and Modern Approaches
StarRocks
StarRocks
Jun 2, 2022 · Big Data

Simplify Real‑Time Data Warehousing with Flink CDC and StarRocks

This article explores how combining Flink CDC with StarRocks can streamline real‑time data pipelines, reduce component complexity, support both full and incremental synchronization, and enable efficient OLAP queries and updates for fast, scalable analytics across diverse business scenarios.

Data WarehouseFlink CDCOLAP
0 likes · 18 min read
Simplify Real‑Time Data Warehousing with Flink CDC and StarRocks
IT Architects Alliance
IT Architects Alliance
May 11, 2022 · Databases

How Change Data Capture Enables Real‑Time Analytics Without Overloading Your Database

The article explains the fundamentals of Change Data Capture (CDC), describing how capturing DML changes from relational databases like MySQL or PostgreSQL can provide incremental, near‑real‑time data for analytics and reporting while preserving source performance, and outlines modern CDC architectures, transaction‑log based extraction, and production‑ready design considerations.

CDCChange Data CaptureDatabase Replication
0 likes · 9 min read
How Change Data Capture Enables Real‑Time Analytics Without Overloading Your Database
DataFunTalk
DataFunTalk
May 3, 2022 · Big Data

Lianyou Technology Car IoT Platform: Architecture, Data Ingestion, Storage, and Application Overview

The article presents a comprehensive overview of Lianyou Technology's car IoT platform, detailing its four‑layer architecture, configurable data ingestion, hybrid cloud storage solutions, real‑time and offline data warehouses, and downstream data applications such as user operation, intelligent recommendation, and data security practices.

Real-time analyticscar IoTcloud
0 likes · 12 min read
Lianyou Technology Car IoT Platform: Architecture, Data Ingestion, Storage, and Application Overview
DataFunSummit
DataFunSummit
Apr 9, 2022 · Big Data

Impala Deployment and Optimization: Practical Experience with Sensor Data Multi‑dimensional Analysis Platform

This article presents a comprehensive technical walkthrough of Sensor Data's multi‑dimensional analysis platform, covering product architecture, an Impala‑based real‑time query engine, query performance tuning, resource‑estimation strategies, and future plans, with concrete diagrams, test results, and community contributions.

Big DataData ArchitectureImpala
0 likes · 19 min read
Impala Deployment and Optimization: Practical Experience with Sensor Data Multi‑dimensional Analysis Platform
StarRocks
StarRocks
Mar 29, 2022 · Big Data

How StarRocks Handles PB‑Scale Real‑Time Analytics with High Availability

This article explains how StarRocks manages petabyte‑level user behavior logs, ads and orders through a shared‑nothing architecture, tablet‑based data distribution, MPP compute, high‑availability metadata, real‑time mini‑batch ingestion, and online schema changes, enabling 24/7 analytical services for diverse internet companies.

MPPOnline Schema ChangeReal-time analytics
0 likes · 11 min read
How StarRocks Handles PB‑Scale Real‑Time Analytics with High Availability
21CTO
21CTO
Feb 24, 2022 · Big Data

5 Data Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time

In 2022 the modern data stack will be driven by the rise of analytics engineers, intensified competition between lakehouse and warehouse solutions, growing demand for real‑time analytics, the explosive growth of cloud marketplaces, and the emergence of unified data‑quality terminology, all reshaping data infrastructure and operational practices.

Data QualityLakehouseReal-time analytics
0 likes · 17 min read
5 Data Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time
DataFunTalk
DataFunTalk
Feb 22, 2022 · Artificial Intelligence

Real‑Time Graph Neural Network for Payment Fraud Detection at eBay

This article describes how eBay applies graph neural networks to real‑time payment fraud detection, covering the anti‑fraud scenario, limitations of traditional GBDT pipelines, challenges of constructing and serving dynamic heterogeneous graphs, the end‑to‑end solution with directed slice graphs and a Lambda‑style architecture, and experimental results comparing GNN with LightGBM.

Real-time analyticsfraud detectionmachine learning
0 likes · 15 min read
Real‑Time Graph Neural Network for Payment Fraud Detection at eBay
DataFunSummit
DataFunSummit
Feb 20, 2022 · Databases

Understanding TiDB Architecture and Real‑Time Application Scenarios

This article explains TiDB's HTAP architecture, covering industry challenges, the row‑store TiKV and column‑store TiFlash design, MPP integration in TiDB 5.0, and a range of real‑time use cases such as dashboards, reporting, and data‑warehouse pipelines.

Database ArchitectureHTAPMPP
0 likes · 16 min read
Understanding TiDB Architecture and Real‑Time Application Scenarios
ByteDance Data Platform
ByteDance Data Platform
Jan 17, 2022 · Big Data

How ByteHouse Scales Real‑Time Analytics on ClickHouse: Challenges & Solutions

This article details ByteHouse’s evolution from ClickHouse, presenting two real‑time analytics use cases, the technical selection process, performance bottlenecks such as write throughput and Kafka consumption, and the engineered solutions—including asynchronous indexing, multi‑threaded Kafka engines, and enhanced Buffer engines—that enable reliable, high‑throughput data processing at massive scale.

ByteHouseClickHouseKafka
0 likes · 11 min read
How ByteHouse Scales Real‑Time Analytics on ClickHouse: Challenges & Solutions
Shopee Tech Team
Shopee Tech Team
Jan 13, 2022 · Big Data

Engineering Practices and Performance Optimizations of Apache Druid for Real‑Time OLAP at Shopee

Shopee’s engineering team scaled a 100‑node Apache Druid cluster for real‑time OLAP by redesigning the Coordinator load‑balancing algorithm, adding incremental metadata pulls, introducing a segment‑merged result cache, and building exact‑count and flexible sliding‑window operators, while planning cloud‑native deployment.

Apache DruidBig DataBitmap Index
0 likes · 17 min read
Engineering Practices and Performance Optimizations of Apache Druid for Real‑Time OLAP at Shopee
Programmer DD
Programmer DD
Jan 8, 2022 · Big Data

How Flink’s Streaming Warehouse Is Redefining Real‑Time Data Lakes

This interview explores Apache Flink’s evolution toward a Streaming Warehouse, detailing its stream‑batch integration, new CDC‑based data integration, the Dynamic Table storage architecture, and how these innovations aim to simplify and accelerate real‑time big‑data analytics.

Apache FlinkBig DataDynamic Table
0 likes · 17 min read
How Flink’s Streaming Warehouse Is Redefining Real‑Time Data Lakes
Volcano Engine Developer Services
Volcano Engine Developer Services
Jan 4, 2022 · Big Data

How ByteDance Scales EB-Level Data: Architecture, BP Model & Real-Time Insights

ByteDance’s data platform, built over seven years, now handles exabyte-scale data and over 100 million TPS, using a hybrid “middle‑platform + Business Partner” model, custom engines like ClickHouse/ByteHouse, agile governance, and a suite of products to support internal and external businesses, illustrating large-scale big-data engineering practices.

Big DataByteDanceClickHouse
0 likes · 22 min read
How ByteDance Scales EB-Level Data: Architecture, BP Model & Real-Time Insights
Qunar Tech Salon
Qunar Tech Salon
Nov 29, 2021 · Big Data

Construction and Practice of Qunar's Business Intelligence Platform

This article details the evolution, architecture, and technical choices of Qunar's BI platform—from early one‑stop reporting to a modular, self‑service system supporting real‑time analytics, multi‑metric calculations, and unified data governance—highlighting challenges, solutions, and performance benchmarks across big‑data technologies.

BIBig DataClickHouse
0 likes · 23 min read
Construction and Practice of Qunar's Business Intelligence Platform
Tencent Cloud Developer
Tencent Cloud Developer
Nov 26, 2021 · Big Data

WeChat's ClickHouse Real‑Time Data Warehouse: Challenges, Co‑Construction, and Performance Gains

Facing Hadoop’s minute‑to‑hour query latency on petabyte‑scale data, WeChat partnered with Tencent Cloud to build a ClickHouse‑based real‑time warehouse, adding custom ingestion, query‑optimisation and management tools that deliver billion‑row throughput, sub‑5‑second queries and over ten‑fold performance gains across millions of daily queries.

Big DataClickHouseCloud Native
0 likes · 9 min read
WeChat's ClickHouse Real‑Time Data Warehouse: Challenges, Co‑Construction, and Performance Gains
HomeTech
HomeTech
Nov 24, 2021 · Databases

Real‑Time Data Analysis at AutoHome: Evaluation and Adoption of StarRocks

This article describes AutoHome's real‑time data analysis architecture, the challenges of existing OLAP solutions, the reasons for choosing StarRocks, detailed performance comparisons with Kylin, ClickHouse, Doris, Presto and Spark, and the practical integration of StarRocks with Flink, broker‑load scripts, and monitoring tools.

FlinkOLAPReal-time analytics
0 likes · 9 min read
Real‑Time Data Analysis at AutoHome: Evaluation and Adoption of StarRocks
DataFunTalk
DataFunTalk
Nov 24, 2021 · Big Data

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Plans

This article presents Tencent's game big‑data analysis platform, detailing its background, the architecture of the iData engine—including offline multi‑dimensional analysis (TGMars), online portrait analysis (TGFace), and real‑time multi‑dimensional analysis (TGDruid)—application scenarios, performance insights, and future ecosystem and open‑source plans.

Big DataGame AnalyticsOLAP
0 likes · 15 min read
Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Plans
DataFunTalk
DataFunTalk
Nov 23, 2021 · Big Data

ClickHouse Practice at Youzan: Architecture, Deployment, and Future Plans

This article details Youzan's adoption of ClickHouse for real-time analytics, covering its evolution from Presto, Druid, and Kylin, the system's architecture, deployment strategies, use cases, performance characteristics, limitations, and future roadmap, including integration with Apache Doris and emerging big‑data trends.

Big DataClickHouseOLAP
0 likes · 23 min read
ClickHouse Practice at Youzan: Architecture, Deployment, and Future Plans
DataFunSummit
DataFunSummit
Nov 8, 2021 · Big Data

Building JD's OLAP System: From Data Ingestion to Management and Future Plans

This article explains how JD.com designs and evolves its OLAP platform, covering data sources, ingestion, storage, real‑time and offline processing, key challenges such as timeliness, high throughput, consistency, and the solutions implemented to support massive e‑commerce analytics.

Big DataData WarehouseDistributed Systems
0 likes · 13 min read
Building JD's OLAP System: From Data Ingestion to Management and Future Plans
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2021 · Big Data

Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices

This article examines the strengths and weaknesses of Apache Iceberg, explains why Tencent selected it over alternatives, details Tencent’s own enhancements and integration with Flink, Spark, and other engines, and shares multiple real‑world implementations for building enterprise‑grade real‑time data lakes.

Apache IcebergData LakeFlink
0 likes · 17 min read
Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices
dbaplus Community
dbaplus Community
Oct 26, 2021 · Databases

Scaling JD.com Customer Service with Doris OLAP: Architecture & Caching

JD.com’s customer service team leverages the open‑source MPP database Doris to power real‑time and offline OLAP dashboards, detailing data ingestion pipelines, full‑link monitoring, dual‑stream high‑availability design, dynamic partition management, multi‑layer caching strategies, and performance optimizations applied during the 2020 11.11 shopping festival.

Big DataOLAPReal-time analytics
0 likes · 15 min read
Scaling JD.com Customer Service with Doris OLAP: Architecture & Caching
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 26, 2021 · Big Data

Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse

This article shares practical insights on designing and operating a real‑time clickstream data warehouse using Flink for streaming processing and ClickHouse for near‑real‑time OLAP, covering dimensional modeling, layered architecture, Flink‑ClickHouse sink implementation, and data rebalancing strategies.

ClickHouseData WarehouseFlink
0 likes · 10 min read
Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Sep 26, 2021 · Databases

How ClickHouse Powers a Billion‑User Profiling Platform at Sub‑5‑Second Latency

This article shares NetEase’s experience building a user‑profile platform with ClickHouse, detailing the business background, challenges of massive data and complex queries, core table designs, data ingestion, bitmap techniques, performance gains, and future plans for scaling and optimization.

Bitmap IndexClickHouseReal-time analytics
0 likes · 13 min read
How ClickHouse Powers a Billion‑User Profiling Platform at Sub‑5‑Second Latency
NetEase Game Operations Platform
NetEase Game Operations Platform
Sep 18, 2021 · Big Data

StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization

This article details NetEase Games’ evolution of its Flink SQL platform, from the early StreamflySQL v1 template‑JAR approach to the v2 SQL‑Gateway architecture, discussing design decisions, challenges such as metadata persistence, multi‑tenant security, horizontal scaling, and job state management.

FlinkReal-time analyticsSQL
0 likes · 17 min read
StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 6, 2021 · Databases

How ByteDance Optimized ClickHouse for Real‑Time Recommendation and Ad Analytics

ByteDance’s ByteHouse, an enterprise‑grade ClickHouse, powers real‑time recommendation and ad‑delivery analytics at massive scale, detailing two case studies, technical selections, architectural designs, and performance optimizations such as asynchronous indexing, multi‑threaded Kafka consumption, and enhanced buffer engines to ensure data integrity.

Big DataByteHouseClickHouse
0 likes · 10 min read
How ByteDance Optimized ClickHouse for Real‑Time Recommendation and Ad Analytics
dbaplus Community
dbaplus Community
Aug 17, 2021 · Big Data

How JD Transformed Its Data Warehouse with Delta Lake for Real‑Time Analytics

This article examines JD's shift from a traditional Lambda‑based data warehouse to a Delta Lake‑powered real‑time data lake, detailing the challenges of legacy architectures, the evaluation of open‑source table formats, Delta Lake's core mechanisms, and the resulting simplified batch‑stream development workflow.

Batch-StreamBig DataData Lake
0 likes · 11 min read
How JD Transformed Its Data Warehouse with Delta Lake for Real‑Time Analytics
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 10, 2021 · Databases

Kudu Overview: Architecture, Features, and Use Cases

Kudu is an open‑source columnar storage engine from Cloudera that combines high‑throughput batch processing with low‑latency random reads, offering features such as C++/Java APIs, Raft‑based replication, flexible consistency, partitioning, and integration with Hadoop, Spark, Impala, and other ecosystem components.

Columnar StorageHadoopKudu
0 likes · 64 min read
Kudu Overview: Architecture, Features, and Use Cases
DataFunTalk
DataFunTalk
Jul 27, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

This article describes how Shuhai Supply Chain upgraded its data warehouse from a complex, high‑cost 1.0 architecture to a streamlined, real‑time solution built around Apache Doris, detailing the motivations, design choices, zero‑code ingestion, metadata management, Flink connector, and the resulting performance gains.

Apache DorisBig DataFlink
0 likes · 13 min read
Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain
dbaplus Community
dbaplus Community
Jul 11, 2021 · Big Data

Scaling Real‑Time & Offline Analytics with Druid: Architecture, Optimizations, and Lessons

This article explains how Beike adopted the Druid OLAP engine to handle massive real‑time and offline query workloads, detailing its four‑component architecture, key technologies such as deep storage and metadata storage, practical optimizations for data ingestion, query caching, dynamic throttling, timeout control, and a roadmap for future enhancements.

Big DataDruidOLAP
0 likes · 19 min read
Scaling Real‑Time & Offline Analytics with Druid: Architecture, Optimizations, and Lessons
Xianyu Technology
Xianyu Technology
Jun 8, 2021 · Big Data

Longgong Data Analysis Platform: Architecture and Solutions for Large‑Scale Structured Data

The Longgong Data Analysis Platform enables Idle Fish to capture, store, and analyze billions of structured product attributes in real time across more than 8,000 categories, using TableStore, MySQL, ODPS, and a distributed scheduler to achieve over 50% query speedup, 80% category coverage, and rapid support for search and recommendation teams.

AlibabaBig DataData Platform
0 likes · 9 min read
Longgong Data Analysis Platform: Architecture and Solutions for Large‑Scale Structured Data
IT Architects Alliance
IT Architects Alliance
May 25, 2021 · Big Data

How Modern Data Middle Platforms Power Real‑Time and Offline Analytics

This article provides a comprehensive technical overview of data middle platforms, covering data aggregation, offline and real‑time development, smart operations, data asset management, governance, service layers, platform implementations, warehouse layering, and key differences between offline and real‑time data warehouses.

Big DataData GovernanceData Platform
0 likes · 26 min read
How Modern Data Middle Platforms Power Real‑Time and Offline Analytics
Architects Research Society
Architects Research Society
May 24, 2021 · Big Data

Understanding the Differences Between Event Stream Processing (ESP) and Complex Event Processing (CEP)

This article explains the origins, concepts, and use‑cases of Event Stream Processing (ESP) and Complex Event Processing (CEP), contrasting their handling of ordered event streams versus unordered event clouds, and discusses how both technologies have evolved and are applied in modern real‑time analytics.

CEPESPEvent Processing
0 likes · 17 min read
Understanding the Differences Between Event Stream Processing (ESP) and Complex Event Processing (CEP)
Tencent Cloud Developer
Tencent Cloud Developer
May 18, 2021 · Big Data

Latest ClickHouse Technologies and Practical Applications

ClickHouse, born from Yandex’s Metrica and now a top‑50 open‑source analytics engine, achieves exceptional speed through a vectorized compute engine, column‑store architecture, and an active community, powering real‑time workloads at companies like Tencent Music, Sina, Bilibili, and Suning while introducing features such as column merging, projections, and storage‑compute separation for future scalability.

ClickHouseColumnar DatabaseOLAP
0 likes · 17 min read
Latest ClickHouse Technologies and Practical Applications
DataFunTalk
DataFunTalk
May 12, 2021 · Big Data

Building a Unified Real‑Time and Offline OLAP Platform with DorisDB at Yuanfudao

The article describes how Yuanfudao's data middle platform built a high‑performance OLAP service using the MPP HOLAP engine DorisDB to unify real‑time and batch analytics, meet low‑latency and high‑concurrency requirements, and support diverse education‑industry use cases such as live‑stream monitoring, advertising, and order analytics.

Big DataDorisDBEducation Technology
0 likes · 13 min read
Building a Unified Real‑Time and Offline OLAP Platform with DorisDB at Yuanfudao
JD Retail Technology
JD Retail Technology
Apr 28, 2021 · Databases

Real‑Time Analytics with Doris for JD Customer Service: Architecture, Caching, and Optimization

This article describes how JD.com leverages the open‑source MPP analytical database Doris for real‑time and offline OLAP on customer‑service data, covering data ingestion pipelines, dual‑stream high‑availability design, dynamic partition management, multi‑level caching, monitoring with Prometheus‑Grafana, and performance optimizations applied during major sales events.

JD.comOLAPReal-time analytics
0 likes · 13 min read
Real‑Time Analytics with Doris for JD Customer Service: Architecture, Caching, and Optimization
58 Tech
58 Tech
Mar 31, 2021 · Big Data

Design and Implementation of an Intelligent Security Monitoring and Alert System

This article presents a comprehensive design of a real‑time security monitoring and alert platform, detailing challenges in high‑concurrency risk control, an architecture that replaces OLAP polling with scalable compute services, event‑time processing, dynamic thresholding using fbprophet, and practical optimizations with Redis and ClickHouse.

ClickHouseReal-time analyticsdynamic thresholds
0 likes · 13 min read
Design and Implementation of an Intelligent Security Monitoring and Alert System
DataFunTalk
DataFunTalk
Mar 29, 2021 · Big Data

Beike's OLAP Platform: Druid Adoption, Architecture, Performance Comparison, and Operational Optimizations

This article details Beike's large‑scale OLAP platform, explaining why Druid was chosen over Kylin, describing the platform's four‑layer architecture, presenting performance and storage benchmarks, and outlining practical improvements to data ingestion, real‑time distinct counting, and cluster stability for high‑concurrency business scenarios.

Big DataDruidOLAP
0 likes · 19 min read
Beike's OLAP Platform: Druid Adoption, Architecture, Performance Comparison, and Operational Optimizations
Ctrip Technology
Ctrip Technology
Mar 25, 2021 · Big Data

Challenges and Approaches for Real‑Time Data Aggregation Analysis

The article examines the key challenges of real‑time data aggregation—data freshness, timely processing, and result visibility—and surveys common solutions such as timestamp‑based sync, CDC, full and incremental computation, storage formats, and trigger mechanisms.

Big DataCDCIncremental Computation
0 likes · 11 min read
Challenges and Approaches for Real‑Time Data Aggregation Analysis
DataFunTalk
DataFunTalk
Mar 24, 2021 · Big Data

Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform

This article details how KuJiaLe's big data team replaced their legacy ADB and Presto clusters with a DorisDB MPP database, achieving sub‑second query latency, unified real‑time and offline analytics, simplified ETL pipelines, and significant cost savings while supporting billion‑row tables and high‑QPS workloads.

Big DataDorisDBETL
0 likes · 9 min read
Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform
dbaplus Community
dbaplus Community
Mar 2, 2021 · Databases

How ByteDance Scaled Real‑Time Analytics with ClickHouse and Kafka Engine

This article details ByteDance's evolution from offline ClickHouse ingestion to a robust real‑time analytics pipeline, covering external transaction handling, risks of direct INSERTs, recommendation and ad‑delivery use cases, Kafka Engine design, multi‑threaded consumption, fault‑tolerance improvements, platform tooling, and future roadmap.

Backend DevelopmentClickHouseKafka
0 likes · 22 min read
How ByteDance Scaled Real‑Time Analytics with ClickHouse and Kafka Engine
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 19, 2021 · Big Data

A Comprehensive Guide to Learning Apache Flink: Background, Core Concepts, Modules, Source Code, and Industry Applications

This article provides a detailed learning roadmap for Apache Flink, covering its theoretical background, key research papers, fundamental concepts, core modules, source‑code exploration, real‑time data‑warehouse use cases, event‑driven applications, and emerging trends in the big‑data ecosystem.

Apache FlinkEvent-drivenReal-time analytics
0 likes · 9 min read
A Comprehensive Guide to Learning Apache Flink: Background, Core Concepts, Modules, Source Code, and Industry Applications
dbaplus Community
dbaplus Community
Feb 18, 2021 · Big Data

How JD Search Scaled Real‑Time Analytics with Flink and Doris

This article details JD Search's journey from a Storm‑based pipeline to a Flink‑driven architecture backed by Apache Doris, covering business requirements, technical challenges, design trade‑offs, performance optimizations for massive traffic spikes, and future plans for their real‑time OLAP data warehouse.

Big DataFlinkOLAP
0 likes · 12 min read
How JD Search Scaled Real‑Time Analytics with Flink and Doris
Sohu Tech Products
Sohu Tech Products
Feb 17, 2021 · Big Data

Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo

This article explains how to implement dynamic data partitioning in Apache Flink using a fraud‑detection demo, covering the system architecture, rule‑driven runtime reconfiguration, custom ProcessFunction code, and the underlying key‑by logic that enables flexible, real‑time stream processing.

Apache FlinkDynamic PartitioningKeyBy
0 likes · 11 min read
Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo
DataFunTalk
DataFunTalk
Feb 5, 2021 · Big Data

Design and Implementation of Beike's Data Management Platform (DMP)

This article details how Beike built a comprehensive Data Management Platform (DMP) that integrates user behavior and business data across multiple apps, outlines its five‑layer architecture, discusses data collection, processing, storage, real‑time profiling, and presents performance results and future optimization directions.

Big DataDMPHive
0 likes · 20 min read
Design and Implementation of Beike's Data Management Platform (DMP)
Youzan Coder
Youzan Coder
Jan 25, 2021 · Big Data

ClickHouse: Principles, Architecture, and Deployment at Youzan

The article explains ClickHouse’s high‑performance columnar OLAP design, its vectorized execution, sparse primary‑key indexes and MergeTree engines, contrasts it with ROLAP/MOLAP approaches, and details Youzan’s large‑scale deployment—including dual‑replica clusters, ingestion pipelines, routing architecture, current challenges, and future container‑based expansion plans.

ClickHouseData PlatformMergeTree
0 likes · 22 min read
ClickHouse: Principles, Architecture, and Deployment at Youzan
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 25, 2021 · Big Data

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

In 2020, Apache Flink surged to become the most active Apache project, releasing three major versions that advanced its unified stream‑batch engine, introduced cloud‑native K8s support, expanded AI capabilities with PyFlink, and fostered a thriving Chinese community, solidifying its role as the de‑facto standard for real‑time computing.

AI integrationApache FlinkBig Data
0 likes · 21 min read
Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem
Byte Quality Assurance Team
Byte Quality Assurance Team
Jan 6, 2021 · Big Data

Fundamentals of Stream Processing: Bounded vs. Unbounded Data, Time Domains, and Windowing Strategies

This article provides a comprehensive introduction to stream processing fundamentals by distinguishing between bounded and unbounded datasets, clarifying the critical differences between event time and processing time, and exploring various windowing strategies to demonstrate how modern distributed systems efficiently handle continuous data flows.

Apache FlinkData WindowingEvent Time
0 likes · 13 min read
Fundamentals of Stream Processing: Bounded vs. Unbounded Data, Time Domains, and Windowing Strategies
dbaplus Community
dbaplus Community
Dec 27, 2020 · Big Data

How ClickHouse Powers a 700 B‑Row Real‑Time Data Platform at Ctrip

This article details how Ctrip's senior engineering manager leveraged ClickHouse to build a high‑availability, sub‑second response data platform handling nearly 700 billion rows, describing the motivations, architecture, data synchronization processes, performance gains, challenges, and practical recommendations for large‑scale analytics.

Big DataClickHouseData Architecture
0 likes · 28 min read
How ClickHouse Powers a 700 B‑Row Real‑Time Data Platform at Ctrip
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Dec 18, 2020 · Big Data

Unlocking the Data Middle Platform: From Ingestion to Real‑Time Analytics

This article provides a comprehensive overview of data middle platform concepts, covering data aggregation, collection tools, development modules, job scheduling, baseline control, heterogeneous storage, permission management, real‑time and offline processing, governance, services, and implementation details for building robust big‑data solutions.

Data GovernanceData PlatformETL
0 likes · 25 min read
Unlocking the Data Middle Platform: From Ingestion to Real‑Time Analytics
JD Cloud Developers
JD Cloud Developers
Dec 4, 2020 · Databases

How TiDB Solves High‑Traffic Database Challenges: Architecture, Features, and Real‑World Cases

This article summarizes a JD Cloud‑TiDB live webcast, explaining why JD Cloud chose TiDB, detailing its distributed architecture (TiDB Server, TiKV storage, PD), highlighting features such as horizontal scaling, strong consistency, real‑time analytics, backup, migration, and showcasing multiple enterprise migration case studies that demonstrate TiDB’s ability to overcome traditional database bottlenecks.

Data MigrationReal-time analyticsScalability
0 likes · 13 min read
How TiDB Solves High‑Traffic Database Challenges: Architecture, Features, and Real‑World Cases
Architect
Architect
Nov 11, 2020 · Big Data

Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips

This article explains how to build a real‑time click‑stream data warehouse using Flink for stream processing and ClickHouse for near‑real‑time OLAP, covering click‑stream characteristics, dimensional modeling, layered warehouse design, async dimension joins, sink implementation, and data rebalancing strategies.

Big DataClick StreamClickHouse
0 likes · 7 min read
Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips
DataFunTalk
DataFunTalk
Oct 29, 2020 · Big Data

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

Lyft transformed its legacy data pipeline by designing a cloud‑native, Flink‑based near real‑time analytics platform that ingests billions of events, writes Parquet files to S3, leverages Presto for interactive queries, and implements multi‑stage non‑blocking ETL, fault‑tolerant back‑fill, and extensive performance optimizations.

AWSData LakeETL
0 likes · 12 min read
Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 23, 2020 · Big Data

Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream

This article provides a comprehensive overview of modern real‑time big‑data solutions, detailing Spark Structured Streaming capabilities, CarbonData’s storage architecture, Meituan’s Flink deployments, and Huawei Cloud Stream’s unified streaming service, highlighting their features, challenges, and future directions.

CarbonDataFlinkReal-time analytics
0 likes · 17 min read
Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream