Tagged articles
181 articles
Page 1 of 2
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 29, 2026 · Big Data

Designing a Full-Stack Credit Data System: From Ingestion to Real-Time Decision

The article dissects a credit data system architecture, detailing six logical layers—from multi-source data collection and feature engineering (including graph features and feature stores) to model training, real‑time stream processing, decision engine integration, and privacy‑preserving computation—while explaining the trade‑offs, tools, and performance targets needed for accurate, low‑latency risk assessment.

Credit ScoringFeature StoreFlink
0 likes · 16 min read
Designing a Full-Stack Credit Data System: From Ingestion to Real-Time Decision
Alibaba Cloud Observability
Alibaba Cloud Observability
Dec 29, 2025 · Cloud Native

How to Seamlessly Import Massive S3 Logs into Alibaba Cloud SLS with Real‑Time Analysis

This article explains how to centralize and analyze massive multi‑cloud log data stored in object storage by moving AWS S3 logs into Alibaba Cloud Log Service (SLS) using dual‑mode file discovery, SQS event‑driven import, elastic scaling, and pre‑ingestion processing to achieve low latency, high reliability, and cost efficiency.

AWS S3Real-time Processingalibaba-sls
0 likes · 12 min read
How to Seamlessly Import Massive S3 Logs into Alibaba Cloud SLS with Real‑Time Analysis
Instant Consumer Technology Team
Instant Consumer Technology Team
Oct 29, 2025 · Big Data

Revolutionizing Feature Engineering with Distributed Tech & Configurable Services

Facing PB‑scale user behavior data and millions of feature dimensions, the platform transformed its search, advertising, and recommendation pipelines by adopting a distributed, configurable‑service architecture that delivers high‑throughput streaming, elastic storage, rapid feature iteration, and robust fault‑tolerance for AI‑driven personalization.

Big DataData ArchitectureDistributed Systems
0 likes · 17 min read
Revolutionizing Feature Engineering with Distributed Tech & Configurable Services
Huolala Tech
Huolala Tech
Oct 22, 2025 · Backend Development

Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters

To ensure fund safety and robust operations, the team built a real‑time reconciliation platform that leverages Kafka, and after encountering scaling bottlenecks with a static consumer model, they implemented a dynamic, partition‑level, weighted load‑balancing consumer cluster that supports automatic scaling and high‑throughput processing.

Backend ArchitectureDistributed SystemsDynamic Scaling
0 likes · 15 min read
Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters
58 Tech
58 Tech
Aug 7, 2025 · Big Data

Transform Real‑Time Data Warehousing with Paimon: From Flink ROW_NUMBER to Streaming Lakehouse

This article details how a real‑time data warehouse built on Flink, Kafka, HBase and MySQL was redesigned using Paimon to eliminate costly deduplication, handle out‑of‑order events, enable streaming reads, simplify aggregation, replace multiple lookup sources, and achieve faster, more reliable batch repairs, resulting in major resource and operational gains.

Data WarehouseFlinkLakehouse
0 likes · 24 min read
Transform Real‑Time Data Warehousing with Paimon: From Flink ROW_NUMBER to Streaming Lakehouse
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Big Data

How Alibaba Built a World‑Class Big Data Platform Over a Decade

Over ten years, Alibaba’s data engineers transformed a modest Hadoop‑based system into a globally‑scalable, high‑performance big data platform—ODPS/MaxCompute—supporting massive offline and real‑time workloads, pioneering innovations like the 5K cluster expansion, Blink streaming, and the unified ‘Moon’ migration.

AlibabaBig DataData Platform
0 likes · 25 min read
How Alibaba Built a World‑Class Big Data Platform Over a Decade
Linux Cloud Computing Practice
Linux Cloud Computing Practice
May 29, 2025 · Big Data

Why Learn Kafka? Core Benefits, Use Cases, and Key Interview Topics

This article explains why Kafka is essential for modern data engineering, highlighting its widespread adoption, high throughput, scalability, durability, integration with streaming ecosystems, and common real‑time use cases, while also providing a concise list of interview topics for aspiring engineers.

Real-time ProcessingStreamingdata pipelines
0 likes · 6 min read
Why Learn Kafka? Core Benefits, Use Cases, and Key Interview Topics
Full-Stack Internet Architecture
Full-Stack Internet Architecture
May 27, 2025 · Big Data

Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases

This article explains Kafka's event streaming concept, detailing events and streams, core components such as producers, topics, partitions, consumers, persistence, and typical real‑time data pipeline, event‑driven architecture, stream processing, and log aggregation use cases, highlighting its role as a foundational big‑data infrastructure.

Event StreamingKafkaReal-time Processing
0 likes · 7 min read
Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases
Full-Stack Internet Architecture
Full-Stack Internet Architecture
May 20, 2025 · Big Data

Why Learn Kafka? Core Benefits, Use Cases, and a Summary

This article explains why Kafka is widely adopted by top companies, outlines its high throughput, scalability, and durability, and describes key real‑time data pipeline, stream processing, and big‑data integration scenarios, concluding that mastering Kafka is essential for modern backend and data engineering roles.

KafkaReal-time Processingdata engineering
0 likes · 4 min read
Why Learn Kafka? Core Benefits, Use Cases, and a Summary
php Courses
php Courses
Apr 7, 2025 · Backend Development

Implementing Sliding Window Algorithms in PHP for Real-Time Data Processing

This article introduces the sliding window technique, demonstrates efficient PHP implementations for computing averages and handling real-time streams, provides optimization strategies, and outlines practical applications such as financial analysis, network monitoring, and recommendation systems, highlighting performance considerations for backend development.

PHPReal-time ProcessingSliding Window
0 likes · 6 min read
Implementing Sliding Window Algorithms in PHP for Real-Time Data Processing
Alibaba Cloud Observability
Alibaba Cloud Observability
Apr 1, 2025 · Cloud Native

Shift Data Processing Left with SPL: Low‑Code, High‑Performance Cloud‑Native Solutions

This article explains how SPL rule consumption moves data cleaning and preprocessing to the server side, enabling low‑code, high‑performance, cloud‑native real‑time processing that reduces client complexity, latency, and bandwidth costs while integrating with services like Flink and DataWorks.

Log ServiceReal-time ProcessingSPL
0 likes · 10 min read
Shift Data Processing Left with SPL: Low‑Code, High‑Performance Cloud‑Native Solutions
Alibaba Cloud Native
Alibaba Cloud Native
Mar 25, 2025 · Cloud Native

Shift Data Cleaning Server‑Side with SPL: Boost Real‑Time Log Processing

Alibaba Cloud Log Service’s new SPL‑based rule consumption lets users move complex data‑cleaning logic from client code to the server, offering low‑code configuration, high performance, precise filtering, and significant reductions in latency, bandwidth, and compute resources across typical scenarios such as Python SDK processing and Flink integration.

Log ServiceReal-time ProcessingSPL
0 likes · 11 min read
Shift Data Cleaning Server‑Side with SPL: Boost Real‑Time Log Processing
AntData
AntData
Mar 20, 2025 · Big Data

Design and Optimization of Real‑time Data Lake Tables with Paimon and Flink for Advertising Diagnostics

This article presents a comprehensive exploration of using Apache Paimon and Flink to design lake tables that support minute‑level latency, low cost, and unified batch‑stream processing for advertising data, covering schema design, partitioning strategies, performance trade‑offs, cost analysis, and operational best practices.

Big DataData LakeFlink
0 likes · 34 min read
Design and Optimization of Real‑time Data Lake Tables with Paimon and Flink for Advertising Diagnostics
Zhuanzhuan Tech
Zhuanzhuan Tech
Mar 13, 2025 · Backend Development

Design and Implementation of a Real-Time Product Tagging Platform for a Second‑Hand E‑Commerce System

This article presents a comprehensive technical case study of a three‑layer product‑tagging platform that addresses the challenges of fine‑grained operations, ensures real‑time tag updates, guarantees data consistency, and eliminates read bottlenecks through traffic separation, event‑driven processing, deduplication MQ, and multi‑level caching.

Backend ArchitectureData ConsistencyReal-time Processing
0 likes · 13 min read
Design and Implementation of a Real-Time Product Tagging Platform for a Second‑Hand E‑Commerce System
Huolala Safety Emergency Response Center
Huolala Safety Emergency Response Center
Jan 9, 2025 · Information Security

Detecting API Anomalous Traffic with Big Data and Machine Learning

This article outlines a comprehensive approach to API anomaly detection, covering background, objectives, a two‑layer framework with offline and real‑time feature pipelines, threshold profiling, detection methods, strategy types, and operational practices to mitigate data leakage and abuse.

Big DataReal-time ProcessingThreshold Modeling
0 likes · 10 min read
Detecting API Anomalous Traffic with Big Data and Machine Learning
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 25, 2024 · Big Data

How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies

This article, based on Alibaba Cloud expert Li Lubing’s presentation, examines the rapid growth of China’s new energy vehicle market, outlines typical automotive big‑data architectures, compares Lambda and real‑time lakehouse solutions built with Flink and Apache Paimon, and showcases real‑world customer deployments.

Big DataFlinkLakehouse
0 likes · 18 min read
How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies
21CTO
21CTO
Jul 15, 2024 · Big Data

Twitter’s Kappa Architecture: Scaling Real-Time Processing of Billions of Events

Twitter migrated from a Lambda-based dual‑pipeline system to a Kappa architecture that relies on a single real‑time stream using Kafka, Google Pub/Sub, Dataflow, and BigTable, dramatically reducing latency, increasing throughput, and improving data accuracy for processing billions of daily events.

Big DataDataflowKappa architecture
0 likes · 9 min read
Twitter’s Kappa Architecture: Scaling Real-Time Processing of Billions of Events
DataFunSummit
DataFunSummit
Jul 1, 2024 · Big Data

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

This article details JD Retail's transition from a complex Lambda architecture to a unified real‑time data pipeline using Flink, Hudi, and StarRocks, addressing data completeness versus latency, reducing maintenance costs, improving storage efficiency, and delivering faster, more consistent analytics for business users.

Data WarehouseFlinkHudi
0 likes · 13 min read
Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks
DataFunTalk
DataFunTalk
May 13, 2024 · Big Data

Data Integration Maturity Model: From ETL to EtLT

The article examines the evolution of data integration architectures—from traditional ETL through ELT to the emerging EtLT model—highlighting their advantages, disadvantages, industry trends, maturity stages, and practical guidance for enterprises and professionals navigating modern big‑data pipelines.

Big DataData IntegrationDataOps
0 likes · 31 min read
Data Integration Maturity Model: From ETL to EtLT
iQIYI Technical Product Team
iQIYI Technical Product Team
Apr 26, 2024 · Big Data

iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

iQIYI replaced its costly Lambda architecture with a unified Iceberg‑based lakehouse that combines Flink streaming and batch processing, cutting data latency from hours to minutes, supporting thousands of tables via a multi‑table sink, guaranteeing completeness, and saving millions of RMB in operational costs.

Data LakeFlinkIceberg
0 likes · 18 min read
iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Mar 4, 2024 · Big Data

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

Xiaohongshu’s data‑warehouse team integrated Apache Iceberg‑based data‑lake techniques into its existing warehouse, replacing the legacy Hive/Spark stack with global sorting, Z‑order, and upsert‑enabled tables, which cut query latency by up to 90 %, boosted data freshness by 50 %, slashed storage costs by 83 % and saved tens of thousands of GB‑hours of compute daily.

Apache IcebergData LakeData Warehouse
0 likes · 19 min read
Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations
DataFunTalk
DataFunTalk
Feb 27, 2024 · Big Data

Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan

This article presents Jushuitan's cloud‑native OLAP architecture, detailing its evolution, current big‑data stack—including DataWorks, MaxCompute, Flink, Hologres, and Aerospike—along with logistics warning workflows, rule‑matching mechanisms, real‑time processing challenges, and future scalability plans.

Big DataCloud NativeData Warehouse
0 likes · 20 min read
Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan
DataFunTalk
DataFunTalk
Feb 25, 2024 · Big Data

Implementation Practice of Bilibili's Tag System: Evolution, Architecture, and Future Plans

This article details Bilibili's tag system from its 2021 inception through successive redesigns, describing the three‑layer architecture, data flow pipelines using Hive, Iceberg, Spark and ClickHouse, crowd selection DSL, online services with Redis, performance optimizations, and upcoming governance and quality initiatives.

Big DataClickHouseReal-time Processing
0 likes · 12 min read
Implementation Practice of Bilibili's Tag System: Evolution, Architecture, and Future Plans
DataFunSummit
DataFunSummit
Jan 25, 2024 · Big Data

Best Practices of Jushuitan Cloud‑Native OLAP Architecture and Logistics Warning

This article presents Jushuitan's cloud‑native OLAP architecture, covering business background, data‑warehouse evolution, real‑time processing with Flink, Hologres, and Aerospike, and detailed logistics‑warning use cases, followed by technical challenges, future outlook, and a Q&A on implementation details.

Big DataData WarehouseFlink
0 likes · 20 min read
Best Practices of Jushuitan Cloud‑Native OLAP Architecture and Logistics Warning
vivo Internet Technology
vivo Internet Technology
Jan 24, 2024 · Big Data

Evolution of Vivo's Trillions-Scale Data Architecture: Dual-Active Real-Time and Offline Computing

Vivo’s trillion‑scale data platform evolved into a dual‑active real‑time and offline architecture that leverages multi‑datacenter clusters, Kafka/Pulsar caching, a unified sorting layer, HBase‑backed dimension tables, and micro‑batch Spark jobs to deliver low‑cost, high‑performance processing, 99.9% availability, and 99.9995% data‑integrity.

Data ArchitectureHBaseOffline Computing
0 likes · 16 min read
Evolution of Vivo's Trillions-Scale Data Architecture: Dual-Active Real-Time and Offline Computing
Zhuanzhuan Tech
Zhuanzhuan Tech
Dec 14, 2023 · Big Data

Design and Implementation of a Data Service Platform for New Media Business

This article details the background, challenges, design principles, and implementation of a unified data service platform—including data modeling, multi-source governance, real-time processing, and a Doris-based storage solution—to support large‑scale video data for a new media operation.

Apache DorisData GovernanceData Platform
0 likes · 7 min read
Design and Implementation of a Data Service Platform for New Media Business
DataFunSummit
DataFunSummit
Oct 18, 2023 · Big Data

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

This article explains why Kuaishou built a data lake, outlines the shortcomings of its previous Lambda architecture, describes the adoption of Apache Hudi for unified batch‑stream processing, and details the five major technical challenges and the corresponding solutions implemented to improve performance, consistency, and operational reliability.

Apache HudiBig DataData Architecture
0 likes · 17 min read
Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
Oct 1, 2023 · Big Data

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

This presentation introduces Iceberg's core capabilities, details Xiaomi's practical applications—including log ingestion, near‑real‑time warehousing, offline challenges, column‑level encryption, and Hive migration—and outlines future development directions such as materialized views and cloud migration, providing a comprehensive view of modern data‑lake engineering.

Big DataData LakeFlink
0 likes · 22 min read
Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 18, 2023 · Big Data

Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse

The article explains the mainstream Lambda data‑warehouse architecture, its benefits and challenges, then introduces Hudi as a lake‑house solution that unifies real‑time and offline storage, describes the multi‑layer service design, and showcases three practical scenarios—stream processing, real‑time multidimensional analysis, and stream‑batch data reuse—demonstrating how the integrated architecture improves latency, cost, and operational complexity.

Batch ProcessingData WarehouseHudi
0 likes · 13 min read
Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse
21CTO
21CTO
Sep 8, 2023 · Big Data

Why Real-Time Data Processing Is the Next Frontier for Data Engineers

Real-time data processing transforms traditional batch pipelines by delivering fresh, low‑latency data to millions of concurrent users, leveraging event‑driven architectures, streaming engines, and real‑time databases, with use cases ranging from fraud detection to personalized e‑commerce and operational dashboards, and includes reference architectures and tool recommendations.

Big DataReal-time ProcessingStreaming
0 likes · 16 min read
Why Real-Time Data Processing Is the Next Frontier for Data Engineers
Qunar Tech Salon
Qunar Tech Salon
Aug 25, 2023 · Big Data

Customer Data Platform (CDP) at Qunar Travel: Architecture, Construction Practices, and Business Value

This article presents a comprehensive case study of Qunar Travel's Customer Data Platform (CDP), detailing its business background, operational pain points, architectural design, tag production and quality processes, real‑time labeling, crowd selection techniques, deployment safeguards, measurable business impact, and future development directions.

CDPCustomer DataQunar Travel
0 likes · 20 min read
Customer Data Platform (CDP) at Qunar Travel: Architecture, Construction Practices, and Business Value
dbaplus Community
dbaplus Community
Aug 2, 2023 · Backend Development

How WeChat Built a Scalable Security Data Warehouse for Billions of Requests

This article explains the evolution of WeChat's security data warehouse—from its business background and the need for unified feature storage to the architectural designs, multi‑IDC synchronization, operation system, and data‑quality safeguards that enable reliable, high‑performance security policy development for over a trillion daily feature reads and writes.

Data QualityFeature ManagementReal-time Processing
0 likes · 12 min read
How WeChat Built a Scalable Security Data Warehouse for Billions of Requests
Top Architect
Top Architect
Jul 14, 2023 · Big Data

Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases

This article introduces the Lambda Architecture for billion‑scale real‑time data analysis, explains its three layers—Batch, Speed, and Serving—covers its flexibility, fault tolerance, and scalability, and demonstrates concrete applications such as Twitter hashtag analysis and a smart‑parking recommendation system.

Batch LayerBig DataLambda architecture
0 likes · 11 min read
Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases
DataFunTalk
DataFunTalk
Jul 10, 2023 · Big Data

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

This article presents a comprehensive overview of Lakehouse‑based in‑lake warehousing, covering common data‑lake misconceptions, the evolution from databases to data warehouses and lakes, the advantages of Lakehouse over traditional architectures, a reference multi‑layer architecture, typical use cases, challenges, future plans, and a brief Q&A.

Big Data ArchitectureData LakeData Warehouse
0 likes · 20 min read
Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture
MaGe Linux Operations
MaGe Linux Operations
Jun 20, 2023 · Big Data

What Is Kafka? A Beginner’s Guide to Distributed Streaming and Messaging

Kafka is an open‑source, distributed streaming platform that uses a publish/subscribe message queue architecture to provide high‑throughput, fault‑tolerant real‑time data processing, featuring topics, partitions, replicas, consumer groups, and multiple APIs for producers, consumers, streams, connectors, and administration.

Big DataDistributed StreamingKafka
0 likes · 20 min read
What Is Kafka? A Beginner’s Guide to Distributed Streaming and Messaging
ITPUB
ITPUB
Apr 8, 2023 · Big Data

How Bilibili Cut Data Pipeline Costs by 20% with Flink Real‑Time Incremental Computing

Facing daily terabyte‑scale data ingestion and costly duplicate reads in its ODS‑to‑DWD pipeline, Bilibili introduced a Flink‑based real‑time incremental computation and multi‑level partition shuffling, dramatically reducing read amplification, cutting resource usage by ~20%, improving latency to minutes, and enhancing scalability.

Big DataFlinkReal-time Processing
0 likes · 19 min read
How Bilibili Cut Data Pipeline Costs by 20% with Flink Real‑Time Incremental Computing
Tencent Cloud Developer
Tencent Cloud Developer
Mar 8, 2023 · Artificial Intelligence

Building a Scalable Recommendation System for WeChat Games: Architecture and Implementation

The article describes WeChat Games’ scalable recommendation system, detailing its four‑component architecture—offline ML platform, unified management, online DAG‑based engine, and peripheral services—along with a hybrid algorithm library, feature engineering, real‑time monitoring, and solutions that boost engagement across diverse game recommendation scenarios.

Data ManagementDeep LearningReal-time Processing
0 likes · 28 min read
Building a Scalable Recommendation System for WeChat Games: Architecture and Implementation
Baidu Geek Talk
Baidu Geek Talk
Mar 6, 2023 · Big Data

Accelerating Data Production and Consumption in Baidu's Performance Platform

Baidu's Performance Platform speeds data production and consumption by adopting a unified stream‑batch architecture with TM and Spark, leveraging the Turing warehouse, introducing tiered service grading, robust governance and compliance measures, and offering self‑service analytics, cutting latency from minutes or days to milliseconds while handling billions of daily records and boosting SLA adherence, data accuracy, and user satisfaction.

Big DataData GovernanceReal-time Processing
0 likes · 12 min read
Accelerating Data Production and Consumption in Baidu's Performance Platform
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 28, 2023 · Artificial Intelligence

How a Dual‑Way Sign Language Digital Human Transforms Communication for the Deaf

This article describes the severe shortage of sign‑language teachers worldwide, presents user demographics, outlines the challenges of bidirectional sign‑language translation, and details the cloud‑native AI architecture, data pipeline, and real‑time recognition and synthesis techniques behind the virtual digital human "Sign Language Translator".

AIDigital HumanReal-time Processing
0 likes · 17 min read
How a Dual‑Way Sign Language Digital Human Transforms Communication for the Deaf
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Feb 17, 2023 · Big Data

Inside Xinghuan Tech’s Next‑Gen Big Data 3.0 Architecture: Unified, Cloud‑Native, Real‑Time

This article details Xinghuan Technology’s evolution from 2013 to the present, describing its self‑developed Big Data 3.0 stack—including a unified data platform, SQL‑centric development, cloud‑native resource scheduling, distributed storage managed by Raft, DAG‑based compute engines, and real‑time stream processing—while highlighting key milestones and design principles that differentiate it from traditional Hadoop‑based solutions.

Data PlatformReal-time ProcessingSQL Optimizer
0 likes · 19 min read
Inside Xinghuan Tech’s Next‑Gen Big Data 3.0 Architecture: Unified, Cloud‑Native, Real‑Time
dbaplus Community
dbaplus Community
Feb 15, 2023 · Big Data

How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg

This article details Bilibili's 北极星 user behavior analysis platform, tracing its evolution from early Spark‑Jar models to Flink‑ClickHouse pipelines and Iceberg‑based full aggregation, and explains the technical solutions for event, retention, funnel, path analysis, data ingestion, cluster rebalancing, and performance optimizations that enable massive real‑time analytics on billions of daily events.

ClickHouseFlinkIceberg
0 likes · 32 min read
How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg
DataFunSummit
DataFunSummit
Jan 10, 2023 · Big Data

Exploring Iceberg in Huawei Terminal Cloud: Architecture, Features, and Future Plans

This article presents a comprehensive overview of Iceberg's adoption in Huawei Terminal Cloud, covering its architectural overview, key features such as Git‑style data management, real‑time processing, acceleration layers, and future development directions, along with a Q&A session addressing performance and implementation details.

Big DataData LakeFlink
0 likes · 15 min read
Exploring Iceberg in Huawei Terminal Cloud: Architecture, Features, and Future Plans
DataFunTalk
DataFunTalk
Jan 6, 2023 · Big Data

ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution

This article presents the architecture and practical experience of ZhongAn's hundred‑billion‑scale data integration service, covering common integration technologies, business support scenarios for offline and real‑time data, technical challenges, evolution from single‑machine to service‑oriented designs, and future directions using Flink and DataX.

Data IntegrationData PlatformDataX
0 likes · 31 min read
ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution
DataFunTalk
DataFunTalk
Dec 10, 2022 · Big Data

Key Development Trends of Data Warehouses: Standardization, Real‑time Processing, Modularity, and Holistic Evaluation

Based on expert interviews, the article outlines the current development traits of data warehouses—standardization through data governance, real‑time processing, modular architecture, and holistic evaluation—while linking these trends to emerging concepts such as data middle platforms, data lakes, and DataOps.

Real-time Processingmodular architecture
0 likes · 13 min read
Key Development Trends of Data Warehouses: Standardization, Real‑time Processing, Modularity, and Holistic Evaluation
21CTO
21CTO
Nov 9, 2022 · Operations

How Ctrip Handles Billions of Logs Daily: Real‑Time Monitoring, Clog, CAT & TSDB

This article details Ctrip’s large‑scale log monitoring architecture, covering the overall Overview, the Clog log system, the CAT tracing platform, and the internal TSDB solution, explaining how billions of logs are processed in real time with low latency, high reliability, and efficient querying.

Big DataDistributed SystemsLog Monitoring
0 likes · 12 min read
How Ctrip Handles Billions of Logs Daily: Real‑Time Monitoring, Clog, CAT & TSDB
DaTaobao Tech
DaTaobao Tech
Oct 17, 2022 · Artificial Intelligence

AI Live Stream: Causal Representation Learning and Real-time Color Enhancement

In this AI Live Stream, two Taobao Technology engineers present how causal representation learning enables unbiased data augmentation and factor‑controllable generation to boost fine‑grained image classification, while also unveiling a real‑time color‑enhancement technique that merges cascaded lookup tables with dynamic neural networks, illustrating modern AI trends and practical deployment strategies.

AI AlgorithmsFine-Grained ClassificationReal-time Processing
0 likes · 4 min read
AI Live Stream: Causal Representation Learning and Real-time Color Enhancement
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 23, 2022 · Big Data

Using Flink Broadcast State for Dynamic Configuration Updates and Real‑Time Data Enrichment

This article explains how Flink's Broadcast State feature can be used to dynamically update processing rules and enrich streaming events with user information from MySQL, showing configuration, code examples, key considerations, and runtime results that demonstrate real‑time adaptability without restarting the job.

Broadcast StateDynamic ConfigurationFlink
0 likes · 15 min read
Using Flink Broadcast State for Dynamic Configuration Updates and Real‑Time Data Enrichment

Understanding Spark Streaming Checkpoint Mechanism for Real‑Time Feature Computation

The article explains how Spark Streaming's checkpoint mechanism works, detailing the four-step process—from setting the checkpoint directory to writing RDD data and finalizing the checkpoint—highlighting its role in ensuring fault‑tolerant, fast recovery for real‑time recommendation feature pipelines.

Big DataCheckpointReal-time Processing
0 likes · 7 min read
Understanding Spark Streaming Checkpoint Mechanism for Real‑Time Feature Computation
Laravel Tech Community
Laravel Tech Community
Jul 19, 2022 · Backend Development

The Evolution and Architecture of China’s 12306 Railway Ticketing System

This article examines the historical development, distributed architecture, and high‑concurrency challenges of China’s 12306 railway ticketing platform, tracing its origins from early Unix‑based systems to modern multi‑layered backend solutions that support hundreds of millions of users during peak travel periods.

Backend ArchitectureDistributed SystemsRailway
0 likes · 8 min read
The Evolution and Architecture of China’s 12306 Railway Ticketing System
DataFunSummit
DataFunSummit
Jul 15, 2022 · Big Data

Apache DolphinScheduler Practice at Xinwang Bank

Xinwang Bank leverages Apache DolphinScheduler to handle over 9,000 daily task instances across real‑time, near‑real‑time, and offline batch scenarios, detailing background, application scenarios, optimizations, workflow improvements, import/export enhancements, alert system upgrades, and future plans to expand data‑ops capabilities.

Apache DolphinSchedulerBig DataDataOps
0 likes · 13 min read
Apache DolphinScheduler Practice at Xinwang Bank
dbaplus Community
dbaplus Community
Jul 13, 2022 · Big Data

Unpacking the Core Technologies Behind Modern Big Data Platforms

From data ingestion to real‑time analytics, this guide breaks down the essential layers of a typical big‑data platform—covering collection methods, HDFS storage, Hive/Spark analysis, data sharing mechanisms, application use‑cases, streaming with Spark Streaming, and the need for robust scheduling and monitoring.

Big DataData IntegrationData Warehouse
0 likes · 9 min read
Unpacking the Core Technologies Behind Modern Big Data Platforms
DataFunTalk
DataFunTalk
Jun 23, 2022 · Big Data

Real‑Time Low‑Latency Log Monitoring and Storage at Ctrip: Architecture, Clog System, CAT Tracing, and TSDB

This article details Ctrip's large‑scale, real‑time log monitoring solution, covering the overall monitoring architecture, the Clog log system, the CAT tracing platform, and the TSDB metric store, and explains design choices such as write‑heavy indexing, segment‑based storage, and migration to ClickHouse for high‑cardinality data.

Distributed SystemsLog MonitoringReal-time Processing
0 likes · 11 min read
Real‑Time Low‑Latency Log Monitoring and Storage at Ctrip: Architecture, Clog System, CAT Tracing, and TSDB
DataFunSummit
DataFunSummit
Jun 6, 2022 · Artificial Intelligence

Event Graphs in Intelligent Customer Service: Concepts, Applications, and System Architecture

This article introduces event graphs as a knowledge‑centric representation of dynamic events, explains their construction and real‑time processing in Meituan's intelligent customer service, and demonstrates applications such as event timeline extraction, hotspot detection, event prediction, multi‑turn dialogue guidance, and business decision support.

AIEvent SchemaIntelligent Customer Service
0 likes · 16 min read
Event Graphs in Intelligent Customer Service: Concepts, Applications, and System Architecture
Architecture Digest
Architecture Digest
May 23, 2022 · Big Data

Overview of Core Technologies in a Big Data Platform Architecture

This article explains the main layers of a typical big data platform—data collection, storage and analysis, sharing, and application—detailing common tools such as Flume, DataX, Hive, Spark, SparkSQL, Impala, and Spark Streaming, and discusses task scheduling and monitoring in the ecosystem.

Data PlatformDataXHadoop
0 likes · 10 min read
Overview of Core Technologies in a Big Data Platform Architecture
58 Tech
58 Tech
Apr 26, 2022 · Information Security

Design and Architecture of a Full‑Chain Data Warehouse for Information Security

The article presents a comprehensive design of an end‑to‑end data warehouse for information‑security governance, detailing background motivations, multi‑layer data architecture, dimension modeling, bus‑matrix mapping, real‑time (lambda/kappa) processing, data‑dictionary integration, and future directions toward unified streaming‑batch solutions.

Data WarehouseReal-time Processingdimension modeling
0 likes · 16 min read
Design and Architecture of a Full‑Chain Data Warehouse for Information Security
Xianyu Technology
Xianyu Technology
Apr 13, 2022 · Big Data

Real-time Multi-system Data Aggregation for Fan Tag System

The Xianyu fan‑tag system solves the challenge of displaying full‑history purchase counts with real‑time updates and low‑latency, high‑throughput queries by daily exporting multi‑system data to a LevelDB‑based KV store, converting schemas, and applying real‑time compensation from transaction and follow‑change messages, merging offline and live data to produce sorted fan lists at ~10 k QPS.

KV storageReal-time Processingdata aggregation
0 likes · 6 min read
Real-time Multi-system Data Aggregation for Fan Tag System
Kuaishou Big Data
Kuaishou Big Data
Feb 25, 2022 · Big Data

How Kuaishou Scales Data Sync: Architecture, Challenges, and Future Plans

This article details the design, evolution, and optimization of Kuaishou's data synchronization platform, covering business overview, architecture, key technologies, performance tuning, data source protection, incremental data lake integration, and future roadmap for a unified data fabric.

Big DataReal-time Processingarchitecture
0 likes · 15 min read
How Kuaishou Scales Data Sync: Architecture, Challenges, and Future Plans
dbaplus Community
dbaplus Community
Feb 15, 2022 · Big Data

Mastering Data Warehouse Architecture: Concepts, Modeling Techniques, and Real‑Time Strategies

This comprehensive guide explains data warehouse fundamentals, architecture layers, modeling methods such as dimensional and entity modeling, metadata management, and the transition from offline to real‑time processing with Lambda and Kappa architectures, providing practical steps, best practices, and key terminology for building robust analytical platforms.

Big DataData WarehouseETL
0 likes · 63 min read
Mastering Data Warehouse Architecture: Concepts, Modeling Techniques, and Real‑Time Strategies
Kuaishou Tech
Kuaishou Tech
Jan 27, 2022 · Artificial Intelligence

Kuaishou’s Self‑Developed Green‑Screen Matting Algorithm and Its Deployment in Kuaiying, Live Companion, and Cloud Editing

This article explains the principles, challenges, and implementation details of Kuaishou’s proprietary green‑screen matting algorithm, covering fine‑detail handling, color‑spill reduction, green‑reflection removal, and its real‑time deployment across mobile video‑editing and live‑streaming products.

Computer VisionKuaishouReal-time Processing
0 likes · 13 min read
Kuaishou’s Self‑Developed Green‑Screen Matting Algorithm and Its Deployment in Kuaiying, Live Companion, and Cloud Editing
DataFunSummit
DataFunSummit
Dec 6, 2021 · Big Data

Design and Performance Optimization of a Real‑Time Billion‑Scale Data Processing Pipeline

This article reviews the background, architecture, and a series of performance‑optimizing techniques—including consumption, batch, storage, and execution‑engine tweaks—applied to a real‑time pipeline that processes hundreds of billions of records daily, and presents the resulting resource savings and latency improvements.

KafkaPerformance OptimizationReal-time Processing
0 likes · 9 min read
Design and Performance Optimization of a Real‑Time Billion‑Scale Data Processing Pipeline
Baidu Geek Talk
Baidu Geek Talk
Nov 24, 2021 · Big Data

Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned

At Baidu Aifanfan, the data team built a unified real‑time and offline big‑data platform—leveraging Watt, Bigpipe, Fengge, AFS and Palo within Lambda/Kappa patterns and a fast‑slow parallel rollout—that cut OLAP query latency from 18 minutes to under 15 seconds, enabled self‑service analytics, and standardized metrics across 15 agile teams.

Apache DorisBig Data ArchitectureData Governance
0 likes · 23 min read
Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned
21CTO
21CTO
Nov 8, 2021 · Big Data

How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons

Facing rapid business iteration, Baidu’s iFanFan data team designed a unified real‑time and offline big‑data platform, tackling business, technical, and organizational challenges through Lambda/Kappa architectures, data integration, storage, computation, governance, and scalable analytics to deliver timely, accurate, and valuable data products.

Big DataData ArchitectureData Warehouse
0 likes · 33 min read
How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons
Java High-Performance Architecture
Java High-Performance Architecture
Oct 12, 2021 · Big Data

Unpacking the Core Technologies Behind Modern Big Data Platforms

This article breaks down a typical big data platform architecture into its four layers—data collection, storage and analysis, sharing, and real‑time computation—detailing the essential tools such as Flume, HDFS, Hive, Spark, DataX, and task scheduling systems that enable scalable, low‑latency data processing and delivery.

Big DataData ArchitectureDataX
0 likes · 8 min read
Unpacking the Core Technologies Behind Modern Big Data Platforms
IT Architects Alliance
IT Architects Alliance
Sep 5, 2021 · Big Data

Big Data Platform Architecture: Core Layers, Technologies, and Practices

This article outlines a typical big data platform architecture, detailing its core layers—data acquisition, storage and analysis, sharing, application, real‑time computation, and task scheduling—while introducing key technologies such as Flume, HDFS, Hive, Spark, DataX, and monitoring considerations.

Big DataData PlatformHadoop
0 likes · 9 min read
Big Data Platform Architecture: Core Layers, Technologies, and Practices
Architects' Tech Alliance
Architects' Tech Alliance
Sep 2, 2021 · Big Data

Core Technologies and Architecture of a Big Data Platform

The article outlines a typical big data platform architecture, detailing its core layers—data collection, storage and analysis, sharing, application, real-time computation, and task scheduling—while describing key technologies such as Flume, DataX, HDFS, Hive, Spark, Spark Streaming, and Redis.

Data ArchitectureData IntegrationHadoop
0 likes · 9 min read
Core Technologies and Architecture of a Big Data Platform
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Aug 23, 2021 · Artificial Intelligence

How a Lightweight Neural Network Cuts Transient Noise in Real‑Time Audio

NetEase Cloud Communication’s Audio Lab presents a low‑complexity neural‑network denoising algorithm that effectively suppresses both stationary and transient noises while preserving speech quality, detailing its mathematical model, feature design, loss function, GRU‑based architecture, real‑time performance, and comparative evaluation against state‑of‑the‑art methods.

Neural NetworkReal-time Processingaudio denoising
0 likes · 13 min read
How a Lightweight Neural Network Cuts Transient Noise in Real‑Time Audio
JD Retail Technology
JD Retail Technology
Aug 12, 2021 · Big Data

Design and Implementation of JD Mini‑Program Custom Data Analysis Service

This article presents the technical solution and key processes of JD's mini‑program custom data analysis service, covering business background, ClickHouse‑based storage design, real‑time processing pipelines, dynamic rule parsing, table architecture, monitoring mechanisms, and future outlook for large‑scale data analytics.

ClickHouseCustom Data AnalysisData Architecture
0 likes · 13 min read
Design and Implementation of JD Mini‑Program Custom Data Analysis Service
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 3, 2021 · Big Data

Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events

This article, compiled from a Volcano Engine meetup, explains how ByteDance’s unified traffic platform designs, governs, and processes massive event‑tracking data in real time, covering embedding content solutions, link architecture, dynamic processing engines, and data‑governance practices that support trillions of daily events.

Big DataData GovernanceReal-time Processing
0 likes · 16 min read
Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events
ITPUB
ITPUB
Jul 7, 2021 · Big Data

How NetEase Cloud Music Scaled Its Data Warehouse for Billion‑User Traffic

This article details NetEase Cloud Music's journey of redesigning its data warehouse and governance processes to support over a billion monthly active users, covering pain points, standardization, shared services, self‑service tools, and the resulting improvements in data quality, latency, and operational efficiency.

AnalyticsData GovernanceData Platform
0 likes · 19 min read
How NetEase Cloud Music Scaled Its Data Warehouse for Billion‑User Traffic
Youzan Coder
Youzan Coder
Jun 30, 2021 · Big Data

Online Monitoring Practices for Offline and Real-Time Data at Youzan

Youzan Data Report Center monitors offline batch and real‑time data pipelines using accuracy and timeliness rules, cross‑table checks, upstream‑downstream comparisons, and scheduled alerts to detect anomalies early; since 2021 it has generated over 25 alerts, and plans a unified data‑quality dashboard.

Big DataData QualityFlink
0 likes · 12 min read
Online Monitoring Practices for Offline and Real-Time Data at Youzan
Yuewen Technology
Yuewen Technology
Jun 25, 2021 · Big Data

Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing

This article details how Yuedu Group designed and implemented an overseas big data platform, covering overall system architecture, offline data‑warehouse construction with dimensional modeling, real‑time streaming using Oceanus and ClickHouse, and future plans for cost reduction and data quality assurance.

Big DataReal-time Processingarchitecture
0 likes · 12 min read
Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing
DataFunTalk
DataFunTalk
Jun 21, 2021 · Big Data

Flink + Iceberg 0.11 Practices in Qunar Data Platform

This article shares Qunar's experience using Flink together with Apache Iceberg 0.11 to address real‑time data warehouse challenges, covering background pain points, Iceberg architecture, solutions for Kafka data loss and Hive latency, and optimization practices such as small‑file handling, sorting, and checkpoint management.

Big DataData LakeFlink
0 likes · 13 min read
Flink + Iceberg 0.11 Practices in Qunar Data Platform
IT Architects Alliance
IT Architects Alliance
Jun 5, 2021 · Big Data

How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker

This article walks through a complete real‑time recommendation system built on Apache Flink, detailing its v2.0 architecture, modules for user behavior, interest, and product profiling, the recommendation algorithms (hot‑list, collaborative filtering, item similarity), and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka.

DockerFlinkHBase
0 likes · 11 min read
How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker
Big Data Technology Architecture
Big Data Technology Architecture
May 31, 2021 · Big Data

Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform

This article presents Qunar's practical experience with Flink and Iceberg 0.11, covering background challenges such as Kafka data loss and Hive metadata pressure, explaining Iceberg architecture, query planning, and detailed solutions including real‑time ingestion, small‑file handling, sorting, and code examples for seamless migration.

FlinkIcebergReal-time Processing
0 likes · 12 min read
Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform
IT Architects Alliance
IT Architects Alliance
May 22, 2021 · Big Data

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

This article presents a comprehensive walkthrough of a Flink‑powered recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms (hotness, product similarity, collaborative filtering), front‑end and back‑end UI, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink
0 likes · 11 min read
Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide
Tencent Cloud Developer
Tencent Cloud Developer
May 21, 2021 · Big Data

Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

Tencent Cloud Oceanus, a computing service powering internal apps like WeChat and external partners such as Bilibili, scales to over 30,000 cores handling 5 PB daily and 500,000 jobs, and tackles Flink SQL’s syntax, function and operational limits with table‑valued functions, incremental and enhanced tumble windows, and caching‑based retraction optimization that cuts downstream data volume up to 30× and improves join performance by about 20 %.

Big DataFlink SQLOceanus
0 likes · 19 min read
Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices
Architecture Digest
Architecture Digest
May 7, 2021 · Big Data

Comprehensive Overview of Data Middle Platform Architecture and Practices

This article provides a detailed introduction to data middle platform concepts, covering data aggregation, ingestion tools, offline and real‑time development, data governance, service layers, monitoring, and deployment patterns, illustrating how enterprises build unified data ecosystems across various industries.

Big DataData GovernanceData Platform
0 likes · 25 min read
Comprehensive Overview of Data Middle Platform Architecture and Practices
DataFunTalk
DataFunTalk
May 5, 2021 · Big Data

JD's OLAP Architecture: Design, Challenges, and Solutions

This article explains how JD constructs its OLAP platform from data ingestion to storage, querying, and management, describing the diverse data sources, real‑time and offline processing, scalability, consistency, fault tolerance, and future optimization plans, while addressing key technical challenges and solutions.

Big DataDistributed SystemsJD.com
0 likes · 15 min read
JD's OLAP Architecture: Design, Challenges, and Solutions