Tagged articles
38 articles
Page 1 of 1
Tencent Cloud Developer
Tencent Cloud Developer
May 8, 2025 · Big Data

How Setats Unifies Stream, Batch, and Incremental Processing for Real‑Time Data Lakes

At the 2025 DA Data+AI Conference in Shanghai, Tencent Cloud unveiled Setats—a unified stream‑batch‑incremental engine that cuts system costs, delivers second‑level data visibility and real‑time changelog generation, and demonstrates measurable performance gains in automotive IoT analytics while integrating tightly with the WeData platform.

Batch ProcessingBig Data ArchitectureData Lake
0 likes · 5 min read
How Setats Unifies Stream, Batch, and Incremental Processing for Real‑Time Data Lakes
Bilibili Tech
Bilibili Tech
Apr 8, 2025 · Big Data

Building a Real-Time Data Warehouse for B站 Game Business

To meet Bilibili’s rapidly expanding game business, the team built a unified real-time data warehouse using Hologres and Flink that replaces the traditional Lambda stack, delivering high-throughput writes, low-latency processing, seamless offline-online integration, global deployment, and real-time support for operations, advertising, and risk analytics.

Big Data ArchitectureData architecture case studyFlink
0 likes · 17 min read
Building a Real-Time Data Warehouse for B站 Game Business
ITPUB
ITPUB
Sep 11, 2024 · Big Data

Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate

The article examines the concepts of storage‑compute separation and the lake‑warehouse (lakehouse) model, tracing their evolution from physical Hadoop clusters to containerized compute and object storage, and argues that true separation requires MPP systems to adopt open standards, effectively merging lake and warehouse architectures.

Big Data ArchitectureHadoopLakehouse
0 likes · 7 min read
Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate
DataFunSummit
DataFunSummit
Jul 12, 2024 · Big Data

Data Lake Development Trends, Architecture, Integration, Lakehouse Core Capabilities, and Open Design

This article examines the current evolution of data lakes, detailing their overall architecture, batch and real‑time integration methods, Lakehouse core functionalities such as enhanced DML, schema evolution, ACID support, and open‑design principles that enable multi‑cloud deployment and seamless interaction with diverse compute engines.

Batch ProcessingBig Data ArchitectureData Lake
0 likes · 12 min read
Data Lake Development Trends, Architecture, Integration, Lakehouse Core Capabilities, and Open Design
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 1, 2024 · Big Data

Applying Data Lake (Hudi) at Kuaishou: Architecture Evolution, Use Cases, and Practice

This article details Kuashou's journey of adopting the Hudi data lake, covering business challenges, migration from Hive to Hudi, architectural redesign, promotion strategy, real‑world use cases such as CDC sync and batch‑stream integration, and key lessons learned for large‑scale data engineering.

Big Data ArchitectureData WarehouseHudi
0 likes · 11 min read
Applying Data Lake (Hudi) at Kuaishou: Architecture Evolution, Use Cases, and Practice
DataFunTalk
DataFunTalk
Jun 10, 2024 · Big Data

Data Lake Development Trends, Architecture, Integration, and Lakehouse Core Capabilities

This article reviews the latest developments in data lakes, including trend analysis, overall architecture, data integration methods, Lakehouse core capabilities, open design principles, stream‑batch unified processing, real‑time OLAP, and lake‑internal warehousing, highlighting how these advances reduce complexity and cost while improving data sharing and performance.

Big Data ArchitectureLakehouseReal-time OLAP
0 likes · 14 min read
Data Lake Development Trends, Architecture, Integration, and Lakehouse Core Capabilities
DataFunTalk
DataFunTalk
Jul 10, 2023 · Big Data

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

This article presents a comprehensive overview of Lakehouse‑based in‑lake warehousing, covering common data‑lake misconceptions, the evolution from databases to data warehouses and lakes, the advantages of Lakehouse over traditional architectures, a reference multi‑layer architecture, typical use cases, challenges, future plans, and a brief Q&A.

Big Data ArchitectureData LakeData Warehouse
0 likes · 20 min read
Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture
dbaplus Community
dbaplus Community
May 9, 2023 · Big Data

How a Bank Built a Near‑Real‑Time Data Platform with Kafka, Flink & Hudi

An in‑depth case study of a Chinese bank’s near‑real‑time data platform reveals its evolution from a monolithic CDC pipeline to a split architecture featuring a real‑time data lake and a data‑service bus, detailing component choices, schema‑registry integration, SDK development, observability, and future roadmap.

Big Data ArchitectureData LakeFlink
0 likes · 18 min read
How a Bank Built a Near‑Real‑Time Data Platform with Kafka, Flink & Hudi
Data Thinking Notes
Data Thinking Notes
Dec 23, 2022 · Big Data

How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

This article explains why real‑time data warehouses are becoming essential, outlines their goals, compares them with traditional offline warehouses, and presents detailed design patterns, naming conventions, and case studies from Didi, Kuaishou, Tencent, Youzan and other enterprises, highlighting challenges and solutions for streaming, storage, and query layers.

Big Data ArchitectureData LakeETL
0 likes · 49 min read
How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices
ByteDance Data Platform
ByteDance Data Platform
Nov 16, 2022 · Big Data

How ByteDance’s Data Lake Powers Near‑Real‑Time E‑Commerce Analytics

This article explains ByteDance’s data lake technology, its Apache Hudi‑based features, near‑real‑time architecture, and practical e‑commerce use cases such as marketing promotion, traffic diagnosis, logistics monitoring, risk governance, and operational monitoring, while outlining future challenges and plans.

Apache HudiBig Data ArchitectureData Lake
0 likes · 15 min read
How ByteDance’s Data Lake Powers Near‑Real‑Time E‑Commerce Analytics
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Nov 11, 2022 · Industry Insights

How Real-Time Data Middle Platforms are Transforming the Telecom Industry

This article analyzes why telecom operators need a real‑time data middle platform, outlines its layered architecture and model design, examines the shift from Lambda to Kappa and lakehouse approaches, and highlights how these innovations enable faster, scenario‑driven insights and competitive advantage.

Big Data ArchitectureData Middle PlatformFlink
0 likes · 15 min read
How Real-Time Data Middle Platforms are Transforming the Telecom Industry
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Oct 26, 2022 · Big Data

Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice

Arctic, NetEase’s streaming lakehouse built on Apache Iceberg, unifies streaming and batch workloads with millisecond‑level latency, Hive compatibility, and built‑in message‑queue support, delivering CDC, upserts and OLAP without a Lambda architecture, as demonstrated by real‑time processing of 2 PB of Hive data for Cloud Music.

Apache IcebergArcticBig Data Architecture
0 likes · 15 min read
Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice
Bilibili Tech
Bilibili Tech
Jul 15, 2022 · Big Data

Lakehouse Architecture Practice at Bilibili: Query Acceleration and Index Enhancement

Bilibili’s lakehouse architecture merges Iceberg‑based data lake flexibility with data‑warehouse efficiency, using Kafka‑Flink real‑time ingestion, Spark offline loads, Trino queries, Alluxio caching, Z‑Order/Hilbert sorting, and enhanced BloomFilter and bitmap indexes to boost query speed up to tenfold while drastically cutting file reads.

Big Data ArchitectureBitmap IndexData Lake
0 likes · 17 min read
Lakehouse Architecture Practice at Bilibili: Query Acceleration and Index Enhancement
Shopee Tech Team
Shopee Tech Team
Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBatch Processing
0 likes · 20 min read
Building Real-Time Data Warehouse with Flink + Hudi at Shopee
dbaplus Community
dbaplus Community
Mar 2, 2022 · Big Data

How Real‑Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

This article explores the growing demand for real‑time data warehouses, compares them with traditional offline warehouses, and presents detailed architectures, layer designs, naming conventions, and case studies from companies like Didi, Kuaishou, Tencent, and Youzan, highlighting challenges, solutions, and performance optimizations.

Big Data ArchitectureFlinkIceberg
0 likes · 47 min read
How Real‑Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices
Tencent Cloud Developer
Tencent Cloud Developer
Feb 28, 2022 · Big Data

GooseFS: Distributed Caching System for Storage-Compute Separation Architecture

GooseFS, Tencent Cloud’s distributed caching system for storage‑compute separation, links compute frameworks to underlying storage (COS, CHDFS, COSN) and boosts big‑data and AI workloads by 2‑10× through transparent acceleration, robust master‑worker architecture, Raft‑based HA, tiered caching, and metadata optimizations, delivering up to 50% cost savings and 29% faster compute jobs.

Big Data ArchitectureGooseFSRaft consensus
0 likes · 18 min read
GooseFS: Distributed Caching System for Storage-Compute Separation Architecture
Baidu Geek Talk
Baidu Geek Talk
Nov 24, 2021 · Big Data

Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned

At Baidu Aifanfan, the data team built a unified real‑time and offline big‑data platform—leveraging Watt, Bigpipe, Fengge, AFS and Palo within Lambda/Kappa patterns and a fast‑slow parallel rollout—that cut OLAP query latency from 18 minutes to under 15 seconds, enabled self‑service analytics, and standardized metrics across 15 agile teams.

Apache DorisBig Data ArchitectureData Governance
0 likes · 23 min read
Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 7, 2020 · Big Data

How to Build a New‑Retail Data Middle Platform with DataWorks

This article explains how new‑retail companies can design and implement a data middle platform using Alibaba Cloud's DataWorks, covering business model analysis, technical architecture, layer‑by‑layer data modeling, governance, security, and the concrete benefits of turning raw data into actionable business insights.

Big Data ArchitectureData GovernanceData Middle Platform
0 likes · 28 min read
How to Build a New‑Retail Data Middle Platform with DataWorks
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 12, 2020 · Big Data

Technical Architecture and Component Selection of a Real‑time Data Platform (RTDP)

This article details the technical architecture of a Real‑time Data Platform (RTDP), covering component selection such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses design considerations, data management, security, operational practices, and various deployment modes for big‑data applications.

Big Data ArchitectureRTDPdata security
0 likes · 22 min read
Technical Architecture and Component Selection of a Real‑time Data Platform (RTDP)
Yanxuan Tech Team
Yanxuan Tech Team
Sep 1, 2020 · Big Data

Yanxuan’s Data Warehouse Blueprint: Architecture, Standards, and Evaluation

This article introduces Yanxuan’s data warehouse concept, platform layers, development standards, and a comprehensive evaluation framework, detailing its multi‑layer architecture (ODS, DWD, DWS, DIM, DM), supporting offline and real‑time platforms, and six key assessment dimensions such as data quality, security, and development efficiency.

Big Data ArchitectureEvaluation Metrics
0 likes · 12 min read
Yanxuan’s Data Warehouse Blueprint: Architecture, Standards, and Evaluation
Didi Tech
Didi Tech
Aug 26, 2020 · Big Data

Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons

To support Didi’s fast‑growing car‑pool service, a real‑time data warehouse was built using a streamlined layered architecture—ODS, DWD, DIM, DWM, and APP—leveraging Flink‑based StreamSQL, Kafka, Druid and ClickHouse to deliver minute‑level analytics, dashboards, monitoring, and cross‑business interfaces while planning unified meta‑store integration.

Big Data ArchitectureData PlatformFlink
0 likes · 20 min read
Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 16, 2020 · Big Data

Hot and Cold Data Separation in Big Data Systems

The article explains the concept of hot and cold data, why separating them reduces cost, and presents heterogeneous and homogeneous architectural solutions—including Elasticsearch, HBase, AWS S3, and cloud‑based UltraWarm—illustrated with network‑behavior and e‑commerce order system case studies.

AWS S3Big Data ArchitectureData Lifecycle
0 likes · 11 min read
Hot and Cold Data Separation in Big Data Systems
Huolala Tech
Huolala Tech
May 28, 2020 · Big Data

How Flink Powers Real‑Time Risk Control at HuoLaLa: Architecture and Insights

This article explains Flink's role in HuoLaLa's risk‑control system, covering its background, the Lambda‑style architecture that combines batch and streaming, the real‑time data pipeline, machine‑learning models, and operational safeguards that together enable proactive fraud detection.

Big Data ArchitectureFlinkLambda architecture
0 likes · 16 min read
How Flink Powers Real‑Time Risk Control at HuoLaLa: Architecture and Insights
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 7, 2019 · Big Data

Understanding Data Middle Platform: Definition, Construction, Product Selection, and Case Studies

This article explains what a data middle platform is, outlines its construction process, discusses how to choose suitable products, and presents enterprise case studies, offering a comprehensive guide to building and leveraging a data middle platform for big‑data initiatives.

Big Data ArchitectureData GovernanceData Middle Platform
0 likes · 5 min read
Understanding Data Middle Platform: Definition, Construction, Product Selection, and Case Studies
Mafengwo Technology
Mafengwo Technology
Sep 26, 2019 · Big Data

Mafengwo’s Data Warehouse & Middle Platform: Architecture, Modeling, Toolchain

This article details Mafengwo’s journey in constructing a data warehouse and data middle platform, covering the core three‑layer architecture, hybrid modeling approaches, the supporting toolchain for data synchronization, scheduling, and metadata management, and the design of an indicator platform for business analytics.

Big Data ArchitectureData Middle PlatformData Warehouse
0 likes · 18 min read
Mafengwo’s Data Warehouse & Middle Platform: Architecture, Modeling, Toolchain
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Sep 6, 2019 · Big Data

Real-Time Data Architecture, Evolution, and Applications at an Online School

The article details the six‑layer big‑data architecture of an online school, chronicles its migration from Storm to Spark Streaming and finally to Flink, and showcases concrete real‑time applications such as gateway monitoring, user‑profile tagging, renewal reporting, and advertising analysis, while outlining future development directions.

AnalyticsBig Data ArchitectureFlink
0 likes · 14 min read
Real-Time Data Architecture, Evolution, and Applications at an Online School
dbaplus Community
dbaplus Community
Aug 8, 2018 · Big Data

How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

This article explains the architecture of a Real‑Time Data Platform (RTDP), details the technical selection of core components such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses data management, security, operations, and four deployment modes—synchronization, flow, rotation and intelligent—illustrating how each fits different business scenarios.

Big Data ArchitectureData IntegrationKafka
0 likes · 24 min read
How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 4, 2017 · Big Data

How Alibaba Powered Double 11 with Real‑Time Big Data Processing

Alibaba’s Double 11 live‑data dashboards required ultra‑high‑precision, low‑latency real‑time processing of billions of events, and the article explains the end‑to‑end architecture—including DRC, TimeTunnel, Galaxy, OTS, XTool, and OneService—used to achieve million‑plus QPS, fault‑tolerance, and flexible data collection.

AlibabaBig Data ArchitectureReal-time Streaming
0 likes · 14 min read
How Alibaba Powered Double 11 with Real‑Time Big Data Processing
Liulishuo Tech Team
Liulishuo Tech Team
Jun 17, 2016 · Big Data

Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design

This article details the architectural design and implementation of a scalable big data platform built on AWS services, highlighting the transition from HDFS to S3 for storage, the use of EMR for elastic compute, and a custom Execution Service integrated with Consul and Airflow for automated cluster management and task scheduling.

AWS EMRAirflowBig Data Architecture
0 likes · 11 min read
Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design
21CTO
21CTO
Nov 4, 2015 · Big Data

Evolution of Dazhong Dianping’s Data Platform (2012‑2014): Key Lessons for Growing Big Data Teams

This article chronicles the step‑by‑step evolution of Dazhong Dianping’s data platform from 2012 to 2014, detailing changes in data models, storage and compute architecture, scheduling, monitoring, and data‑driven applications, offering practical insights for teams building early‑stage big‑data infrastructures.

Big Data ArchitectureData PlatformData Warehouse
0 likes · 7 min read
Evolution of Dazhong Dianping’s Data Platform (2012‑2014): Key Lessons for Growing Big Data Teams