Tagged articles
558 articles
Page 2 of 6
Architecture & Thinking
Architecture & Thinking
Nov 15, 2024 · Databases

How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets

This article explains how Baidu’s TDE‑ClickHouse, as a core engine of the Turing 3.0 ecosystem, overcomes platform fragmentation, quality issues, and usability challenges through the OneData+ development paradigm, multi‑level aggregation, projection, query‑caching, bulk‑load ingestion, and a cloud‑native architecture to achieve sub‑second query response for massive data volumes.

Big DataClickHouseCloud Native
0 likes · 22 min read
How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets
Data Thinking Notes
Data Thinking Notes
Nov 5, 2024 · Big Data

How a Next‑Gen Data Management Platform Boosts Efficiency and Innovation

This article outlines the motivations, objectives, and architectural design of a next‑generation data management platform, detailing its four‑layer “four‑ization” approach, core services such as data integration, modeling, API provisioning, componentization, as well as governance, security, and operational best practices.

Big DataData GovernanceData Integration
0 likes · 20 min read
How a Next‑Gen Data Management Platform Boosts Efficiency and Innovation
37 Interactive Technology Team
37 Interactive Technology Team
Nov 4, 2024 · Artificial Intelligence

Developing RAG and Agent Applications with LangChain: A Case Study of an AI Assistant for Activity Components

The article outlines a step‑by‑step methodology for creating Retrieval‑Augmented Generation and custom Agent applications with LangChain, illustrated by an AI assistant for activity components that evolves from a rapid Dify prototype to a LangChain‑based RAG system and finally a hand‑crafted ReAct‑style agent, detailing LCEL chain composition, vector‑search integration, model performance trade‑offs, and a unified routing layer.

AI AssistantAgentCloud-native
0 likes · 6 min read
Developing RAG and Agent Applications with LangChain: A Case Study of an AI Assistant for Activity Components
Baidu Geek Talk
Baidu Geek Talk
Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGClickHouseCloud Native
0 likes · 19 min read
TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 21, 2024 · Big Data

Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF

Apache Doris 3.0 introduces storage‑compute separation, native lakehouse write‑back, optimized Variant handling for semi‑structured data, stronger ETL transaction support, enhanced multi‑table materialized views, and Java UDTF capabilities, providing developers with more flexible, cost‑effective, and high‑performance analytics solutions.

Apache DorisData WarehouseETL
0 likes · 7 min read
Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF
360 Tech Engineering
360 Tech Engineering
Oct 17, 2024 · Databases

Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow

This article explains DataFusion, a Rust‑written, Arrow‑based query engine that offers high performance, extensibility, and seamless integration with various data sources, detailing its architecture, execution model, Rust advantages, and practical usage examples for building modern data‑warehouse solutions.

Apache ArrowData WarehouseDataFusion
0 likes · 15 min read
Introducing DataFusion: A High‑Performance Rust‑Based Query Engine Powered by Apache Arrow
Baidu Tech Salon
Baidu Tech Salon
Oct 16, 2024 · Big Data

Design and Implementation of an Online/Offline Integrated Task Scheduling System for Baidu's Mobile Operations Promotion Platform (OPS)

The paper presents Baidu’s Mobile Operations Promotion Platform redesign, introducing an online‑offline integrated task‑scheduling architecture that partitions settlement fields to the data‑warehouse, records all jobs in a unified MySQL operation table, orchestrates them via Turing Data Studio, and manages dependencies to achieve consistent, auditable, billion‑scale settlement processing.

BaiduData WarehouseOps
0 likes · 14 min read
Design and Implementation of an Online/Offline Integrated Task Scheduling System for Baidu's Mobile Operations Promotion Platform (OPS)
DataFunTalk
DataFunTalk
Sep 28, 2024 · Big Data

Metric Management and Standardization in Didi's Data Platform

This article outlines Didi's approach to metric management, covering background, data product overview, and challenges in traditional and agile BI models, and presents a comprehensive solution for metric standardization, logical modeling, quality assurance, unified consumption, and future roadmap to improve data warehouse efficiency and consistency.

BIData Warehousedata modeling
0 likes · 21 min read
Metric Management and Standardization in Didi's Data Platform
DataFunTalk
DataFunTalk
Sep 20, 2024 · Databases

Technical Paper Summaries on Graph Databases, Vector Databases, and Real-Time Data Warehousing

This article compiles concise English summaries of several technical papers covering Xiaohongshu's REDgraph graph database, DingoDB vector database, Tianqiong autonomous data platform, Douyin's real‑time data warehouse, financial‑grade data warehousing, Alibaba Cloud ClickHouse Serverless offering, best practices in financial data governance, and 58.com user‑profile data warehouse construction.

Big DataData Warehousegraph database
0 likes · 5 min read
Technical Paper Summaries on Graph Databases, Vector Databases, and Real-Time Data Warehousing
DataFunTalk
DataFunTalk
Sep 17, 2024 · Databases

Overview of Recent Advances in Graph, Vector, and Real-Time Data Warehouse Technologies

This article presents a collection of technical abstracts covering graph database parallel query optimization, next‑generation vector databases, real‑time data warehouse architectures, and cloud‑native analytics solutions, while also providing instructions for obtaining the full e‑book via a WeChat public account.

Big DataCloud NativeData Warehouse
0 likes · 5 min read
Overview of Recent Advances in Graph, Vector, and Real-Time Data Warehouse Technologies
DataFunSummit
DataFunSummit
Sep 16, 2024 · Databases

DataFun Summit: Technical Papers on Graph Databases, Vector Databases, Real‑Time Data Warehouses and Industry Data Practices

The DataFun Summit page presents a collection of technical papers covering graph database parallel queries, next‑generation vector databases, real‑time data warehouse architectures, and best practices in finance and e‑commerce, while also providing instructions for obtaining the e‑book via a public account.

Big DataData WarehouseReal-time analytics
0 likes · 5 min read
DataFun Summit: Technical Papers on Graph Databases, Vector Databases, Real‑Time Data Warehouses and Industry Data Practices
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 13, 2024 · Big Data

How Qimao Scales 20PB Data with StarRocks, Flink, and Real‑Time Analytics

Qimao, a Shanghai‑based cultural entertainment internet firm, details its 20 PB big‑data architecture built on StarRocks, Flink, Hive, and Redis, covering data ingestion, real‑time processing, audience selection, metric anomaly drill‑down, 730‑day aggregation, and future plans for metric acceleration and full‑link data governance.

Big DataData GovernanceData Warehouse
0 likes · 13 min read
How Qimao Scales 20PB Data with StarRocks, Flink, and Real‑Time Analytics
DevOps
DevOps
Sep 12, 2024 · Fundamentals

Advantages, Disadvantages, and Principles of Layered Architecture

This article examines the common benefits, drawbacks, and design principles of layered architecture across micro‑service, data‑warehouse, and protocol designs, illustrating each point with real‑world examples and offering practical guidance on when and how to apply layering effectively.

Data Warehousedesign principleslayered architecture
0 likes · 11 min read
Advantages, Disadvantages, and Principles of Layered Architecture
Tencent Cloud Developer
Tencent Cloud Developer
Sep 11, 2024 · Fundamentals

Advantages, Disadvantages, and Principles of Layered Architecture in Software Systems

Layered architecture offers abstract stability, functional reuse, cohesion, hidden complexity, and scalability, but can introduce extra complexity, performance overhead, and dependency risk, so designers should retain essential layers, enforce one‑way cross‑layer calls, depend only on lower layers, keep lower layers stable, and ensure each layer has a clear purpose.

DDDData WarehouseMicroservices
0 likes · 11 min read
Advantages, Disadvantages, and Principles of Layered Architecture in Software Systems
DataFunSummit
DataFunSummit
Sep 8, 2024 · Big Data

Building and Optimizing a Cross‑Border E‑Commerce Data Platform: Architecture, Challenges, and Protonbase‑Based Solutions

This article presents Xide International's cross‑border e‑commerce data platform, detailing its multi‑layer business architecture, the scalability and data‑access problems encountered, and how a Protonbase‑driven data‑warehouse and micro‑service redesign dramatically improved query speed, operational efficiency, and cost.

Big DataData PlatformData Warehouse
0 likes · 11 min read
Building and Optimizing a Cross‑Border E‑Commerce Data Platform: Architecture, Challenges, and Protonbase‑Based Solutions
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 3, 2024 · Big Data

Mastering Data Modeling: From Raw Data to Insightful Warehouses

This article walks through the fundamentals of data modeling, explaining what data is, the DIKW framework, why modeling matters, and detailing the end‑to‑end process from conceptual design through logical and physical layers, including DIM, DWD, DWS, and ADM tables with practical tips and naming conventions.

Data WarehouseETLdata modeling
0 likes · 11 min read
Mastering Data Modeling: From Raw Data to Insightful Warehouses
DataFunTalk
DataFunTalk
Aug 27, 2024 · Big Data

Kuaishou's Year-Long White‑Box Cost Governance in Big Data: Engine, Data‑Warehouse, and Tool Optimizations

This article presents Kuaishou's comprehensive white‑box cost governance practice over the past year, detailing the data‑governance framework, engine and data‑warehouse white‑boxing techniques, compression algorithm replacement, HBO automatic tuning, operator analysis, and the resulting performance and cost benefits, as well as future plans.

Big DataCost OptimizationData Warehouse
0 likes · 29 min read
Kuaishou's Year-Long White‑Box Cost Governance in Big Data: Engine, Data‑Warehouse, and Tool Optimizations
DataFunSummit
DataFunSummit
Aug 26, 2024 · Big Data

Building a Doris‑Based Lakehouse Integrated Analytics System at Kuaishou

This article presents Kuaishou's experience of designing and implementing a Doris‑driven lakehouse integrated analytics system, covering the current OLAP landscape, challenges of data duplication and governance, the new architecture with caching and auto‑materialization, implementation details, performance impact, and future work.

Auto MaterializationBig DataData Warehouse
0 likes · 24 min read
Building a Doris‑Based Lakehouse Integrated Analytics System at Kuaishou
Bilibili Tech
Bilibili Tech
Aug 23, 2024 · Big Data

Accelerating Multi‑Dimensional OLAP Queries in ClickHouse with Grouping Sets, RBM, and Dense Dictionary Encoding

To achieve sub‑second, multi‑dimensional analytics on Bilibili’s hundred‑million‑row datasets, the team built a ClickHouse‑based acceleration layer that combines grouping‑set pre‑aggregation, bitmap (RBM) distinct handling, and a dense dictionary encoding service, dramatically cutting CPU, memory and query latency versus traditional OLAP pipelines.

Big DataBitmapClickHouse
0 likes · 28 min read
Accelerating Multi‑Dimensional OLAP Queries in ClickHouse with Grouping Sets, RBM, and Dense Dictionary Encoding
Data Thinking Notes
Data Thinking Notes
Aug 15, 2024 · Big Data

How to Build a Scalable Data Warehouse: Theory, Architecture, and Best Practices

This article outlines practical approaches to data warehouse construction, covering dimensional modeling, layered architecture, capability development, real‑time and batch processing with technologies like Hive, Spark, Flink, Iceberg, and discusses governance, security, and future trends toward data value and real‑time metrics.

Data GovernanceData WarehouseIceberg
0 likes · 13 min read
How to Build a Scalable Data Warehouse: Theory, Architecture, and Best Practices
21CTO
21CTO
Aug 13, 2024 · Databases

How PostgreSQL Can Replace Kafka, Redis, MongoDB and More in Your Stack

This article explores how PostgreSQL’s advanced features—UNLOGGED tables, JSONB, SKIP LOCKED, TimescaleDB, pg_cron, PostGIS, full‑text search, JSON generation, pgaudit, and GraphQL adapters—can replace specialized tools like Kafka, Redis, MongoDB, and others, simplifying the tech stack while boosting performance and maintainability.

Backend DevelopmentData WarehouseFull‑Text Search
0 likes · 23 min read
How PostgreSQL Can Replace Kafka, Redis, MongoDB and More in Your Stack
DataFunSummit
DataFunSummit
Aug 13, 2024 · Big Data

Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design

This article presents Qichacha's comprehensive data‑cost‑reduction strategy, detailing its Hadoop‑based three‑pillar architecture, layered data warehouse, Hive upgrades, unified metadata across multi‑cloud clusters, middleware choices such as Alluxio and JuiceFS, version‑compatible hybrid clouds, and Kubernetes‑driven resource orchestration to achieve scalable, low‑cost data processing.

Big DataData WarehouseHadoop
0 likes · 16 min read
Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design
DataFunSummit
DataFunSummit
Jul 20, 2024 · Databases

Real-time Data Update Solutions in TCHouse‑C: Architecture, Schema‑less Design, and Performance Evaluation

This article presents TCHouse‑C, a cloud‑native ClickHouse service, detailing its real‑time data update architecture, schema‑less ingestion, various update strategies such as Delete‑Insert and lightweight‑update/delete, and comprehensive performance tests comparing UniqueMergeTree with standard ClickHouse engines across import, query, and update workloads.

ClickHouseData WarehouseDelete-Insert
0 likes · 32 min read
Real-time Data Update Solutions in TCHouse‑C: Architecture, Schema‑less Design, and Performance Evaluation
Data Thinking Notes
Data Thinking Notes
Jul 11, 2024 · Big Data

How to Build a Robust Data Lineage Foundation for Scalable Business Insights

This article explains how to construct a full‑chain data lineage system, covering its overall architecture, quality measurement framework, and application layer, and demonstrates practical use cases such as handling data growth, monitoring warehouse changes, accelerating development, ensuring consistency, and automating metric decomposition in real‑world business scenarios.

Big DataData GovernanceData Lineage
0 likes · 14 min read
How to Build a Robust Data Lineage Foundation for Scalable Business Insights
Baidu Tech Salon
Baidu Tech Salon
Jul 11, 2024 · Industry Insights

How Baidu Feed Evolved Its Data Warehouse with Multi‑Version Wide Tables

This article outlines the step‑by‑step evolution of Baidu's Feed data warehouse—from traditional layered modeling to hour‑level core tables, then real‑time wide tables, and finally a flow‑batch integrated multi‑version wide‑table architecture—highlighting the motivations, design choices, challenges, and resulting benefits.

Big DataData WarehouseReal-time analytics
0 likes · 10 min read
How Baidu Feed Evolved Its Data Warehouse with Multi‑Version Wide Tables
DataFunSummit
DataFunSummit
Jul 1, 2024 · Big Data

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

This article details JD Retail's transition from a complex Lambda architecture to a unified real‑time data pipeline using Flink, Hudi, and StarRocks, addressing data completeness versus latency, reducing maintenance costs, improving storage efficiency, and delivering faster, more consistent analytics for business users.

Data WarehouseFlinkHudi
0 likes · 13 min read
Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 1, 2024 · Big Data

Applying Data Lake (Hudi) at Kuaishou: Architecture Evolution, Use Cases, and Practice

This article details Kuashou's journey of adopting the Hudi data lake, covering business challenges, migration from Hive to Hudi, architectural redesign, promotion strategy, real‑world use cases such as CDC sync and batch‑stream integration, and key lessons learned for large‑scale data engineering.

Big Data ArchitectureData WarehouseHudi
0 likes · 11 min read
Applying Data Lake (Hudi) at Kuaishou: Architecture Evolution, Use Cases, and Practice
DataFunTalk
DataFunTalk
Jun 27, 2024 · Big Data

Data Warehouse Construction and Data Governance Practices at Wing Payment

This presentation by senior data warehouse engineer Huang Luo details Wing Payment’s end‑to‑end data warehouse build, covering background challenges, governance framework, platform architecture, layered modeling, naming standards, asset management, monitoring, and future plans, illustrating how systematic data governance drives cost reduction, efficiency, and security.

AnalyticsBig DataData Governance
0 likes · 14 min read
Data Warehouse Construction and Data Governance Practices at Wing Payment
DataFunTalk
DataFunTalk
Jun 19, 2024 · Big Data

Evolution and Practices of E‑commerce Data Warehouse Governance

This article analyzes the current state, development stages, and comprehensive solutions of e‑commerce data‑warehouse governance, covering data quality, cost, security, and efficiency requirements, and presents a roadmap from early‑stage standardization to mature tool‑driven governance with future outlooks.

Big DataCost ManagementData Governance
0 likes · 13 min read
Evolution and Practices of E‑commerce Data Warehouse Governance
Data Thinking Notes
Data Thinking Notes
Jun 18, 2024 · Big Data

How to Build a Robust Data Metric System: From Atomic to Derived Indicators

This article explains what a metric (indicator) is, distinguishes atomic, derived and composite indicators, outlines the OSM and UJM modeling methods, describes the steps for constructing a metric system, its integration with data warehouses, and details the design and technical implementation of a metric management platform.

Business IntelligenceData WarehouseOSM model
0 likes · 13 min read
How to Build a Robust Data Metric System: From Atomic to Derived Indicators
ITPUB
ITPUB
Jun 9, 2024 · Databases

Doris vs ClickHouse: Which Database Fits Your Workload?

This article compares Doris and ClickHouse across architecture, table creation, ecosystem integration, management tools, query performance, and join capabilities, offering practical guidance on how to choose the right database based on your specific data processing and operational requirements.

ClickHouseData WarehouseSQL
0 likes · 10 min read
Doris vs ClickHouse: Which Database Fits Your Workload?
DataFunTalk
DataFunTalk
Jun 4, 2024 · Databases

From Lambda Architecture to an All‑in‑One Apache Doris Real‑Time/Offline Data Platform for 5G Connected Factories

The article explains how China Unicom transformed its 5G fully‑connected factory data pipeline from a complex Lambda architecture into a streamlined, real‑time and offline‑integrated solution built on Apache Doris, detailing system requirements, architectural redesign, performance gains, and future plans.

5GApache DorisBig Data
0 likes · 15 min read
From Lambda Architecture to an All‑in‑One Apache Doris Real‑Time/Offline Data Platform for 5G Connected Factories
DataFunTalk
DataFunTalk
Jun 1, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, covering the system architecture, streaming data quality assurance, flow‑batch integrated applications, and future data lake integration, while sharing technical details and operational insights for large‑scale data processing.

Data WarehouseFlinkreal-time data
0 likes · 16 min read
Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook
Data Thinking Notes
Data Thinking Notes
May 30, 2024 · Databases

Why Your Data Team Is Drowning in Requests—and How OLAP Can Save You

This article examines why data departments get overwhelmed by massive data‑retrieval requests, identifies root causes such as mindset, requirement handling, and lack of tools, and presents a technical solution centered on dimensional modeling and OLAP multi‑dimensional reporting to streamline data access and empower teams.

Big DataData WarehouseOLAP
0 likes · 12 min read
Why Your Data Team Is Drowning in Requests—and How OLAP Can Save You
DataFunTalk
DataFunTalk
May 28, 2024 · Big Data

Building and Managing a Metric System in Data Warehouse: Practices from Dongchedi

This article details how the Dongchedi business team designs, implements, and monitors a comprehensive metric system within its data warehouse, covering metric standards, model construction, metadata management, quality monitoring, application scenarios, and future directions using the DataLeap platform.

Big DataData GovernanceData Warehouse
0 likes · 18 min read
Building and Managing a Metric System in Data Warehouse: Practices from Dongchedi
Alibaba Cloud Developer
Alibaba Cloud Developer
May 27, 2024 · Big Data

How MaxCompute’s New Offline‑Near‑Real‑Time Architecture Revolutionizes Big Data Workloads

This article explains how MaxCompute’s integrated offline‑and‑near‑real‑time architecture, built on Delta Table, solves complex big‑data scenarios by providing unified storage, ACID transactions, upsert, time‑travel, automatic data‑file governance and low‑latency query capabilities while reducing cost and operational complexity.

Data WarehouseDelta TableMaxCompute
0 likes · 27 min read
How MaxCompute’s New Offline‑Near‑Real‑Time Architecture Revolutionizes Big Data Workloads
DataFunSummit
DataFunSummit
May 2, 2024 · Big Data

Building an Attribution System for NetEase Cloud Music Data Warehouse: Challenges and Solutions

This article presents the problems faced by NetEase Cloud Music's data warehouse attribution system and details a comprehensive solution that includes upgrading the event‑tracking framework, redesigning the attribution model, and launching a unified management platform to improve stability, accuracy, and scalability.

AnalyticsBig DataData Warehouse
0 likes · 13 min read
Building an Attribution System for NetEase Cloud Music Data Warehouse: Challenges and Solutions
DataFunTalk
DataFunTalk
Apr 15, 2024 · Databases

ByteHouse Cloud‑Native Data Warehouse Performance Whitepaper: Architecture, Optimizations, and Benchmark Results

The ByteHouse performance whitepaper details the cloud‑native data warehouse’s architecture, rule‑based and cost‑based optimizer enhancements, exchange runtime, runtime filters, parallelism and wide‑table optimizations, and presents benchmark comparisons on TPC‑DS, TPC‑H and SSB datasets demonstrating orders‑of‑magnitude query speed improvements.

BenchmarkByteHouseCloud Native
0 likes · 17 min read
ByteHouse Cloud‑Native Data Warehouse Performance Whitepaper: Architecture, Optimizations, and Benchmark Results
DataFunTalk
DataFunTalk
Apr 14, 2024 · Big Data

Third‑Generation Metric Platform: Enabling a Light Data Warehouse with NoETL

This article explains how a third‑generation metric platform replaces traditional ETL‑heavy data‑warehouse pipelines with a semantic‑driven NoETL approach, reducing cost, improving quality and efficiency, and delivering automated, self‑service analytics for both IT and business users.

Big DataData WarehouseNoETL
0 likes · 16 min read
Third‑Generation Metric Platform: Enabling a Light Data Warehouse with NoETL
DataFunTalk
DataFunTalk
Apr 12, 2024 · Big Data

Building and Managing an Indicator System in a Data Warehouse: Practices from the Dongchedi Business

This article explains how the Dongchedi team designed, implemented, and monitored a comprehensive indicator system within a petabyte‑scale data warehouse, covering standards, metadata management, model construction, quality monitoring, and diverse application scenarios to improve data reliability and business insight.

Big DataData GovernanceData Warehouse
0 likes · 18 min read
Building and Managing an Indicator System in a Data Warehouse: Practices from the Dongchedi Business
Didi Tech
Didi Tech
Mar 28, 2024 · Big Data

How We Unified Real‑Time and Batch Features with StarRocks in Financial Risk Control

This article analyzes the challenges of building real‑time and batch risk‑control features, compares Lambda and Kappa architectures, evaluates storage‑unified and compute‑unified solutions, and details how StarRocks was selected, validated, and deployed to achieve high‑performance, low‑latency feature serving in a financial context.

Big DataData WarehouseReal-time analytics
0 likes · 19 min read
How We Unified Real‑Time and Batch Features with StarRocks in Financial Risk Control
DataFunSummit
DataFunSummit
Mar 14, 2024 · Big Data

Tencent Game Data Analysis: Lakehouse Integration Practice

This article presents Tencent Game's comprehensive lakehouse integration practice, detailing the project background, storage‑compute separation, data layering, unified DDL/DML operations, performance optimizations, and future plans, illustrating how StarRocks, Iceberg, and Spark are combined to achieve scalable, cost‑effective analytics for massive game data.

Compute-Storage SeparationData WarehouseIceberg
0 likes · 16 min read
Tencent Game Data Analysis: Lakehouse Integration Practice
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Mar 4, 2024 · Big Data

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

Xiaohongshu’s data‑warehouse team integrated Apache Iceberg‑based data‑lake techniques into its existing warehouse, replacing the legacy Hive/Spark stack with global sorting, Z‑order, and upsert‑enabled tables, which cut query latency by up to 90 %, boosted data freshness by 50 %, slashed storage costs by 83 % and saved tens of thousands of GB‑hours of compute daily.

Apache IcebergData LakeData Warehouse
0 likes · 19 min read
Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations
Volcano Engine Developer Services
Volcano Engine Developer Services
Feb 29, 2024 · Big Data

How MetaApp Cut Data Warehouse Costs by 50% with ByConity

MetaApp replaced ClickHouse with the open‑source cloud‑native data warehouse ByConity, achieving over 50% cost reduction and faster, more stable OLAP queries by separating storage and compute, simplifying scaling, and improving resource utilization across a range of analytics workloads such as deduplication, retention, conversion and point‑lookup.

ByConityClickHouseCost reduction
0 likes · 13 min read
How MetaApp Cut Data Warehouse Costs by 50% with ByConity
Baidu Geek Talk
Baidu Geek Talk
Feb 28, 2024 · Big Data

How Baidu’s Fusion Compute Engine Cuts Query Time to Seconds on Petabyte‑Scale Data

This article analyzes Baidu's fusion compute engine for its data warehouse, detailing its architecture, optimization techniques such as data skipping, Parquet column indexing, ProjectLimit and CodeGen, and demonstrates how these innovations reduce query latency to seconds while cutting storage costs by about 30% on multi‑petabyte workloads.

BaiduBig DataData Warehouse
0 likes · 12 min read
How Baidu’s Fusion Compute Engine Cuts Query Time to Seconds on Petabyte‑Scale Data
DataFunTalk
DataFunTalk
Feb 27, 2024 · Big Data

Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan

This article presents Jushuitan's cloud‑native OLAP architecture, detailing its evolution, current big‑data stack—including DataWorks, MaxCompute, Flink, Hologres, and Aerospike—along with logistics warning workflows, rule‑matching mechanisms, real‑time processing challenges, and future scalability plans.

Big DataCloud NativeData Warehouse
0 likes · 20 min read
Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan
DataFunSummit
DataFunSummit
Feb 19, 2024 · Big Data

Yipay Data Warehouse Construction and Data Governance Practices

This presentation by senior data warehouse engineer Huang Luo details Yipay's end‑to‑end data warehouse build, covering background challenges, governance framework, platform development, layered architecture, naming standards, monitoring, and future plans, offering practical insights for data engineers, architects, and business stakeholders.

Big DataData ArchitectureData Quality
0 likes · 14 min read
Yipay Data Warehouse Construction and Data Governance Practices
DataFunSummit
DataFunSummit
Feb 7, 2024 · Big Data

Evolution of OLAP with Apache Doris at Xingyun Retail Credit

Facing rapid data growth, Xingyun Retail Credit transitioned from traditional OLTP systems to an Apache Doris‑based OLAP solution, detailing the data demand generation, OLAP engine selection challenges, multi‑stage implementation, performance gains, data‑warehouse construction, and future roadmap for scalable analytics.

Apache DorisBig DataData Warehouse
0 likes · 17 min read
Evolution of OLAP with Apache Doris at Xingyun Retail Credit
DataFunSummit
DataFunSummit
Jan 25, 2024 · Big Data

Best Practices of Jushuitan Cloud‑Native OLAP Architecture and Logistics Warning

This article presents Jushuitan's cloud‑native OLAP architecture, covering business background, data‑warehouse evolution, real‑time processing with Flink, Hologres, and Aerospike, and detailed logistics‑warning use cases, followed by technical challenges, future outlook, and a Q&A on implementation details.

Big DataData WarehouseFlink
0 likes · 20 min read
Best Practices of Jushuitan Cloud‑Native OLAP Architecture and Logistics Warning
DataFunTalk
DataFunTalk
Jan 3, 2024 · Databases

ClickHouse 2024 Core New Features and Product Development Directions

This article introduces ClickHouse, an open‑source columnar OLAP database, outlines its architecture, advantages, self‑hosted and cloud deployment models, highlights recent product features such as async inserts, JSON support, Parquet acceleration, query caching, and summarizes a Q&A covering semi‑structured data, MPP, virtual columns, and future roadmap.

ClickHouseColumnar DatabaseData Warehouse
0 likes · 12 min read
ClickHouse 2024 Core New Features and Product Development Directions
Architects Research Society
Architects Research Society
Jan 2, 2024 · Big Data

Understanding Data Lakes: Concepts, Benefits, Challenges, and Comparison with Data Warehouses

This article explains what a data lake is, its origins, key characteristics such as collecting all data, enabling diverse user access, and flexible processing, compares it with traditional data warehouses, discusses cost advantages, potential pitfalls like data swamps, and outlines best‑practice considerations for enterprise adoption.

AnalyticsData ArchitectureData Lake
0 likes · 10 min read
Understanding Data Lakes: Concepts, Benefits, Challenges, and Comparison with Data Warehouses
Weimob Technology Center
Weimob Technology Center
Jan 2, 2024 · Big Data

How to Efficiently Test BI Reports in a Hive‑StarRocks Data Warehouse

This article details practical methods for testing BI reports built on Hive and StarRocks, covering the report creation workflow, testing characteristics, SQL writing techniques, impact analysis, data warehouse simplification, and the application of data quality tools to ensure accurate and efficient reporting.

BI testingData QualityData Warehouse
0 likes · 9 min read
How to Efficiently Test BI Reports in a Hive‑StarRocks Data Warehouse
DataFunTalk
DataFunTalk
Jan 1, 2024 · Big Data

MaxCompute Semi-Structured Data: Concepts, Solutions, and Benefits

This article explains the nature of semi‑structured data, compares traditional schema‑on‑read and schema‑on‑write approaches, and details MaxCompute's columnar storage solution that balances flexibility, performance, and cost for large‑scale data warehouses.

Big DataColumnar StorageData Warehouse
0 likes · 19 min read
MaxCompute Semi-Structured Data: Concepts, Solutions, and Benefits
Data Thinking Notes
Data Thinking Notes
Dec 28, 2023 · Big Data

How Xiaomi Built a Scalable Metric System: Best Practices and Methodology

This article explains Xiaomi's end‑to‑end metric system construction, covering the definition of metrics, business pain points, the OSM (Object‑Strategy‑Measure) model, MECE principle, model design guidelines, data‑warehouse implementation, metric management, and the resulting data‑driven workflow across the company.

Big DataData GovernanceData Warehouse
0 likes · 10 min read
How Xiaomi Built a Scalable Metric System: Best Practices and Methodology
Zhuanzhuan Tech
Zhuanzhuan Tech
Dec 27, 2023 · Big Data

Implementing Self-Service OLAP Analytics with Quick BI and StarRocks: Architecture, Optimizations, and Lessons Learned

This article presents a comprehensive case study of building a self‑service OLAP analytics platform at ZhaiZhai using Quick BI and StarRocks, covering background motivations, technical architecture, implementation details, performance‑optimizing case studies, and the resulting business impact.

Data WarehouseOLAPPerformance Optimization
0 likes · 16 min read
Implementing Self-Service OLAP Analytics with Quick BI and StarRocks: Architecture, Optimizations, and Lessons Learned
DataFunSummit
DataFunSummit
Dec 7, 2023 · Databases

Apache Doris: A High‑Performance Real‑Time Analytical Database for Online High‑Concurrency Reporting

This article introduces Apache Doris, a real‑time analytical database built on an MPP architecture, explains its suitability for massive data workloads and online high‑concurrency reporting scenarios, and details the core technologies—storage models, vectorized query engine, materialized views, partitioning, indexing, row‑store and prepared statements—that enable sub‑second query latency and high QPS, while also showing a real‑world case study and how to join the Doris community.

Apache DorisData WarehouseMaterialized Views
0 likes · 13 min read
Apache Doris: A High‑Performance Real‑Time Analytical Database for Online High‑Concurrency Reporting
DataFunTalk
DataFunTalk
Nov 28, 2023 · Big Data

Xiaomi Metric System Construction and Management Best Practices

This article presents Xiaomi's comprehensive metric system framework, covering its definition, business pain points, the OSM and MECE methodologies, model design principles, data warehouse construction, metric management, and future outlook, illustrating how a unified data platform drives efficient business decision‑making.

Business IntelligenceData GovernanceData Warehouse
0 likes · 10 min read
Xiaomi Metric System Construction and Management Best Practices
Architects Research Society
Architects Research Society
Nov 26, 2023 · Big Data

Data Lake vs Data Warehouse: Key Differences and How to Choose

Data lakes and data warehouses serve different purposes in big‑data architectures; this article explains their definitions, core attributes, five major distinctions—including data retention, type support, user coverage, adaptability, and insight speed—and offers guidance on selecting or combining the two approaches.

AnalyticsData ArchitectureData Lake
0 likes · 12 min read
Data Lake vs Data Warehouse: Key Differences and How to Choose
Ctrip Technology
Ctrip Technology
Nov 23, 2023 · Big Data

Optimizing Data Warehouse Timeliness Using Metadata Lineage

This article presents a metadata‑driven approach to improve data warehouse timeliness by extracting upstream lineage, identifying over‑layered, duplicate, and critical‑path tasks, and applying targeted scheduling and code‑level optimizations, demonstrated with a hotel order wide‑table case study.

DAGData WarehouseLineage
0 likes · 7 min read
Optimizing Data Warehouse Timeliness Using Metadata Lineage
DataFunSummit
DataFunSummit
Nov 22, 2023 · Big Data

Bilibili Data Quality Assurance System: Architecture, Practices, and Case Study

This article presents Bilibili's data quality assurance system, detailing its evolution across four stages, the architectural framework, core capabilities such as a quality data warehouse, monitoring, collaborative safeguards, digital-driven optimization, and efficient incident handling, along with practical case studies and future outlooks.

Big DataData QualityData Warehouse
0 likes · 22 min read
Bilibili Data Quality Assurance System: Architecture, Practices, and Case Study
HomeTech
HomeTech
Nov 15, 2023 · Industry Insights

How to Build Accurate Data Asset Lineage for Data Warehouse Governance

This article explains the challenges of data asset lineage in large data warehouses, presents a comprehensive approach using business‑level instrumentation, SQL interceptor plugins, and ETL script parsing to generate fine‑grained lineage graphs, and demonstrates measurable improvements in coverage and zombie‑table cleanup.

Data GovernanceData LineageData Quality
0 likes · 18 min read
How to Build Accurate Data Asset Lineage for Data Warehouse Governance
Data Thinking Notes
Data Thinking Notes
Nov 9, 2023 · Big Data

How to Build a Scalable Data Governance System for Massive E‑Commerce Warehouses

This article outlines the challenges of ultra‑large e‑commerce data warehouses—such as SLA pressure, model instability, soaring resource costs, low governance efficiency, and fragmented processes—and presents a one‑stop, tiered data‑governance framework with stability, cost, and efficiency subsystems that drives distributed autonomous governance and measurable business value.

AutomationBig DataCost Optimization
0 likes · 11 min read
How to Build a Scalable Data Governance System for Massive E‑Commerce Warehouses
Volcano Engine Developer Services
Volcano Engine Developer Services
Nov 9, 2023 · Databases

How ByteHouse Redefines ELT for Cloud‑Native Data Warehousing

This article explains how ByteHouse, a cloud‑native data warehouse, shifts traditional ETL to ELT, simplifies data pipelines, enhances scalability, and introduces advanced features such as stage‑by‑stage scheduling, adaptive resource management, async execution, and future roadmap for big‑data workloads.

ByteHouseData WarehouseELT
0 likes · 16 min read
How ByteHouse Redefines ELT for Cloud‑Native Data Warehousing
dbaplus Community
dbaplus Community
Nov 8, 2023 · Big Data

Choosing Between Data Warehouse, Data Lake, and Lakehouse: When to Use Each

This article compares traditional data warehouses, modern data lakes, and emerging lakehouse architectures, explaining their design patterns, advantages, disadvantages, and suitable use cases, while detailing implementation considerations such as schema design, ETL/ELT processes, file formats like Delta, Iceberg, and Hudi, and factors influencing platform selection.

Apache SparkData LakeData Warehouse
0 likes · 20 min read
Choosing Between Data Warehouse, Data Lake, and Lakehouse: When to Use Each
Data Thinking Notes
Data Thinking Notes
Nov 2, 2023 · Operations

How Bilibili Built a Scalable Data Quality Assurance System for Its Data Warehouse

This article details Bilibili's data quality assurance framework, covering its evolution across four data platform stages, the architecture of its quality data warehouse, core capabilities such as a complete assurance system, digital‑driven continuous optimization, and efficient incident handling, plus case studies, future plans, and a Q&A session.

Big DataBilibiliData Platform
0 likes · 27 min read
How Bilibili Built a Scalable Data Quality Assurance System for Its Data Warehouse
DataFunTalk
DataFunTalk
Oct 25, 2023 · Databases

Apache Doris Summit Asia 2023: Highlights, Innovations, and Industry Use Cases

The Apache Doris Summit Asia 2023 showcased the milestone 2.0 release, impressive performance gains, rapid community growth, and diverse industry deployments, while outlining future cloud‑native and unified analytics directions that position Doris as a leading real‑time data warehouse solution.

Apache DorisBig DataCloud Native
0 likes · 13 min read
Apache Doris Summit Asia 2023: Highlights, Innovations, and Industry Use Cases
DataFunTalk
DataFunTalk
Oct 23, 2023 · Big Data

Alibaba Cloud DataWorks Intelligent Data Modeling: Practices, Challenges, and Solutions

This article introduces Alibaba Cloud DataWorks' intelligent data modeling tool, outlines the data demand flow, shares best practices and hands‑on demonstrations for data warehouse modeling, discusses common challenges and their solutions, and provides Q&A and product details for developers and data engineers.

Alibaba CloudBig DataData Warehouse
0 likes · 12 min read
Alibaba Cloud DataWorks Intelligent Data Modeling: Practices, Challenges, and Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 23, 2023 · Big Data

Bilibili Data Quality Assurance: Architecture, Goals, Core Capabilities, and Future Outlook

This article outlines Bilibili's data quality assurance framework, detailing its evolution across four development stages, the current data platform architecture, identified pain points, four key quality objectives, core capabilities such as a quality data warehouse, comprehensive monitoring, digital optimization, fault handling, and future directions.

Big DataData GovernanceData Platform
0 likes · 22 min read
Bilibili Data Quality Assurance: Architecture, Goals, Core Capabilities, and Future Outlook
DataFunSummit
DataFunSummit
Oct 22, 2023 · Big Data

How Kuaishou E‑commerce Leverages OLAP and a Unified Data Architecture to Solve Business Data Challenges

This article explains how Kuaishou's e‑commerce team built a unified OLAP‑based data platform—covering data ingestion, consistent dimensional and fact layers, metric management, and real‑time services—to address rapid growth, metric inconsistency, and operational inefficiencies across multiple business scenarios.

Big DataData ArchitectureData Warehouse
0 likes · 20 min read
How Kuaishou E‑commerce Leverages OLAP and a Unified Data Architecture to Solve Business Data Challenges
DataFunTalk
DataFunTalk
Oct 22, 2023 · Operations

Bilibili Data Quality Assurance System: Architecture, Practices, and Case Study

This article presents Bilibili's data quality assurance system, detailing its evolution across four data platform stages, the multi‑layer architecture, core capabilities such as a quality data warehouse, digital‑driven continuous optimization, and efficient incident handling, and concludes with a real‑world case study and future outlook.

Big DataData Warehousequality assurance
0 likes · 21 min read
Bilibili Data Quality Assurance System: Architecture, Practices, and Case Study
DataFunSummit
DataFunSummit
Oct 16, 2023 · Big Data

Elegant Dimensional Modeling and Multi‑Dimensional Analysis Design Practice

In this presentation, Qiu Shengchang shares his 13‑year experience designing elegant data‑warehouse architectures, detailing a highly generic dimensional model, extreme partitioned tables, and a universal multi‑dimensional analysis framework that enables rapid, comprehensive reporting on massive datasets.

Big DataData WarehouseMulti-dimensional Analysis
0 likes · 3 min read
Elegant Dimensional Modeling and Multi‑Dimensional Analysis Design Practice
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 18, 2023 · Big Data

Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse

The article explains the mainstream Lambda data‑warehouse architecture, its benefits and challenges, then introduces Hudi as a lake‑house solution that unifies real‑time and offline storage, describes the multi‑layer service design, and showcases three practical scenarios—stream processing, real‑time multidimensional analysis, and stream‑batch data reuse—demonstrating how the integrated architecture improves latency, cost, and operational complexity.

Batch ProcessingData WarehouseHudi
0 likes · 13 min read
Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse
dbaplus Community
dbaplus Community
Sep 10, 2023 · Big Data

Master Data Warehouse Architecture: Layers, Naming Rules, and Lifecycle Tips

This guide outlines a comprehensive data model architecture for data warehouses, detailing layer definitions (ODS, CDM, DWD, DWS, ADS), naming conventions, data type standards, partitioning strategies, redundancy rules, and lifecycle management policies to ensure consistency, performance, and maintainability across big‑data environments.

Data WarehousePartitioningnaming conventions
0 likes · 28 min read
Master Data Warehouse Architecture: Layers, Naming Rules, and Lifecycle Tips
DataFunSummit
DataFunSummit
Sep 8, 2023 · Big Data

Tianqiong OLAP Real‑time Lakehouse Fusion Platform Architecture Practice

This article explains why lake‑warehouse fusion is needed, describes the challenges of integrating real‑time data warehouses with data lakes, introduces a new StarRocks‑based architecture that supports real‑time ingestion, cooling, offline loading, and adaptive hot‑cold query rewriting, and outlines future plans and Q&A.

Big DataData IntegrationData Warehouse
0 likes · 21 min read
Tianqiong OLAP Real‑time Lakehouse Fusion Platform Architecture Practice
DataFunTalk
DataFunTalk
Sep 6, 2023 · Databases

Large Model + OLAP: Enabling a New Data Service Platform

This article details how Tencent Music combines large language models with an Apache Doris‑based OLAP engine, introduces a semantic layer, manual‑experience routing, schema mapping and plugin integration, and outlines the evolution of its data architecture through four versions to achieve real‑time, cost‑effective, and scalable intelligent data services.

Apache DorisData WarehouseOLAP
0 likes · 24 min read
Large Model + OLAP: Enabling a New Data Service Platform
JD Retail Technology
JD Retail Technology
Sep 4, 2023 · Big Data

JD Mini Program Data Center: Architecture, Milestones, and Real‑time Analytics Solutions

The article details the JD Mini Program platform, its data‑center development milestones, comprehensive business panorama, technical architecture, data collection, storage, and analysis pipelines—including Flink‑based real‑time monitoring, ClickHouse custom analytics, and Elasticsearch user‑behavior insights—while outlining current challenges and future AI‑driven enhancements.

Big DataClickHouseData Warehouse
0 likes · 16 min read
JD Mini Program Data Center: Architecture, Milestones, and Real‑time Analytics Solutions
DataFunTalk
DataFunTalk
Sep 4, 2023 · Big Data

Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment

This article presents a comprehensive overview of a batch‑stream unified storage solution built on Hudi and the Lakehouse Analysis Service (LAS), covering background challenges, architectural design, data organization, read/write mechanisms, BTS architecture, real‑world deployment scenarios, and future development plans.

Batch-StreamData WarehouseHudi
0 likes · 22 min read
Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment
DataFunTalk
DataFunTalk
Sep 3, 2023 · Big Data

Evolution of OLAP at Xingyun Retail Credit Using Apache Doris

This article details how Xingyun Retail Credit transitioned from traditional data warehouses to an Apache Doris‑based OLAP solution, covering data demand generation, OLAP engine selection challenges, multi‑stage implementation, performance optimizations, data‑warehouse construction, real‑world use cases, and future roadmap.

Apache DorisBig DataData Warehouse
0 likes · 16 min read
Evolution of OLAP at Xingyun Retail Credit Using Apache Doris