Tagged articles
558 articles
Page 3 of 6
Didi Tech
Didi Tech
Aug 31, 2023 · Big Data

Data Stability Construction and Fault Governance Practices at Didi Customer Service

Didi’s multi‑year data‑stability program for its customer‑service platform progressed through fault‑centered engineering, business‑aligned cross‑team work, and capability normalization, instituting pre‑, mid‑ and post‑fault safeguards, clear ownership, automated alerts and repair tools, which cut fault count by 42 % and more than doubled mean‑time‑to‑repair while boosting team communication and satisfaction.

AutomationData ReliabilityData Warehouse
0 likes · 16 min read
Data Stability Construction and Fault Governance Practices at Didi Customer Service
ByteDance Data Platform
ByteDance Data Platform
Aug 30, 2023 · Big Data

How We Cut Offline Data Warehouse SLA Delay from 13 Days to Zero with DataLeap

The article details how the "Xingfu Li" real‑estate platform tackled a 13‑day offline data‑warehouse SLA delay by adopting Volcano Engine's DataLeap suite, outlining the challenges, the three‑step governance process, and the measurable improvements achieved across task coverage, alert reduction, and data stability.

Big DataData GovernanceData Warehouse
0 likes · 10 min read
How We Cut Offline Data Warehouse SLA Delay from 13 Days to Zero with DataLeap
DataFunTalk
DataFunTalk
Aug 29, 2023 · Big Data

MaxCompute Incremental Update, Processing Architecture, and Intelligent Data Warehouse Optimizations

This article presents a comprehensive overview of MaxCompute's incremental update and processing architecture, the design of intelligent materialized views, and the engine's adaptive execution optimizations, detailing the integrated near‑real‑time and batch pipelines, transactional table 2.0, and practical Q&A.

Big DataData WarehouseMaxCompute
0 likes · 21 min read
MaxCompute Incremental Update, Processing Architecture, and Intelligent Data Warehouse Optimizations
DataFunTalk
DataFunTalk
Aug 28, 2023 · Big Data

Practical Experience of an E‑commerce Platform’s Offline and Real‑time Data Warehouse

This article shares the practical architecture, technology selection, implementation details, and evolution of an e‑commerce platform’s offline and real‑time data warehouses, covering data modeling, processing pipelines, system components such as Hive, Spark, Flink, ClickHouse, Doris, and Hudi, and the lessons learned from multiple production deployments.

Big DataClickHouseData Warehouse
0 likes · 18 min read
Practical Experience of an E‑commerce Platform’s Offline and Real‑time Data Warehouse
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Aug 25, 2023 · Databases

Unlock GaussDB(DWS) Performance: Expert Tips for Resource Management

This announcement introduces Huawei Cloud’s DTT live session on August 29, where expert Lv Pengbo will explain GaussDB(DWS) resource control principles and demonstrate practical techniques—such as CPU usage analysis, memory tuning, and queue issue resolution—to help developers efficiently manage data‑warehouse resources and boost performance.

Data WarehouseDatabase TuningGaussDB
0 likes · 3 min read
Unlock GaussDB(DWS) Performance: Expert Tips for Resource Management
Tencent Cloud Developer
Tencent Cloud Developer
Aug 23, 2023 · Big Data

WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization

The WeChat Experiment Platform migrated its 60,000 metric, 200,000 core, 30 PB plus data pipeline to an Iceberg based lakehouse, leveraging three layer metadata, fine grained partitioning, MERGE into writes, time travel snapshots and skew handling UDFs, which cut core time by 69%, saved ~100 PB storage, and reduced latency by up to 70%.

Big DataData WarehouseIceberg
0 likes · 18 min read
WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization
DataFunTalk
DataFunTalk
Aug 14, 2023 · Big Data

Data Warehouse Modeling Platform: Exploration and Practice at NetEase Yanxuan

This article details NetEase Yanxuan’s exploration and practice of a data warehouse modeling platform, covering background, current challenges, a comprehensive solution, step‑by‑step implementation, and the resulting improvements in model standardization, automation, and business value.

AutomationBig DataData Warehouse
0 likes · 18 min read
Data Warehouse Modeling Platform: Exploration and Practice at NetEase Yanxuan
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 7, 2023 · Big Data

Using Doris for Real‑Time Data Warehousing: Benefits, Drawbacks, and Comparison with Flink

The article examines Doris‑based real‑time data warehousing, outlining why teams choose this approach, comparing its low‑threshold development and operational simplicity to Flink’s high‑cost streaming, and highlighting latency, scale limits, and the strict monitoring required for production use.

Big DataData WarehouseFlink
0 likes · 5 min read
Using Doris for Real‑Time Data Warehousing: Benefits, Drawbacks, and Comparison with Flink
DataFunTalk
DataFunTalk
Jul 25, 2023 · Databases

Building an Integrated Metric Data Service Platform with Apache Doris: Architecture Evolution and Millisecond‑Level Query Performance

This article describes how Financial One Account, a technology service arm of Ping An, migrated from a Hadoop‑Presto‑Kylin stack to an Apache Doris‑based data platform, detailing the architectural evolution, OLAP engine selection, metric system design, performance optimizations, and future roadmap for real‑time analytics.

Apache DorisBig DataData Warehouse
0 likes · 15 min read
Building an Integrated Metric Data Service Platform with Apache Doris: Architecture Evolution and Millisecond‑Level Query Performance
DataFunSummit
DataFunSummit
Jul 20, 2023 · Big Data

Cloud‑Native OLAP on Volcano EMR: Architecture, Capabilities, and Customer Cases

This article introduces Volcano EMR's cloud‑native OLAP solution, detailing its product overview, storage‑compute separation, elastic scaling, cost and hot‑cold data management, intelligent query analysis, multiple customer case studies, and future roadmap for real‑time and offline data warehousing.

Cost ManagementData WarehouseEMR
0 likes · 11 min read
Cloud‑Native OLAP on Volcano EMR: Architecture, Capabilities, and Customer Cases
JD Cloud Developers
JD Cloud Developers
Jul 19, 2023 · Databases

Why ClickHouse Is the Ideal Choice for Massive Data Storage and Real‑Time Analytics

This article examines the massive‑scale data requirements of an activity‑tracking platform, compares MySQL, Elasticsearch and HBase, and explains why ClickHouse—with its columnar storage, MergeTree engine, vectorized execution, and distributed architecture—offers the best combination of storage capacity, write performance, real‑time analysis, and query speed for billions of records.

ClickHouseColumnar DatabaseData Warehouse
0 likes · 31 min read
Why ClickHouse Is the Ideal Choice for Massive Data Storage and Real‑Time Analytics
DataFunTalk
DataFunTalk
Jul 10, 2023 · Big Data

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

This article presents a comprehensive overview of Lakehouse‑based in‑lake warehousing, covering common data‑lake misconceptions, the evolution from databases to data warehouses and lakes, the advantages of Lakehouse over traditional architectures, a reference multi‑layer architecture, typical use cases, challenges, future plans, and a brief Q&A.

Big Data ArchitectureData LakeData Warehouse
0 likes · 20 min read
Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture
MaGe Linux Operations
MaGe Linux Operations
Jul 8, 2023 · Databases

Why Columnstore Indexes Supercharge SQL Server Queries

This article explains how columnstore indexes differ from traditional row stores, detailing their batch processing, compression, and storage mechanisms that can boost data‑warehouse query performance by up to ten times while reducing storage size dramatically.

Columnstore IndexData WarehouseSQL Server
0 likes · 11 min read
Why Columnstore Indexes Supercharge SQL Server Queries
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 4, 2023 · Big Data

Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics

This article presents a step‑by‑step guide on how the logistics provider Haicheng Bangda implemented a streaming data warehouse using Paimon, Flink CDC, and Kubernetes, covering business background, architecture choices, environment setup, SQL examples, troubleshooting tips, and future roadmap for their digital transformation.

Big DataCDCData Warehouse
0 likes · 27 min read
Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics
AI Cyberspace
AI Cyberspace
Jul 4, 2023 · Databases

Benchmarking Cloud‑Native Data Warehouses: Cloudwave vs StarRocks Performance Test

This article compares traditional databases with modern cloud‑native data warehouses, outlines a detailed performance testing methodology using the SSB1000 benchmark, presents test scripts and environment setup for Cloudwave and StarRocks, and analyzes the results to highlight strengths and optimization opportunities.

Data WarehousePerformance TestingSQL
0 likes · 21 min read
Benchmarking Cloud‑Native Data Warehouses: Cloudwave vs StarRocks Performance Test
21CTO
21CTO
Jun 30, 2023 · Information Security

How WeChat’s Security Data Warehouse Powers Billions of Daily Feature Reads

This article explains the origins, evolution, and current architecture of WeChat’s security data warehouse, detailing its unified feature storage, data quality guarantees, multi‑IDC synchronization, and operational system that streamlines feature management, analysis, and deployment to support the platform’s massive security strategy.

Big DataData WarehouseFeature Management
0 likes · 15 min read
How WeChat’s Security Data Warehouse Powers Billions of Daily Feature Reads
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 27, 2023 · Big Data

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

This article details Alibaba Cloud MaxCompute’s lakehouse evolution, describing its unified storage‑metadata‑compute design, the Transactional Table 2.0 format, near‑real‑time incremental ingestion, clustering and compaction services, transaction handling, TimeTravel and incremental queries, and future roadmap for big‑data workloads.

Big DataData WarehouseIncremental Processing
0 likes · 23 min read
How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing
Data Thinking Notes
Data Thinking Notes
Jun 18, 2023 · Big Data

Data Lake vs Data Warehouse: Uncover the Real Differences

This article explores the evolving concept of data lakes, compares them with traditional data warehouses across storage, modeling, tooling, and user roles, and examines the emerging lake‑warehouse integration, highlighting why both remain essential in modern big‑data architectures.

Big DataData ArchitectureData Lake
0 likes · 12 min read
Data Lake vs Data Warehouse: Uncover the Real Differences
Data Thinking Notes
Data Thinking Notes
Jun 14, 2023 · Big Data

Why Data Warehouse Standards Matter and How to Implement Them Effectively

This article explains why data‑warehouse standards are essential for improving team efficiency, product quality, and maintenance costs, and provides a step‑by‑step guide covering standard creation, discussion, rollout, supervision, continuous improvement, as well as detailed design, process, quality, and security specifications.

Big DataData WarehouseSecurity
0 likes · 18 min read
Why Data Warehouse Standards Matter and How to Implement Them Effectively
DataFunTalk
DataFunTalk
Jun 14, 2023 · Big Data

Active Data Governance with Operator-Level Lineage: Practices and Exploration

This article presents Big Data company's active data governance practice using operator-level lineage, detailing the shortcomings of traditional lineage, the implementation of indicator chain governance, and the exploration of proactive model governance to achieve smarter, more precise data management.

Big DataData GovernanceData Warehouse
0 likes · 14 min read
Active Data Governance with Operator-Level Lineage: Practices and Exploration
DataFunSummit
DataFunSummit
Jun 8, 2023 · Big Data

Methodology and Practice of Onedata Data Warehouse Construction

This article presents a comprehensive methodology for building an Onedata data warehouse, covering the conceptual framework, data modeling processes, the Inmon and Kimball approaches, practical case studies from Baidu, Huawei, and banking, and key takeaways for enterprise data architecture.

Data WarehouseOnedatadata modeling
0 likes · 12 min read
Methodology and Practice of Onedata Data Warehouse Construction
Data Thinking Notes
Data Thinking Notes
Jun 4, 2023 · Big Data

How Distributed Lakehouse Architecture Solves Data Swamp Challenges

This article examines the explosion of heterogeneous data sources, the limitations of traditional data lakes and warehouses, and proposes a distributed lakehouse architecture that integrates advanced management layers to improve data governance, reliability, and support both SQL and advanced analytics workloads.

Data GovernanceData LakeData Warehouse
0 likes · 29 min read
How Distributed Lakehouse Architecture Solves Data Swamp Challenges
DataFunSummit
DataFunSummit
Jun 4, 2023 · Databases

From Apache Doris to SelectDB: Evolution Towards the Next‑Generation Cloud‑Native Data Warehouse

This presentation introduces Apache Doris, examines changing data analysis demands in the cloud era, explains why SelectDB was created, and details SelectDB’s cloud‑native architecture, performance, unified capabilities, ease of use, cost efficiency, open‑source nature, and its application scenarios for modern data warehousing and log analytics.

AnalyticsApache DorisCloud-native
0 likes · 15 min read
From Apache Doris to SelectDB: Evolution Towards the Next‑Generation Cloud‑Native Data Warehouse
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
May 17, 2023 · Databases

StarRocks Production Practice at Tongcheng Travel: Architecture, Use Cases, and Technical Evaluation

This article details Tongcheng Travel’s production deployment of the StarRocks OLAP database, covering background, business scenarios, technical evaluation against ClickHouse and Greenplum, implementation with Flink SQL, real‑time analytics, offline reporting, CDP use cases, performance optimizations, and future cloud‑native plans.

Big DataData WarehouseFlink
0 likes · 12 min read
StarRocks Production Practice at Tongcheng Travel: Architecture, Use Cases, and Technical Evaluation
DataFunSummit
DataFunSummit
May 16, 2023 · Big Data

LakeSoul Joins LF AI & Data Foundation as an Open‑Source Cloud‑Native Lakehouse Framework

LakeSoul, China's only open‑source lakehouse project, has been donated to the LF AI & Data Foundation, becoming its first lake‑warehouse framework and offering ACID‑guaranteed high‑concurrency upserts, a high‑performance Rust‑based I/O layer, real‑time data‑warehouse capabilities, and seamless AI/BI integration for modern big‑data applications.

AIData WarehouseLakeSoul
0 likes · 7 min read
LakeSoul Joins LF AI & Data Foundation as an Open‑Source Cloud‑Native Lakehouse Framework
DataFunTalk
DataFunTalk
Apr 18, 2023 · Big Data

Real-time OLAP with Apache Doris: Architecture, Use Cases, and Optimization at Dingdong Maicai

This article details Dingdong Maicai's adoption of Apache Doris as a real‑time OLAP engine, covering business requirements, comparative evaluation with ClickHouse, system architecture, practical applications such as real‑time analytics, B‑end queries, tag systems, and performance‑boosting techniques like Colocate Join, bitmap, prefix and Bloom‑filter indexes, materialized views, and streamlined Broker Load workflows.

Apache DorisBig DataData Warehouse
0 likes · 19 min read
Real-time OLAP with Apache Doris: Architecture, Use Cases, and Optimization at Dingdong Maicai
DataFunTalk
DataFunTalk
Apr 13, 2023 · Big Data

Four Paradigms of StarRocks Lakehouse Integration and an Overview of StarRocks 3.0

This article explains why lake‑warehouse integration is needed, outlines its challenges, describes StarRocks' four integration paradigms—including query acceleration, layered modeling, real‑time warehouse‑lake fusion, and the cloud‑native 3.0 solution—and previews the upcoming StarRocks 3.0 release.

Big DataCloud NativeData Lake
0 likes · 18 min read
Four Paradigms of StarRocks Lakehouse Integration and an Overview of StarRocks 3.0
DataFunTalk
DataFunTalk
Apr 4, 2023 · Big Data

Upgrading Hangzhou Bank Consumer Finance Big Data Platform with Apache Doris 1.2: Architecture, Performance Gains, and Integration

This article details how Hangzhou Bank Consumer Finance modernized its big‑data platform by introducing Apache Doris 1.2, replacing the original Greenplum + CDH architecture, unifying data sources via Multi‑Catalog, achieving second‑level query latency, reducing storage and compute costs, and outlining the integration workflow with DolphinScheduler, SeaTunnel, and Spark.

Apache DorisBig DataData Integration
0 likes · 20 min read
Upgrading Hangzhou Bank Consumer Finance Big Data Platform with Apache Doris 1.2: Architecture, Performance Gains, and Integration
Bilibili Tech
Bilibili Tech
Apr 4, 2023 · Big Data

How Bilibili’s Flink‑Based Real‑Time Incremental Pipeline Cuts Costs and Boosts Latency

This article details Bilibili’s migration from a Spark‑based offline ODS‑to‑DWD sharding process to a Flink real‑time incremental pipeline, explaining the background challenges, the design of multi‑level partitioning, small‑file optimizations, stability enhancements, and the measurable performance gains achieved.

Big DataData WarehouseFlink
0 likes · 19 min read
How Bilibili’s Flink‑Based Real‑Time Incremental Pipeline Cuts Costs and Boosts Latency
ITPUB
ITPUB
Mar 28, 2023 · Big Data

How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi

This article details the migration from a traditional Hive‑based data warehouse to a lakehouse architecture using Apache Hudi, covering the original Lambda setup, its pain points, lake‑vs‑warehouse differences, Hudi features, integration challenges, practical solutions, and future roadmap.

Apache HudiBig DataData Warehouse
0 likes · 11 min read
How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi
Baidu Geek Talk
Baidu Geek Talk
Mar 27, 2023 · Big Data

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

The article details Baidu's precise watermark design for its unified streaming‑batch data warehouse, describing how a centralized watermark server and client ensure end‑to‑end data completeness, align real‑time and batch windows with 99.9‑99.99% precision, and support accurate anti‑fraud calculations within the broader big‑data ecosystem.

Apache FlinkBaiduBig Data
0 likes · 14 min read
Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse
ITPUB
ITPUB
Mar 24, 2023 · Big Data

What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

Apache Flink 1.17 introduces a suite of batch and streaming enhancements—including a new Streaming Warehouse API, significant TPC‑DS performance boosts, adaptive batch scheduling, improved checkpointing, expanded SQL capabilities, Hive connector upgrades, and broader filesystem support—while also delivering upgrades to FRocksDB, Calcite, and the token framework to strengthen its position as a leading unified data‑processing engine.

Apache FlinkBatch ProcessingCheckpoint
0 likes · 23 min read
What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances
DataFunTalk
DataFunTalk
Mar 21, 2023 · Databases

Design and Technical Details of Apache Doris for Lakehouse Architecture

This article explains how Apache Doris extends its real‑time OLAP capabilities to support Lakehouse architectures, covering unified metadata, query acceleration, elastic compute, performance benchmarks, and future roadmap for richer data‑source integration and resource isolation.

Apache DorisBig DataData Warehouse
0 likes · 20 min read
Design and Technical Details of Apache Doris for Lakehouse Architecture
ITPUB
ITPUB
Mar 13, 2023 · Databases

10 Years of Amazon Redshift: From MPP to Serverless and Real‑Time Data Warehousing

This article traces a decade of Amazon Redshift’s evolution, detailing its shift from a traditional MPP warehouse to a fully cloud‑native Serverless architecture, exploring its underlying innovations, key features such as Concurrency Scaling, built‑in ML, Data Sharing, and offering practical best‑practice guidance for real‑time analytics across diverse industry scenarios.

Amazon RedshiftConcurrency ScalingData Warehouse
0 likes · 17 min read
10 Years of Amazon Redshift: From MPP to Serverless and Real‑Time Data Warehousing
政采云技术
政采云技术
Mar 9, 2023 · Fundamentals

Redesigning Data Warehouse Models: When and How to Use Dimensional Modeling

This article explains the concept of data models, why warehouse models need reconstruction, compares normative and dimensional modeling approaches, and provides a step‑by‑step guide—including information gathering, design, and implementation—to build efficient, maintainable data warehouse architectures.

Big DataData WarehouseDatabase design
0 likes · 12 min read
Redesigning Data Warehouse Models: When and How to Use Dimensional Modeling
Architect's Tech Stack
Architect's Tech Stack
Mar 9, 2023 · Big Data

Improving Data Warehouse Performance: From Clusters and Pre‑Computation to esProc SPL

The article analyzes the growing performance challenges of data warehouses, evaluates traditional solutions such as clustering, pre‑computation and optimization engines, and presents esProc SPL as a non‑SQL, low‑complexity alternative that delivers orders‑of‑magnitude speedups on modest hardware.

Big DataData WarehousePerformance Optimization
0 likes · 16 min read
Improving Data Warehouse Performance: From Clusters and Pre‑Computation to esProc SPL
政采云技术
政采云技术
Mar 7, 2023 · Databases

Data Warehouse Modeling: Concepts, Methods, and Implementation

This article explains what data models are, why model refactoring is necessary, compares normalized and dimensional data warehouse modeling approaches, and details a three‑step implementation process—including information research, model design, and model deployment—while highlighting best‑practice naming conventions and practical examples.

Big DataData WarehouseDatabase design
0 likes · 14 min read
Data Warehouse Modeling: Concepts, Methods, and Implementation
DeWu Technology
DeWu Technology
Mar 6, 2023 · Backend Development

Warehouse Inventory System Model Upgrade and Performance Optimization

To handle exploding product inventory data, the company overhauled its warehouse inventory model by eliminating risky document‑hand‑offs, storing only changed rows instead of daily snapshots, and syncing transformed data to a data‑warehouse for reporting, which cut monthly accounting time by 30 hours (≈30 %), improved accuracy, enabled new analytics, and introduced TiDB migration and team upskilling.

Data Warehousedata modelinginventory
0 likes · 7 min read
Warehouse Inventory System Model Upgrade and Performance Optimization
Architects Research Society
Architects Research Society
Mar 5, 2023 · Big Data

Best Open‑Source and Commercial ETL Tools: Detailed Comparison

This article introduces the concept of ETL, explains its importance for modern data‑driven applications, and provides a comprehensive comparison of the most popular open‑source and commercial ETL platforms—including their key features, supported data sources, and deployment options—helping readers choose the right tool for their data integration needs.

Big DataData IntegrationData Warehouse
0 likes · 19 min read
Best Open‑Source and Commercial ETL Tools: Detailed Comparison
DataFunTalk
DataFunTalk
Mar 1, 2023 · Databases

Evolution and Optimization of Tencent Music Content Library Data Platform: From Architecture 1.0 to 4.0

This article details the evolution of Tencent Music's content library data platform from version 1.0 to 4.0, describing business requirements, architectural redesigns—including migration from ClickHouse to Apache Doris, introduction of a semantic layer, and extensive write, query, and cost optimizations—while sharing practical lessons and future directions.

Apache DorisBig DataData Warehouse
0 likes · 21 min read
Evolution and Optimization of Tencent Music Content Library Data Platform: From Architecture 1.0 to 4.0
ITPUB
ITPUB
Feb 22, 2023 · Databases

How Huawei’s GaussDB(DWS) 3.0 Redefines Cloud‑Native Data Warehousing

This article summarizes Wang Chuanting’s DTCC2022 talk on Huawei Cloud GaussDB(DWS) 3.0, detailing its cloud‑native architecture, layered elasticity, lake‑warehouse integration, performance acceleration techniques, and how it seamlessly couples data‑processing pipelines with AI workloads for modern, real‑time analytics.

AI integrationCloud NativeData Warehouse
0 likes · 16 min read
How Huawei’s GaussDB(DWS) 3.0 Redefines Cloud‑Native Data Warehousing
StarRocks
StarRocks
Feb 21, 2023 · Databases

How Yidian Tianxia Built a Unified Real‑Time & Offline Data Warehouse with StarRocks

Yidian Tianxia tackled massive daily data volumes and complex analytics by defining a five‑layer data‑warehouse standard, comparing ClickHouse and StarRocks performance, and implementing a unified real‑time/offline architecture with StarRocks, DataPlus, and EasyJob, achieving multi‑fold query speedups and lower operational costs.

ClickHouseData GovernanceData Warehouse
0 likes · 14 min read
How Yidian Tianxia Built a Unified Real‑Time & Offline Data Warehouse with StarRocks
DataFunTalk
DataFunTalk
Feb 21, 2023 · Databases

Building a Stream‑Batch Integrated Data Architecture with Apache Doris at SelectDB

This article details how SelectDB’s data technology architect designed and implemented a new stream‑batch unified data platform using Apache Doris, covering the shortcomings of the early CDH‑based architecture, the selection process, data modeling, ingestion pipelines, performance testing, operational optimizations, and future plans.

Apache DorisBatch ProcessingBig Data
0 likes · 17 min read
Building a Stream‑Batch Integrated Data Architecture with Apache Doris at SelectDB
ITPUB
ITPUB
Feb 20, 2023 · Databases

Why Teradata Is Leaving China and What It Means for the Domestic Data Warehouse Market

Teradata's withdrawal from China, driven by geopolitical tensions and the rise of mature domestic data‑warehouse solutions, prompts a detailed look at its MPP architecture, the three main Chinese warehouse designs, Gartner market positioning, and migration tools for alternatives like GBase 8a and GaussDB DWS.

Big DataData WarehouseGBase
0 likes · 9 min read
Why Teradata Is Leaving China and What It Means for the Domestic Data Warehouse Market
Data Thinking Notes
Data Thinking Notes
Feb 14, 2023 · Big Data

How Cloud Music Turned 60k Tables into Valuable Data Assets

This article details Cloud Music's year‑long data assetization journey, covering the background, practical achievements, governance methods, and future roadmap for turning massive data warehouses into high‑value, well‑governed assets that drive cost reduction and business insight.

Big DataData GovernanceData Platform
0 likes · 10 min read
How Cloud Music Turned 60k Tables into Valuable Data Assets
Sohu Tech Products
Sohu Tech Products
Feb 8, 2023 · Big Data

Design and Implementation of a General H5 User Behavior Tracking and Data Warehouse Model

This article presents a comprehensive H5 (HTML5) tracking solution that details the planning of event‑collection points, the full data‑warehouse modeling process—including schema design, retention calculations, and SQL implementations—and the automatic data‑capture mechanisms needed to improve user‑behavior analysis efficiency across the product lifecycle.

Big DataData WarehouseH5 analytics
0 likes · 17 min read
Design and Implementation of a General H5 User Behavior Tracking and Data Warehouse Model
vivo Internet Technology
vivo Internet Technology
Feb 1, 2023 · Big Data

H5 Tracking Solution and Data Warehouse Design for User Behavior Analysis

The vivo Internet Big Data team presents a standardized, extensible H5 tracking solution that automates data collection via a JavaScript SDK for navigation, focus/blur, and visibility events, incorporates privacy safeguards, and feeds a multi‑layer data‑warehouse architecture with unified ID mapping and bitmap‑based retention modeling to support comprehensive user‑behavior dashboards and future advanced analyses.

Data WarehouseH5 trackingautomatic collection
0 likes · 19 min read
H5 Tracking Solution and Data Warehouse Design for User Behavior Analysis
DataFunTalk
DataFunTalk
Jan 28, 2023 · Big Data

Data Lake vs Data Warehouse: Differences, Evolution, and Integrated Lakehouse Design

This article explores the ongoing debate between data lakes and data warehouses, clarifies their distinct purposes and technologies, discusses how they can coexist or complement each other, and introduces the concept of an integrated lakehouse architecture while promoting a comprehensive data intelligence knowledge map.

Big DataData LakeData Warehouse
0 likes · 5 min read
Data Lake vs Data Warehouse: Differences, Evolution, and Integrated Lakehouse Design
dbaplus Community
dbaplus Community
Jan 10, 2023 · Big Data

Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips

This article introduces OLAP concepts, compares major OLAP solutions such as Druid, Kylin, Doris, and ClickHouse, outlines their features and suitable scenarios, and shares practical optimization techniques—including materialized views, caching, node tiering, and query tuning—to improve performance for high‑concurrency analytical workloads.

Big DataClickHouseData Warehouse
0 likes · 16 min read
Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips
DataFunSummit
DataFunSummit
Jan 8, 2023 · Big Data

Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis

This article presents a comprehensive overview of evolving big‑data architectures—from classic offline warehouses to Lambda and Kappa models—and details a streaming‑batch integrated solution that addresses latency, data freshness, and multi‑table join challenges to achieve minute‑level real‑time multi‑dimensional analytics.

Batch ProcessingData WarehouseKappa architecture
0 likes · 18 min read
Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis
Data Thinking Notes
Data Thinking Notes
Jan 5, 2023 · Big Data

Why Data Lakes Are Outshining Traditional Data Warehouses: A Deep Dive

This comprehensive guide explains the evolution from traditional data warehouses to modern data lakes, detailing concepts, architectures, differences, implementation steps, and real‑world case studies, while also comparing major cloud providers' solutions and highlighting how data platforms support digital transformation and analytics.

AnalyticsBig DataData Lake
0 likes · 97 min read
Why Data Lakes Are Outshining Traditional Data Warehouses: A Deep Dive
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 4, 2023 · Big Data

How Hologres + FBI Powers Real‑Time Experience Insight at Alibaba: A Deep Dive

This article explains how Alibaba's CCO team built a scalable, real‑time experience‑insight platform using Hologres and FBI, detailing the evolution from pre‑aggregated cubes to lightweight summary tables and finally to ad‑hoc detail‑wide queries, along with practical schema, partition, and deduplication techniques.

Data WarehouseFBIHologres
0 likes · 22 min read
How Hologres + FBI Powers Real‑Time Experience Insight at Alibaba: A Deep Dive
ITPUB
ITPUB
Jan 3, 2023 · Databases

How DragonF MPP DB Redefines Cloud‑Native Data Warehousing at Massive Scale

The article details the design, core features, and real‑world performance of the DragonF MPP DB, a cloud‑native, compute‑storage‑separated database that overcomes traditional MPP limitations, supports millions of daily jobs, and outlines its future roadmap for ultra‑large‑scale data platforms.

Big DataCloud NativeData Warehouse
0 likes · 11 min read
How DragonF MPP DB Redefines Cloud‑Native Data Warehousing at Massive Scale
ITPUB
ITPUB
Jan 2, 2023 · Databases

Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips

This article introduces OLAP concepts, compares major OLAP engines such as Druid, Kylin, Doris, and ClickHouse, outlines real‑world application scenarios, and provides detailed optimization techniques—including materialized views, caching, tiered storage, and skip‑index configurations—to improve query performance.

AnalyticsClickHouseData Warehouse
0 likes · 16 min read
Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips
JD Tech
JD Tech
Dec 29, 2022 · Big Data

Financial Enterprise Big Data Platform Construction Plan: Architecture, Design, and Implementation

This document outlines a comprehensive big‑data platform construction plan for a financial enterprise, describing the current data challenges, objectives, three‑layer architecture, recommended commercial Hadoop solution (TDH), detailed model‑design steps, implementation schedule, hardware/software specifications, and key success factors.

Data WarehouseFinancial ServicesHadoop
0 likes · 15 min read
Financial Enterprise Big Data Platform Construction Plan: Architecture, Design, and Implementation
DataFunTalk
DataFunTalk
Dec 24, 2022 · Big Data

Evolution of Data Platforms: From Early Computers to the Modern Data Stack

This article traces the history of data platforms—from the first general‑purpose computers and traditional BI, through the rise of data warehouses, big‑data frameworks like Hadoop, Spark and Flink, to the modern data‑stack era with cloud‑native architectures, Lambda/Kappa models, and emerging tools—highlighting key technologies, architectural shifts, and future prospects.

Big DataData WarehouseETL
0 likes · 26 min read
Evolution of Data Platforms: From Early Computers to the Modern Data Stack
Zhuanzhuan Tech
Zhuanzhuan Tech
Dec 21, 2022 · Big Data

OLAP Technology Overview, Selection, and Optimization Practices

This article introduces OLAP concepts, compares ROLAP, MOLAP, and HOLAP, evaluates mainstream OLAP engines such as Druid, Kylin, Doris, and ClickHouse, and presents practical optimization techniques including materialized views, caching, tiered storage, and query tuning for large‑scale analytical workloads.

ClickHouseData WarehouseDruid
0 likes · 17 min read
OLAP Technology Overview, Selection, and Optimization Practices
Ziru Technology
Ziru Technology
Dec 16, 2022 · Big Data

How to Effectively Test Offline Data Metrics and Data Warehouse Pipelines

This article explains what data metrics are, compares offline metric testing with traditional testing, and provides a comprehensive step‑by‑step guide for testing data collection, ETL, warehouse models, metric calculations, scheduling, security, and API outputs in a Hive‑based data warehouse.

Data WarehouseETLHive
0 likes · 9 min read
How to Effectively Test Offline Data Metrics and Data Warehouse Pipelines
DataFunSummit
DataFunSummit
Dec 14, 2022 · Big Data

Key Development Trends of Data Warehouses: Standardization, Real‑time Processing, Modularity, and Holistic Evaluation

The article analyzes current data‑warehouse development trends—standardization through data governance, real‑time processing via stream‑batch integration, modular architecture, and holistic performance evaluation—while linking these trends to emerging concepts such as data middle‑platforms, data lakes, and DataOPs.

Data Warehousemodularity
0 likes · 12 min read
Key Development Trends of Data Warehouses: Standardization, Real‑time Processing, Modularity, and Holistic Evaluation
Architecture Digest
Architecture Digest
Dec 1, 2022 · Big Data

Understanding Data Warehouse Architecture and Layered Design

This article explains the concepts, architecture, and layered design of data warehouses, covering data flow, ETL processes, ODS, DWD, DWM, DWS, ADS layers, their characteristics, differences from databases, and the role of data marts in supporting OLAP and decision‑making.

AnalyticsBig DataData Layers
0 likes · 13 min read
Understanding Data Warehouse Architecture and Layered Design
Data Thinking Notes
Data Thinking Notes
Nov 28, 2022 · Big Data

Unlocking Data Value: How Metadata Drives Efficient Data Management and Quality

This comprehensive guide explains how metadata connects source data, warehouses, and applications, outlines its technical and business classifications, demonstrates its value for data management, profiling, portals, and ETL development, and details optimization, storage, lifecycle, and quality practices essential for robust big‑data operations.

Big DataData QualityData Warehouse
0 likes · 35 min read
Unlocking Data Value: How Metadata Drives Efficient Data Management and Quality
Data Thinking Notes
Data Thinking Notes
Nov 23, 2022 · Big Data

Mastering Fact Table Design: From Basics to Advanced Strategies

This comprehensive guide explains the fundamentals, design rules, and various types of fact tables—including transaction, snapshot, and aggregate tables—while detailing Kimball's four-step modeling process, grain declaration, handling of additive measures, and practical examples for effective data warehouse implementation.

Big DataData WarehouseFact Table
0 likes · 16 min read
Mastering Fact Table Design: From Basics to Advanced Strategies
Data Thinking Notes
Data Thinking Notes
Nov 21, 2022 · Big Data

Mastering Big Data Modeling: From ER and Dimensional to Data Vault and Alibaba’s OneData

This comprehensive guide explains why data modeling is essential for big‑data systems, compares relational and OLAP approaches, details ER, dimensional, Data Vault and Anchor methodologies, and walks through Alibaba’s multi‑stage data‑model practice, integration framework, dimension design, fact‑table strategies and aggregation techniques.

AlibabaData Warehousedata modeling
0 likes · 57 min read
Mastering Big Data Modeling: From ER and Dimensional to Data Vault and Alibaba’s OneData
360 Smart Cloud
360 Smart Cloud
Nov 17, 2022 · Databases

Exploring StarRocks Applications, Performance Tests, and Cloud‑Native Integration at 360

This article reviews the practical applications and experimental explorations of StarRocks at 360, describing the cloud‑native lake‑warehouse product Yunzhou, its three‑tier architecture, performance comparisons with Trino using TPCH 100 GB, challenges of Kubernetes integration, and future directions for storage‑compute separation.

Big DataCloud NativeData Warehouse
0 likes · 7 min read
Exploring StarRocks Applications, Performance Tests, and Cloud‑Native Integration at 360
Data Thinking Notes
Data Thinking Notes
Nov 16, 2022 · Big Data

Why Metadata Management Is Essential for Data Warehouses

This article explains the concept of metadata, its role in data warehouses, why managing metadata is critical for building, maintaining, and scaling data warehouse systems, and outlines practical steps, use cases, and tools for effective metadata management.

Data GovernanceData WarehouseETL
0 likes · 15 min read
Why Metadata Management Is Essential for Data Warehouses
Tencent Cloud Developer
Tencent Cloud Developer
Nov 7, 2022 · Big Data

Data Engineering and Data Warehouse Design: Principles, Practices, and Governance

The article outlines comprehensive data‑engineering and warehouse‑design principles—covering collection (four Ws and methods like SDK, point‑code, binlog), reporting strategies, source selection, modeling with fact, aggregation, dimension and model tables, quality checks, and governance practices such as standardized SDKs, metric libraries, automated lineage, and cost optimization—to share actionable experience for any organization.

Big DataData GovernanceData Warehouse
0 likes · 32 min read
Data Engineering and Data Warehouse Design: Principles, Practices, and Governance
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 31, 2022 · Big Data

Noah Wealth’s CDH-to-Cloud Migration: Boosting OLAP with Hologres

Facing soaring data volumes and performance bottlenecks, Noah Wealth replaced its self‑built CDH cluster with Alibaba Cloud’s unified big‑data platform and Hologres, streamlining OLAP analysis, cutting costs, accelerating queries to sub‑second response times, and enabling real‑time, multi‑dimensional analytics for its financial services.

Alibaba CloudData WarehouseHologres
0 likes · 13 min read
Noah Wealth’s CDH-to-Cloud Migration: Boosting OLAP with Hologres
dbaplus Community
dbaplus Community
Oct 24, 2022 · Big Data

Mastering Data Warehouse Modeling: From ER to Data Vault

This article explains what a data warehouse is, why modeling it matters, and compares four major modeling approaches—ER, dimensional, Data Vault, and Anchor—detailing their structures, steps, advantages, and typical use cases, while also offering guidance on selecting tools and designing models.

Big DataData VaultData Warehouse
0 likes · 15 min read
Mastering Data Warehouse Modeling: From ER to Data Vault
DataFunSummit
DataFunSummit
Oct 15, 2022 · Cloud Computing

Design and Evolution of Tencent Cloud Product Metering and Billing System

The article presents a comprehensive overview of Tencent Cloud's metering and billing system, detailing the billing models, multi‑dimensional data analysis, real‑time data‑warehouse construction, operator orchestration, hot‑key handling, smooth upgrade strategies, and future evolution directions for large‑scale cloud services.

Data WarehouseMulti-dimensional AnalysisReal-time analytics
0 likes · 16 min read
Design and Evolution of Tencent Cloud Product Metering and Billing System

How a Leading E‑commerce Platform Built a Scalable Data Warehouse with Lambda & Hudi

This article explains how an e‑commerce company designed and implemented a modern data warehouse—combining batch Spark jobs, real‑time Flink streams, and Hudi data‑lake storage—to handle terabytes of daily logs, ensure data quality, and provide fast, reliable analytics for business decision‑making.

Data LakeData WarehouseETL
0 likes · 16 min read
How a Leading E‑commerce Platform Built a Scalable Data Warehouse with Lambda & Hudi
DataFunSummit
DataFunSummit
Sep 24, 2022 · Big Data

Evolution of 37 Mobile Games' Multi-Dimensional Analysis Platform: From MySQL to StarRocks

The article details how 37 Mobile Games built and continuously evolved a multi-dimensional analytics platform—covering business background, data challenges, the migration from MySQL through Druid, Impala, ClickHouse to StarRocks, self‑service data tools, monitoring, and future roadmap—highlighting technical decisions and lessons learned.

ClickHouseData WarehouseImpala
0 likes · 20 min read
Evolution of 37 Mobile Games' Multi-Dimensional Analysis Platform: From MySQL to StarRocks
DeWu Technology
DeWu Technology
Sep 14, 2022 · Databases

Introduction to StarRocks: Architecture, Storage, Use Cases, and Troubleshooting

StarRocks is a high‑performance MPP database whose simplified FE/BE architecture, fully vectorized engine, and CBO optimizer enable fast multi‑table joins, while its partition‑bucket‑tablet storage model supports real‑time metric services and dashboard migrations, accompanied by practical troubleshooting guidance and upcoming enhancements.

Data WarehouseMPP databaseReal-time analytics
0 likes · 15 min read
Introduction to StarRocks: Architecture, Storage, Use Cases, and Troubleshooting