Tagged articles
558 articles
Page 4 of 6
Python Crawling & Data Mining
Python Crawling & Data Mining
Sep 11, 2022 · Big Data

How Tencent Built Its Massive Big Data Platform Over a Decade

Over more than ten years, Tencent evolved its big data infrastructure through three generations—from early Hadoop-based offline processing, to a hybrid real‑time Spark/Storm system, and finally to a self‑developed, open‑source machine‑learning platform—highlighting the shift from “borrowed” solutions to fully proprietary, AI‑ready architectures.

Data Warehousearchitecturemachine learning
0 likes · 10 min read
How Tencent Built Its Massive Big Data Platform Over a Decade
Tencent Cloud Developer
Tencent Cloud Developer
Sep 9, 2022 · Big Data

Data Lake, Data Warehouse, and Lakehouse: Concepts, Architectures, and Industry Practices

The article explains how data lakes excel at ingesting massive, varied data, data warehouses optimize storage and query performance, and lake‑house architectures combine both strengths—offering scalable, low‑cost storage with high‑speed analytics—highlighting industry solutions from Snowflake, Databricks, and major cloud providers.

AnalyticsBig DataData Lake
0 likes · 8 min read
Data Lake, Data Warehouse, and Lakehouse: Concepts, Architectures, and Industry Practices
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 16, 2022 · Big Data

How a Young B2B Startup Built Its Big Data Platform from Scratch

This article shares Fenbeitong’s practical experience building a big‑data platform for a young B2B company, covering company background, data‑team formation, technology selection, architecture design, governance processes, modeling tools, batch and real‑time modeling, and insights on ToB versus ToC technical choices.

Data WarehouseToBcloud computing
0 likes · 15 min read
How a Young B2B Startup Built Its Big Data Platform from Scratch
Big Data Technology Architecture
Big Data Technology Architecture
Aug 13, 2022 · Big Data

Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices

This article details Xiaomi's three‑year journey of adopting Apache Doris across dozens of internal services, describing the transition from a Spark‑SQL‑based Lambda architecture to a unified MPP database, performance benchmarks, data ingestion pipelines, compaction tuning, two‑phase commit, single‑replica writes, monitoring, and community contributions.

Apache DorisData WarehouseMPP
0 likes · 19 min read
Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices
DaTaobao Tech
DaTaobao Tech
Aug 11, 2022 · Big Data

Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing

The article describes how fragmented real‑time, batch, and online data‑warehouse pipelines suffer from low productivity and inconsistent data quality, and introduces a unified SQL engine built on Apache Calcite that parses, optimizes, and compiles a single SQL statement into executable plans for ODPS, Flink, or Java, leveraging Janino code generation, multi‑backend state storage, and snapshot‑join semantics to boost performance and simplify development.

Batch ProcessingCalciteCode Generation
0 likes · 16 min read
Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing
Snowball Engineer Team
Snowball Engineer Team
Aug 5, 2022 · Big Data

Snowball Data Warehouse Modeling and OneData System Implementation

This article outlines Snowball's data warehouse background, compares major modeling approaches such as ER, dimensional, DataVault and Anchor models, describes the current challenges of their dimensional model, and details the OneData methodology—including OneModel, OneID, and OneService—along with its practical implementation, results, and future plans.

Big DataData GovernanceData Warehouse
0 likes · 23 min read
Snowball Data Warehouse Modeling and OneData System Implementation
Bilibili Tech
Bilibili Tech
Jul 15, 2022 · Big Data

Lakehouse Architecture Practice at Bilibili: Query Acceleration and Index Enhancement

Bilibili’s lakehouse architecture merges Iceberg‑based data lake flexibility with data‑warehouse efficiency, using Kafka‑Flink real‑time ingestion, Spark offline loads, Trino queries, Alluxio caching, Z‑Order/Hilbert sorting, and enhanced BloomFilter and bitmap indexes to boost query speed up to tenfold while drastically cutting file reads.

Big Data ArchitectureBitmap IndexData Lake
0 likes · 17 min read
Lakehouse Architecture Practice at Bilibili: Query Acceleration and Index Enhancement
dbaplus Community
dbaplus Community
Jul 13, 2022 · Big Data

Unpacking the Core Technologies Behind Modern Big Data Platforms

From data ingestion to real‑time analytics, this guide breaks down the essential layers of a typical big‑data platform—covering collection methods, HDFS storage, Hive/Spark analysis, data sharing mechanisms, application use‑cases, streaming with Spark Streaming, and the need for robust scheduling and monitoring.

Big DataData IntegrationData Warehouse
0 likes · 9 min read
Unpacking the Core Technologies Behind Modern Big Data Platforms

Comprehensive Overview of Tracking System, Data Warehouse Construction, and Attribution in an E‑commerce Platform

The article presents a comprehensive end‑to‑end traffic data architecture for an e‑commerce platform, detailing hybrid frontend/backend tracking with SPM/SCM/action standards, data‑warehouse construction of fact and dimension tables, UUID i_code unification, real‑time attribution methods, and future automation of warehouse and model layers.

AnalyticsData TrackingData Warehouse
0 likes · 13 min read
Comprehensive Overview of Tracking System, Data Warehouse Construction, and Attribution in an E‑commerce Platform
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jun 30, 2022 · Big Data

Why Data Lakes Need Data Warehouses: Evolution of Modern Data Platforms

This article traces the evolution of enterprise data platforms—from early data warehouses to modern data lakes and the emerging lakehouse—detailing key technologies, challenges, and best practices for storage, compute engines, metadata, and integration, while highlighting how cloud-native object storage reshapes scalability and cost.

Big DataData LakeData Warehouse
0 likes · 27 min read
Why Data Lakes Need Data Warehouses: Evolution of Modern Data Platforms
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 28, 2022 · Big Data

How Kuaishou Guarantees Real‑Time Data Warehouse Reliability During Billion‑Scale Events

This article details Kuaishou’s real‑time data warehouse architecture and its comprehensive assurance framework—including forward lifecycle standards, reverse fault‑injection testing, and Spring Festival event practices—highlighting challenges of massive traffic, high timeliness, accuracy, and stability, and outlining future plans for automation, batch‑stream integration, and cost reduction.

Data WarehouseFlinkReal-time Streaming
0 likes · 23 min read
How Kuaishou Guarantees Real‑Time Data Warehouse Reliability During Billion‑Scale Events

Building a Scalable Data Masking and Mock Service for Warehouse Testing

This article explains how to design and implement a data‑masking service that also provides mock data generation for data‑warehouse testing, covering the architecture, pain points, masking principles, workflow, evolution into a warehouse mock service, practical scenarios, and the significant efficiency and cost benefits achieved.

Big DataData Warehousedata masking
0 likes · 12 min read
Building a Scalable Data Masking and Mock Service for Warehouse Testing
JavaEdge
JavaEdge
Jun 21, 2022 · Databases

Why OLTP and OLAP Differ: Understanding Data Warehouses and Star Schemas

This article explains the fundamental differences between transactional (OLTP) and analytical (OLAP) database workloads, describes how data warehouses isolate analytical queries, and introduces star and snowflake schema designs for efficient reporting and business intelligence.

Data WarehouseOLAPOLTP
0 likes · 9 min read
Why OLTP and OLAP Differ: Understanding Data Warehouses and Star Schemas
政采云技术
政采云技术
Jun 21, 2022 · Big Data

Overview of the Traffic Domain and Its Data Governance Architecture

This document presents a comprehensive overview of the traffic domain in a data warehouse, covering its concepts, objectives, guiding principles, core and extension models, data quality, monitoring, scheduling, and operational practices to achieve a complete, accurate, efficient, low‑cost, and high‑value traffic data system while addressing massive data volume, consistency, and SLA challenges.

Big DataData GovernanceData Warehouse
0 likes · 15 min read
Overview of the Traffic Domain and Its Data Governance Architecture
Baidu Geek Talk
Baidu Geek Talk
Jun 15, 2022 · Big Data

Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges

The article proposes replacing the traditional multi‑layered data‑warehouse architecture (ODS‑DWD‑DWS‑ADS) with a single, column‑store wide‑table per business theme, achieving roughly 30 % storage savings and faster queries, while acknowledging higher ETL complexity, back‑tracking costs, and production timing challenges.

Big DataData WarehouseETL
0 likes · 11 min read
Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 14, 2022 · Big Data

Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?

This article examines the core trade‑offs of data warehouses—freshness, query latency, and cost—compares offline and real‑time architectures, introduces the concept of a streaming data warehouse, and details how Apache Flink Table Store aims to provide a unified, low‑cost solution.

Big DataData WarehouseFlink
0 likes · 19 min read
Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?
StarRocks
StarRocks
Jun 2, 2022 · Big Data

Simplify Real‑Time Data Warehousing with Flink CDC and StarRocks

This article explores how combining Flink CDC with StarRocks can streamline real‑time data pipelines, reduce component complexity, support both full and incremental synchronization, and enable efficient OLAP queries and updates for fast, scalable analytics across diverse business scenarios.

Data WarehouseFlink CDCOLAP
0 likes · 18 min read
Simplify Real‑Time Data Warehousing with Flink CDC and StarRocks
IT Architects Alliance
IT Architects Alliance
May 19, 2022 · Big Data

How Apache Kylin Enables Sub‑Second OLAP on Massive Data Sets

Apache Kylin leverages pre‑computed OLAP cubes on Hadoop/Spark/Flink to deliver sub‑second query responses for massive datasets, detailing its architecture, integration with BI platforms, user security, cube building, monitoring, and storage using HBase, illustrating how it overcomes big‑data analytical challenges.

Apache KylinBig DataData Warehouse
0 likes · 12 min read
How Apache Kylin Enables Sub‑Second OLAP on Massive Data Sets
DataFunTalk
DataFunTalk
May 18, 2022 · Big Data

Building and Optimizing JD Retail OLAP Platform: Architecture, Real‑time Updates, Materialized Views, and Join Optimization

This article presents JD Retail's OLAP platform construction and practical scenarios, covering control‑plane design, architecture, business management, operational safeguards, real‑time data updates, materialized view acceleration, join optimization techniques, high‑concurrency queries, and large‑scale write throughput for e‑commerce peak periods.

Big DataClickHouseData Warehouse
0 likes · 21 min read
Building and Optimizing JD Retail OLAP Platform: Architecture, Real‑time Updates, Materialized Views, and Join Optimization
DataFunSummit
DataFunSummit
May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataClickHouseCloud Native
0 likes · 15 min read
Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer
JD Cloud Developers
JD Cloud Developers
May 13, 2022 · Databases

JD’s Color Gateway: Tens of Millions QPS with Cloud‑Native Data Warehouse

During the 2022 China Internet Industry Application Salon, JD Cloud’s product manager explained how the Color gateway, an API gateway handling billions of daily requests, overcomes stability, high‑availability, reliability, and performance challenges during peak sales by adopting a cloud‑native ClickHouse data warehouse that boosts processing speed, reduces costs, and provides real‑time analytics.

Cloud NativeCost OptimizationData Warehouse
0 likes · 13 min read
JD’s Color Gateway: Tens of Millions QPS with Cloud‑Native Data Warehouse
dbaplus Community
dbaplus Community
May 11, 2022 · Big Data

How JD Logistics Tackled Billion-Scale Data Challenges with Doris

This article details JD Logistics' journey from fragmented, massive‑scale data to a unified, real‑time analytics platform, covering business needs, pain points, tool evaluation, a new Doris‑based architecture, table management, data import procedures, automation scripts, and future roadmap for data engineering.

BI ToolsBig DataData Warehouse
0 likes · 16 min read
How JD Logistics Tackled Billion-Scale Data Challenges with Doris
ITPUB
ITPUB
Apr 27, 2022 · Databases

Mastering Data Warehouse Standards: Architecture, Layer Design, and Naming Conventions

This comprehensive guide explains data‑warehouse construction standards, covering model architecture principles, public development rules, layer‑by‑layer design specifications, and systematic naming conventions for tables, dimensions, and metrics to ensure consistency, scalability, and reliable data governance.

Big DataData WarehouseDatabase Standards
0 likes · 26 min read
Mastering Data Warehouse Standards: Architecture, Layer Design, and Naming Conventions
58 Tech
58 Tech
Apr 26, 2022 · Information Security

Design and Architecture of a Full‑Chain Data Warehouse for Information Security

The article presents a comprehensive design of an end‑to‑end data warehouse for information‑security governance, detailing background motivations, multi‑layer data architecture, dimension modeling, bus‑matrix mapping, real‑time (lambda/kappa) processing, data‑dictionary integration, and future directions toward unified streaming‑batch solutions.

Data WarehouseReal-time Processingdimension modeling
0 likes · 16 min read
Design and Architecture of a Full‑Chain Data Warehouse for Information Security
ByteDance Data Platform
ByteDance Data Platform
Apr 15, 2022 · Cloud Native

How ByteHouse Evolved From ClickHouse Into a Next‑Gen Cloud‑Native Data Warehouse

ByteHouse, born from ByteDance’s extensive use of ClickHouse, transformed a high‑performance OLAP engine into a cloud‑native, scalable data warehouse by addressing scalability, elasticity, high availability, and multi‑tenant challenges through architectural redesign, custom storage layers, and advanced metadata management.

Big DataByteHouseClickHouse
0 likes · 19 min read
How ByteHouse Evolved From ClickHouse Into a Next‑Gen Cloud‑Native Data Warehouse
Volcano Engine Developer Services
Volcano Engine Developer Services
Apr 14, 2022 · Databases

How ByteHouse Transformed ClickHouse into a Cloud‑Native Data Warehouse

This article explores ByteHouse’s evolution from ClickHouse within ByteDance, detailing the challenges of scaling to over 18,000 nodes, the architectural redesign for cloud‑native elasticity, high‑availability innovations, and the product’s roadmap toward a Snowflake‑like, multi‑tenant data warehouse solution.

ByteHouseClickHouseData Warehouse
0 likes · 18 min read
How ByteHouse Transformed ClickHouse into a Cloud‑Native Data Warehouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 31, 2022 · Big Data

Bilibili’s Lakehouse Architecture: Integrating Data Lake and Warehouse with Apache Iceberg

To address the high cost and low efficiency of traditional Hadoop‑based data pipelines, Bilibili designed a lakehouse solution using Apache Iceberg, integrating Spark, Flink, Trino, and Alluxio to unify flexible data lake storage with warehouse‑level query performance, reducing data duplication and improving interactive analytics.

Big DataData WarehouseIceberg
0 likes · 17 min read
Bilibili’s Lakehouse Architecture: Integrating Data Lake and Warehouse with Apache Iceberg
Meituan Technology Team
Meituan Technology Team
Feb 24, 2022 · Big Data

Systematic Modeling for Delivery Data Governance

Meituan Delivery’s systematic modeling approach unifies data demand, model design, and production through metadata‑driven dimensional modeling, eliminating siloed development, standardizing definitions, and automating implementation to boost data quality, trust, and efficiency for enterprise delivery data governance.

Data Warehousesystematic modeling
0 likes · 19 min read
Systematic Modeling for Delivery Data Governance
ByteDance Data Platform
ByteDance Data Platform
Feb 21, 2022 · Big Data

Choosing the Right Components for Enterprise Data Warehouses: Hive vs SparkSQL

This article examines how to design enterprise‑grade data warehouses by evaluating development convenience, ecosystem, decoupling, performance and security, compares Hive and SparkSQL along with other engines such as Presto, Doris and ClickHouse, and outlines best‑practice component selections for long‑running batch and interactive analytics.

Big DataData WarehouseETL
0 likes · 19 min read
Choosing the Right Components for Enterprise Data Warehouses: Hive vs SparkSQL
Bilibili Tech
Bilibili Tech
Feb 17, 2022 · Big Data

Bilibili's Lakehouse Architecture: Building a Unified Data Lake and Data Warehouse

Bilibili replaced its Hive‑Spark‑Presto ETL pipeline with a lakehouse built on Iceberg, using Magnus, Trino and Alluxio to unify a PB‑scale data lake and warehouse, adding Z‑Order sorting and indexing for fast multi‑dimensional queries while planning further schema and pre‑computation optimizations.

Data LakeData WarehouseIceberg
0 likes · 14 min read
Bilibili's Lakehouse Architecture: Building a Unified Data Lake and Data Warehouse
Alimama Tech
Alimama Tech
Feb 16, 2022 · Databases

Optimizing Hologres Data Tables and Queries for Alibaba Advertising Inventory Management

By redesigning Hologres tables with column orientation, shard‑controlled Table Groups, distribution and clustering keys, adding bitmap indexes, refreshing statistics, caching external data, and tuning optimizer join order and resource scaling, Alibaba’s Mom advertising inventory system cut query latency by up to 35 % and memory use by 98 %, achieving a 5‑10× performance boost.

AdvertisingData WarehouseDatabase Performance
0 likes · 21 min read
Optimizing Hologres Data Tables and Queries for Alibaba Advertising Inventory Management
dbaplus Community
dbaplus Community
Feb 15, 2022 · Big Data

Mastering Data Warehouse Architecture: Concepts, Modeling Techniques, and Real‑Time Strategies

This comprehensive guide explains data warehouse fundamentals, architecture layers, modeling methods such as dimensional and entity modeling, metadata management, and the transition from offline to real‑time processing with Lambda and Kappa architectures, providing practical steps, best practices, and key terminology for building robust analytical platforms.

Big DataData WarehouseETL
0 likes · 63 min read
Mastering Data Warehouse Architecture: Concepts, Modeling Techniques, and Real‑Time Strategies
Youzan Coder
Youzan Coder
Jan 26, 2022 · Big Data

How to Build a Robust Data Quality Assurance Strategy for Large-Scale Data Platforms

This article outlines a comprehensive data quality assurance framework for a massive reporting platform, covering the data pipeline architecture, detailed testing methods for timeliness, completeness, and accuracy, as well as application‑level checks, downgrade and backup strategies, and future automation plans.

AutomationData QualityData Warehouse
0 likes · 14 min read
How to Build a Robust Data Quality Assurance Strategy for Large-Scale Data Platforms
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 18, 2022 · Big Data

Data Warehouse Data Quality Measurement Standards

The article outlines four key dimensions for evaluating data warehouse data quality—correctness, completeness, timeliness, and consistency—explains common consistency issues such as differing metric values across models, cross‑dimensional aggregations, and real‑time versus batch calculations, and proposes organizational and review mechanisms to mitigate these problems.

Big DataConsistencyData Governance
0 likes · 9 min read
Data Warehouse Data Quality Measurement Standards
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 12, 2022 · Databases

How to Slash Cloud Data Warehouse Costs with ADB PG Disk Optimization

This article explains how enterprises can dramatically reduce cloud‑native data‑warehouse expenses by understanding ADB PG/Greenplum architecture, applying disk‑reservation and lock‑write safeguards, and implementing practical optimizations such as table compression, hot‑cold tiering, vacuuming, redundant‑index cleanup, replication conversion, and isolated temporary‑table spaces.

ADB PGCost reductionData Warehouse
0 likes · 25 min read
How to Slash Cloud Data Warehouse Costs with ADB PG Disk Optimization
DataFunTalk
DataFunTalk
Jan 10, 2022 · Big Data

Real‑Time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

The talk by Tang Chuxi of Meituan explains typical real‑time data scenarios, the challenges faced when building a streaming data warehouse, and the design, development, operation, and performance‑optimisation solutions implemented on a Flink‑based platform to support massive, low‑latency business applications.

Data WarehouseFlinkMeituan
0 likes · 17 min read
Real‑Time Data Warehouse at Meituan: Architecture, Challenges, and Solutions
21CTO
21CTO
Jan 8, 2022 · Big Data

How Amazon’s Intelligent Lakehouse Redefines Big Data Architecture

The article examines Amazon’s Intelligent Lakehouse architecture, tracing its evolution from early data‑lake‑warehouse integrations to a modern, serverless, secure, and AI‑enhanced platform that unifies data storage, governance, and analytics to lower big‑data costs and boost agility.

Big DataData GovernanceData Lake
0 likes · 12 min read
How Amazon’s Intelligent Lakehouse Redefines Big Data Architecture
dbaplus Community
dbaplus Community
Dec 23, 2021 · Databases

Building a 16,000‑Node Cloud‑Native MPP Data Warehouse: Lessons from CCB

The article details how China Construction Bank's fintech arm designed, deployed, and operated a cloud‑native, three‑layer MPP data warehouse spanning 16,000 servers, covering architectural choices, performance gains, operational automation, and high‑availability strategies for ultra‑large scale workloads.

Cloud NativeData WarehouseDatabase Architecture
0 likes · 10 min read
Building a 16,000‑Node Cloud‑Native MPP Data Warehouse: Lessons from CCB
DataFunSummit
DataFunSummit
Dec 22, 2021 · Big Data

Data Governance Practices and Experiences at NetEase Cloud Music

This article details NetEase Cloud Music's comprehensive data governance journey, covering data warehouse architecture, data standards, event tracking (埋点) governance, asset lifecycle management, and future automation plans, illustrating how systematic governance improves data quality, cost efficiency, and business insight.

Big DataData GovernanceData Warehouse
0 likes · 21 min read
Data Governance Practices and Experiences at NetEase Cloud Music
DataFunSummit
DataFunSummit
Dec 18, 2021 · Big Data

Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP

The Fast OLAP Forum held on December 19 at DataFunCon gathers leading experts from Baidu, Tencent, JD, and FreeWheel to share cutting‑edge techniques in vectorized execution, cloud‑native ClickHouse, large‑scale OLAP architectures, and Presto optimizations, offering deep insights for practitioners dealing with massive real‑time data workloads.

Apache DorisBig DataClickHouse
0 likes · 7 min read
Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 18, 2021 · Big Data

Slowly Changing Dimensions (SCD) – Design Principles, Challenges, and Hive Implementation

This article explains the concept of Slowly Changing Dimensions (SCD), discusses practical design questions, compares three change‑tracking requirements, presents three implementation patterns, and provides detailed Hive/SQL examples for historical data initialization and incremental updates in large‑scale data warehouses.

Big DataData WarehouseHive
0 likes · 20 min read
Slowly Changing Dimensions (SCD) – Design Principles, Challenges, and Hive Implementation
Ctrip Technology
Ctrip Technology
Dec 16, 2021 · Big Data

Data Standard Management Practices in Ctrip Vacation Data Governance

This article outlines Ctrip Vacation's data standard management approach, covering why standards are needed, the three‑element framework of scope, tools, and policies, and detailed practices for data integration, production change handling, metadata governance, portal dashboard standardization, and self‑service query templating.

Big DataData GovernanceData Integration
0 likes · 12 min read
Data Standard Management Practices in Ctrip Vacation Data Governance
IT Architects Alliance
IT Architects Alliance
Dec 8, 2021 · Industry Insights

6 Proven Strategies to Modernize Your Cloud Data Warehouse

This article outlines six practical strategies—identifying bottlenecks, empowering data engineers, adopting distributed management, creating data contracts, embracing diverse perspectives, and streamlining workflows—to help organizations leverage cloud data warehouses more efficiently and drive better business intelligence outcomes.

Business IntelligenceData GovernanceData Warehouse
0 likes · 8 min read
6 Proven Strategies to Modernize Your Cloud Data Warehouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 28, 2021 · Big Data

OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework

This article presents the OneData methodology for designing, standardizing, and governing a data warehouse, detailing background challenges, goals, industry references, core concepts, unified business and design consolidation, data modeling layers, naming conventions, data quality controls, and the resulting operational improvements and business value.

Big DataData GovernanceData Warehouse
0 likes · 20 min read
OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework
Tencent Cloud Developer
Tencent Cloud Developer
Nov 26, 2021 · Big Data

WeChat's ClickHouse Real‑Time Data Warehouse: Challenges, Co‑Construction, and Performance Gains

Facing Hadoop’s minute‑to‑hour query latency on petabyte‑scale data, WeChat partnered with Tencent Cloud to build a ClickHouse‑based real‑time warehouse, adding custom ingestion, query‑optimisation and management tools that deliver billion‑row throughput, sub‑5‑second queries and over ten‑fold performance gains across millions of daily queries.

Big DataClickHouseCloud Native
0 likes · 9 min read
WeChat's ClickHouse Real‑Time Data Warehouse: Challenges, Co‑Construction, and Performance Gains
Baidu Geek Talk
Baidu Geek Talk
Nov 24, 2021 · Big Data

Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned

At Baidu Aifanfan, the data team built a unified real‑time and offline big‑data platform—leveraging Watt, Bigpipe, Fengge, AFS and Palo within Lambda/Kappa patterns and a fast‑slow parallel rollout—that cut OLAP query latency from 18 minutes to under 15 seconds, enabled self‑service analytics, and standardized metrics across 15 agile teams.

Apache DorisBig Data ArchitectureData Governance
0 likes · 23 min read
Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned
DataFunTalk
DataFunTalk
Nov 9, 2021 · Big Data

Data Governance Practices at NetEase Cloud Music: Warehouse Overview, Data Standards, Event Tracking, and Asset Management

This article details NetEase Cloud Music's data governance journey, covering the challenges of massive and complex data, the design of a multi‑layered data warehouse, the establishment of data and event‑tracking standards, asset lifecycle management, and future automation plans.

Data Warehouseasset managementcloud music
0 likes · 20 min read
Data Governance Practices at NetEase Cloud Music: Warehouse Overview, Data Standards, Event Tracking, and Asset Management
21CTO
21CTO
Nov 8, 2021 · Big Data

How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons

Facing rapid business iteration, Baidu’s iFanFan data team designed a unified real‑time and offline big‑data platform, tackling business, technical, and organizational challenges through Lambda/Kappa architectures, data integration, storage, computation, governance, and scalable analytics to deliver timely, accurate, and valuable data products.

Big DataData ArchitectureData Warehouse
0 likes · 33 min read
How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons
DataFunSummit
DataFunSummit
Nov 8, 2021 · Big Data

Building JD's OLAP System: From Data Ingestion to Management and Future Plans

This article explains how JD.com designs and evolves its OLAP platform, covering data sources, ingestion, storage, real‑time and offline processing, key challenges such as timeliness, high throughput, consistency, and the solutions implemented to support massive e‑commerce analytics.

Big DataData WarehouseDistributed Systems
0 likes · 13 min read
Building JD's OLAP System: From Data Ingestion to Management and Future Plans
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 26, 2021 · Big Data

Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse

This article shares practical insights on designing and operating a real‑time clickstream data warehouse using Flink for streaming processing and ClickHouse for near‑real‑time OLAP, covering dimensional modeling, layered architecture, Flink‑ClickHouse sink implementation, and data rebalancing strategies.

ClickHouseData WarehouseFlink
0 likes · 10 min read
Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse
DataFunTalk
DataFunTalk
Oct 17, 2021 · Databases

Databend: A Cloud‑Native Modern Data Warehouse Architecture

This article explains how Databend, a cloud‑native OLAP data warehouse, addresses modern data‑warehouse challenges by separating storage and compute, providing elastic scaling, multi‑cloud support, and efficient query planning and execution to deliver low‑cost, on‑demand analytics.

Data WarehouseDatabendOLAP
0 likes · 12 min read
Databend: A Cloud‑Native Modern Data Warehouse Architecture
Architect
Architect
Oct 6, 2021 · Big Data

Design and Implementation of a Real-time and Offline Integrated Query System

This article details the requirements, architecture, and implementation of a real-time and offline integrated query system, covering data ingestion via Debezium and Confluent Platform, storage in Kudu and HDFS, query engines Presto and Kylin, and strategies for data synchronization, partitioning, and scaling.

Big DataData WarehouseDebezium
0 likes · 19 min read
Design and Implementation of a Real-time and Offline Integrated Query System
Airbnb Technology Team
Airbnb Technology Team
Sep 27, 2021 · Big Data

Midas Certification: Airbnb’s End-to-End Data Quality Framework

Airbnb’s Midas certification establishes a company‑wide, multi‑dimensional golden‑standard for data quality—covering accuracy, consistency, timeliness, cost, and completeness—by requiring collaborative design, automated health checks, and four review stages, ensuring certified data is reliable, well‑documented, and ready for reporting, experimentation, and machine‑learning.

AirbnbBig DataData Quality
0 likes · 12 min read
Midas Certification: Airbnb’s End-to-End Data Quality Framework
IT Architects Alliance
IT Architects Alliance
Sep 12, 2021 · Industry Insights

Data Warehouse vs. Database: Core Differences and Building a Data Platform

This article explains what a data warehouse is, contrasts it with traditional databases, outlines how to design and build a data warehouse—including model selection, topic domain division, bus matrix, layered architecture, and data governance—then expands to the concept of a data middle platform and its distinction from data lakes and big‑data platforms.

Big DataData GovernanceData Platform
0 likes · 18 min read
Data Warehouse vs. Database: Core Differences and Building a Data Platform
Architects' Tech Alliance
Architects' Tech Alliance
Sep 11, 2021 · Big Data

Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices

This article explains what a data warehouse is, contrasts it with traditional databases, outlines how to design and build a warehouse—including model selection, subject‑area definition, bus matrix, layering, and data quality—while also covering related concepts such as data middle platforms, data lakes, metadata, and modeling techniques.

Big DataData QualityData Warehouse
0 likes · 16 min read
Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices
Laravel Tech Community
Laravel Tech Community
Aug 22, 2021 · Fundamentals

Fundamentals of Information Systems, Service Management, and Software Engineering

This article provides a comprehensive overview of information system fundamentals, covering national information‑technology elements, emerging concepts such as cloud computing and IoT, e‑government models, ERP and CRM basics, service‑management challenges, qualification systems, software development lifecycles, testing methods, architecture patterns, and data‑warehouse concepts.

CRMData WarehouseERP
0 likes · 16 min read
Fundamentals of Information Systems, Service Management, and Software Engineering
DataFunSummit
DataFunSummit
Aug 22, 2021 · Big Data

Evolution and Optimization of Meituan Waimai Offline Data Warehouse: Architecture, ETL, Modeling, Governance, and Future Plans

This article details the historical development, architectural layers, ETL migration to Spark, data modeling standards, governance processes, resource optimization, security measures, and future roadmap of Meituan Waimai's offline data warehouse, illustrating how the team addressed scalability and efficiency challenges.

Big DataData GovernanceData Warehouse
0 likes · 21 min read
Evolution and Optimization of Meituan Waimai Offline Data Warehouse: Architecture, ETL, Modeling, Governance, and Future Plans
IT Architects Alliance
IT Architects Alliance
Aug 22, 2021 · Big Data

Understanding ETL and Building Enterprise Data Warehouses: Concepts, Architecture, and Step‑by‑Step Techniques

This article explains the fundamentals of ETL, describes data warehouse architectures such as star and snowflake schemas, outlines a five‑step methodology for constructing enterprise‑level data warehouses, and discusses advanced ETL techniques, tools, and algorithm choices for effective data integration and management.

DW ArchitectureData WarehouseETL
0 likes · 24 min read
Understanding ETL and Building Enterprise Data Warehouses: Concepts, Architecture, and Step‑by‑Step Techniques
IT Architects Alliance
IT Architects Alliance
Aug 9, 2021 · Big Data

Data Warehouse Architecture Overview: Layers, Sources, Modeling, Storage, and Management

This article explains the logical layered architecture of modern data warehouses, covering data sources, ODS, DW/DWS layers, collection, storage on HDFS, synchronization tools, dimensional modeling (star, snowflake, constellation), metadata management, and task scheduling and monitoring, highlighting best practices for scalable big‑data solutions.

Data WarehouseETLmetadata
0 likes · 12 min read
Data Warehouse Architecture Overview: Layers, Sources, Modeling, Storage, and Management
dbaplus Community
dbaplus Community
Aug 1, 2021 · Databases

Scaling and Optimizing a Greenplum Data Warehouse Cluster: Key Lessons

This article details the background, goals, design decisions, deployment steps, kernel tuning, fault‑recovery testing, performance optimization, and TPCH benchmark results of a Greenplum data‑warehouse cluster expansion, highlighting practical challenges and concrete solutions for large‑scale database environments.

Cluster DeploymentData WarehouseGreenplum
0 likes · 20 min read
Scaling and Optimizing a Greenplum Data Warehouse Cluster: Key Lessons
DataFunTalk
DataFunTalk
Jul 31, 2021 · Big Data

Building a Data Metric System for NetEase Media Using OSM and AARRR Models

This article explains the concept of a metric system, why it is essential for fine‑grained product operations, and demonstrates how NetEase Media built a comprehensive data metric system using the North Star metric, OSM, and AARRR models within a layered big‑data warehouse architecture.

AARRRData WarehouseOSM model
0 likes · 11 min read
Building a Data Metric System for NetEase Media Using OSM and AARRR Models
Big Data Technology Architecture
Big Data Technology Architecture
Jun 4, 2021 · Big Data

Types of OLAP Data Warehouses and Performance Optimization Techniques

This article explains the various classifications of OLAP data warehouses—including MOLAP, ROLAP, HOLAP, and HTAP—based on data volume and modeling, reviews common open‑source ROLAP products, and details performance‑boosting techniques such as MPP architecture, cost‑based optimization, vectorized execution, and storage optimizations.

Data WarehouseMPPOLAP
0 likes · 27 min read
Types of OLAP Data Warehouses and Performance Optimization Techniques
dbaplus Community
dbaplus Community
Jun 2, 2021 · Databases

How to Build a Mature Data Warehouse: 7 Essential Steps and Best Practices

This article explains why data warehouses are critical for decision‑making, outlines the challenges of immature warehouses, and provides a step‑by‑step framework—including goal setting, technology selection, problem identification, domain modeling, layer design, modeling principles, and governance standards—to help teams build a robust, maintainable data warehouse.

Big DataData ArchitectureData Warehouse
0 likes · 22 min read
How to Build a Mature Data Warehouse: 7 Essential Steps and Best Practices
dbaplus Community
dbaplus Community
May 27, 2021 · Big Data

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

This article details Vipshop's OLAP evolution, describing how Presto, Kylin, and ClickHouse are integrated, the deployment architecture with HAproxy and chproxy, containerization on Kubernetes, and the Flink‑ClickHouse pipeline that enables self‑service analysis of hundred‑billion‑row datasets while addressing performance challenges and future roadmap.

Big DataClickHouseData Warehouse
0 likes · 28 min read
How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
May 26, 2021 · Big Data

Comprehensive Guide to Data Warehouse Concepts, Modeling, and Data Governance

This article provides an extensive overview of data warehouse fundamentals, including its purpose, core characteristics, layered architecture, modeling methods such as dimensional and normalization, as well as detailed discussions on data governance, metric systems, security standards, and practical implementation strategies for enterprise data management.

Data WarehouseMetrics
0 likes · 70 min read
Comprehensive Guide to Data Warehouse Concepts, Modeling, and Data Governance
IT Architects Alliance
IT Architects Alliance
May 25, 2021 · Big Data

How Modern Data Middle Platforms Power Real‑Time and Offline Analytics

This article provides a comprehensive technical overview of data middle platforms, covering data aggregation, offline and real‑time development, smart operations, data asset management, governance, service layers, platform implementations, warehouse layering, and key differences between offline and real‑time data warehouses.

Big DataData GovernanceData Platform
0 likes · 26 min read
How Modern Data Middle Platforms Power Real‑Time and Offline Analytics
Tencent Cloud Developer
Tencent Cloud Developer
May 24, 2021 · Cloud Native

Next‑Generation Cloud‑Native Data Warehouse: Architecture, Principles and Implementation

The article defines cloud‑native data warehouses as storage‑compute separated systems that elastically scale across clouds, outlines their key traits, describes a three‑layer architecture, compares Snowflake and OushuDB implementations, and illustrates a large bank’s migration to such a platform.

Data WarehouseDistributed Systemscloud-native
0 likes · 16 min read
Next‑Generation Cloud‑Native Data Warehouse: Architecture, Principles and Implementation
DataFunTalk
DataFunTalk
May 18, 2021 · Big Data

Evolution and Architecture of Beike Real-Time Computing Platform

Beike's real-time computing platform, led by Liu Liyun, has evolved from early Spark Streaming to a Flink-based system with SQL 1.0, 2.0, and upcoming 3.0, supporting a large-scale data warehouse, event-driven processing, extensive monitoring, and diverse business scenarios across the company's operations.

Data WarehouseEvent-drivenFlink
0 likes · 14 min read
Evolution and Architecture of Beike Real-Time Computing Platform
ITPUB
ITPUB
May 14, 2021 · Big Data

How AnalyticDB Powers Petabyte-Scale Consumer Analytics in Alibaba’s Data Bank

The article details how Alibaba’s Data Bank leverages AnalyticDB’s cold‑hot tiered storage, high‑throughput real‑time writes, and low‑latency OLAP capabilities to handle petabyte‑scale consumer data, support flexible AIPL analysis, crowd profiling, and rapid audience selection while cutting costs and ensuring elasticity during peak events.

AnalyticDBBig DataCold-Hot Storage
0 likes · 14 min read
How AnalyticDB Powers Petabyte-Scale Consumer Analytics in Alibaba’s Data Bank
Architecture Digest
Architecture Digest
May 7, 2021 · Big Data

Comprehensive Overview of Data Middle Platform Architecture and Practices

This article provides a detailed introduction to data middle platform concepts, covering data aggregation, ingestion tools, offline and real‑time development, data governance, service layers, monitoring, and deployment patterns, illustrating how enterprises build unified data ecosystems across various industries.

Big DataData GovernanceData Platform
0 likes · 25 min read
Comprehensive Overview of Data Middle Platform Architecture and Practices
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 3, 2021 · Big Data

Unlocking the Power of Data Middle Platforms: Key Concepts and Best Practices

This article provides a comprehensive overview of data middle platforms, covering data aggregation, collection tools, offline and real‑time development, scheduling, baseline control, heterogeneous storage, data governance, service layers, monitoring, and the architectural differences between offline and real‑time data warehouses.

Data WarehouseETLReal-time Processing
0 likes · 26 min read
Unlocking the Power of Data Middle Platforms: Key Concepts and Best Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 15, 2021 · Big Data

Hive and Hadoop Interview Questions and Answers

This article provides a comprehensive collection of interview-style questions and detailed answers covering Hive concepts, Hadoop architecture, MapReduce mechanics, HDFS operations, and performance optimization techniques for big‑data processing environments.

Data WarehouseHadoopHive
0 likes · 41 min read
Hive and Hadoop Interview Questions and Answers
Sohu Tech Products
Sohu Tech Products
Apr 7, 2021 · Big Data

Data Warehouse Architecture and Modeling with Alibaba MaxCompute and DataWorks

This tutorial explains how to select a technical architecture, design a three‑layer data warehouse (ODS, CDM, ADS), model tables and dimensions, choose storage strategies, handle slowly changing dimensions, synchronize data with DataWorks, and implement dimensional modeling and fact tables using Alibaba MaxCompute for big‑data analytics.

Big DataData WarehouseDataWorks
0 likes · 32 min read
Data Warehouse Architecture and Modeling with Alibaba MaxCompute and DataWorks
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 6, 2021 · Big Data

Real-Time Computing and Data Warehouse Solutions with Apache Flink: Architecture, Technology Selection, and Implementation

This article explores the evolution of real-time computing in the big data domain, detailing Apache Flink's capabilities, architectural designs, technology selections such as Kafka, Canal, HBase, ClickHouse, and provides practical implementation guides and case studies from Alibaba, Tencent, and other enterprises.

Data WarehouseFlinkReal‑Time Computing
0 likes · 33 min read
Real-Time Computing and Data Warehouse Solutions with Apache Flink: Architecture, Technology Selection, and Implementation
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 26, 2021 · Big Data

Evolution of iQIYI's Real-Time Big Data Ecosystem

iQIYI transformed its data infrastructure from a traditional offline T+1 model to a comprehensive real‑time ecosystem—leveraging Kafka, Flink, a three‑layer Stream Data Service Platform, the Talos drag‑and‑drop pipeline, and a Druid‑based analytics platform—to enable low‑latency monitoring, personalized recommendations, ad targeting, and continuous machine‑learning workflows while planning future stream‑batch integration and lake‑warehouse convergence.

AnalyticsBig DataData Warehouse
0 likes · 13 min read
Evolution of iQIYI's Real-Time Big Data Ecosystem