Tagged articles
558 articles
Page 5 of 6
Yanxuan Tech Team
Yanxuan Tech Team
Feb 5, 2021 · Big Data

How NetEase Yanxuan Built a Robust Data Task Governance System in 2020

This article details NetEase Yanxuan's 2020 initiative to improve data task governance, describing identified pain points, the pre‑mid‑post framework for model, baseline, and incident handling, and the resulting products, processes, and future plans for a more reliable data warehouse.

Baseline ManagementData GovernanceData Quality
0 likes · 27 min read
How NetEase Yanxuan Built a Robust Data Task Governance System in 2020
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Feb 4, 2021 · Big Data

Unlocking Data Middle Platform: From Ingestion to Real‑Time Analytics

This article provides a comprehensive overview of data middle platform concepts, covering data aggregation, ingestion tools, offline and real‑time development, scheduling, baseline control, heterogeneous storage, recommendation dependencies, data permissions, layered data architecture (ODS, DW, DWD, DWS, TDM, ADS), asset management, governance, service APIs, query and analysis services, as well as monitoring, alerting, and operational best practices for building robust big‑data solutions.

Big DataData WarehouseETL
0 likes · 25 min read
Unlocking Data Middle Platform: From Ingestion to Real‑Time Analytics
DataFunTalk
DataFunTalk
Feb 2, 2021 · Big Data

Metadata Management: Concepts, Architecture, and Applications in Data Warehousing

This article explains the fundamentals and value of metadata, describes a comprehensive metadata management system and its layered architecture, outlines key technologies such as automatic SQL metadata extraction, and showcases practical applications like metadata query, impact analysis, data lineage, and business‑driven data needs within modern data warehouses.

Data LineageData WarehouseSQL parsing
0 likes · 17 min read
Metadata Management: Concepts, Architecture, and Applications in Data Warehousing
21CTO
21CTO
Jan 25, 2021 · Big Data

Understanding Data Lakes vs. Data Warehouses: A Complete Guide

This article provides a comprehensive overview of data lakes and data warehouses, explaining their definitions, architectures, differences, and practical use cases, while also covering related concepts such as OLTP/OLAP, ETL processes, data governance, and modern lakehouse solutions.

Data GovernanceData LakeData Warehouse
0 likes · 95 min read
Understanding Data Lakes vs. Data Warehouses: A Complete Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 24, 2021 · Big Data

Design and Implementation of a Big Data OLAP Platform Based on Apache Kylin

This article explains the background, challenges, and architectural design of a big‑data OLAP platform that integrates Apache Kylin with a BI system, detailing pre‑computation strategies, cube construction, user authentication, storage engines, and query mechanisms to achieve sub‑second analytics on massive datasets.

Apache KylinData WarehouseHBase
0 likes · 11 min read
Design and Implementation of a Big Data OLAP Platform Based on Apache Kylin
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 20, 2021 · Big Data

Understanding Data Warehouse, Data Lake, and Data Middle Platform: Concepts, Differences, and Applications

This article provides a comprehensive overview of data warehouses, data lakes, and data middle platforms, explaining their definitions, architectures, functions, differences, and the value they bring to enterprises, while also addressing common misconceptions and related concepts such as data marts and data swamps.

Data ArchitectureData LakeData Warehouse
0 likes · 37 min read
Understanding Data Warehouse, Data Lake, and Data Middle Platform: Concepts, Differences, and Applications
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 31, 2020 · Big Data

Data Lake vs Data Warehouse: Evolution, Comparison, and Alibaba Cloud Lakehouse Integration

This article examines the 20‑year evolution of big data architectures, contrasts data lakes and data warehouses, explores their respective strengths and challenges, and details Alibaba Cloud’s lake‑warehouse (lakehouse) solution that unifies storage, metadata, and compute for enterprise‑grade analytics and AI workloads.

Data ArchitectureData LakeData Warehouse
0 likes · 30 min read
Data Lake vs Data Warehouse: Evolution, Comparison, and Alibaba Cloud Lakehouse Integration
JD Retail Technology
JD Retail Technology
Dec 24, 2020 · Databases

Applying ClickHouse for Offline and Real‑Time Data Analysis in JD's Golden Eye Business

This article details JD's Golden Eye business's adoption of ClickHouse for offline and real‑time traffic data analysis, covering system architecture, data ingestion pipelines, high‑availability design, monitoring, performance optimizations, and practical trade‑offs, offering insights for large‑scale analytical database deployments.

ClickHouseData WarehouseOLAP
0 likes · 17 min read
Applying ClickHouse for Offline and Real‑Time Data Analysis in JD's Golden Eye Business
Architect
Architect
Dec 22, 2020 · Big Data

Dimensional Modeling in Data Warehousing: Concepts, Theory, and Practical Example

This article explains data warehouse fundamentals, reviews classic warehouse models such as ER, dimensional, Data Vault and Anchor, then dives deep into dimensional modeling concepts, star and snowflake schemas, and demonstrates a practical e‑commerce scenario with SQL examples and trade‑offs.

Big DataData WarehouseETL
0 likes · 11 min read
Dimensional Modeling in Data Warehousing: Concepts, Theory, and Practical Example
DataFunTalk
DataFunTalk
Dec 19, 2020 · Big Data

Evolution of iQIYI Data Warehouse from 1.0 to 2.0: Architecture, Modeling Practices, and Future Directions

This article details iQIYI's transition from a fragmented Data Warehouse 1.0 to a unified, standardized Data Warehouse 2.0, covering layered architecture, dimension and metric design, modeling workflows, metadata management, data lineage, and upcoming intelligent and automated data platform initiatives.

Data LineageData Warehousedata modeling
0 likes · 25 min read
Evolution of iQIYI Data Warehouse from 1.0 to 2.0: Architecture, Modeling Practices, and Future Directions
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 19, 2020 · Big Data

Apache Kylin Principles, Architecture, and Real-World Applications in Baidu Maps, Lianjia, and Didi

This article explains Apache Kylin’s core principles and technical architecture, then details how major Chinese companies such as Baidu Maps, Lianjia, and Didi have deployed Kylin for large‑scale OLAP, describing their system designs, performance results, and the challenges they encountered.

Apache KylinCubeData Warehouse
0 likes · 16 min read
Apache Kylin Principles, Architecture, and Real-World Applications in Baidu Maps, Lianjia, and Didi
Laiye Technology Team
Laiye Technology Team
Dec 18, 2020 · Big Data

Comprehensive Overview of Laiye Technology's Business Intelligence Ecosystem

This article provides a detailed, end‑to‑end description of Laiye Technology's BI ecosystem, covering its background, development stages, data acquisition, transmission, transformation, loading, modeling, storage layers, statistical analysis, real‑time metrics, visualization, and future challenges, illustrating how the company builds a scalable, cloud‑native data‑driven platform.

AnalyticsBIBig Data
0 likes · 22 min read
Comprehensive Overview of Laiye Technology's Business Intelligence Ecosystem
58 Tech
58 Tech
Dec 16, 2020 · Big Data

Building a High‑Performance ClickHouse Data Analytics Platform: Architecture, Operations, and Optimization

This article describes how 58.com designed and optimized a ClickHouse‑based OLAP platform for massive user‑behavior data, covering the reasons for choosing ClickHouse, its key features, multi‑layer architecture, configuration management, automation scripts, monitoring, performance benchmarks, and future improvement plans.

ClickHouseData WarehouseOLAP
0 likes · 20 min read
Building a High‑Performance ClickHouse Data Analytics Platform: Architecture, Operations, and Optimization
Sohu Tech Products
Sohu Tech Products
Dec 2, 2020 · Big Data

Optimizing Hive SQL Lineage Parsing: Techniques, Implementation, and Practical Insights

This article presents a comprehensive overview of Hive SQL lineage parsing, detailing the challenges of data provenance in large‑scale data warehouses, introducing ANTLR‑based parsing techniques, and describing a series of optimizations—including AST pruning, CTE handling, UDF registration, and metadata service integration—to improve both table‑level and column‑level lineage extraction and visualization.

ANTLRData WarehouseHive
0 likes · 18 min read
Optimizing Hive SQL Lineage Parsing: Techniques, Implementation, and Practical Insights
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 28, 2020 · Big Data

ETL Fundamentals and Introduction to Kettle (Pentaho Data Integration)

This article provides an in-depth overview of ETL concepts, including extraction, transformation, loading, data warehouse architecture, and detailed discussion of Kettle (Pentaho Data Integration) features, design principles, components, transformations, jobs, database connections, metadata management, and practical examples for building robust data integration pipelines.

Data IntegrationData WarehouseETL
0 likes · 57 min read
ETL Fundamentals and Introduction to Kettle (Pentaho Data Integration)
Beike Product & Technology
Beike Product & Technology
Nov 18, 2020 · Big Data

Evolution and Practice of BEIKE OLAP Platform Architecture and Engine Selection

This article details the three‑stage evolution of BEIKE's OLAP platform—from the early Hive‑to‑MySQL phase, through a Kylin‑based architecture, to a flexible multi‑engine design—explaining metric modeling, engine selection, performance trade‑offs, and future roadmap for supporting Druid, ClickHouse, Doris and real‑time analytics.

Data WarehouseDruidEngine Selection
0 likes · 18 min read
Evolution and Practice of BEIKE OLAP Platform Architecture and Engine Selection
DataFunSummit
DataFunSummit
Nov 17, 2020 · Big Data

Sohu Intelligent Media Data Warehouse Architecture and Technical Practices

This article presents Sohu Intelligent Media's data warehouse construction practice, covering fundamental concepts, batch and real‑time processing, OLAP theory, multidimensional modeling, workflow management, data quality, metadata lineage, and security, with a focus on Apache Doris and a Lambda‑style architecture.

Apache DorisBatch ProcessingData Quality
0 likes · 18 min read
Sohu Intelligent Media Data Warehouse Architecture and Technical Practices
DataFunSummit
DataFunSummit
Nov 15, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink

This article details the three‑stage evolution of 58.com’s commercial data warehouse, describing its massive scale, four‑layer architecture, technical challenges, migrations from MapReduce to Hive and Flink, real‑time streaming upgrades, and the resulting improvements in stability, accuracy, and timeliness.

Big DataData ArchitectureData Warehouse
0 likes · 10 min read
Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink
DataFunSummit
DataFunSummit
Nov 12, 2020 · Big Data

OLAP Engine Selection and Challenges in Large-Scale Data at Youku

This article explores the challenges big data brings to traditional data technologies and reviews various OLAP solutions—including MPP, batch processing, pre‑computation, and Hadoop‑based engines—while detailing Youku’s specific business scenarios and how different OLAP engines are selected to meet performance, scalability, and real‑time analysis requirements.

AnalyticsBig DataData Warehouse
0 likes · 14 min read
OLAP Engine Selection and Challenges in Large-Scale Data at Youku
Architect
Architect
Nov 11, 2020 · Big Data

Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips

This article explains how to build a real‑time click‑stream data warehouse using Flink for stream processing and ClickHouse for near‑real‑time OLAP, covering click‑stream characteristics, dimensional modeling, layered warehouse design, async dimension joins, sink implementation, and data rebalancing strategies.

Big DataClick StreamClickHouse
0 likes · 7 min read
Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips
DataFunTalk
DataFunTalk
Nov 5, 2020 · Big Data

Applying Apache Doris for JD.com Advertising Report Queries: Architecture, Challenges, and Performance

This article details JD.com's transition from a custom ad‑reporting system to Apache Doris, describing the background, challenges with the legacy platform, selection criteria, implementation of data import, pre‑aggregation, on‑site computation, and the resulting performance and operational benefits during regular operation and major sales events.

Ad ReportingApache DorisData Warehouse
0 likes · 12 min read
Applying Apache Doris for JD.com Advertising Report Queries: Architecture, Challenges, and Performance
dbaplus Community
dbaplus Community
Nov 3, 2020 · Big Data

How Ctrip Boosted Hotel Data Warehouse Performance 400% with ClickHouse

Ctrip’s hotel data team tackled a 3 TB daily data load by building a ClickHouse cluster on VMware, creating custom sync and execution tools, applying query optimizations, and handling merge and memory errors, ultimately achieving over 400% performance gains across multiple reporting themes.

Big DataClickHouseData Warehouse
0 likes · 7 min read
How Ctrip Boosted Hotel Data Warehouse Performance 400% with ClickHouse
DataFunTalk
DataFunTalk
Nov 1, 2020 · Big Data

Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse

The article explains how Flink 1.11 deepens its integration with Hive, covering background, new connector features, simplified dependency management, enhanced Hive dialect, streaming writes and reads, temporal table joins, and how these capabilities enable a unified batch‑streaming data warehouse.

Batch‑Streaming IntegrationData WarehouseFlink
0 likes · 16 min read
Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Oct 23, 2020 · Industry Insights

How NetEase Yanxuan Built a Scalable Data Product System: Lessons & Practices

This article details NetEase Yanxuan's four‑stage journey—from establishing a business‑centric BI platform to ensuring data quality, empowering CXOs with mobile dashboards, and delivering scenario‑specific data products—highlighting the challenges faced, technical solutions implemented, and key takeaways for building enterprise data products.

BI platformData ProductData Quality
0 likes · 18 min read
How NetEase Yanxuan Built a Scalable Data Product System: Lessons & Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 15, 2020 · Big Data

Meituan's OLAP Requirements and Apache Kylin Deployment: Architecture, Challenges, and Comparative Analysis

This article describes Meituan's massive OLAP workloads, the specific challenges of data scale, complex schemas, and precise counting, explains how Apache Kylin was integrated using wide tables and bitmap deduplication, compares its performance and features with Presto, Druid and other engines, and outlines future improvements.

Apache KylinBig DataData Warehouse
0 likes · 19 min read
Meituan's OLAP Requirements and Apache Kylin Deployment: Architecture, Challenges, and Comparative Analysis
DataFunTalk
DataFunTalk
Sep 28, 2020 · Databases

Understanding OLAP Types, Open‑Source Products, and Performance Optimization Techniques

This article explains the classification of OLAP data warehouses by data volume and modeling approach, compares MOLAP, ROLAP, HOLAP and HTAP, reviews popular open‑source ROLAP systems, and details advanced performance‑boosting techniques such as MPP architectures, cost‑based optimization, vectorized execution, dynamic code generation, and runtime filtering.

Data WarehouseMOLAPOLAP
0 likes · 27 min read
Understanding OLAP Types, Open‑Source Products, and Performance Optimization Techniques
DataFunTalk
DataFunTalk
Sep 24, 2020 · Databases

Understanding OLAP vs. OLTP and the Fundamentals of Data Warehousing

This article explains the core differences between OLTP and OLAP, evaluates whether traditional OLTP databases like MySQL can handle analytical workloads, introduces benchmark queries, and provides a comprehensive overview of data‑warehouse concepts such as data sources, fact and dimension tables, multi‑dimensional modeling, and common cube operations.

AnalyticsData WarehouseHTAP
0 likes · 21 min read
Understanding OLAP vs. OLTP and the Fundamentals of Data Warehousing
Architects Research Society
Architects Research Society
Sep 15, 2020 · Big Data

Key Factors to Consider When Building Your Own Data Warehouse

This article examines essential considerations for selecting and designing a data warehouse—including data volume, scalability, on‑premises versus cloud options, pricing models, and ETL/ELT approaches—to help organizations choose the most suitable solution for their needs.

Big DataData WarehouseScalability
0 likes · 9 min read
Key Factors to Consider When Building Your Own Data Warehouse
Architects' Tech Alliance
Architects' Tech Alliance
Aug 11, 2020 · Big Data

Comprehensive Overview of Data Middle Platform Architecture, Components, and Practices

This article provides an extensive summary of data middle platform concepts, covering data aggregation, collection tools, offline and real‑time development, data governance, service layers, warehouse construction, and operational practices, illustrating how enterprises build and manage a unified data ecosystem.

Big DataData GovernanceData Middle Platform
0 likes · 27 min read
Comprehensive Overview of Data Middle Platform Architecture, Components, and Practices
Ctrip Technology
Ctrip Technology
Aug 6, 2020 · Big Data

Data Governance Practices and Model Design in Ctrip Vacation Data Warehouse

This article shares the practical experience and thinking behind Ctrip's vacation data governance project, covering team efficiency optimization, demand sorting, data domain definition, warehouse layering, unified dimension modeling, metric standardization, and the overall benefits of a centralized data governance framework.

Big DataCtripData Governance
0 likes · 17 min read
Data Governance Practices and Model Design in Ctrip Vacation Data Warehouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 5, 2020 · Big Data

An Introduction to Apache Kylin: Architecture, Core Concepts, Installation, and Enterprise Use Cases

This article provides a comprehensive overview of Apache Kylin, covering its background, core OLAP concepts, technical architecture, installation steps, cube-building methods, real‑world enterprise deployments, and resources for further learning, illustrating how it enables sub‑second query performance on massive datasets.

Apache KylinBig DataCube
0 likes · 20 min read
An Introduction to Apache Kylin: Architecture, Core Concepts, Installation, and Enterprise Use Cases
dbaplus Community
dbaplus Community
Aug 4, 2020 · Databases

How Doris Powers Meituan’s Real‑Time Data Warehouse: ROLAP vs MOLAP Lessons

This article examines Meituan’s data warehouse evolution, detailing the limitations of MOLAP with Kylin, the adoption of Doris‑driven ROLAP using MPP technology, and the practical optimizations—such as join predicate pushdown, concurrent execution, colocate join, and bitmap aggregation—that improve real‑time analytics and reduce costs.

Data WarehouseMOLAPMPP
0 likes · 19 min read
How Doris Powers Meituan’s Real‑Time Data Warehouse: ROLAP vs MOLAP Lessons
21CTO
21CTO
Aug 1, 2020 · Big Data

Mastering User Profiling: A Comprehensive Big Data Blueprint

This article explains how enterprises can leverage massive raw and business data to build detailed user profiles, covering tag types, data architecture, development modules, project phases, key deliverables, and a real-world e‑commerce case study.

Big DataData WarehouseETL
0 likes · 22 min read
Mastering User Profiling: A Comprehensive Big Data Blueprint
dbaplus Community
dbaplus Community
Jul 21, 2020 · Databases

What Are the Different Types of OLAP and How Do They Impact Performance?

This article provides a comprehensive overview of OLAP systems, classifying them by data volume and modeling approach, comparing MOLAP, ROLAP, HOLAP and HTAP, reviewing popular open‑source products, and detailing architectural, query‑optimization, vectorization, storage and resource‑management techniques that affect analytical warehouse performance.

Data WarehouseHTAPMOLAP
0 likes · 30 min read
What Are the Different Types of OLAP and How Do They Impact Performance?
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 19, 2020 · Big Data

An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem

This article explains Hive's role as a Hadoop‑based data warehouse, its integration with HBase, the advantages and drawbacks of that combination, introduces Apache Phoenix as a high‑performance SQL layer on HBase, and describes the open‑source NewSQL database Lealone, providing practical usage scenarios and performance comparisons.

Big DataData WarehouseHBase
0 likes · 9 min read
An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem
58 Tech
58 Tech
Jul 13, 2020 · Big Data

Design and Implementation of a Financial Data Warehouse: Architecture, Modeling, Quality Monitoring, and Metadata Management

This article presents a comprehensive design and implementation guide for a financial data warehouse, covering background needs, modeling methodology choices, a layered architecture, data quality monitoring, metadata management, naming and coding standards, and future development directions.

Big DataData QualityData Warehouse
0 likes · 11 min read
Design and Implementation of a Financial Data Warehouse: Architecture, Modeling, Quality Monitoring, and Metadata Management
DataFunTalk
DataFunTalk
Jul 10, 2020 · Big Data

Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions

This article details NetEase's evolution from Storm to Flink for real‑time computing, describing the Sloth platform's architecture, large‑scale deployment, diverse business scenarios, monitoring, alerting, and future development plans, illustrating how Flink powers data synchronization, real‑time warehousing, and e‑commerce analytics and recommendation.

Data WarehouseFlinkNetEase
0 likes · 15 min read
Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions
Big Data Technology Architecture
Big Data Technology Architecture
Jul 8, 2020 · Big Data

Key Interview Questions on Data Warehousing, Data Platforms, and Related Technologies

This article compiles a comprehensive set of 32 interview questions covering data warehouse fundamentals, data platform construction, modeling approaches, real‑time architectures, data quality, governance, Hive optimization, and related analytical techniques to help candidates prepare for data engineering roles.

Data PlatformData WarehouseETL
0 likes · 4 min read
Key Interview Questions on Data Warehousing, Data Platforms, and Related Technologies
Youzan Coder
Youzan Coder
Jul 1, 2020 · Big Data

Mastering HiveCube: Efficient Multi‑Dimensional Aggregation with Grouping Sets

This article explains how HiveCube can replace traditional development for multi‑dimensional aggregation in a data‑warehouse, covering background, theory of cube, with‑cube/rollup/grouping‑sets syntax, grouping_id handling, practical implementation tips, performance tuning, and a comparison with conventional methods.

Big DataCubeData Warehouse
0 likes · 19 min read
Mastering HiveCube: Efficient Multi‑Dimensional Aggregation with Grouping Sets
DataFunTalk
DataFunTalk
Jun 30, 2020 · Big Data

Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team

This article details Shopee Singapore Data Team’s implementation of a Flink‑based real‑time data warehouse, covering background challenges, layered architecture integrating Kafka, HBase, Druid, Hive, streaming pipelines, job management, monitoring, and future plans to expand Flink SQL support.

Data WarehouseFlinkReal-Time
0 likes · 15 min read
Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team
Big Data and Microservices
Big Data and Microservices
Jun 28, 2020 · Big Data

Data Warehouse vs Data Lake vs Data Platform vs Data Middle Platform: Which Fits Your Business?

This article compares data warehouse, data lake, data platform, and data middle platform, explaining their definitions, architectures, strengths, limitations, and use‑case differences, and provides tables that highlight how each solution handles structured and unstructured data, governance, flexibility, and business value.

Big DataData ArchitectureData Lake
0 likes · 12 min read
Data Warehouse vs Data Lake vs Data Platform vs Data Middle Platform: Which Fits Your Business?
Big Data Technology Architecture
Big Data Technology Architecture
Jun 28, 2020 · Databases

Understanding OLAP Data Warehouse Types, Architectures, and Performance Optimizations

This article provides a comprehensive overview of OLAP data warehouses, covering classification by data volume and modeling, detailed explanations of MOLAP, ROLAP, HOLAP and HTAP, common open‑source implementations, and a deep dive into performance‑boosting techniques such as MPP architectures, cost‑based optimization, vectorized execution, dynamic code generation, storage compression, runtime filters and resource management.

Data WarehouseDynamic Code GenerationMPP
0 likes · 25 min read
Understanding OLAP Data Warehouse Types, Architectures, and Performance Optimizations
dbaplus Community
dbaplus Community
Jun 18, 2020 · Databases

How a Hybrid Data Warehouse Transformed Banking Data Services

This article details the 2015 hybrid data‑warehouse design implemented at Guangdong Huaxing Bank, explaining its real‑time, historical, and archival layers, the data‑bus concept, and how mixing in‑memory, relational, and Hadoop technologies addressed modern banking data‑volume, latency, and unstructured‑data challenges.

BankingBig DataData Warehouse
0 likes · 20 min read
How a Hybrid Data Warehouse Transformed Banking Data Services
Big Data Technology Architecture
Big Data Technology Architecture
Jun 18, 2020 · Big Data

Understanding Data Lakes, Data Warehouses, and Real-Time Analytics with Hologres

This article analyzes the challenges of traditional data lake and warehouse architectures, explains why unified storage and compute are needed for real‑time and batch workloads, and introduces Hologres as a cloud‑native, high‑performance engine that combines PostgreSQL compatibility with Flink‑driven analytics to deliver a true real‑time data warehouse solution.

Data WarehouseFlinkHologres
0 likes · 13 min read
Understanding Data Lakes, Data Warehouses, and Real-Time Analytics with Hologres
TAL Education Technology
TAL Education Technology
Jun 11, 2020 · Big Data

Data Quality Monitoring: Standards, Practices, and Technical Solutions

This article outlines the importance of data quality in the big‑data era, defines evaluation criteria such as integrity, accuracy, consistency and timeliness, describes daily monitoring and reconciliation processes, and proposes technical solutions and challenges for building a comprehensive data‑quality monitoring platform.

Data GovernanceData QualityData Warehouse
0 likes · 7 min read
Data Quality Monitoring: Standards, Practices, and Technical Solutions
DataFunTalk
DataFunTalk
Jun 6, 2020 · Big Data

Optimizing Workflow in Data Warehouse Construction: A Task‑Instance Layered Approach

The article analyzes workflow scenarios in data‑warehouse projects, proposes a two‑level model that abstracts workflow nodes into tasks and instances, defines period and dependency attributes, and presents generation rules that simplify configuration, improve collaboration, and support complex data‑processing schedules in modern big‑data environments.

Data WarehouseETLdependency management
0 likes · 19 min read
Optimizing Workflow in Data Warehouse Construction: A Task‑Instance Layered Approach
dbaplus Community
dbaplus Community
Apr 26, 2020 · Big Data

Evolving from Data Warehouses to Data Middle Platforms: Architecture & Practices

This talk reviews China's big‑data evolution from early enterprise data warehouses to modern data middle platforms, outlines core architectural components, technology selections, data development practices, lifecycle and quality management, and shares practical Q&A insights for building scalable, cost‑effective data infrastructures.

Big DataData ArchitectureData Governance
0 likes · 28 min read
Evolving from Data Warehouses to Data Middle Platforms: Architecture & Practices
Big Data Technology Architecture
Big Data Technology Architecture
Apr 24, 2020 · Big Data

Kyligence Kylin on Parquet: Architecture, Engine Design, and Performance Evaluation

The article introduces Kyligence's Kylin on Parquet solution, explains its plug‑in architecture, reasons for replacing HBase with Parquet, details the new Spark‑based build and query engines, auto‑tuning, global dictionary, fault‑tolerance features, and presents performance comparisons with Kylin 3.0.

Apache KylinData WarehouseParquet
0 likes · 11 min read
Kyligence Kylin on Parquet: Architecture, Engine Design, and Performance Evaluation
Qunar Tech Salon
Qunar Tech Salon
Apr 24, 2020 · Databases

Applying Apache Doris in Meituan Food Delivery Data Warehouse: Dual Engine Architecture and Performance Optimizations

The article details Meituan's food‑delivery data warehouse transformation from a MOLAP‑centric design to a dual‑engine (MOLAP + ROLAP) architecture powered by Apache Doris, describing the challenges of massive, mutable data, the technical trade‑offs, and the performance gains achieved through MPP, predicate push‑down, multi‑instance concurrency, colocate joins, and bitmap aggregation.

Apache DorisBig DataData Warehouse
0 likes · 16 min read
Applying Apache Doris in Meituan Food Delivery Data Warehouse: Dual Engine Architecture and Performance Optimizations
Meituan Technology Team
Meituan Technology Team
Apr 9, 2020 · Big Data

Dual-Engine MOLAP + ROLAP Architecture with Apache Doris for Meituan Takeaway Data Warehouse

Meituan Takeaway’s data warehouse combines Apache Kylin’s MOLAP cubes for stable dimensions with Apache Doris’s MPP‑driven ROLAP engine to handle changing dimensions, detail queries, and near‑real‑time analytics, achieving millisecond‑level responses, reduced storage/compute costs, and simplifying operations across diverse analytical workloads.

Apache DorisBig DataData Warehouse
0 likes · 18 min read
Dual-Engine MOLAP + ROLAP Architecture with Apache Doris for Meituan Takeaway Data Warehouse
ITPUB
ITPUB
Apr 6, 2020 · Big Data

How to Build a Data Lake Quickly: Strategies, Tools, and Real‑World Cases

This article explains the origins and market growth of data lakes, compares them with traditional data warehouses, showcases major implementations like Amazon Galaxy and Club Factory, and provides practical guidance on choosing open‑source or commercial cloud solutions to construct a data lake efficiently while minimizing risk.

AWSBig DataData Architecture
0 likes · 10 min read
How to Build a Data Lake Quickly: Strategies, Tools, and Real‑World Cases
Big Data Technology Architecture
Big Data Technology Architecture
Mar 28, 2020 · Big Data

Apache Kylin: From Extreme OLAP Engine to an Analytical Data Warehouse for Big Data

The article chronicles Apache Kylin's evolution from an Apache incubator OLAP engine to a comprehensive analytical data warehouse, highlighting its five‑year growth, extensive enterprise adoption, core data‑warehouse features, and the community’s rebranding to better reflect its big‑data capabilities.

AnalyticsApache KylinData Warehouse
0 likes · 7 min read
Apache Kylin: From Extreme OLAP Engine to an Analytical Data Warehouse for Big Data
360 Quality & Efficiency
360 Quality & Efficiency
Mar 24, 2020 · Big Data

Understanding Granularity in Data Warehouse Design

This article explains the concept of granularity in data warehouse design, describing data models composed of structures, operations, and constraints, illustrating how granularity affects storage detail, query performance, and resource consumption, and recommending a dual‑granularity approach to balance efficiency and analytical depth.

AnalyticsBig DataData Warehouse
0 likes · 5 min read
Understanding Granularity in Data Warehouse Design
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 19, 2020 · Big Data

Can Flink Unify Real‑Time and Offline Data Warehouses? A Deep Dive

This article examines the challenges of maintaining separate offline and real‑time data warehouses, explains the three‑layer ODS‑DW‑ADS model, evaluates the traditional Lambda architecture, and explores how a unified Flink stack with Kafka, HiveCatalog and streaming sinks can simplify metadata, SQL development, data import/export, and stateful processing for both batch and streaming workloads.

Data WarehouseFlinkLambda architecture
0 likes · 12 min read
Can Flink Unify Real‑Time and Offline Data Warehouses? A Deep Dive
Youzan Coder
Youzan Coder
Mar 18, 2020 · Big Data

The Evolution of Youzan’s Data Warehouse in a Big Data Environment

The article traces Youzan’s data warehouse from its chaotic early days lacking structure, through a 2016 Airflow‑driven construction phase that introduced layered ODS/DW/Data Mart architecture and naming standards, to a mature stage focused on efficiency, security, SparkSQL, dimensional modeling, metadata, and ongoing real‑time and governance challenges.

AirflowBig DataData Governance
0 likes · 20 min read
The Evolution of Youzan’s Data Warehouse in a Big Data Environment
Ctrip Technology
Ctrip Technology
Feb 20, 2020 · Big Data

Ctrip Flight Ticket Data Warehouse: Architecture, Technology Stack, and Practical Practices

This article outlines Ctrip's flight ticket data warehouse evolution, current big‑data technology stack, data synchronization methods, layered architecture, quality monitoring system, and a real‑time price anomaly detection case, providing practical insights for building scalable, reliable data warehousing solutions.

CtripData QualityData Warehouse
0 likes · 20 min read
Ctrip Flight Ticket Data Warehouse: Architecture, Technology Stack, and Practical Practices
21CTO
21CTO
Feb 19, 2020 · Big Data

Building an Open-Source Big Data Analytics Stack: Challenges & Benefits

The article explains why modern companies rely on data‑driven decisions, outlines the two main challenges of tracking data and connecting it to BI, describes the three‑step analytics stack (integration, warehouse, analysis), and highlights the cost, flexibility, and security advantages of open‑source tools.

Big DataData AnalyticsData Integration
0 likes · 5 min read
Building an Open-Source Big Data Analytics Stack: Challenges & Benefits
58 Tech
58 Tech
Feb 10, 2020 · Big Data

Construction and Practice of a Site-wide User Behavior Data Warehouse at 58.com

This article systematically describes the challenges, design principles, modeling methods, layered architecture, implementation steps, and standards used in building a comprehensive user behavior data warehouse for 58.com, highlighting practical experiences and future improvement directions.

Big DataData QualityData Warehouse
0 likes · 11 min read
Construction and Practice of a Site-wide User Behavior Data Warehouse at 58.com
Big Data Technology Architecture
Big Data Technology Architecture
Feb 4, 2020 · Big Data

What Is a Data Lakehouse? Introduction, Key Features, and Evolution

The article explains the emerging Lakehouse paradigm that combines the low‑cost storage of data lakes with the management and ACID guarantees of data warehouses, detailing its advantages over traditional architectures, core capabilities, early implementations, and its role in supporting modern AI and analytics workloads.

AnalyticsData WarehouseLakehouse
0 likes · 9 min read
What Is a Data Lakehouse? Introduction, Key Features, and Evolution
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Dec 31, 2019 · Big Data

Apache Kylin Overview and Model Optimization Practices for Trajectory Analytics

This article introduces Apache Kylin, details its deployment at Tongcheng Yilong, explains the design of a large‑scale trajectory model, and provides step‑by‑step optimization techniques—including cube dimension reduction, HBase rowkey tuning, build parameter tweaks, high‑cardinality handling, and query compression disabling—to achieve sub‑second OLAP queries on multi‑terabyte data.

Apache KylinBig DataCube
0 likes · 17 min read
Apache Kylin Overview and Model Optimization Practices for Trajectory Analytics
Architecture Digest
Architecture Digest
Dec 26, 2019 · Databases

Data Warehouse Fundamentals, Modeling Techniques, and the Evolution of Maoyan’s Warehouse

This article explains the origins and challenges of scattered enterprise data, defines the data warehouse concept, details its four core characteristics, compares entity, normalization, and dimensional modeling methods, and illustrates Maoyan’s three‑stage data‑warehouse evolution with practical examples and diagrams.

Data WarehouseETLModeling
0 likes · 17 min read
Data Warehouse Fundamentals, Modeling Techniques, and the Evolution of Maoyan’s Warehouse
Product Technology Team
Product Technology Team
Dec 11, 2019 · Big Data

How a Data Middle Platform Transforms Business: Design, Architecture, and Modeling Insights

This article explains what a data middle platform is, why it matters, its core components—including storage, compute, IDE, workflow, API services, and data asset management—and details the layered architecture of ODS, DWD, DWT, DIM, and DWA, as well as dimensional modeling using Kimball’s methodology.

Big DataData PlatformData Warehouse
0 likes · 6 min read
How a Data Middle Platform Transforms Business: Design, Architecture, and Modeling Insights
Yanxuan Tech Team
Yanxuan Tech Team
Dec 2, 2019 · Big Data

Why Modern Enterprises Need a Data Middle Platform: Lessons from NetEase Yanxuan

Drawing on NetEase Yanxuan’s experience, this article explains what a data middle platform is, why companies are building one for digital transformation and fine‑grained operations, and details its core components—including the data warehouse, data services, and BI platform—illustrated with real‑world diagrams.

BIBig DataData Middle Platform
0 likes · 12 min read
Why Modern Enterprises Need a Data Middle Platform: Lessons from NetEase Yanxuan
DataFunTalk
DataFunTalk
Nov 19, 2019 · Big Data

Comprehensive Overview of Data Warehouses: Concepts, Evolution, Architecture, and Real‑time vs Offline Practices

This article provides a thorough introduction to data warehouses, traces their evolution, explains construction methodologies, compares offline, Lambda, and Kappa architectures, and presents real‑time warehouse case studies from Alibaba, Meituan, Xiaomi, Netflix, and OPPO, highlighting practical implementation details and challenges.

Data WarehouseETLFlink
0 likes · 14 min read
Comprehensive Overview of Data Warehouses: Concepts, Evolution, Architecture, and Real‑time vs Offline Practices
dbaplus Community
dbaplus Community
Oct 22, 2019 · Big Data

How Weibo Built a Billion‑Log Real‑Time Data Platform with Flink

This article details how Weibo’s advertising team designed and implemented a real‑time data platform capable of processing over a hundred billion daily logs, covering technology selection, Flink advantages, architecture evolution, data processing pipelines, component libraries, fault‑tolerance strategies, and the construction of a multi‑layer real‑time data warehouse.

Big DataCheckpointData Architecture
0 likes · 25 min read
How Weibo Built a Billion‑Log Real‑Time Data Platform with Flink
Architects' Tech Alliance
Architects' Tech Alliance
Oct 17, 2019 · Big Data

Understanding Alibaba's Data Middle Platform: Concepts, Architecture, and Differences from Data Warehouses and Data Lakes

The article explains Alibaba's data middle platform—its definition, methodology, organizational structure, key tools, and how it differs from traditional data warehouses and data lakes—while highlighting its role in supporting scalable, business‑centric data services and digital transformation.

AlibabaBig DataData Architecture
0 likes · 16 min read
Understanding Alibaba's Data Middle Platform: Concepts, Architecture, and Differences from Data Warehouses and Data Lakes
Meituan Technology Team
Meituan Technology Team
Oct 17, 2019 · Big Data

OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework

By adapting Alibaba’s OneData methodology, the project establishes a unified data‑warehouse architecture, standards, and governance framework—including consolidated business intake, standardized design layers, naming conventions, and delivery metrics—that resolves data‑quality issues, enhances scalability and reusability, and delivers faster, reliable data support for evolving business needs.

Big DataData ArchitectureData Governance
0 likes · 15 min read
OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework
HomeTech
HomeTech
Oct 9, 2019 · Big Data

Design and Implementation of a Flink‑Based Real‑Time Data Platform at Autohome

This article describes how Autohome migrated its real‑time analytics from Storm to a Flink‑SQL platform, detailing the architectural design, development and operational advantages, practical use cases such as recommendation metrics, and future plans for ecosystem expansion and open‑source release.

Data WarehouseFlinkReal-time Streaming
0 likes · 12 min read
Design and Implementation of a Flink‑Based Real‑Time Data Platform at Autohome
Mafengwo Technology
Mafengwo Technology
Sep 26, 2019 · Big Data

Mafengwo’s Data Warehouse & Middle Platform: Architecture, Modeling, Toolchain

This article details Mafengwo’s journey in constructing a data warehouse and data middle platform, covering the core three‑layer architecture, hybrid modeling approaches, the supporting toolchain for data synchronization, scheduling, and metadata management, and the design of an indicator platform for business analytics.

Big Data ArchitectureData Middle PlatformData Warehouse
0 likes · 18 min read
Mafengwo’s Data Warehouse & Middle Platform: Architecture, Modeling, Toolchain
dbaplus Community
dbaplus Community
Sep 24, 2019 · Big Data

How Weibo Turns Big Data into Revenue: Insights from a 2019 DAMS Talk

The presentation explains how Weibo leverages big‑data technologies, user profiling, and social‑first advertising models to drive commercial growth, detailing data‑driven product development, real‑time and offline data warehouses, scientific experiments, and case studies that illustrate the impact on revenue and user engagement.

AdvertisingBig DataData Warehouse
0 likes · 24 min read
How Weibo Turns Big Data into Revenue: Insights from a 2019 DAMS Talk
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 23, 2019 · Big Data

Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation

This article describes Meituan’s large‑scale OLAP requirements, how Apache Kylin was integrated to meet them, the architectural solutions, performance benchmarks against other engines, and future work, providing practical insights for building stable, precise, and high‑performance analytics platforms.

Apache KylinBig DataData Warehouse
0 likes · 20 min read
Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation
Tencent Cloud Developer
Tencent Cloud Developer
Aug 30, 2019 · Big Data

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

The cloud+ community and Kuaishou hosted a big‑data technology salon where experts detailed the evolution, architecture, and practical deployments of Spark‑based cloud data warehouses, ElasticSearch, Yarn, and Flink, highlighting trends, optimization techniques, and future directions for enterprise data analytics.

Big DataData WarehouseElasticsearch
0 likes · 22 min read
How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 8, 2019 · Big Data

Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization

This article provides an in‑depth overview of Apache Kylin’s pre‑computation architecture, data‑warehouse concepts, step‑by‑step cube creation from Hive tables, and advanced optimization techniques such as derived dimensions, aggregation groups, and HBase row‑key encoding to achieve sub‑second OLAP queries on massive datasets.

Apache KylinBig DataCube
0 likes · 20 min read
Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization
dbaplus Community
dbaplus Community
Aug 6, 2019 · Databases

How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip

This article details Ctrip's hotel data platform challenges with billions of daily updates and near‑million queries, evaluates various storage options, explains why ClickHouse was chosen, and describes the full‑load and incremental pipelines, monitoring, server clustering, and practical tips that enable sub‑second query performance at massive scale.

Big DataCtripData Warehouse
0 likes · 13 min read
How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip