Tagged articles
297 articles
Page 2 of 3
Architecture Digest
Architecture Digest
Feb 3, 2023 · Databases

Comprehensive Guide to Using DataX for Data Synchronization

This article provides a step‑by‑step tutorial on installing, configuring, and using Alibaba's open‑source DataX tool to perform both full and incremental data synchronization between MySQL databases on Linux, covering framework design, job architecture, JSON job files, and practical command‑line examples.

DataXETLJSON
0 likes · 14 min read
Comprehensive Guide to Using DataX for Data Synchronization
Data Thinking Notes
Data Thinking Notes
Jan 31, 2023 · Fundamentals

Mastering Data Governance: From Metadata to ETL in One Guide

This comprehensive guide walks you through the entire data governance ecosystem, covering metadata fundamentals, classification, maturity models, data standards, modeling, integration, lifecycle management, quality assurance, security, and ETL processes, all illustrated with clear diagrams and practical steps.

Data GovernanceData IntegrationData Quality
0 likes · 13 min read
Mastering Data Governance: From Metadata to ETL in One Guide
Code Ape Tech Column
Code Ape Tech Column
Jan 28, 2023 · Big Data

Using Alibaba DataX for Offline Data Synchronization and Incremental Sync

This article introduces Alibaba DataX, explains its architecture and role in offline heterogeneous data synchronization, provides step‑by‑step Linux installation, demonstrates full‑load and incremental MySQL‑to‑MySQL sync with JSON job templates, and shares practical tips for handling large data volumes.

Data IntegrationDataXETL
0 likes · 15 min read
Using Alibaba DataX for Offline Data Synchronization and Incremental Sync
Data Thinking Notes
Data Thinking Notes
Jan 12, 2023 · Big Data

Mastering Alibaba DataWorks: Data Warehouse Architecture & Modeling Guide

This comprehensive tutorial walks you through Alibaba DataWorks' data warehouse architecture, covering technical stack selection, three‑layer warehouse design (ODS, CDM, ADS), detailed data modeling with DDL examples, storage strategies, dimension and fact table conventions, and best‑practice hierarchical call standards.

DataModelingDataWarehouseDataWorks
0 likes · 27 min read
Mastering Alibaba DataWorks: Data Warehouse Architecture & Modeling Guide
Ctrip Technology
Ctrip Technology
Jan 12, 2023 · Big Data

Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0

This article details the evolution of Ctrip's log infrastructure, describing the shift from fragmented departmental logging to a unified Elasticsearch-based platform, the migration to ClickHouse for cost‑effective, high‑performance storage, and the subsequent Log 3.0 redesign that leverages Kubernetes, sharding, and a unified query governance layer to handle petabyte‑scale data.

Big DataClickHouseCloud Native
0 likes · 16 min read
Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0
DataFunTalk
DataFunTalk
Jan 6, 2023 · Big Data

ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution

This article presents the architecture and practical experience of ZhongAn's hundred‑billion‑scale data integration service, covering common integration technologies, business support scenarios for offline and real‑time data, technical challenges, evolution from single‑machine to service‑oriented designs, and future directions using Flink and DataX.

Data IntegrationData PlatformDataX
0 likes · 31 min read
ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution
DataFunTalk
DataFunTalk
Dec 24, 2022 · Big Data

Evolution of Data Platforms: From Early Computers to the Modern Data Stack

This article traces the history of data platforms—from the first general‑purpose computers and traditional BI, through the rise of data warehouses, big‑data frameworks like Hadoop, Spark and Flink, to the modern data‑stack era with cloud‑native architectures, Lambda/Kappa models, and emerging tools—highlighting key technologies, architectural shifts, and future prospects.

Big DataCloud ComputingETL
0 likes · 26 min read
Evolution of Data Platforms: From Early Computers to the Modern Data Stack
Data Thinking Notes
Data Thinking Notes
Dec 23, 2022 · Big Data

How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

This article explains why real‑time data warehouses are becoming essential, outlines their goals, compares them with traditional offline warehouses, and presents detailed design patterns, naming conventions, and case studies from Didi, Kuaishou, Tencent, Youzan and other enterprises, highlighting challenges and solutions for streaming, storage, and query layers.

Big Data ArchitectureData LakeETL
0 likes · 49 min read
How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices
Ziru Technology
Ziru Technology
Dec 16, 2022 · Big Data

How to Effectively Test Offline Data Metrics and Data Warehouse Pipelines

This article explains what data metrics are, compares offline metric testing with traditional testing, and provides a comprehensive step‑by‑step guide for testing data collection, ETL, warehouse models, metric calculations, scheduling, security, and API outputs in a Hive‑based data warehouse.

ETLdata validationdata-warehouse
0 likes · 9 min read
How to Effectively Test Offline Data Metrics and Data Warehouse Pipelines
Architecture Digest
Architecture Digest
Dec 1, 2022 · Big Data

Understanding Data Warehouse Architecture and Layered Design

This article explains the concepts, architecture, and layered design of data warehouses, covering data flow, ETL processes, ODS, DWD, DWM, DWS, ADS layers, their characteristics, differences from databases, and the role of data marts in supporting OLAP and decision‑making.

AnalyticsBig DataData Layers
0 likes · 13 min read
Understanding Data Warehouse Architecture and Layered Design
DevOps Cloud Academy
DevOps Cloud Academy
Nov 22, 2022 · Big Data

Components and Key Terminology in Apache Airflow

Apache Airflow’s architecture consists of schedulers, executors, workers, a web server, and a metadata database, enabling scalable workflow orchestration, while essential terminology such as DAGs, operators, and sensors defines how tasks are organized, executed, and monitored within data pipelines.

Apache AirflowBig DataDAG
0 likes · 8 min read
Components and Key Terminology in Apache Airflow
Data Thinking Notes
Data Thinking Notes
Nov 16, 2022 · Big Data

Why Metadata Management Is Essential for Data Warehouses

This article explains the concept of metadata, its role in data warehouses, why managing metadata is critical for building, maintaining, and scaling data warehouse systems, and outlines practical steps, use cases, and tools for effective metadata management.

Data GovernanceETLdata-warehouse
0 likes · 15 min read
Why Metadata Management Is Essential for Data Warehouses
Tencent Cloud Developer
Tencent Cloud Developer
Nov 7, 2022 · Big Data

Data Engineering and Data Warehouse Design: Principles, Practices, and Governance

The article outlines comprehensive data‑engineering and warehouse‑design principles—covering collection (four Ws and methods like SDK, point‑code, binlog), reporting strategies, source selection, modeling with fact, aggregation, dimension and model tables, quality checks, and governance practices such as standardized SDKs, metric libraries, automated lineage, and cost optimization—to share actionable experience for any organization.

Big DataData GovernanceETL
0 likes · 32 min read
Data Engineering and Data Warehouse Design: Principles, Practices, and Governance
Architecture Digest
Architecture Digest
Nov 5, 2022 · Big Data

Why Data Warehouse Modeling and Layered Architecture Matter

Data warehouse modeling organizes data into layered structures—ODS, DWD, DWS, and ADS—to improve performance, reduce costs, ensure data quality, enable traceability, simplify maintenance, and support both batch and real‑time analytics, while outlining best practices for ETL processes and schema design.

ETLModelinglayered architecture
0 likes · 37 min read
Why Data Warehouse Modeling and Layered Architecture Matter
dbaplus Community
dbaplus Community
Oct 30, 2022 · Big Data

Why Layered Data Warehouse Modeling Boosts Performance and Cuts Costs

This article explains the importance of layering in data warehouse modeling, outlines the four ETL steps, describes common pitfalls, presents a typical technical stack, and details each warehouse layer (ODS, DWD, DWS, ADS) along with best‑practice naming conventions and implementation tips for big‑data environments.

ETLModelingSpark
0 likes · 38 min read
Why Layered Data Warehouse Modeling Boosts Performance and Cuts Costs
Big Data Technology Architecture
Big Data Technology Architecture
Oct 25, 2022 · Big Data

Rebuilding Shopee's Data Integration Platform with Apache SeaTunnel

Shopee faced fragmented data‑ingestion pipelines, limited source support, and high maintenance overhead, so it evaluated open‑source tools and adopted Apache SeaTunnel to unify batch and streaming data transfers, simplify ETL workflows, and provide a scalable, extensible solution for its multi‑TB daily data processing needs.

ApacheData IntegrationETL
0 likes · 17 min read
Rebuilding Shopee's Data Integration Platform with Apache SeaTunnel
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 24, 2022 · Big Data

Comprehensive Guide to Big Data Modeling and Data Warehouse Design

This article provides an in‑depth overview of big‑data modeling concepts, covering why data modeling is essential, relational versus analytical systems, common warehouse modeling methodologies, Alibaba's practical implementations, dimension design techniques, and detailed fact‑table design principles for modern data platforms.

ETLdimensional modeling
0 likes · 50 min read
Comprehensive Guide to Big Data Modeling and Data Warehouse Design
37 Interactive Technology Team
37 Interactive Technology Team
Aug 8, 2022 · Backend Development

Time Management in Programming: Concepts, Practices, and Common Pitfalls

Time management in programming spans human concepts of time, language-specific handling of zones and timestamps, 32‑bit overflow risks, sync versus async processing, log timestamping, business‑level period calculations, and common pitfalls, emphasizing that mastering these nuances prevents bugs, improves performance, and enables reliable analytics.

ClickHouseETLPHP
0 likes · 20 min read
Time Management in Programming: Concepts, Practices, and Common Pitfalls
Snowball Engineer Team
Snowball Engineer Team
Aug 5, 2022 · Big Data

Snowball Data Warehouse Modeling and OneData System Implementation

This article outlines Snowball's data warehouse background, compares major modeling approaches such as ER, dimensional, DataVault and Anchor models, describes the current challenges of their dimensional model, and details the OneData methodology—including OneModel, OneID, and OneService—along with its practical implementation, results, and future plans.

Big DataData GovernanceETL
0 likes · 23 min read
Snowball Data Warehouse Modeling and OneData System Implementation
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 4, 2022 · Big Data

Comprehensive Guide to DataX: Introduction, Architecture, Usage, and Deployment

This article provides a detailed overview of DataX, covering its purpose, framework design, core architecture, scheduling process, practical examples of MySQL-to-MySQL synchronization, step‑by‑step installation and configuration of DataX‑WEB, UI usage, routing strategies, task types, and advanced task building techniques.

Big DataData IntegrationDataX
0 likes · 14 min read
Comprehensive Guide to DataX: Introduction, Architecture, Usage, and Deployment
Alibaba Cloud Native
Alibaba Cloud Native
Jul 15, 2022 · Cloud Native

Boost Data Analysis and ETL with Alibaba Cloud Function Compute Async Tasks

This guide explains how to use Alibaba Cloud Function Compute asynchronous tasks for large‑scale data analysis, database autonomous services, Kafka‑based ETL pipelines, and high‑performance video transcoding, highlighting architecture migration, cost reduction, deployment steps, and observable serverless task capabilities.

Async TasksCloud NativeETL
0 likes · 16 min read
Boost Data Analysis and ETL with Alibaba Cloud Function Compute Async Tasks
Programmer DD
Programmer DD
Jul 14, 2022 · Big Data

Master Fast Data Synchronization with Alibaba DataX: A Step‑by‑Step Guide

This article explains why traditional mysqldump and file‑based methods struggle with massive tables, introduces Alibaba DataX as a high‑performance offline data integration tool, details its architecture, and provides comprehensive installation and configuration steps for full and incremental MySQL‑to‑MySQL synchronization using JSON job files.

Big DataDataXETL
0 likes · 15 min read
Master Fast Data Synchronization with Alibaba DataX: A Step‑by‑Step Guide
Baidu Geek Talk
Baidu Geek Talk
Jun 15, 2022 · Big Data

Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges

The article proposes replacing the traditional multi‑layered data‑warehouse architecture (ODS‑DWD‑DWS‑ADS) with a single, column‑store wide‑table per business theme, achieving roughly 30 % storage savings and faster queries, while acknowledging higher ETL complexity, back‑tracking costs, and production timing challenges.

Big DataETLParquet
0 likes · 11 min read
Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges
Architect's Tech Stack
Architect's Tech Stack
May 28, 2022 · Big Data

Data Lake Challenges and the Open SPL Computing Engine

The article examines the inherent trade‑offs of data lakes—maintaining raw data, enabling efficient computation, and keeping costs low—explains why traditional data‑warehouse approaches fall short, and introduces the open‑source SPL engine that provides multi‑source, file‑based, high‑performance analytics to overcome these limitations.

Big DataData LakeETL
0 likes · 12 min read
Data Lake Challenges and the Open SPL Computing Engine
Architect
Architect
May 25, 2022 · Big Data

Metadata Infrastructure and Governance in Bilibili's Data Platform

The article details how Bilibili built a unified metadata infrastructure—including a URN‑based model, collection pipelines, quality assurance, storage in TiDB/ES/HugeGraph, and query services—to support data discovery, lineage, impact analysis, and governance across its growing data platform.

Big DataData CatalogData Governance
0 likes · 21 min read
Metadata Infrastructure and Governance in Bilibili's Data Platform
DataFunTalk
DataFunTalk
May 24, 2022 · Big Data

Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake

This article explains how Apache Flink integrates with Apache Hudi to enable real‑time data lake ingestion, covering the evolution from traditional data warehouses to data lakes, Hudi’s core concepts such as timeline and file grouping, copy‑on‑write vs merge‑on‑read modes, and Flink’s CDC‑based ETL pipeline.

Big DataCDCData Lake
0 likes · 18 min read
Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake
DataFunTalk
DataFunTalk
May 19, 2022 · Big Data

SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management

This article introduces Apache SeaTunnel, a distributed, high‑performance data integration platform built on Spark and Flink, outlines its technical features, workflow, and plugin ecosystem, and details a concrete traffic‑management use case involving incremental Oracle‑to‑warehouse data synchronization with Spark resources and scheduled shell scripts.

Apache FlinkApache SparkBig Data
0 likes · 12 min read
SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management
ITPUB
ITPUB
Apr 27, 2022 · Databases

Mastering Data Warehouse Standards: Architecture, Layer Design, and Naming Conventions

This comprehensive guide explains data‑warehouse construction standards, covering model architecture principles, public development rules, layer‑by‑layer design specifications, and systematic naming conventions for tables, dimensions, and metrics to ensure consistency, scalability, and reliable data governance.

Big DataDatabase StandardsETL
0 likes · 26 min read
Mastering Data Warehouse Standards: Architecture, Layer Design, and Naming Conventions
DataFunSummit
DataFunSummit
Apr 4, 2022 · Big Data

User Portrait Scenarios and Technical Implementation Solutions

This article presents a comprehensive overview of user portrait applications across various industries, detailing common scenarios, product functionalities, and a step‑by‑step technical solution that includes data collection, tag management, ETL pipelines, and service architecture for real‑time and offline processing.

ETLSCRMTag Management
0 likes · 18 min read
User Portrait Scenarios and Technical Implementation Solutions
58 Tech
58 Tech
Mar 29, 2022 · Big Data

Design and Implementation of the 58 Group Penalty Data Center

This article presents the design, architecture, and implementation of a unified penalty data center for 58 Group, detailing the challenges of heterogeneous data sources, the selection of Flink for real‑time ETL, the use of a DSL and LRU aggregation, and the adoption of MVEL for feature recognition to achieve standardized, high‑performance penalty data processing.

Big DataETLFlink
0 likes · 13 min read
Design and Implementation of the 58 Group Penalty Data Center
DataFunTalk
DataFunTalk
Mar 5, 2022 · Big Data

Designing Cross‑Period Dependencies in Data Scheduling Systems

This article explains how data scheduling systems manage task execution, ETL processes, and cross‑period dependencies by linking task versions, data partitions, and time parameters, and introduces the offset‑and‑cnt model to express dynamic dependencies in big‑data pipelines.

DAGData SchedulingETL
0 likes · 14 min read
Designing Cross‑Period Dependencies in Data Scheduling Systems
ByteDance Data Platform
ByteDance Data Platform
Feb 21, 2022 · Big Data

Choosing the Right Components for Enterprise Data Warehouses: Hive vs SparkSQL

This article examines how to design enterprise‑grade data warehouses by evaluating development convenience, ecosystem, decoupling, performance and security, compares Hive and SparkSQL along with other engines such as Presto, Doris and ClickHouse, and outlines best‑practice component selections for long‑running batch and interactive analytics.

ArchitectureBig DataETL
0 likes · 19 min read
Choosing the Right Components for Enterprise Data Warehouses: Hive vs SparkSQL
dbaplus Community
dbaplus Community
Feb 15, 2022 · Big Data

Mastering Data Warehouse Architecture: Concepts, Modeling Techniques, and Real‑Time Strategies

This comprehensive guide explains data warehouse fundamentals, architecture layers, modeling methods such as dimensional and entity modeling, metadata management, and the transition from offline to real‑time processing with Lambda and Kappa architectures, providing practical steps, best practices, and key terminology for building robust analytical platforms.

Big DataETLReal-time Processing
0 likes · 63 min read
Mastering Data Warehouse Architecture: Concepts, Modeling Techniques, and Real‑Time Strategies
DataFunTalk
DataFunTalk
Jan 22, 2022 · Big Data

Alibaba Cloud Data Integration (DataX) Architecture, Design Principles, and Solution Overview

This presentation details Alibaba Cloud DataWorks Data Integration (DataX), covering its architecture, core design principles, offline and real‑time synchronization mechanisms, deployment modes, product positioning, use‑case scenarios, and its role within the broader DataWorks ecosystem, highlighting its capabilities for large‑scale data movement and processing.

Alibaba CloudBig DataData Integration
0 likes · 19 min read
Alibaba Cloud Data Integration (DataX) Architecture, Design Principles, and Solution Overview
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 31, 2021 · Big Data

Apache SeaTunnel Joins the Apache Incubator: Overview, Features, and Real‑World Use Cases

SeaTunnel, the China‑originated data‑integration platform built on Spark and Flink, has been accepted into the Apache Incubator, and this article introduces its history, architecture, plugin ecosystem, deployment requirements, and numerous enterprise deployments across batch and streaming big‑data scenarios.

ApacheBig DataData Integration
0 likes · 7 min read
Apache SeaTunnel Joins the Apache Incubator: Overview, Features, and Real‑World Use Cases
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 30, 2021 · Big Data

User Portrait Development Process and Key Deliverables

This article outlines a comprehensive seven‑stage workflow for building enterprise user portraits—from goal interpretation and requirement analysis through tag development, scheduling, service‑layer integration, productization, optimization, and finally deployment and performance tracking—highlighting critical outputs and common challenges at each step.

ETLdata engineeringtag development
0 likes · 8 min read
User Portrait Development Process and Key Deliverables
dbaplus Community
dbaplus Community
Nov 27, 2021 · Big Data

How Vipshop’s Hera Data Service Boosts Big Data Access and Performance

The article details the design, architecture, core features, scheduling logic, and performance gains of Vipshop’s self‑built Hera data service, which unifies data‑warehouse access, supports multiple engines, adapts SQL execution, and dramatically improves SLA for both B‑to‑B and B‑to‑C workloads.

Big DataData ServiceETL
0 likes · 22 min read
How Vipshop’s Hera Data Service Boosts Big Data Access and Performance
DataFunTalk
DataFunTalk
Nov 20, 2021 · Big Data

How to Build a Big Data Platform from Zero to One: Architecture, Components, and Best Practices

This article provides a comprehensive guide to designing and implementing a big‑data platform, covering architecture overview, data ingestion with Flume, storage on HDFS/Hive/HBase, processing engines such as Hive, Spark and Flink, scheduling solutions like Azkaban and Airflow, and the construction of self‑service analytics systems.

Big DataETLHadoop
0 likes · 29 min read
How to Build a Big Data Platform from Zero to One: Architecture, Components, and Best Practices
Big Data Technology Architecture
Big Data Technology Architecture
Nov 13, 2021 · Big Data

Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake

This article details Baicaowei's migration from an IDC‑hosted Hadoop cluster to a cloud‑native data lake on Alibaba Cloud, outlining the business drivers, pain points of the legacy platform, architectural goals, design principles, solution selection, implementation steps, and future outlook for the new big‑data ecosystem.

Alibaba CloudBig DataDelta Lake
0 likes · 16 min read
Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake
Architects' Tech Alliance
Architects' Tech Alliance
Sep 11, 2021 · Big Data

Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices

This article explains what a data warehouse is, contrasts it with traditional databases, outlines how to design and build a warehouse—including model selection, subject‑area definition, bus matrix, layering, and data quality—while also covering related concepts such as data middle platforms, data lakes, metadata, and modeling techniques.

Big DataData QualityETL
0 likes · 16 min read
Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices
dbaplus Community
dbaplus Community
Aug 31, 2021 · Big Data

How Meituan Waimai Built and Evolved Its Massive Data Warehouse from V1 to V3

This article details Meituan Waimai's data warehouse evolution—covering business context, four‑layer architecture, Spark‑based ETL, successive V1.0, V2.0, and V3.0 redesigns, data governance practices, resource‑optimization tactics, security measures, and future road‑maps—illustrated with diagrams and concrete technical choices.

Data GovernanceETLResource Optimization
0 likes · 24 min read
How Meituan Waimai Built and Evolved Its Massive Data Warehouse from V1 to V3
DataFunSummit
DataFunSummit
Aug 22, 2021 · Big Data

Evolution and Optimization of Meituan Waimai Offline Data Warehouse: Architecture, ETL, Modeling, Governance, and Future Plans

This article details the historical development, architectural layers, ETL migration to Spark, data modeling standards, governance processes, resource optimization, security measures, and future roadmap of Meituan Waimai's offline data warehouse, illustrating how the team addressed scalability and efficiency challenges.

Big DataData GovernanceETL
0 likes · 21 min read
Evolution and Optimization of Meituan Waimai Offline Data Warehouse: Architecture, ETL, Modeling, Governance, and Future Plans
IT Architects Alliance
IT Architects Alliance
Aug 22, 2021 · Big Data

Understanding ETL and Building Enterprise Data Warehouses: Concepts, Architecture, and Step‑by‑Step Techniques

This article explains the fundamentals of ETL, describes data warehouse architectures such as star and snowflake schemas, outlines a five‑step methodology for constructing enterprise‑level data warehouses, and discusses advanced ETL techniques, tools, and algorithm choices for effective data integration and management.

DW ArchitectureETLdata-warehouse
0 likes · 24 min read
Understanding ETL and Building Enterprise Data Warehouses: Concepts, Architecture, and Step‑by‑Step Techniques
Qunar Tech Salon
Qunar Tech Salon
Aug 16, 2021 · Operations

Design and Practice of Qunar Data Synchronization Platform: ES Multi‑Version Migration, High Availability, and Data Consistency

The article details Qunar's data synchronization platform that aggregates MySQL data into Elasticsearch, covering its architecture, component choices, ES5‑to‑ES7 migration, hot‑plugging, reindexing, high‑availability design, consistency guarantees, operational optimizations, and future roadmap.

ETLElasticsearchMySQL
0 likes · 16 min read
Design and Practice of Qunar Data Synchronization Platform: ES Multi‑Version Migration, High Availability, and Data Consistency
IT Architects Alliance
IT Architects Alliance
Aug 14, 2021 · Big Data

An Introduction to Dimensional Modeling in Data Warehousing

This article provides a comprehensive overview of data warehouse concepts, compares classic warehouse models, explains dimensional modeling fundamentals such as fact and dimension tables, demonstrates a practical e‑commerce scenario with schema design and SQL query examples, and discusses real‑world trade‑offs.

Big DataETLStar Schema
0 likes · 9 min read
An Introduction to Dimensional Modeling in Data Warehousing
IT Architects Alliance
IT Architects Alliance
Aug 9, 2021 · Big Data

Data Warehouse Architecture Overview: Layers, Sources, Modeling, Storage, and Management

This article explains the logical layered architecture of modern data warehouses, covering data sources, ODS, DW/DWS layers, collection, storage on HDFS, synchronization tools, dimensional modeling (star, snowflake, constellation), metadata management, and task scheduling and monitoring, highlighting best practices for scalable big‑data solutions.

ETLdata-warehousemetadata
0 likes · 12 min read
Data Warehouse Architecture Overview: Layers, Sources, Modeling, Storage, and Management
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Aug 3, 2021 · Big Data

How BIGO Scaled Real‑Time Messaging by Migrating from Kafka to Pulsar

BIGO replaced its Kafka‑based message‑flow platform with Apache Pulsar to overcome scaling, stability, and operational cost challenges, leveraging Pulsar’s storage‑compute separation, seamless horizontal expansion, low latency, and tight integration with Flink for real‑time ETL and AB‑test pipelines, resulting in billions of messages processed daily with half the hardware cost.

Apache PulsarETLFlink
0 likes · 17 min read
How BIGO Scaled Real‑Time Messaging by Migrating from Kafka to Pulsar
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 6, 2021 · Big Data

Understanding Data Warehouses: Concepts, Architecture, Modeling, and Governance

This article provides a comprehensive overview of data warehouses, explaining their purpose, differences from databases, OLTP vs OLAP, traditional versus internet data warehouse models, layered architecture, modeling theories, metric dictionaries, date dimensions, naming conventions, data governance, and incremental synchronization techniques with practical SQL examples.

Big DataData GovernanceETL
0 likes · 24 min read
Understanding Data Warehouses: Concepts, Architecture, Modeling, and Governance
dbaplus Community
dbaplus Community
Jun 2, 2021 · Databases

How to Build a Mature Data Warehouse: 7 Essential Steps and Best Practices

This article explains why data warehouses are critical for decision‑making, outlines the challenges of immature warehouses, and provides a step‑by‑step framework—including goal setting, technology selection, problem identification, domain modeling, layer design, modeling principles, and governance standards—to help teams build a robust, maintainable data warehouse.

Big DataData ArchitectureDatabase design
0 likes · 22 min read
How to Build a Mature Data Warehouse: 7 Essential Steps and Best Practices
IT Architects Alliance
IT Architects Alliance
May 30, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's Flink‑based streaming ETL system, detailing business background, log classifications, specialized and generic ETL services, Python UDF integration, runtime optimizations, HDFS write tuning, SLA metrics, fault‑tolerance mechanisms, and future roadmap for unified data lakes and PyFlink support.

Big DataData IntegrationETL
0 likes · 19 min read
NetEase Game Streaming ETL Architecture and Practices Based on Flink
IT Architects Alliance
IT Architects Alliance
May 25, 2021 · Big Data

How Modern Data Middle Platforms Power Real‑Time and Offline Analytics

This article provides a comprehensive technical overview of data middle platforms, covering data aggregation, offline and real‑time development, smart operations, data asset management, governance, service layers, platform implementations, warehouse layering, and key differences between offline and real‑time data warehouses.

Big DataData GovernanceData Platform
0 likes · 26 min read
How Modern Data Middle Platforms Power Real‑Time and Offline Analytics
Programmer DD
Programmer DD
May 22, 2021 · Big Data

What Is a Data Lake? Origins, Architecture, and How It Powers Modern Big Data

This article explains the concept of a data lake—its origin in 2011, how it differs from traditional databases and data warehouses, its core characteristics such as raw data storage, on‑demand computing, and schema‑on‑read, as well as its advantages, challenges, architectural components, and future outlook within the big‑data ecosystem.

Big DataData ArchitectureData Governance
0 likes · 20 min read
What Is a Data Lake? Origins, Architecture, and How It Powers Modern Big Data
JD Retail Technology
JD Retail Technology
May 13, 2021 · Big Data

Evolution and Architecture of JD.com Self‑Operated Rebate Platform

The article details the development, challenges, and redesign of JD.com’s self‑operated rebate system, describing its early monolithic architecture, data‑intensive processing pipeline, migration to a modular, high‑availability platform built on Spark, Hive, and Elasticsearch, and the resulting performance and operational improvements.

Big DataETLSpark
0 likes · 16 min read
Evolution and Architecture of JD.com Self‑Operated Rebate Platform
Architecture Digest
Architecture Digest
May 7, 2021 · Big Data

Comprehensive Overview of Data Middle Platform Architecture and Practices

This article provides a detailed introduction to data middle platform concepts, covering data aggregation, ingestion tools, offline and real‑time development, data governance, service layers, monitoring, and deployment patterns, illustrating how enterprises build unified data ecosystems across various industries.

Big DataData GovernanceData Platform
0 likes · 25 min read
Comprehensive Overview of Data Middle Platform Architecture and Practices
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 3, 2021 · Big Data

Unlocking the Power of Data Middle Platforms: Key Concepts and Best Practices

This article provides a comprehensive overview of data middle platforms, covering data aggregation, collection tools, offline and real‑time development, scheduling, baseline control, heterogeneous storage, data governance, service layers, monitoring, and the architectural differences between offline and real‑time data warehouses.

ETLReal-time Processingdata-warehouse
0 likes · 26 min read
Unlocking the Power of Data Middle Platforms: Key Concepts and Best Practices
DataFunTalk
DataFunTalk
Mar 24, 2021 · Big Data

Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform

This article details how KuJiaLe's big data team replaced their legacy ADB and Presto clusters with a DorisDB MPP database, achieving sub‑second query latency, unified real‑time and offline analytics, simplified ETL pipelines, and significant cost savings while supporting billion‑row tables and high‑QPS workloads.

Big DataDorisDBETL
0 likes · 9 min read
Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 2, 2021 · Big Data

An Introduction to Kafka Connect: Architecture, Components, and Hands‑On Setup

This article introduces Kafka Connect, explaining its purpose as a scalable and reliable tool for moving data between Apache Kafka and external systems, detailing its core concepts, architecture, deployment modes, configuration files, and a step‑by‑step example that streams data from a file source to a file sink.

Data IntegrationETLStreaming
0 likes · 12 min read
An Introduction to Kafka Connect: Architecture, Components, and Hands‑On Setup
Architects' Tech Alliance
Architects' Tech Alliance
Feb 21, 2021 · Big Data

Data Warehouse and Data Lake: Concepts, Architecture, and Comparison

This article provides an extensive overview of data warehouse and data lake concepts, their architectures, differences, components, and implementation considerations, covering topics such as OLTP/OLAP, ETL processes, data quality, cloud solutions, and the role of data platforms in modern enterprises.

Cloud ComputingData ArchitectureData Lake
0 likes · 92 min read
Data Warehouse and Data Lake: Concepts, Architecture, and Comparison
DataFunTalk
DataFunTalk
Feb 10, 2021 · Big Data

AirWorks Data Intelligence Platform: Architecture, Cloud‑Native Ingestion, and Financial Asset Management Use Case

The article presents Entropy Simplify's AirWorks data intelligence platform, detailing its three‑layer architecture, cloud‑native multi‑source data ingestion system, low‑code ETL capabilities, technical features such as multi‑engine cooperation and data‑skew handling, and a financial asset‑management case study.

Big DataETLFinancial Services
0 likes · 16 min read
AirWorks Data Intelligence Platform: Architecture, Cloud‑Native Ingestion, and Financial Asset Management Use Case
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Feb 4, 2021 · Big Data

Unlocking Data Middle Platform: From Ingestion to Real‑Time Analytics

This article provides a comprehensive overview of data middle platform concepts, covering data aggregation, ingestion tools, offline and real‑time development, scheduling, baseline control, heterogeneous storage, recommendation dependencies, data permissions, layered data architecture (ODS, DW, DWD, DWS, TDM, ADS), asset management, governance, service APIs, query and analysis services, as well as monitoring, alerting, and operational best practices for building robust big‑data solutions.

Big DataETLdata-warehouse
0 likes · 25 min read
Unlocking Data Middle Platform: From Ingestion to Real‑Time Analytics
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jan 14, 2021 · Big Data

How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM

Yidun’s public-opinion monitoring platform transforms massive raw web data into a unified format by separating dynamic Groovy-script-driven cleaning from static processing, achieving real-time source integration, high throughput, scalability, and high availability while addressing format diversity, team coordination, and performance-flexibility trade-offs.

Big DataETLGroovy
0 likes · 5 min read
How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM
Architect
Architect
Dec 22, 2020 · Big Data

Dimensional Modeling in Data Warehousing: Concepts, Theory, and Practical Example

This article explains data warehouse fundamentals, reviews classic warehouse models such as ER, dimensional, Data Vault and Anchor, then dives deep into dimensional modeling concepts, star and snowflake schemas, and demonstrates a practical e‑commerce scenario with SQL examples and trade‑offs.

Big DataETLStar Schema
0 likes · 11 min read
Dimensional Modeling in Data Warehousing: Concepts, Theory, and Practical Example
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Dec 18, 2020 · Big Data

Unlocking the Data Middle Platform: From Ingestion to Real‑Time Analytics

This article provides a comprehensive overview of data middle platform concepts, covering data aggregation, collection tools, development modules, job scheduling, baseline control, heterogeneous storage, permission management, real‑time and offline processing, governance, services, and implementation details for building robust big‑data solutions.

Data GovernanceData PlatformETL
0 likes · 25 min read
Unlocking the Data Middle Platform: From Ingestion to Real‑Time Analytics
DeWu Technology
DeWu Technology
Dec 11, 2020 · Big Data

Data Synchronization from MySQL to Elasticsearch using DataX and Canal

The article explains how to improve query performance by flattening multi‑table MySQL data and synchronizing it to Elasticsearch—using DataX for one‑time bulk loading and Canal (with Canal‑Adapter) for real‑time binlog‑driven incremental updates—while detailing configuration steps, job examples, and common pitfalls.

CanalDataXETL
0 likes · 14 min read
Data Synchronization from MySQL to Elasticsearch using DataX and Canal
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 29, 2020 · Big Data

Installing and Configuring Kettle (Pentaho Data Integration) on Linux for Hadoop ETL

This guide provides a step‑by‑step tutorial on preparing a Linux environment, installing Java, GNOME Desktop, VNC remote access, Chinese language support, downloading and extracting Kettle, configuring its startup scripts, creating desktop shortcuts, and managing essential Kettle configuration files for successful Hadoop ETL development.

ETLInstallationKettle
0 likes · 37 min read
Installing and Configuring Kettle (Pentaho Data Integration) on Linux for Hadoop ETL
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 28, 2020 · Big Data

ETL Fundamentals and Introduction to Kettle (Pentaho Data Integration)

This article provides an in-depth overview of ETL concepts, including extraction, transformation, loading, data warehouse architecture, and detailed discussion of Kettle (Pentaho Data Integration) features, design principles, components, transformations, jobs, database connections, metadata management, and practical examples for building robust data integration pipelines.

Data IntegrationETLKettle
0 likes · 57 min read
ETL Fundamentals and Introduction to Kettle (Pentaho Data Integration)
dbaplus Community
dbaplus Community
Nov 26, 2020 · Big Data

Silicon Valley's Data Middle Platform Secrets: EA, Twitter, Airbnb, Uber

This article examines how leading Silicon Valley companies such as EA, Twitter, Airbnb, and Uber design and operate data middle platforms—detailing their architectures, data collection pipelines, standardization efforts, real‑time and batch processing, and the business impact of shared data capabilities.

Big DataData ArchitectureData Platform
0 likes · 25 min read
Silicon Valley's Data Middle Platform Secrets: EA, Twitter, Airbnb, Uber
DataFunTalk
DataFunTalk
Nov 26, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Architecture and Technology

This article details the evolution of 58.com’s commercial data warehouse across three phases—1.0, 2.0, and 3.0—covering its scale, four‑layer architecture, migration from legacy Hadoop‑MapReduce pipelines to Flume/Kafka and Flink streaming, code optimizations, monitoring, and productization for real‑time business insights.

ArchitectureBig DataETL
0 likes · 9 min read
Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Architecture and Technology
Beike Product & Technology
Beike Product & Technology
Nov 13, 2020 · Big Data

Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook

The article summarizes Beike's one‑stop big data development platform, describing its data business background, the evolution from a simple Hadoop‑Kafka‑Hive stack to a metadata‑driven, asset‑oriented platform, and outlines current capabilities in data management, integration, scheduling, quality, openness, and future plans.

Big DataData GovernanceData Platform
0 likes · 11 min read
Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook