Tagged articles
237 articles
Page 2 of 3
macrozheng
macrozheng
Mar 27, 2023 · Big Data

Top 8 Open-Source ETL Tools for Efficient Data Migration

This guide reviews eight popular ETL and data migration tools—including Kettle, DataX, DataPipeline, Talend, DataStage, Sqoop, FineDataLink, and Canal—detailing their core features, architectures, and use cases to help engineers choose the right solution for reliable data integration.

Big DataData IntegrationData Migration
0 likes · 14 min read
Top 8 Open-Source ETL Tools for Efficient Data Migration
Su San Talks Tech
Su San Talks Tech
Mar 24, 2023 · Big Data

Top 8 Open-Source ETL Tools You Should Know for Efficient Data Migration

Explore a comprehensive overview of eight popular ETL and data migration tools—including Kettle, DataX, DataPipeline, Talend, DataStage, Sqoop, FineDataLink, and Canal—detailing their features, architectures, and use cases to help you choose the right solution for efficient data integration.

Big DataData IntegrationData Migration
0 likes · 13 min read
Top 8 Open-Source ETL Tools You Should Know for Efficient Data Migration
Architects Research Society
Architects Research Society
Mar 5, 2023 · Big Data

Best Open‑Source and Commercial ETL Tools: Detailed Comparison

This article introduces the concept of ETL, explains its importance for modern data‑driven applications, and provides a comprehensive comparison of the most popular open‑source and commercial ETL platforms—including their key features, supported data sources, and deployment options—helping readers choose the right tool for their data integration needs.

Big DataData IntegrationData Warehouse
0 likes · 19 min read
Best Open‑Source and Commercial ETL Tools: Detailed Comparison
HomeTech
HomeTech
Mar 1, 2023 · Backend Development

Overview of the Wenjie Low-Code Platform: Architecture, Technologies, and Use Cases

The article presents a comprehensive overview of the Wenjie low-code platform, detailing its motivation, front‑end React framework, back‑end Spring Cloud micro‑services architecture, PowerJob scheduler, custom ORM, various data‑modeling and data‑processing scenarios, dashboard visualizations, monitoring and alerting features, as well as future plans and a concluding summary.

Data IntegrationReactSpring Cloud
0 likes · 11 min read
Overview of the Wenjie Low-Code Platform: Architecture, Technologies, and Use Cases
DataFunTalk
DataFunTalk
Feb 27, 2023 · Big Data

Comprehensive Overview of Data Middle Platform Architecture and Its Core Frameworks

This article provides a detailed overview of data middle platform concepts, describing a decoupled six‑subsystem architecture—including storage, collection, processing, governance, security, and operation frameworks—while illustrating typical enterprise implementations, industry‑specific solutions, and best‑practice considerations for building scalable, secure, and value‑driven data platforms.

Big DataData GovernanceData Integration
0 likes · 25 min read
Comprehensive Overview of Data Middle Platform Architecture and Its Core Frameworks
Aikesheng Open Source Community
Aikesheng Open Source Community
Feb 24, 2023 · Databases

SQLE 2.2302.0 Release Notes – New Features, Enhancements, and Bug Fixes

The SQLE 2.2302.0 release introduces data source import from external platforms, operation‑record viewing, improved CloudBeaver integration, manual deployment support, numerous UI optimizations, and a long list of bug fixes, providing a more seamless and secure SQL auditing experience for both enterprise and community users.

Bug FixesCloudBeaverData Integration
0 likes · 8 min read
SQLE 2.2302.0 Release Notes – New Features, Enhancements, and Bug Fixes
DataFunTalk
DataFunTalk
Feb 2, 2023 · Big Data

SeaTunnel: Design Goals, Current Status, Architecture, and Future Roadmap

This article provides a comprehensive overview of Apache SeaTunnel, covering its design objectives, current capabilities such as multi‑engine support and extensive connector ecosystem, detailed architecture including engine‑independent APIs and execution flows, and outlines the upcoming roadmap to expand connectors, launch a visual web UI, and introduce a dedicated SeaTunnel Engine.

ApacheBatch ProcessingBig Data
0 likes · 12 min read
SeaTunnel: Design Goals, Current Status, Architecture, and Future Roadmap
Data Thinking Notes
Data Thinking Notes
Jan 31, 2023 · Fundamentals

Mastering Data Governance: From Metadata to ETL in One Guide

This comprehensive guide walks you through the entire data governance ecosystem, covering metadata fundamentals, classification, maturity models, data standards, modeling, integration, lifecycle management, quality assurance, security, and ETL processes, all illustrated with clear diagrams and practical steps.

Data GovernanceData IntegrationData Quality
0 likes · 13 min read
Mastering Data Governance: From Metadata to ETL in One Guide
DataFunTalk
DataFunTalk
Jan 31, 2023 · Big Data

SPI Refactoring Practice in Apache InLong Manager to Reduce Maintenance Cost and Enhance Extensibility

This article presents the SPI-based refactoring of Apache InLong Manager, describing the project's background, existing maintenance challenges, the concept of Java Service Provider Interface, the concrete implementation steps, code restructuring, and the resulting benefits such as higher code reuse, easier extension, and reduced DDL changes.

Apache InLongBig DataCode Refactoring
0 likes · 10 min read
SPI Refactoring Practice in Apache InLong Manager to Reduce Maintenance Cost and Enhance Extensibility
Data Thinking Notes
Data Thinking Notes
Jan 29, 2023 · Big Data

How to Turn Data Assets into Business Value: A Roadmap for Enterprises

Enterprises must shift their perception of data assets and embed data‑value into every digital process, establishing governance, unified asset catalogs, operational metrics, security controls, integration, services, and visualization to transform raw data into strategic business outcomes.

Big DataData GovernanceData Integration
0 likes · 12 min read
How to Turn Data Assets into Business Value: A Roadmap for Enterprises
Code Ape Tech Column
Code Ape Tech Column
Jan 28, 2023 · Big Data

Using Alibaba DataX for Offline Data Synchronization and Incremental Sync

This article introduces Alibaba DataX, explains its architecture and role in offline heterogeneous data synchronization, provides step‑by‑step Linux installation, demonstrates full‑load and incremental MySQL‑to‑MySQL sync with JSON job templates, and shares practical tips for handling large data volumes.

Data IntegrationDataXETL
0 likes · 15 min read
Using Alibaba DataX for Offline Data Synchronization and Incremental Sync
ITPUB
ITPUB
Jan 26, 2023 · Big Data

How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse

This article explains the challenges of a Lambda‑architecture data pipeline, introduces NetEase’s Arctic lakehouse built on Apache Iceberg, details its table‑store design, optimization cycles, consistency mechanisms, real‑time features, practical use cases, and future roadmap, highlighting its advantages over similar solutions.

ArcticData IntegrationFlink
0 likes · 14 min read
How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse
DataFunSummit
DataFunSummit
Jan 24, 2023 · Big Data

Building a Real-Time Data and User Profiling Architecture with Apache Doris at Zhihu

The article details Zhihu's data empowerment team's design and implementation of a low‑cost, high‑response real‑time data platform built on Apache Doris, covering real‑time business metrics, algorithm features, and user profiling, and explains the challenges, architectural choices, tooling, performance gains, and future directions.

Apache DorisData IntegrationData Quality
0 likes · 22 min read
Building a Real-Time Data and User Profiling Architecture with Apache Doris at Zhihu
JD Tech
JD Tech
Jan 13, 2023 · Big Data

UData: Solving the Last Mile of Data Usage – Architecture, Query Engine Design, and Federated Query Enhancements

This article introduces the UData platform, explains its data‑integration architecture, details the StarRocks‑based query engine workflow from SQL parsing to distributed execution, and describes recent optimizations such as computation push‑down, support for JSF/HTTP/ClickHouse external tables, and a proxy‑based federated query framework.

Big DataData IntegrationQuery Engine
0 likes · 20 min read
UData: Solving the Last Mile of Data Usage – Architecture, Query Engine Design, and Federated Query Enhancements
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 12, 2023 · Operations

What Is DataOps and How Can It Transform Your Data Management?

DataOps, the data‑centric counterpart of DevOps, combines agile principles, standardized tools, and cross‑team collaboration to manage the full data lifecycle—from integration and development to storage, governance, and service—enabling organizations to handle massive, diverse datasets efficiently, reduce silos, and turn data into actionable value.

Big DataData GovernanceData Integration
0 likes · 15 min read
What Is DataOps and How Can It Transform Your Data Management?
DataFunTalk
DataFunTalk
Jan 6, 2023 · Big Data

ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution

This article presents the architecture and practical experience of ZhongAn's hundred‑billion‑scale data integration service, covering common integration technologies, business support scenarios for offline and real‑time data, technical challenges, evolution from single‑machine to service‑oriented designs, and future directions using Flink and DataX.

Data IntegrationData PlatformDataX
0 likes · 31 min read
ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution
DataFunTalk
DataFunTalk
Nov 29, 2022 · Big Data

Summary of Flink Forward Asia 2022: Keynotes, Technical Innovations, and Industry Deployments of Apache Flink

The 2022 Flink Forward Asia conference highlighted Apache Flink’s rapid growth, showcased major technical advances such as upgraded checkpointing, cloud‑native state storage, Hybrid Shuffle, Flink CDC 2.0, and Flink ML 2.0, and presented real‑world deployments from Alibaba, Midea, miHoYo, and Disney.

Apache FlinkData IntegrationReal-time Streaming
0 likes · 25 min read
Summary of Flink Forward Asia 2022: Keynotes, Technical Innovations, and Industry Deployments of Apache Flink
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 29, 2022 · Big Data

How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data

The article explores Apache Flink’s eight‑year journey to becoming a top‑level Apache project, Alibaba’s extensive contributions, the rise of stream‑batch unified computing, its impact on real‑time data integration, cloud‑native deployment, and the emerging Flink‑based data‑warehouse and serverless solutions.

Apache FlinkBig DataCloud Native
0 likes · 15 min read
How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data
DataFunTalk
DataFunTalk
Nov 6, 2022 · Big Data

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

BitSail, an open‑source data integration engine from ByteDance, provides a unified solution for batch, streaming, full‑load, and incremental data synchronization across heterogeneous sources, detailing its background, technical evolution, architecture, low‑cost co‑building features, compatibility strategies, and future roadmap.

CDCData IntegrationFlink
0 likes · 18 min read
BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities
IT Services Circle
IT Services Circle
Oct 26, 2022 · Databases

Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide

This article introduces Debezium, an open‑source low‑latency change data capture platform that streams database row changes via Kafka, explains its architecture and common scenarios such as cache invalidation and CQRS, and provides step‑by‑step Docker commands to install ZooKeeper, Kafka, MySQL and the Debezium connector.

CDCData IntegrationDebezium
0 likes · 15 min read
Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide
Big Data Technology Architecture
Big Data Technology Architecture
Oct 25, 2022 · Big Data

Rebuilding Shopee's Data Integration Platform with Apache SeaTunnel

Shopee faced fragmented data‑ingestion pipelines, limited source support, and high maintenance overhead, so it evaluated open‑source tools and adopted Apache SeaTunnel to unify batch and streaming data transfers, simplify ETL workflows, and provide a scalable, extensible solution for its multi‑TB daily data processing needs.

ApacheData IntegrationETL
0 likes · 17 min read
Rebuilding Shopee's Data Integration Platform with Apache SeaTunnel
Alibaba Cloud Native
Alibaba Cloud Native
Sep 29, 2022 · Cloud Native

Why Use RocketMQ Connect for Scalable Data Pipelines?

This article explains the challenges of point‑to‑point data sync, introduces RocketMQ Connect as a cloud‑native solution that decouples upstream and downstream, details its architecture, connectors, REST API, metrics, deployment modes, and provides a step‑by‑step guide to building custom connectors for use cases such as CDC, data lakes, and system migration.

CDCCloud NativeConnector
0 likes · 19 min read
Why Use RocketMQ Connect for Scalable Data Pipelines?
HomeTech
HomeTech
Sep 13, 2022 · Big Data

Integrating Heterogeneous Data Sources with openLooKeng and Upgrading the Apache Kylin Connector at AutoHome

This article describes how AutoHome tackled the complexity of managing multiple relational, NoSQL, and Hive data stores by adopting openLooKeng for unified, cross‑source SQL queries, outlines its key features such as ANSI‑SQL support, diverse connectors, and query optimizations, and details the custom enhancements made to the Apache Kylin connector to better serve their commercial data analysis workloads.

Big DataConnectorsData Integration
0 likes · 13 min read
Integrating Heterogeneous Data Sources with openLooKeng and Upgrading the Apache Kylin Connector at AutoHome
Architects Research Society
Architects Research Society
Sep 6, 2022 · Fundamentals

Understanding Microsoft’s Common Data Model: Components, Benefits, and Real‑World Use

The article explains how Microsoft’s Common Data Model provides a shared metadata system and standardized, extensible data schemas that simplify integration across Power Apps, Power BI, Dynamics 365 and Azure, enabling consistent data semantics, easier app development, and scalable enterprise solutions.

Common Data ModelData IntegrationDynamics 365
0 likes · 7 min read
Understanding Microsoft’s Common Data Model: Components, Benefits, and Real‑World Use
Architect
Architect
Sep 2, 2022 · Frontend Development

Optimizing Front‑End and Back‑End Collaboration with Data Direct Capability at Baidu Commercial Front‑End Team

The article describes how Baidu's commercial front‑end team introduced a data‑direct capability and BFF layer to streamline front‑end/back‑end cooperation, reduce environment‑maintenance overhead, enable parallel development, and improve overall delivery efficiency across multiple project phases.

BFFCollaborationData Integration
0 likes · 9 min read
Optimizing Front‑End and Back‑End Collaboration with Data Direct Capability at Baidu Commercial Front‑End Team
Shopee Tech Team
Shopee Tech Team
Sep 2, 2022 · Big Data

Shopee Data System Challenges and Apache Hudi Practices

Shopee tackled its data‑system bottlenecks by customizing Apache Hudi to provide unified stream‑batch integration, efficient state‑detail snapshots, and low‑latency wide‑table generation, using CDC‑based bootstrapping, COW/MOR tables, savepoints and partial updates, which cut latency to ten minutes, lowered resource use, and yielded several community‑backed enhancements.

Apache HudiBig DataData Integration
0 likes · 18 min read
Shopee Data System Challenges and Apache Hudi Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 4, 2022 · Big Data

Comprehensive Guide to DataX: Introduction, Architecture, Usage, and Deployment

This article provides a detailed overview of DataX, covering its purpose, framework design, core architecture, scheduling process, practical examples of MySQL-to-MySQL synchronization, step‑by‑step installation and configuration of DataX‑WEB, UI usage, routing strategies, task types, and advanced task building techniques.

Big DataData IntegrationDataX
0 likes · 14 min read
Comprehensive Guide to DataX: Introduction, Architecture, Usage, and Deployment
DataFunTalk
DataFunTalk
Jul 30, 2022 · Databases

StarRocks-Based Unified Data Service and Analytics Platform at JD Logistics

JD Logistics leverages StarRocks to create the Udata unified query engine, addressing data silos, low performance, and high maintenance costs by integrating data services and analytics, enabling low‑code data service generation, high‑speed federated queries, real‑time updates, and future data‑lake and resource isolation capabilities.

Data IntegrationReal-time analyticsStarRocks
0 likes · 14 min read
StarRocks-Based Unified Data Service and Analytics Platform at JD Logistics
Big Data Technology Architecture
Big Data Technology Architecture
Jul 15, 2022 · Big Data

Using and Designing the Apache SeaTunnel Examples Module

This article introduces Apache SeaTunnel's Examples module, compares SeaTunnel with DataX, explains its multi‑engine design, demonstrates Flink and Spark example implementations, and shares the speaker's experiences contributing to the open‑source community, providing practical guidance for big‑data integration projects.

Apache SeaTunnelData IntegrationFlink
0 likes · 10 min read
Using and Designing the Apache SeaTunnel Examples Module
dbaplus Community
dbaplus Community
Jul 13, 2022 · Big Data

Unpacking the Core Technologies Behind Modern Big Data Platforms

From data ingestion to real‑time analytics, this guide breaks down the essential layers of a typical big‑data platform—covering collection methods, HDFS storage, Hive/Spark analysis, data sharing mechanisms, application use‑cases, streaming with Spark Streaming, and the need for robust scheduling and monitoring.

Big DataData IntegrationData Warehouse
0 likes · 9 min read
Unpacking the Core Technologies Behind Modern Big Data Platforms
JD Retail Technology
JD Retail Technology
Jun 10, 2022 · Big Data

Design and Implementation of an International Business Data Platform for JD.com's 618 Promotion

The article details JD International's challenges and solutions in building a unified, real‑time data platform for its multi‑regional 618 promotion, covering business characteristics, data distribution, team organization, dashboard architecture, integration strategies, and short‑ and long‑term technical plans.

Data IntegrationData PlatformFlink
0 likes · 8 min read
Design and Implementation of an International Business Data Platform for JD.com's 618 Promotion
Big Data Technology Architecture
Big Data Technology Architecture
Jun 9, 2022 · Databases

Building a Real‑Time Data Warehouse with Apache Doris: Architecture, Benefits, and Lessons Learned

This article details how a fast‑growing supply‑chain platform migrated from MySQL and Hive to Apache Doris for real‑time analytics, describing the architectural evolution, the advantages of the new design, practical implementation steps, encountered challenges, and the performance and cost benefits achieved.

Apache DorisData IntegrationFlink CDC
0 likes · 12 min read
Building a Real‑Time Data Warehouse with Apache Doris: Architecture, Benefits, and Lessons Learned
IT Architects Alliance
IT Architects Alliance
Jun 7, 2022 · Databases

Introduction to Change Data Capture (CDC) Practices

This article introduces the concept and practice of Change Data Capture (CDC), explaining how it captures database changes to provide real‑time incremental data for analytics and reporting without impacting source performance, and outlines modern CDC methods, challenges, and production‑ready system requirements.

CDCChange Data CaptureData Integration
0 likes · 8 min read
Introduction to Change Data Capture (CDC) Practices
Top Architect
Top Architect
Jun 7, 2022 · Databases

An Introduction to Change Data Capture (CDC) Practices and Modern Approaches

This article introduces the concept of Change Data Capture (CDC), explains why traditional batch reporting strains resources, describes how CDC captures only data changes to keep source databases performant, and outlines modern CDC architectures, production‑ready considerations, and best‑practice guidelines for building reliable data pipelines.

CDCChange Data CaptureData Integration
0 likes · 16 min read
An Introduction to Change Data Capture (CDC) Practices and Modern Approaches
DataFunTalk
DataFunTalk
May 19, 2022 · Big Data

SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management

This article introduces Apache SeaTunnel, a distributed, high‑performance data integration platform built on Spark and Flink, outlines its technical features, workflow, and plugin ecosystem, and details a concrete traffic‑management use case involving incremental Oracle‑to‑warehouse data synchronization with Spark resources and scheduled shell scripts.

Apache FlinkApache SparkBig Data
0 likes · 12 min read
SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management
Top Architect
Top Architect
May 11, 2022 · Databases

An Introduction to Change Data Capture (CDC) Practices

This article introduces the concept and practice of Change Data Capture (CDC), explaining why CDC is needed for real‑time analytics, how it works by capturing DML changes, modern approaches using transaction logs, and key considerations for building a production‑ready CDC system.

CDCChange Data CaptureData Integration
0 likes · 8 min read
An Introduction to Change Data Capture (CDC) Practices
Alibaba Cloud Native
Alibaba Cloud Native
Apr 20, 2022 · Cloud Native

How to Seamlessly Integrate Cloud Services with Alibaba EventBridge

This guide walks through Alibaba Cloud EventBridge’s event standardization, shows step‑by‑step how to integrate OSS events for automatic file unzipping, demonstrates custom event source filtering to store data in RDS, and explains using EventBridge event streams to route RocketMQ messages to MNS, complete with code snippets and configuration details.

Alibaba CloudCloud NativeData Integration
0 likes · 9 min read
How to Seamlessly Integrate Cloud Services with Alibaba EventBridge
DataFunTalk
DataFunTalk
Apr 20, 2022 · Big Data

OpenMLDB Pulsar Connector: A Real‑time Data Integration Guide

This article presents a step‑by‑step tutorial on using the OpenMLDB Pulsar Connector to stream real‑time data from Apache Pulsar into OpenMLDB, covering connector architecture, key features, Docker‑based installation, sink configuration, schema registration, message production, verification queries, and future roadmap details.

Apache PulsarConnectorData Integration
0 likes · 13 min read
OpenMLDB Pulsar Connector: A Real‑time Data Integration Guide
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 2, 2022 · Big Data

What’s New in Flink CDC 2.2? A Deep Dive into Added Sources and Core Features

The article introduces Flink CDC 2.2, highlighting its expanded support for twelve data sources—including OceanBase, PolarDB‑X, SqlServer, and TiDB—while detailing core features such as the incremental snapshot framework, multi‑version Flink compatibility, dynamic table addition, and numerous bug fixes and performance improvements.

Apache FlinkChange Data CaptureConnector
0 likes · 9 min read
What’s New in Flink CDC 2.2? A Deep Dive into Added Sources and Core Features
Architects Research Society
Architects Research Society
Mar 31, 2022 · R&D Management

The Strategic Role of Enterprise Architects and Their Key Focus Areas

Enterprise architects align IT strategy with business goals by managing application portfolios, technology risk, IT operations, security, data integration, and financial considerations, while balancing long‑term strategic planning with tactical execution in a rapidly changing environment.

Application Portfolio ManagementData Integrationenterprise architecture
0 likes · 5 min read
The Strategic Role of Enterprise Architects and Their Key Focus Areas

Data Lake Construction and Practice at NetEase Yanxuan

NetEase Yanxuan replaced its cumbersome data‑warehouse with a flexible Delta‑Lake/Iceberg data lake, creating a unified metadata layer and real‑time ingestion pipelines that cut latency from nightly batches to seconds, slashed compute and storage costs, supported diverse business scenarios and machine‑learning feature engineering, and set the stage for broader future expansion.

Data IntegrationData LakeDelta Lake
0 likes · 16 min read
Data Lake Construction and Practice at NetEase Yanxuan
IT Xianyu
IT Xianyu
Mar 3, 2022 · Databases

Introducing SPL: An Open‑Source Structured Data Processing Language with Full SQL‑92 Capabilities

SPL is an open‑source structured data processing language that extends full SQL‑92 functionality to a wide range of data sources—including CSV, Excel, JSON, NoSQL and Hadoop—allowing developers to perform complex queries, multi‑step calculations, and mixed‑source analytics without a traditional relational database.

Big DataData IntegrationSPL
0 likes · 14 min read
Introducing SPL: An Open‑Source Structured Data Processing Language with Full SQL‑92 Capabilities
vivo Internet Technology
vivo Internet Technology
Feb 23, 2022 · Big Data

Kafka-based Real-Time Data Warehouse: Architecture and Practice for Search

The article explains how Kafka serves as the core of a real‑time data warehouse for search, detailing its advantages over traditional databases, integration with Flink for low‑latency stream processing, architectural patterns such as Lambda/Kappa, scaling challenges, and comprehensive monitoring using Kafka Eagle.

Apache KafkaData IntegrationFlink
0 likes · 15 min read
Kafka-based Real-Time Data Warehouse: Architecture and Practice for Search
DataFunTalk
DataFunTalk
Jan 28, 2022 · Big Data

Real-Time Customer Data Platform (RT‑CDP) Architecture and Implementation at iFanFan

This article explains the concept, challenges, and key business goals of a real‑time Customer Data Platform, details the technology stack selection—including Nebula Graph, Apache Flink, Apache Beam, Kudu, and Doris—and describes the modular architecture, data model, identity service, streaming computation, storage layers, rule engine, operational results, and future directions.

Big DataCDPData Integration
0 likes · 43 min read
Real-Time Customer Data Platform (RT‑CDP) Architecture and Implementation at iFanFan
Baidu Geek Talk
Baidu Geek Talk
Jan 26, 2022 · Big Data

How a Real‑Time CDP Solves Data Silos: Architecture, Tech Choices & Lessons

This article examines the design and implementation of a tenant‑level real‑time Customer Data Platform, detailing CDP fundamentals, business and technical challenges, key architectural components, technology selections such as graph databases, stream processing, storage engines, and the operational practices that enable high‑throughput, low‑latency data integration and analytics.

CDPData IntegrationFlink
0 likes · 42 min read
How a Real‑Time CDP Solves Data Silos: Architecture, Tech Choices & Lessons
IT Architects Alliance
IT Architects Alliance
Jan 25, 2022 · Operations

Design and Architecture of a Shared Resource Platform and Its Technical System

This document outlines the logical and technical architecture of a government shared resource platform, describing application system upgrades, data collection and analysis, multi‑layer system design, standards compliance, interface management, and overall system integration for improved service quality and decision support.

Big DataData IntegrationGovernment IT
0 likes · 23 min read
Design and Architecture of a Shared Resource Platform and Its Technical System
DataFunTalk
DataFunTalk
Jan 22, 2022 · Big Data

Alibaba Cloud Data Integration (DataX) Architecture, Design Principles, and Solution Overview

This presentation details Alibaba Cloud DataWorks Data Integration (DataX), covering its architecture, core design principles, offline and real‑time synchronization mechanisms, deployment modes, product positioning, use‑case scenarios, and its role within the broader DataWorks ecosystem, highlighting its capabilities for large‑scale data movement and processing.

Alibaba CloudBig DataData Integration
0 likes · 19 min read
Alibaba Cloud Data Integration (DataX) Architecture, Design Principles, and Solution Overview
21CTO
21CTO
Jan 6, 2022 · R&D Management

CTO’s Three Phases: Why Architecture, Data, and Management Can’t Be Unified

The article reflects on a CTO’s evolution through three stages, examines why architecture, data, servers and R&D management cannot be fully unified in large organizations, and offers practical guidance on innovation, team growth, and establishing a coherent R&D culture.

CTOData IntegrationInnovation
0 likes · 13 min read
CTO’s Three Phases: Why Architecture, Data, and Management Can’t Be Unified
DataFunTalk
DataFunTalk
Jan 6, 2022 · Artificial Intelligence

Deep Application‑Driven Construction of Medical Knowledge Graphs

This article presents a comprehensive overview of medical knowledge graph development, covering global and domestic progress, domain characteristics, a detailed seven‑piece ontology and "Huizhi" graph construction process, platform support, and real‑world healthcare applications such as intelligent alerts, guideline recommendations, and data reporting.

Data IntegrationHealthcareMedical Knowledge Graph
0 likes · 11 min read
Deep Application‑Driven Construction of Medical Knowledge Graphs
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 31, 2021 · Big Data

Apache SeaTunnel Joins the Apache Incubator: Overview, Features, and Real‑World Use Cases

SeaTunnel, the China‑originated data‑integration platform built on Spark and Flink, has been accepted into the Apache Incubator, and this article introduces its history, architecture, plugin ecosystem, deployment requirements, and numerous enterprise deployments across batch and streaming big‑data scenarios.

ApacheBig DataData Integration
0 likes · 7 min read
Apache SeaTunnel Joins the Apache Incubator: Overview, Features, and Real‑World Use Cases
DataFunSummit
DataFunSummit
Dec 28, 2021 · Artificial Intelligence

Deep Application‑Driven Construction of Medical Knowledge Graphs: Methods, Models, and Case Studies

This article presents a comprehensive overview of medical knowledge graph development, covering global and domestic progress, domain characteristics, a six‑step construction workflow—including schema design, ontology term set creation, and graph building—and showcases practical applications such as intelligent alerts, guideline recommendations, and data direct reporting.

Big DataData IntegrationHealthcare
0 likes · 11 min read
Deep Application‑Driven Construction of Medical Knowledge Graphs: Methods, Models, and Case Studies
Architects Research Society
Architects Research Society
Dec 23, 2021 · Fundamentals

Enterprise Integration: Challenges, Models, and Techniques

The article explains enterprise integration as the essential practice of connecting applications, data, and devices across distributed, cloud‑native environments, covering its evolution, key challenges, and core techniques such as messaging, application connectors, data flow platforms, integration patterns, and APIs.

APIsCloud NativeData Integration
0 likes · 7 min read
Enterprise Integration: Challenges, Models, and Techniques
Ctrip Technology
Ctrip Technology
Dec 16, 2021 · Big Data

Data Standard Management Practices in Ctrip Vacation Data Governance

This article outlines Ctrip Vacation's data standard management approach, covering why standards are needed, the three‑element framework of scope, tools, and policies, and detailed practices for data integration, production change handling, metadata governance, portal dashboard standardization, and self‑service query templating.

Big DataData GovernanceData Integration
0 likes · 12 min read
Data Standard Management Practices in Ctrip Vacation Data Governance
DataFunTalk
DataFunTalk
Dec 9, 2021 · Big Data

Mobile Cloud LakeHouse: Cloud‑Native Big Data Analytics Architecture and Practices

This article introduces the cloud‑native LakeHouse solution from China Mobile Cloud, covering its lake‑warehouse integration concept, overall architecture, core functions such as storage‑compute separation, one‑click data ingestion, intelligent metadata discovery, serverless execution, JDBC support, incremental updates, and typical application scenarios in public and private clouds.

Big DataCloud NativeData Integration
0 likes · 17 min read
Mobile Cloud LakeHouse: Cloud‑Native Big Data Analytics Architecture and Practices
Efficient Ops
Efficient Ops
Dec 6, 2021 · Operations

How Scenario‑Based AIOps Transforms IT Operations: Insights from GOPS 2023

The article summarizes a GOPS conference presentation by Dingmao Technology on AIOps scenario‑driven construction, detailing challenges, definition of scenarios, technical methods, roadmap planning, and future prospects, while showcasing practical examples and supporting technologies for intelligent IT operations.

Data IntegrationIT OperationsScenario-based
0 likes · 8 min read
How Scenario‑Based AIOps Transforms IT Operations: Insights from GOPS 2023
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration
0 likes · 29 min read
Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough
DataFunTalk
DataFunTalk
Sep 11, 2021 · Cloud Computing

Industrial Data Cloud Migration: Architecture, Core Technologies, and Case Studies with Alibaba Cloud IoT

This article explains the background, challenges, overall architecture, core technology optimizations, edge‑computing integration, data modeling, serialization, and real‑world case studies of moving industrial IoT data to Alibaba Cloud, illustrating how cloud‑native solutions enable digital transformation in manufacturing.

Big DataData IntegrationDigital Transformation
0 likes · 16 min read
Industrial Data Cloud Migration: Architecture, Core Technologies, and Case Studies with Alibaba Cloud IoT
Architects' Tech Alliance
Architects' Tech Alliance
Sep 2, 2021 · Big Data

Core Technologies and Architecture of a Big Data Platform

The article outlines a typical big data platform architecture, detailing its core layers—data collection, storage and analysis, sharing, application, real-time computation, and task scheduling—while describing key technologies such as Flume, DataX, HDFS, Hive, Spark, Spark Streaming, and Redis.

Data ArchitectureData IntegrationHadoop
0 likes · 9 min read
Core Technologies and Architecture of a Big Data Platform
Big Data Technology Architecture
Big Data Technology Architecture
Aug 17, 2021 · Big Data

Detailed Overview of Flink CDC 2.0: Architecture, Features, and Future Roadmap

This article provides an in‑depth technical overview of Flink CDC 2.0, covering its CDC fundamentals, comparison of query‑based and log‑based approaches, the new lock‑free chunk algorithm, FLIP‑27 based parallel snapshot reading, performance benchmarks, documentation improvements, and future roadmap for stability and ecosystem integration.

Change Data CaptureData IntegrationDebezium
0 likes · 16 min read
Detailed Overview of Flink CDC 2.0: Architecture, Features, and Future Roadmap
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 28, 2021 · Big Data

Understanding Customer Data Platforms (CDP): Why They’re Needed and How to Build One

The article explains what a Customer Data Platform (CDP) is, why businesses need it to overcome fragmented multi‑channel data, enable fine‑grained operations and data‑driven growth, and outlines the key steps for building a CDP, including data collection, OneID unification, tagging, lifecycle management, and marketing execution.

CDPData IntegrationUser Tagging
0 likes · 10 min read
Understanding Customer Data Platforms (CDP): Why They’re Needed and How to Build One
DataFunTalk
DataFunTalk
Jul 7, 2021 · Big Data

Solving Data Island Challenges and Enabling Advanced OLAP Analysis on Heterogeneous Big Data Platforms – Kyligence Solution Overview

This article explains the growing analytical demands in the big‑data era, the limitations of traditional OLAP, and how Kyligence’s distributed OLAP engine addresses data‑island issues, multi‑dimensional and many‑to‑many analysis, unified security, and performance optimization with MDX on Spark, delivering a seamless Excel‑like experience.

AnalyticsBig DataData Integration
0 likes · 9 min read
Solving Data Island Challenges and Enabling Advanced OLAP Analysis on Heterogeneous Big Data Platforms – Kyligence Solution Overview
Laravel Tech Community
Laravel Tech Community
Jul 1, 2021 · Frontend Development

How to Build a Visual Dashboard (Large Screen) Using FineReport

This article walks through the complete process of creating a high‑impact visual dashboard for large‑screen displays using FineReport, covering tool selection, data preparation, report creation, design, visual polishing, and adding dynamic effects.

DashboardData IntegrationFineReport
0 likes · 5 min read
How to Build a Visual Dashboard (Large Screen) Using FineReport
Code Ape Tech Column
Code Ape Tech Column
Jul 1, 2021 · Backend Development

Master Spring Batch: Core Concepts, Architecture, and Practical Tips

This article provides a comprehensive guide to Spring Batch, covering its purpose, architecture, core components such as Job, Step, ItemReader/Writer/Processor, chunk processing, skip strategies, configuration tips, and common memory issues, all illustrated with code examples and diagrams.

Batch ProcessingChunk ProcessingData Integration
0 likes · 19 min read
Master Spring Batch: Core Concepts, Architecture, and Practical Tips
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 21, 2021 · Big Data

What Is a Big Data Platform and How to Design Its Architecture?

This article explains what a big data platform is, outlines its seven‑component overall architecture, details the technical stack from data sources to applications, and describes the key subsystems such as catalog management, data integration, governance, storage, processing, sharing, development, and analysis.

Data GovernanceData IntegrationDistributed Systems
0 likes · 11 min read
What Is a Big Data Platform and How to Design Its Architecture?
DataFunTalk
DataFunTalk
Jun 5, 2021 · Big Data

Building and Evolving a Data Service Platform for NetEase Cloud Music

The article details how NetEase Cloud Music co‑built a unified data service platform with NetEase YouShu, describing its architecture, phased development from internal use to online high‑concurrency services, feature enhancements such as API marketplace, multi‑source support, parameter conversion, and future roadmap for broader data products.

API PlatformBackendBig Data
0 likes · 16 min read
Building and Evolving a Data Service Platform for NetEase Cloud Music
IT Architects Alliance
IT Architects Alliance
May 30, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's Flink‑based streaming ETL system, detailing business background, log classifications, specialized and generic ETL services, Python UDF integration, runtime optimizations, HDFS write tuning, SLA metrics, fault‑tolerance mechanisms, and future roadmap for unified data lakes and PyFlink support.

Big DataData IntegrationETL
0 likes · 19 min read
NetEase Game Streaming ETL Architecture and Practices Based on Flink
DataFunTalk
DataFunTalk
Apr 23, 2021 · Big Data

Building and Evolving Zhihu’s Flink‑Based Data Integration Platform

This article details Zhihu’s transition from a Sqoop‑driven data integration system to a Flink‑centric platform, covering business scenarios, historical architecture, design goals, technology choices, performance optimizations, and future plans for unified streaming‑batch processing across diverse storage systems.

Batch ProcessingBig DataData Integration
0 likes · 14 min read
Building and Evolving Zhihu’s Flink‑Based Data Integration Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 2, 2021 · Big Data

An Introduction to Kafka Connect: Architecture, Components, and Hands‑On Setup

This article introduces Kafka Connect, explaining its purpose as a scalable and reliable tool for moving data between Apache Kafka and external systems, detailing its core concepts, architecture, deployment modes, configuration files, and a step‑by‑step example that streams data from a file source to a file sink.

Data IntegrationETLStreaming
0 likes · 12 min read
An Introduction to Kafka Connect: Architecture, Components, and Hands‑On Setup
DataFunSummit
DataFunSummit
Dec 13, 2020 · Big Data

Data Services: Definition, Value, Lifecycle, Classification and Construction Guidelines

The article explains how traditional point‑to‑point data integration leads to data quality, consistency and cost issues, introduces the concept of data services as a unified, reusable way to provide data, outlines their benefits, lifecycle stages, classification into data‑set and API services, and presents Huawei’s practical construction strategy and the “Three‑1s” supply‑chain goals.

Data Integrationdata servicesservice lifecycle
0 likes · 23 min read
Data Services: Definition, Value, Lifecycle, Classification and Construction Guidelines
Architects Research Society
Architects Research Society
Dec 13, 2020 · Backend Development

Understanding Ballerina’s Native Data Types, Parallel Processing, and Development Tools

This article introduces Ballerina’s unique language features, including native XML/JSON data types, datatable handling, inline definitions, parallel processing with workers and fork‑join, and the comprehensive development toolset such as Composer, Testerina, connectors, and editor plugins, illustrating code examples throughout.

Backend DevelopmentBallerinaData Integration
0 likes · 11 min read
Understanding Ballerina’s Native Data Types, Parallel Processing, and Development Tools
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 28, 2020 · Big Data

ETL Fundamentals and Introduction to Kettle (Pentaho Data Integration)

This article provides an in-depth overview of ETL concepts, including extraction, transformation, loading, data warehouse architecture, and detailed discussion of Kettle (Pentaho Data Integration) features, design principles, components, transformations, jobs, database connections, metadata management, and practical examples for building robust data integration pipelines.

Data IntegrationData WarehouseETL
0 likes · 57 min read
ETL Fundamentals and Introduction to Kettle (Pentaho Data Integration)
DataFunTalk
DataFunTalk
Nov 17, 2020 · Artificial Intelligence

Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, explains its core algorithms, performance comparison with Spark ML, version‑wise feature evolution, and provides practical quick‑start instructions for both Java (Maven) and Python (PyAlink) users, including data source handling, type conversion components, unified file‑system operations, and an overview of its FM algorithm implementation.

AlinkBatch ProcessingData Integration
0 likes · 13 min read
Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide
360 Tech Engineering
360 Tech Engineering
Nov 6, 2020 · Big Data

Guide to Flink SQL: Features, Scenarios, and Productization

Flink SQL, the high‑level SQL interface for Apache Flink, offers language‑independent, dependency‑free, easy‑to‑use stream processing with advanced features such as DDL, UDFs, time semantics, windowing, pattern matching, and built‑in connectors, supporting data synchronization, batch‑stream fusion, Hive integration, and various product enhancements.

Data IntegrationFlinkHive
0 likes · 11 min read
Guide to Flink SQL: Features, Scenarios, and Productization
Architects Research Society
Architects Research Society
Aug 20, 2020 · Big Data

Differences Between Talend and Pentaho ETL Tools

The article explains the fundamentals of ETL, compares Talend and Pentaho in terms of openness, connectivity, support, performance, GUI usability, deployment flexibility, and cost, and concludes with guidance on choosing the appropriate tool based on specific business and technical requirements.

ComparisonData IntegrationETL
0 likes · 7 min read
Differences Between Talend and Pentaho ETL Tools
Qunar Tech Salon
Qunar Tech Salon
Jun 3, 2020 · Fundamentals

Optimizing International Hotel Data Aggregation Algorithms at Qunar

The article outlines Qunar’s challenges in aggregating international hotel data, analyzes issues such as localized address formats and limited text similarity parsing, and presents a pattern‑matching and weighted scoring approach that improves aggregation accuracy across multiple countries.

Algorithm OptimizationData Integrationhotel aggregation
0 likes · 7 min read
Optimizing International Hotel Data Aggregation Algorithms at Qunar
Meituan Technology Team
Meituan Technology Team
May 28, 2020 · Big Data

Design and Implementation of Meituan Delivery A/B Testing Platform and Evaluation System

The article details Meituan Delivery’s A/B testing platform and evaluation system, explaining its closed‑loop design, multi‑strategy traffic allocation with AA grouping, comprehensive metric hierarchy, statistical rigor, data integration, and implementation architecture, and outlines future tools for traffic‑volume recommendation.

A/B testingData IntegrationMetrics
0 likes · 20 min read
Design and Implementation of Meituan Delivery A/B Testing Platform and Evaluation System
Big Data Technology & Architecture
Big Data Technology & Architecture
May 20, 2020 · Big Data

Technical Overview of Real-time Data Platform (RTDP) Architecture and Component Selection

This article presents a comprehensive technical overview of the Real-time Data Platform (RTDP), detailing its overall architecture, component selection—including DBus, Kafka, Wormhole, Moonbox, and Davinci—design philosophies, functional features, and various deployment patterns such as synchronous, stream-processing, rotation, and intelligent modes.

Data GovernanceData Integration
0 likes · 26 min read
Technical Overview of Real-time Data Platform (RTDP) Architecture and Component Selection