Tagged articles

3675 articles

Page 19 of 37

Dec 29, 2021 · Fundamentals

Collection of System Architecture Templates and Diagrams

This article presents a series of downloadable system architecture templates covering DMP, blockchain, data quality governance, enterprise technology, data architecture, Xelerator, alarm platform, microservices, front‑back separation, and a generic architecture, each illustrated with descriptive diagrams and brief explanations.

Big DataBlockchainSystem Architecture

0 likes · 5 min read

Collection of System Architecture Templates and Diagrams

Tencent Cloud Developer

Dec 28, 2021 · Industry Insights

How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses

This article analyzes the challenges of massive data query efficiency, explains how Flink's stream processing and ClickHouse's OLAP engine complement each other, and presents a layered real‑time data‑warehouse architecture with practical guidance on data ingestion, write strategies, quality assurance, and evolving batch‑stream integration patterns.

Big DataClickHouseFlink

0 likes · 19 min read

How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses

DataFunSummit

Dec 28, 2021 · Artificial Intelligence

Deep Application‑Driven Construction of Medical Knowledge Graphs: Methods, Models, and Case Studies

This article presents a comprehensive overview of medical knowledge graph development, covering global and domestic progress, domain characteristics, a six‑step construction workflow—including schema design, ontology term set creation, and graph building—and showcases practical applications such as intelligent alerts, guideline recommendations, and data direct reporting.

Big DataData IntegrationHealthcare

0 likes · 11 min read

Deep Application‑Driven Construction of Medical Knowledge Graphs: Methods, Models, and Case Studies

Architecture Digest

Dec 28, 2021 · Big Data

HDFS Overview: Architecture, Features, Data Management and Storage Policies

This article provides a comprehensive overview of HDFS, covering basic file system concepts, HDFS architecture, high availability, federation, replica placement, storage policies, colocation, data integrity, and key design considerations for large‑scale distributed storage.

Big DataColocationDistributed File System

0 likes · 23 min read

HDFS Overview: Architecture, Features, Data Management and Storage Policies

Big Data Technology & Architecture

Dec 28, 2021 · Big Data

Comprehensive Guide to Spark SQL: Concepts, DataSet/DataFrame, Functions, Optimization and Common Pitfalls

This article provides an in‑depth overview of Spark SQL, covering its architecture, DataSet/DataFrame creation, DSL and SQL usage, integration with Hive, custom UDF/UDAF/Aggregator implementations, handling of small files, Cartesian product detection, and a catalog of useful built‑in functions and window operations.

Big DataDatasetSpark SQL

0 likes · 29 min read

Comprehensive Guide to Spark SQL: Concepts, DataSet/DataFrame, Functions, Optimization and Common Pitfalls

Su San Talks Tech

Dec 28, 2021 · Big Data

What Makes Kafka the Backbone of Real‑Time Big Data Processing?

This article provides a comprehensive overview of Apache Kafka, covering its distributed architecture, key advantages and drawbacks, the role of ZooKeeper, message delivery semantics, partitioning strategies, storage mechanisms, and performance optimizations such as zero‑copy and batch processing, all essential for high‑throughput real‑time data pipelines.

Big DataDistributed MessagingStreaming

0 likes · 23 min read

What Makes Kafka the Backbone of Real‑Time Big Data Processing?

DataFunTalk

Dec 27, 2021 · Big Data

Comprehensive Big Data Interview Q&A: Hadoop, Spark, Kafka, Hive, and Related Technologies

This article presents a detailed interview-style walkthrough covering Hadoop cluster setup, HDFS components, MapReduce workflow, YARN advantages, Spark fundamentals, Kafka replication, Hive table types, and related big‑data concepts, providing concise explanations and practical insights for data engineers.

Big DataHadoopKafka

0 likes · 20 min read

Comprehensive Big Data Interview Q&A: Hadoop, Spark, Kafka, Hive, and Related Technologies

DataFunTalk

Dec 25, 2021 · Artificial Intelligence

Optimizing Spark‑ML Linear Models with Project Matrix: Background, Progress, and Future Plans

This article introduces the Project Matrix initiative that re‑examines and restructures Spark‑ML linear models, detailing the background of Spark‑ML usage at JD, the performance‑focused optimizations such as blockification and virtual centering, and outlines upcoming work to further improve scalability and accuracy.

Big DataPerformance OptimizationSpark

0 likes · 9 min read

Optimizing Spark‑ML Linear Models with Project Matrix: Background, Progress, and Future Plans

Big Data Technology & Architecture

Dec 24, 2021 · Big Data

Key Updates and New Features in Apache Flink 1.14.2 Release

The Apache Flink 1.14.2 release, launched on December 16, fixes a critical Log4j vulnerability, resolves OOM issues with the Pulsar connector, introduces numerous Table API, DataStream API, connector, and checkpoint enhancements, deprecates several legacy APIs, and drops support for Apache Mesos, while also promoting related PDF resources.

Apache FlinkBig DataCheckpoints

0 likes · 8 min read

Key Updates and New Features in Apache Flink 1.14.2 Release

Dada Group Technology

Dec 24, 2021 · Databases

StarRocks Practice Experience and SQL Optimization Cases in JD Daojia Data Platform

This article presents JD Daojia's data platform built on StarRocks, detailing business background, challenges such as pagination inconsistency, case‑when performance, and array‑type issues, and provides concrete SQL solutions, best‑practice tips, and future optimization directions.

Big DataDatabase PerformanceSQL optimization

0 likes · 11 min read

StarRocks Practice Experience and SQL Optimization Cases in JD Daojia Data Platform

AntTech

Dec 23, 2021 · Databases

Understanding Graph Computing: Fundamentals, Applications, and Future Directions

This article explains graph computing fundamentals, illustrates its use in fraud detection, search ranking, and brain modeling, highlights Ant Group's record‑breaking performance and standards efforts, and outlines future challenges such as standardization, higher performance, and integration with AI.

Artificial IntelligenceBig DataPerformance

0 likes · 13 min read

Understanding Graph Computing: Fundamentals, Applications, and Future Directions

DataFunTalk

Dec 23, 2021 · Big Data

Building an Advertising Data Platform on ClickHouse: Architecture, Challenges, and Practices

This article details the design and implementation of an advertising data platform at eBay, explaining the business scenario, why ClickHouse was chosen over alternatives, the technical challenges faced, and the solutions involving lambda architecture, table engine choices, compression techniques, data ingestion pipelines, consistency guarantees, and deployment practices.

AdvertisingBig DataClickHouse

0 likes · 26 min read

Building an Advertising Data Platform on ClickHouse: Architecture, Challenges, and Practices

Big Data Technology & Architecture

Dec 23, 2021 · Big Data

Key Spark Configuration Parameters and Their Explanations

This article presents a comprehensive list of essential Spark configuration settings—including executor memory, off‑heap memory, memory fractions, shuffle options, and adaptive query execution parameters—each accompanied by a concise description to help users fine‑tune Spark performance.

Adaptive Query ExecutionBig DataMemory Management

0 likes · 6 min read

Key Spark Configuration Parameters and Their Explanations

DataFunSummit

Dec 22, 2021 · Big Data

Data Governance Practices and Experiences at NetEase Cloud Music

This article details NetEase Cloud Music's comprehensive data governance journey, covering data warehouse architecture, data standards, event tracking (埋点) governance, asset lifecycle management, and future automation plans, illustrating how systematic governance improves data quality, cost efficiency, and business insight.

Big DataData Governancedata-warehouse

0 likes · 21 min read

Data Governance Practices and Experiences at NetEase Cloud Music

Youzan Coder

Dec 22, 2021 · Big Data

Why DP Switched from Airflow to DolphinScheduler: A Deep Dive into Scaling the Data Platform

The article examines DP's rapid growth in daily scheduled tasks, outlines the limitations of its Airflow‑based scheduler, compares Airflow with DolphinScheduler, and details the architectural redesign, migration steps, and future plans for a more scalable, reliable big‑data workflow system.

AirflowBig DataData Platform

0 likes · 13 min read

Why DP Switched from Airflow to DolphinScheduler: A Deep Dive into Scaling the Data Platform

Big Data Technology & Architecture

Dec 22, 2021 · Big Data

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

This article explains Change Data Capture (CDC), compares query‑based and log‑based approaches, introduces Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including Java source, deserialization, sink code and required Maven dependencies—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCClickHouse

0 likes · 14 min read

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

Architects Research Society

Dec 21, 2021 · Fundamentals

Next-Generation Master Data Management (MDM): Architecture, Business Value, and Technical Challenges

This article explains master data management concepts, regulatory drivers, business benefits, key technical challenges, architectural trends such as graph databases and machine learning, and highlights leading vendors, providing a comprehensive overview for enterprises seeking modern MDM solutions.

AnalyticsBig DataData Governance

0 likes · 9 min read

Next-Generation Master Data Management (MDM): Architecture, Business Value, and Technical Challenges

DataFunTalk

Dec 21, 2021 · Artificial Intelligence

Personalized Federated Learning and AI for Drug Discovery: Challenges, Applications, and Cloud Solutions

This talk by Huawei senior engineer Xu Chi explores the challenges of drug screening, AI-driven drug discovery practices, and how personalized federated learning combined with Huawei Cloud's high‑performance computing accelerates pharmaceutical research, including case studies, platform services, and collaborative efforts.

AIBig DataCloud Computing

0 likes · 11 min read

Personalized Federated Learning and AI for Drug Discovery: Challenges, Applications, and Cloud Solutions

Big Data Technology & Architecture

Dec 21, 2021 · Big Data

Understanding Spark 3.0 Adaptive Query Execution (AQE) and Dynamic Partition Pruning (DPP)

This article explains the two most important Spark 3.0 features—Adaptive Query Execution and Dynamic Partition Pruning—detailing how AQE dynamically optimizes join strategies, partition coalescing, and skew handling, while DPP reduces I/O by pruning irrelevant fact‑table partitions at runtime.

Adaptive Query ExecutionBig DataDynamic Partition Pruning

0 likes · 10 min read

Understanding Spark 3.0 Adaptive Query Execution (AQE) and Dynamic Partition Pruning (DPP)

HelloTech

Dec 20, 2021 · Big Data

Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization

Hello Mobility unified its fragmented ElasticSearch clusters into a single, real‑time search platform—leveraging Kafka‑driven CDC, Flink stream processing, custom ES plugins, and extensive performance tuning—to deliver scalable matching, recommendation and voice services, ultimately raising completed orders by 49.8 % and driver acceptance by 37 %.

Big DataFlinkSearch Platform

0 likes · 19 min read

Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization

Architecture Digest

Dec 20, 2021 · Backend Development

Understanding Kafka: Core Design, Architecture, and Performance

This article explains Kafka’s fundamental design concepts—including topics, partitions, replicas, consumer groups, and its network architecture—while highlighting performance features such as sequential writes, zero‑copy, log segmentation, and how the controller coordinates with ZooKeeper, providing a comprehensive overview for backend developers.

Big DataKafkaMessage Queue

0 likes · 12 min read

Understanding Kafka: Core Design, Architecture, and Performance

Big Data Technology & Architecture

Dec 19, 2021 · Big Data

Understanding Spark Catalyst and Tungsten Optimizations in Spark SQL

This article explains how Spark SQL's Catalyst optimizer performs logical and physical planning, details the Tungsten engine's data‑structure and whole‑stage code generation improvements, compares them with the Volcano iterator model, and provides code examples and PDF resources for deeper study.

Big DataCatalystSQL optimization

0 likes · 12 min read

Understanding Spark Catalyst and Tungsten Optimizations in Spark SQL

DataFunSummit

Dec 18, 2021 · Big Data

Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP

The Fast OLAP Forum held on December 19 at DataFunCon gathers leading experts from Baidu, Tencent, JD, and FreeWheel to share cutting‑edge techniques in vectorized execution, cloud‑native ClickHouse, large‑scale OLAP architectures, and Presto optimizations, offering deep insights for practitioners dealing with massive real‑time data workloads.

Apache DorisBig DataClickHouse

0 likes · 7 min read

Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP

Big Data Technology & Architecture

Dec 18, 2021 · Big Data

Slowly Changing Dimensions (SCD) – Design Principles, Challenges, and Hive Implementation

This article explains the concept of Slowly Changing Dimensions (SCD), discusses practical design questions, compares three change‑tracking requirements, presents three implementation patterns, and provides detailed Hive/SQL examples for historical data initialization and incremental updates in large‑scale data warehouses.

Big DataSCDdata-warehouse

0 likes · 20 min read

Slowly Changing Dimensions (SCD) – Design Principles, Challenges, and Hive Implementation

Xueersi Online School Tech Team

Dec 17, 2021 · Databases

An Overview of HBase: Architecture, Data Model, and Use Cases

This article provides a comprehensive overview of HBase, covering its origins, column‑family storage model, key components such as HMaster and HRegionServer, data‑location process, row‑key design strategies, practical use cases, and comparisons with relational and other NoSQL databases.

Big DataHBaseNoSQL

0 likes · 14 min read

An Overview of HBase: Architecture, Data Model, and Use Cases

Taobao Frontend Technology

Dec 16, 2021 · Artificial Intelligence

How Virtual Digital Humans Are Shaping the Future of Entertainment and Tech

This article defines virtual characters, outlines their market growth and industry chain, showcases leading products and solutions, and details the technical research—including AI-driven animation, rendering pipelines, scene orchestration, and big‑data algorithms—being pursued by Alibaba's front‑end team.

AIBig DataGame Development

0 likes · 12 min read

How Virtual Digital Humans Are Shaping the Future of Entertainment and Tech

Ctrip Technology

Dec 16, 2021 · Big Data

Data Standard Management Practices in Ctrip Vacation Data Governance

This article outlines Ctrip Vacation's data standard management approach, covering why standards are needed, the three‑element framework of scope, tools, and policies, and detailed practices for data integration, production change handling, metadata governance, portal dashboard standardization, and self‑service query templating.

Big DataData GovernanceData Integration

0 likes · 12 min read

Data Standard Management Practices in Ctrip Vacation Data Governance

High Availability Architecture

Dec 16, 2021 · Big Data

iQIYI Basic Data Platform: Architecture, High Availability, and Service Practices

The iQIYI Basic Data Platform unifies internal data exchange standards, integrates massive multi‑business data, and implements high‑availability solutions for ID services, messaging, HBase storage, and read‑write scaling, showcasing practical engineering approaches to big‑data reliability and performance.

Big DataDistributed SystemsHBase

0 likes · 11 min read

iQIYI Basic Data Platform: Architecture, High Availability, and Service Practices

政采云技术

Dec 16, 2021 · Big Data

What Is Event Tracking (埋点) and Its Implementation in a Data Analysis System

This article explains the concept of event tracking (埋点), its importance for capturing user behavior, outlines the four‑module architecture of a tracking system, compares code‑based, visual and full tracking methods, describes data models, storage, management, and presents a practical case study with analysis techniques.

AnalyticsBackendBig Data

0 likes · 15 min read

What Is Event Tracking (埋点) and Its Implementation in a Data Analysis System

Big Data Technology & Architecture

Dec 16, 2021 · Big Data

Understanding Spark SQL Join Strategies, Catalyst Optimizer, and Tungsten for Big Data Processing

This article explains Spark SQL join classifications, the mechanics of Nested Loop Join, Sort‑Merge Join, and Hash Join, and describes how the Catalyst optimizer and Tungsten project improve query execution and memory efficiency in large‑scale data environments.

Big DataCatalystJOIN

0 likes · 9 min read

Understanding Spark SQL Join Strategies, Catalyst Optimizer, and Tungsten for Big Data Processing

Liangxu Linux

Dec 15, 2021 · Fundamentals

Cracking the 4‑Billion QQ Deduplication Challenge with 1 GB Memory

This article walks through four approaches—sorting, hashmap, file splitting, and a bitmap technique—to deduplicate 4 billion QQ numbers within a 1 GB memory limit, explains why the first three fail, and shows how a bitmap solves the problem efficiently.

Big DataMemory Optimizationalgorithm

0 likes · 8 min read

Cracking the 4‑Billion QQ Deduplication Challenge with 1 GB Memory

JD Cloud Developers

Dec 15, 2021 · Big Data

How JD Retail Scales Billion‑Item Selection with ClickHouse & Elasticsearch

This article details JD Retail's strategic "Nirvana" product‑selection platform, describing the technical challenges of handling billions of items and hundreds of tags, and presenting a dual‑engine solution using ClickHouse and Elasticsearch with Spark‑driven data pipelines to achieve fast filtering, multidimensional analytics, and efficient storage.

Big DataClickHouseElasticsearch

0 likes · 15 min read

How JD Retail Scales Billion‑Item Selection with ClickHouse & Elasticsearch

Big Data Technology & Architecture

Dec 15, 2021 · Big Data

Understanding Spark DataFrames: Creation Methods, Optimizations, and Common Operations

This article explains the origins of Spark DataFrames, compares them with RDDs, describes how Spark SQL optimizes DataFrame execution, and provides detailed examples of creating DataFrames from RDDs, files, and JDBC sources along with common DataFrame operations and code snippets.

Big DataScalaSpark

0 likes · 10 min read

Understanding Spark DataFrames: Creation Methods, Optimizations, and Common Operations

DataFunSummit

Dec 14, 2021 · Big Data

Data Map: Background, Definition, and Youzan’s Practical Implementation

This article introduces the concept of a data map, explains its background and goals, describes Youzan’s end‑to‑end data‑map practice—including full data lineage, search, management, link analysis, impact estimation, and optimization—and concludes with a summary and future outlook.

Big DataData GovernanceData Lineage

0 likes · 16 min read

Data Map: Background, Definition, and Youzan’s Practical Implementation

Tencent Cloud Developer

Dec 13, 2021 · Cloud Computing

Trends in the Internet of Things and the Differentiated Development Path of Tencent Cloud IoT

Zhou Jiaxin explains that as IoT moves from universal connectivity to intelligent integration, Tencent Cloud IoT’s “Lianlian” platform tackles cost, efficiency and ecosystem gaps by embedding content, AI, big‑data and WeChat services into eight modular solutions, enabling rapid, cross‑industry smart applications.

AIBig DataCloud Computing

0 likes · 13 min read

Trends in the Internet of Things and the Differentiated Development Path of Tencent Cloud IoT

Top Architect

Dec 13, 2021 · Big Data

Design and Implementation of BanYu's Big Data Access Control System

This article describes the evolution from an unsecured data warehouse to a comprehensive big‑data access control system at BanYu, detailing the background, data access methods, design goals, authentication and authorization mechanisms, policy configuration, integration with Metabase, and the overall workflow that balances security with efficiency.

Big DataLDAPPresto

0 likes · 15 min read

Design and Implementation of BanYu's Big Data Access Control System

Python Crawling & Data Mining

Dec 13, 2021 · Big Data

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

This article explains several algorithmic strategies—including sorting, hash maps, file splitting, and bitmap techniques—to remove duplicates from a file containing 4 billion QQ numbers while staying within a 1 GB memory limit, and it provides extension exercises for sorting, median, top‑K, and duplicate detection.

Big DataMemory Optimizationalgorithm

0 likes · 8 min read

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

Java Architect Essentials

Dec 11, 2021 · Information Security

Protecting Mobile Privacy in the Big Data Era: Risks of Data Leakage and How to Stay Safe

In today's big‑data era, excessive stress leads many to seek relief through risky online activities, but unauthorized app permissions and visits to dubious sites can expose personal information, so users must stay vigilant, limit permissions, avoid harmful sites, and use security tools to protect their mobile privacy.

Big DataMobile Securitydata leakage

0 likes · 6 min read

Protecting Mobile Privacy in the Big Data Era: Risks of Data Leakage and How to Stay Safe

IT Architects Alliance

Dec 11, 2021 · Big Data

Design and Implementation of Banyu's Big Data Permission System

This article describes the background, design goals, authentication and authorization mechanisms, system architecture, policy configuration, and Metabase integration of Banyu's big data permission system, which secures Hive, Presto, HDFS and other data access components using Apache Ranger and LDAP.

Apache RangerBig DataLDAP

0 likes · 14 min read

Design and Implementation of Banyu's Big Data Permission System

JD Retail Technology

Dec 10, 2021 · Industry Insights

How JD Retail Cloud’s CRM Turned a Convenience Store Chain into a Data‑Driven Growth Engine

The award‑winning JD Retail Cloud store‑CRM solution helped the Haolinju convenience‑store chain overcome fragmented membership systems by rebuilding user data, applying big‑data algorithms and marketing automation, which boosted precise‑marketing ROI by 19% and increased purchase frequency by 0.7 per customer.

Big DataCRMCloud Computing

0 likes · 6 min read

How JD Retail Cloud’s CRM Turned a Convenience Store Chain into a Data‑Driven Growth Engine

DataFunTalk

Dec 10, 2021 · Big Data

Building and Evolving NetEase Yanxuan Real-Time Computing Platform: Architecture, SQLization, Serviceization, and Data Governance

This article details NetEase Yanxuan's real-time computing platform development from 2017 to present, covering its architecture, Flink‑SQL development environment, service‑oriented deployment, resource optimization, cloud‑native migration, comprehensive data governance, and future plans for stream‑batch integration and intelligent job diagnostics.

Big DataCloud NativeData Governance

0 likes · 14 min read

Building and Evolving NetEase Yanxuan Real-Time Computing Platform: Architecture, SQLization, Serviceization, and Data Governance

DataFunSummit

Dec 10, 2021 · Big Data

Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance

This article details NetEase Yanxuan's evolution of a real‑time data platform from 2017 to present, covering background, current scale, layered architecture, Flink‑SQL development IDE, service‑oriented task execution, resource‑optimizing deployment modes, cloud‑native migration, comprehensive data governance, and future batch‑stream integration plans.

Big DataCloud NativeData Governance

0 likes · 15 min read

Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance

Big Data Technology & Architecture

Dec 10, 2021 · Big Data

Integrating Apache Hudi with Flink CDC for Real‑Time Data Lake Solutions

This article explains how to integrate Apache Hudi with Flink CDC to build a near‑real‑time data lake, covering Hudi’s storage model, streaming primitives, version compatibility, Maven setup, SQL table definitions, data flow from MySQL through Kafka, and practical troubleshooting tips.

Apache HudiBig DataData Integration

0 likes · 18 min read

Integrating Apache Hudi with Flink CDC for Real‑Time Data Lake Solutions

NiuNiu MaTe

Dec 10, 2021 · Big Data

Finding Common URLs in Billions of Records: From Naive Search to Hash Divide‑and‑Conquer

This article walks through a classic interview problem of locating common URLs in two massive files, explains why a naïve O(m·n) approach fails, and presents a scalable solution using hash‑based divide‑and‑conquer and file partitioning techniques.

Big Datadivide and conquerhash algorithm

0 likes · 6 min read

Finding Common URLs in Billions of Records: From Naive Search to Hash Divide‑and‑Conquer

21CTO

Dec 9, 2021 · Big Data

Designing a Scalable Big Data Permission System: From Hive to Metabase

BanYu’s early data warehouse lacked any access controls, prompting the creation of a comprehensive big‑data permission system that integrates authentication and authorization across Hive, Presto, HDFS, and Metabase using LDAP, Ranger policies, workflow automation, and both synchronous and asynchronous policy initialization.

AuthorizationBig DataLDAP

0 likes · 16 min read

Designing a Scalable Big Data Permission System: From Hive to Metabase

DataFunTalk

Dec 9, 2021 · Big Data

Mobile Cloud LakeHouse: Cloud‑Native Big Data Analytics Architecture and Practices

This article introduces the cloud‑native LakeHouse solution from China Mobile Cloud, covering its lake‑warehouse integration concept, overall architecture, core functions such as storage‑compute separation, one‑click data ingestion, intelligent metadata discovery, serverless execution, JDBC support, incremental updates, and typical application scenarios in public and private clouds.

Big DataCloud NativeData Integration

0 likes · 17 min read

Mobile Cloud LakeHouse: Cloud‑Native Big Data Analytics Architecture and Practices

Architects' Tech Alliance

Dec 8, 2021 · Cloud Computing

Future Network Architecture and Emerging Data Center Technologies in the New Infrastructure Era

The article examines the concept of "new infrastructure" in China, outlines the evolution of the Internet toward a third generation, discusses candidate future network architectures such as SDN, CCN, and XIA, and reviews emerging data‑center networking technologies like large‑scale L2, VXLAN, virtual switching, and large‑scale switching, highlighting their role in supporting AI, big data, and cloud computing workloads.

AIBig DataSDN

0 likes · 12 min read

Future Network Architecture and Emerging Data Center Technologies in the New Infrastructure Era

Alimama Tech

Dec 8, 2021 · Big Data

Marketing Channel Attribution Models and Conversion Effectiveness Evaluation

Effective marketing budget allocation relies on robust channel attribution models that combine dimensions, metrics, and segmentation with rule‑based or data‑driven (Shapley) credit assignment across defined attribution windows, enabling multi‑touch analysis, conversion‑time insights, and ROI‑focused channel performance evaluation.

Big DataROIattribution model

0 likes · 16 min read

Marketing Channel Attribution Models and Conversion Effectiveness Evaluation

Big Data Technology & Architecture

Dec 8, 2021 · Big Data

Presto Overview, Architecture, and Query Optimization Techniques

This article introduces Presto, an open‑source MPP SQL engine, explains its coordinator‑worker architecture and connector model, and provides detailed storage, query, and join optimization strategies—including in‑memory parallelism, dynamic plan compilation, and practical SQL code examples—to achieve low‑latency, high‑performance analytics on big data.

Big DataPrestoquery optimization

0 likes · 7 min read

Presto Overview, Architecture, and Query Optimization Techniques

Open Source Linux

Dec 5, 2021 · Operations

Essential Skill Maps Every DevOps Engineer Should Master

This article compiles a series of visual skill maps covering DevOps, cloud computing, big data, security, architecture, and development practices, offering engineers a comprehensive roadmap to build and expand their technical knowledge across multiple domains.

Big DataCloud ComputingDevOps

0 likes · 3 min read

Essential Skill Maps Every DevOps Engineer Should Master

Big Data Technology & Architecture

Dec 4, 2021 · Big Data

Understanding Spark's BlockManager, MemoryStore, and DiskStore

This article explains Spark's storage architecture, detailing the roles and interactions of BlockManager, MemoryStore, and DiskStore, including their initialization, data management mechanisms, code implementations, and eviction strategies, to help readers grasp how Spark efficiently handles in‑memory and on‑disk data.

Big DataBlockManagerDiskStore

0 likes · 12 min read

Understanding Spark's BlockManager, MemoryStore, and DiskStore

Python Programming Learning Circle

Dec 4, 2021 · Big Data

Visualizing US Census Data with Datashader: A Step‑by‑Step Python Guide

This tutorial demonstrates how to load a large US census Parquet dataset, convert WebMercator coordinates, filter by geographic regions, and generate high‑resolution visualizations of population density and racial distribution using Python's Datashader library with various colormaps and export options.

Big DataGeospatialcensus

0 likes · 8 min read

Visualizing US Census Data with Datashader: A Step‑by‑Step Python Guide

Open Source Linux

Dec 3, 2021 · Big Data

How Big Data Tech Evolved: Lessons from Alibaba, JD, and Didi

This article traces the evolution of big data technologies from early concepts and Google research papers through the rise of Hadoop, examines the platform transformations of Alibaba, JD.com, and Didi, and offers practical stack‑selection guidance for medium‑ and small‑scale enterprises.

AlibabaBig DataDidi

0 likes · 17 min read

How Big Data Tech Evolved: Lessons from Alibaba, JD, and Didi

21CTO

Dec 2, 2021 · Fundamentals

Why China Is Betting on Open‑Source to Revitalize Its Software Industry

China's Ministry of Industry and Information Technology warns that the domestic software sector lags internationally and outlines a comprehensive plan—ranging from talent cultivation and open‑source community building to big‑data expansion—to transform the industry by 2025.

Big DataChinaOpen-source

0 likes · 5 min read

Why China Is Betting on Open‑Source to Revitalize Its Software Industry

Big Data Technology & Architecture

Dec 1, 2021 · Big Data

Understanding Spark Shuffle: Mechanisms, Evolution, and Optimization

This article provides a comprehensive overview of Spark's shuffle process, explaining its definition, internal mechanisms such as shuffle write and read, the evolution of shuffle managers, and practical optimization techniques including parameter tuning and broadcast variables, all aimed at improving performance in large‑scale data processing.

Big DataShuffleShuffle Reader

0 likes · 18 min read

Understanding Spark Shuffle: Mechanisms, Evolution, and Optimization

Alimama Tech

Dec 1, 2021 · Big Data

Optimization Algorithms for Guaranteed Delivery Advertising in Double‑11 Interactive Campaign

During Double‑11, the team created two specialized allocation algorithms—a brand‑score‑driven primal‑dual method for guaranteed‑downline contracts and a guarantee‑and‑balance flow‑re‑ranking approach for guaranteed‑non‑downline contracts—both using near‑line dual adjustments to meet contract volumes while boosting interaction depth, repeat visits, and browsing time.

Allocation AlgorithmBig Datadual programming

0 likes · 14 min read

Optimization Algorithms for Guaranteed Delivery Advertising in Double‑11 Interactive Campaign

Big Data Technology & Architecture

Dec 1, 2021 · Big Data

Understanding Spark Core, RDD, and Scheduler Components: A Practical Guide

This article introduces Spark's core concepts, explains the RDD abstraction and its four main properties, and details the roles of DAGScheduler, SchedulerBackend, TaskScheduler, and ExecutorBackend, providing practical insights for beginners and intermediate users in big‑data processing.

Big DataDAGSchedulerRDD

0 likes · 9 min read

Understanding Spark Core, RDD, and Scheduler Components: A Practical Guide

Big Data Technology & Architecture

Nov 30, 2021 · Big Data

Curated Learning Resources for Big Data and Data Engineering

This article compiles a comprehensive list of Chinese-language articles and tutorials covering big‑data technologies such as Flink, Spark, Hive, ClickHouse, data governance, and related interview preparation resources, providing a structured learning path for aspiring data engineers.

Big DataClickHouseData Governance

0 likes · 4 min read

Curated Learning Resources for Big Data and Data Engineering

Alibaba Cloud Developer

Nov 30, 2021 · Big Data

Scaling Real‑Time Data Warehousing for Double‑11: Flink + Hologres in Action

During the 2021 Double‑11 shopping festival, logistics provider DiSiFang upgraded its real‑time data warehouse with Flink and Hologres, enabling multi‑billion‑row joins, cutting costs by 50%, and delivering stable, low‑latency analytics that powered high‑frequency dashboards and improved overall delivery speed.

Big DataCloud ComputingFlink

0 likes · 13 min read

Scaling Real‑Time Data Warehousing for Double‑11: Flink + Hologres in Action

Qunar Tech Salon

Nov 29, 2021 · Big Data

Construction and Practice of Qunar's Business Intelligence Platform

This article details the evolution, architecture, and technical choices of Qunar's BI platform—from early one‑stop reporting to a modular, self‑service system supporting real‑time analytics, multi‑metric calculations, and unified data governance—highlighting challenges, solutions, and performance benchmarks across big‑data technologies.

BIBig DataClickHouse

0 likes · 23 min read

Construction and Practice of Qunar's Business Intelligence Platform

Big Data Technology & Architecture

Nov 28, 2021 · Big Data

OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework

This article presents the OneData methodology for designing, standardizing, and governing a data warehouse, detailing background challenges, goals, industry references, core concepts, unified business and design consolidation, data modeling layers, naming conventions, data quality controls, and the resulting operational improvements and business value.

Big DataData GovernanceOnedata

0 likes · 20 min read

OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework

Big Data Technology & Architecture

Nov 28, 2021 · Big Data

Core Techniques of ID Mapping for Data Integration in Big Data Platforms

This article explains why ID mapping is essential for breaking data silos, describes traditional address cleaning and DSP scenarios, and details a graph‑based, six‑step process that builds a unified One‑ID dictionary to enable comprehensive user profiling and analytics in big‑data environments.

Big DataData IntegrationElasticsearch

0 likes · 9 min read

Core Techniques of ID Mapping for Data Integration in Big Data Platforms

DataFunSummit

Nov 28, 2021 · Big Data

Understanding Data Lakes: Definition, Architecture, Core Capabilities, and Comparison with Data Warehouses

The article explains what a data lake is, its architecture and core capabilities, compares it with data warehouses, discusses its value and challenges, and reviews major open‑source platforms such as Delta Lake, Iceberg, and Hudi.

AnalyticsBig DataData Architecture

0 likes · 11 min read

Understanding Data Lakes: Definition, Architecture, Core Capabilities, and Comparison with Data Warehouses

Big Data Technology & Architecture

Nov 28, 2021 · Big Data

Designing Hive Data Warehouse Schemas: Fact & Dimension Tables, Partitioning, Tag Aggregation, and ID Mapping

This article explains how to design Hive data warehouse schemas, covering fact and dimension table modeling, partitioned storage strategies, tag aggregation techniques, and ID‑mapping implementations using Hive SQL and UDFs to support user profiling and analytics.

Big DataETLID-Mapping

0 likes · 15 min read

Designing Hive Data Warehouse Schemas: Fact & Dimension Tables, Partitioning, Tag Aggregation, and ID Mapping

DataFunTalk

Nov 27, 2021 · Big Data

iQIYI Data Middle Platform: Architecture, Data Governance Practices, and Future Plans

The article details iQIYI’s data middle platform architecture and its comprehensive data governance practices, covering platform overview, data flow, unified standards, metadata management, production quality assurance, and future AI‑driven enhancements, illustrating how centralized data services improve reliability, efficiency, and security.

Big DataData GovernanceData Quality

0 likes · 27 min read

iQIYI Data Middle Platform: Architecture, Data Governance Practices, and Future Plans

dbaplus Community

Nov 27, 2021 · Big Data

How Vipshop’s Hera Data Service Boosts Big Data Access and Performance

The article details the design, architecture, core features, scheduling logic, and performance gains of Vipshop’s self‑built Hera data service, which unifies data‑warehouse access, supports multiple engines, adapts SQL execution, and dramatically improves SLA for both B‑to‑B and B‑to‑C workloads.

Big DataData ServiceETL

0 likes · 22 min read

How Vipshop’s Hera Data Service Boosts Big Data Access and Performance

Tencent Cloud Developer

Nov 26, 2021 · Big Data

WeChat's ClickHouse Real‑Time Data Warehouse: Challenges, Co‑Construction, and Performance Gains

Facing Hadoop’s minute‑to‑hour query latency on petabyte‑scale data, WeChat partnered with Tencent Cloud to build a ClickHouse‑based real‑time warehouse, adding custom ingestion, query‑optimisation and management tools that deliver billion‑row throughput, sub‑5‑second queries and over ten‑fold performance gains across millions of daily queries.

Big DataClickHouseCloud Native

0 likes · 9 min read

WeChat's ClickHouse Real‑Time Data Warehouse: Challenges, Co‑Construction, and Performance Gains

NiuNiu MaTe

Nov 26, 2021 · Big Data

How to Deduplicate 4 Billion QQ Numbers Using Only 1 GB of Memory

This article walks through four practical techniques—sorting, hashmap, file splitting, and bitmap—to remove duplicate QQ numbers from a 4‑billion‑record file within a 1 GB memory limit, and provides extended exercises for sorting, median, top‑K, and duplicate detection.

Big Dataalgorithmbitmap

0 likes · 8 min read

How to Deduplicate 4 Billion QQ Numbers Using Only 1 GB of Memory

StarRocks

Nov 24, 2021 · Big Data

Building a Scalable OLAP Platform at SF Express: StarRocks Evaluation and Lessons

SF Express’s data engineering team details how they migrated from a mixed‑component OLAP stack to a unified StarRocks platform, describing the evaluation criteria, performance‑critical design choices, import and query optimizations, and future roadmap for a high‑availability, low‑cost big‑data analytics solution.

Big DataOLAPPerformance Tuning

0 likes · 14 min read

Building a Scalable OLAP Platform at SF Express: StarRocks Evaluation and Lessons

Qunar Tech Salon

Nov 24, 2021 · Databases

Comprehensive Guide to Elasticsearch Index Design, Settings, and Mapping

This article provides a detailed guide on Elasticsearch index design, covering index settings, shard and replica planning, mapping strategies, complex types, lifecycle management, template usage, and practical best‑practice recommendations for large‑scale log data clusters.

Big DataElasticsearchMapping

0 likes · 27 min read

Comprehensive Guide to Elasticsearch Index Design, Settings, and Mapping

DataFunTalk

Nov 24, 2021 · Big Data

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Plans

This article presents Tencent's game big‑data analysis platform, detailing its background, the architecture of the iData engine—including offline multi‑dimensional analysis (TGMars), online portrait analysis (TGFace), and real‑time multi‑dimensional analysis (TGDruid)—application scenarios, performance insights, and future ecosystem and open‑source plans.

Big DataGame AnalyticsOLAP

0 likes · 15 min read

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Plans

DataFunTalk

Nov 23, 2021 · Big Data

ClickHouse Practice at Youzan: Architecture, Deployment, and Future Plans

This article details Youzan's adoption of ClickHouse for real-time analytics, covering its evolution from Presto, Druid, and Kylin, the system's architecture, deployment strategies, use cases, performance characteristics, limitations, and future roadmap, including integration with Apache Doris and emerging big‑data trends.

Big DataClickHouseOLAP

0 likes · 23 min read

ClickHouse Practice at Youzan: Architecture, Deployment, and Future Plans

Big Data Technology & Architecture

Nov 22, 2021 · Big Data

Comprehensive Big Data Learning Path and Resource Guide

This article presents a detailed learning roadmap for aspiring big‑data experts, covering foundational programming languages, data structures, Linux basics, databases, distributed system theory, and essential frameworks such as Hadoop, Spark, Flink, Kafka, and provides curated B‑site video links and reference materials.

Big DataFlinkHadoop

0 likes · 9 min read

Comprehensive Big Data Learning Path and Resource Guide

DataFunTalk

Nov 20, 2021 · Big Data

How to Build a Big Data Platform from Zero to One: Architecture, Components, and Best Practices

This article provides a comprehensive guide to designing and implementing a big‑data platform, covering architecture overview, data ingestion with Flume, storage on HDFS/Hive/HBase, processing engines such as Hive, Spark and Flink, scheduling solutions like Azkaban and Airflow, and the construction of self‑service analytics systems.

Big DataETLHadoop

0 likes · 29 min read

How to Build a Big Data Platform from Zero to One: Architecture, Components, and Best Practices

Big Data Technology & Architecture

Nov 20, 2021 · Big Data

Comprehensive Overview of Apache Flink Concepts, Mechanisms, and Interview Questions

This article provides an extensive technical guide to Apache Flink, covering its exactly‑once consumption guarantees, checkpoint and two‑phase commit mechanisms, differences from Spark, state backends, watermark handling, time semantics, window joins, CEP, backpressure, architecture layers, deployment, resource management, and common operational issues.

Big DataCEPCheckpoint

0 likes · 77 min read

Comprehensive Overview of Apache Flink Concepts, Mechanisms, and Interview Questions

Tongcheng Travel Technology Center

Nov 19, 2021 · Big Data

Real‑Time Data Warehouse Practices with Apache Kudu: Architecture, Partitioning, and Platformization

This article reviews the challenges of building a real‑time data warehouse, compares Lambda and Kappa architectures, introduces Apache Kudu’s master‑tablet design, storage model and partition strategies, and shares practical experiences and future directions for a Kudu‑based streaming analytics platform.

Apache KuduBig DataKappa architecture

0 likes · 8 min read

Real‑Time Data Warehouse Practices with Apache Kudu: Architecture, Partitioning, and Platformization

Shopee Tech Team

Nov 18, 2021 · Industry Insights

How Shopee Automates Southeast Asian Last‑Mile Sorting with AI and Big Data

This article analyzes the inefficiencies of Southeast Asian last‑mile logistics and explains Shopee's AI‑driven, data‑centric solution that builds a trusted address library, uses offline training and online inference, and adopts AOI‑based matching to automate parcel sorting and driver assignment.

AIBig DataLogistics

0 likes · 17 min read

How Shopee Automates Southeast Asian Last‑Mile Sorting with AI and Big Data

Big Data Technology Architecture

Nov 16, 2021 · Databases

ByteHouse: ClickHouse Enterprise Edition Case Studies and Optimizations at ByteDance

ByteDance’s ByteHouse, a ClickHouse enterprise edition, showcases large‑scale real‑time analytics through two detailed case studies—recommendation system metrics and ad‑delivery data—detailing technical selection, challenges, multi‑threaded Kafka Engine, async indexing, buffer engine enhancements, and the resulting performance gains.

Big DataByteHouseClickHouse

0 likes · 10 min read

ByteHouse: ClickHouse Enterprise Edition Case Studies and Optimizations at ByteDance

DevOps

Nov 16, 2021 · Operations

Key Strategies and Recommendations for Successful Enterprise Digital Transformation

The article outlines how enterprises can assess digital transformation outcomes, formulate effective strategies, build large‑scale capabilities, foster agile culture, and continuously monitor progress, drawing on McKinsey research and real‑world examples to guide traditional firms toward sustainable digital growth.

Big DataDigital TransformationOperations

0 likes · 17 min read

Key Strategies and Recommendations for Successful Enterprise Digital Transformation

Big Data Technology Architecture

Nov 13, 2021 · Big Data

Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake

This article details Baicaowei's migration from an IDC‑hosted Hadoop cluster to a cloud‑native data lake on Alibaba Cloud, outlining the business drivers, pain points of the legacy platform, architectural goals, design principles, solution selection, implementation steps, and future outlook for the new big‑data ecosystem.

Alibaba CloudBig DataDelta Lake

0 likes · 16 min read

Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake

Big Data Technology & Architecture

Nov 11, 2021 · Big Data

Reflections on Java Backend and Big Data Career Paths

The author shares personal insights and advice on working in Java backend and real‑time big‑data platforms, discussing common doubts, the value of continuous learning, and how early career choices can shape long‑term professional growth.

Big Databackend-developmentjava

0 likes · 6 min read

Reflections on Java Backend and Big Data Career Paths

Big Data Technology & Architecture

Nov 9, 2021 · Fundamentals

Eight Key Aspects of Digital Transformation – Summary of Ma Xiaodong’s “Digital Transformation Methodology”

This article presents a concise PPT‑style summary of Ma Xiaodong’s book “Digital Transformation Methodology”, outlining eight essential topics—why, when, what, whether, who, how, tools, and case studies of digital transformation—along with numerous illustrative slides and links to related big‑data resources.

Big DataCase StudiesDigital Transformation

0 likes · 5 min read

Eight Key Aspects of Digital Transformation – Summary of Ma Xiaodong’s “Digital Transformation Methodology”

21CTO

Nov 8, 2021 · Big Data

How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons

Facing rapid business iteration, Baidu’s iFanFan data team designed a unified real‑time and offline big‑data platform, tackling business, technical, and organizational challenges through Lambda/Kappa architectures, data integration, storage, computation, governance, and scalable analytics to deliver timely, accurate, and valuable data products.

Big DataData ArchitectureReal-time Processing

0 likes · 33 min read

How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons

DataFunSummit

Nov 8, 2021 · Big Data

Building JD's OLAP System: From Data Ingestion to Management and Future Plans

This article explains how JD.com designs and evolves its OLAP platform, covering data sources, ingestion, storage, real‑time and offline processing, key challenges such as timeliness, high throughput, consistency, and the solutions implemented to support massive e‑commerce analytics.

Big DataDistributed SystemsJD.com

0 likes · 13 min read

Building JD's OLAP System: From Data Ingestion to Management and Future Plans

Big Data Technology & Architecture

Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration

0 likes · 29 min read

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

Big Data Technology & Architecture

Nov 7, 2021 · Databases

Understanding Secondary Indexes and Coprocessor Solutions in HBase

This article explains the concept of secondary indexes in HBase, describes how coprocessors (including observers and endpoints) enable server‑side processing, compares coprocessor‑based solutions such as Apache Phoenix with non‑coprocessor approaches using Elasticsearch or Solr, and outlines their advantages and trade‑offs.

Big DataCoprocessorHBase

0 likes · 11 min read

Understanding Secondary Indexes and Coprocessor Solutions in HBase

Python Crawling & Data Mining

Nov 7, 2021 · Big Data

Mastering Data Mining: A Deep Dive into CRISP‑DM and SEMMA Methodologies

This article explains the two most common data‑mining frameworks—CRISP‑DM and SEMMA—detailing their six and five stages respectively, illustrating each phase with diagrams and highlighting how the iterative nature of data mining drives continuous improvement.

AnalyticsBig DataCRISP-DM

0 likes · 8 min read

Mastering Data Mining: A Deep Dive into CRISP‑DM and SEMMA Methodologies

Full-Stack Internet Architecture

Nov 6, 2021 · Big Data

Why User Profiling Projects Fail: Common Pitfalls and Deep Causes

The article analyzes why user profiling initiatives frequently collapse, highlighting surface mistakes such as confusing past behavior with future predictions, mixing behavior with motivation, and mistaking correlation for causation, while also exposing deeper issues like unrealistic business expectations, over‑reliance on static tags, and insufficient predictive modeling and causal analysis.

Big DataBusiness IntelligencePredictive Modeling

0 likes · 9 min read

Why User Profiling Projects Fail: Common Pitfalls and Deep Causes

DataFunTalk

Nov 6, 2021 · Big Data

Evolution and Practices of OLAP at Vipshop: Presto, ClickHouse, and Kylin

This article details Vipshop's OLAP evolution, covering the deployment, optimization, and containerization of Presto, ClickHouse, and Kylin, the challenges faced, self‑developed tooling, and future directions for intelligent scaling and resource management.

Big DataClickHouseFlink

0 likes · 27 min read

Evolution and Practices of OLAP at Vipshop: Presto, ClickHouse, and Kylin

Tongcheng Travel Technology Center

Nov 2, 2021 · Big Data

Hadoop Cluster Cross-Data Center Migration Practice at Tongcheng Travel

This article details Tongcheng Travel’s month‑long, zero‑downtime migration of hundreds of petabytes of Hadoop HDFS and YARN clusters across data centers, describing the background, migration strategies, lessons learned, tool enhancements, and future plans to improve data locality, balance, and monitoring.

Big DataCluster MigrationData Center

0 likes · 16 min read

Hadoop Cluster Cross-Data Center Migration Practice at Tongcheng Travel

Architecture Digest

Nov 2, 2021 · Databases

Comparative Analysis of MySQL and HBase: Architecture, Engine, and Use Cases

This article compares MySQL and HBase across architecture, storage engine, indexing structures (B+ tree vs LSM tree), data access features, and ecosystem integration, highlighting each system's strengths, limitations, and the scenarios where HBase is a suitable complement to MySQL for large‑scale data workloads.

B+TreeBig DataHBase

0 likes · 9 min read

Comparative Analysis of MySQL and HBase: Architecture, Engine, and Use Cases

Big Data Technology & Architecture

Nov 2, 2021 · Big Data

Understanding Kafka: From Message Engine to Distributed Stream Processing Platform

This article explains Kafka's evolution—highlighting the introduction of Kafka Streams, the shift to a full distributed stream processing platform, practical learning paths, source‑code reading tips, common pitfalls, and the major new features introduced in Kafka 3.0.

Big DataDistributed SystemsKafka

0 likes · 7 min read

Understanding Kafka: From Message Engine to Distributed Stream Processing Platform

21CTO

Nov 1, 2021 · Big Data

Essential Data Engineering Roadmap: Skills, Tools, and Technologies to Master

This guide outlines the fast‑growing data engineering career path, covering essential Linux fundamentals, programming languages, testing, database concepts, data warehouses, processing frameworks, messaging systems, cluster computing, workflow scheduling, monitoring, infrastructure as code, and CI/CD tools.

Big Datadata engineeringdata pipelines

0 likes · 5 min read

Essential Data Engineering Roadmap: Skills, Tools, and Technologies to Master

DataFunTalk

Oct 30, 2021 · Big Data

Product Practice of Data Governance Tools at NetEase: Review, Pain Points, Strategy, and Future Planning

The presentation at the DataFun Summit detailed NetEase's data‑governance tool practice, reviewing past initiatives, current challenges, comprehensive product strategies, and future roadmap to improve compute and storage efficiency, cost quantification, and systematic governance across business lines.

Big DataData GovernanceData Lifecycle

0 likes · 13 min read

Product Practice of Data Governance Tools at NetEase: Review, Pain Points, Strategy, and Future Planning

Kuaishou Big Data

Oct 28, 2021 · Big Data

How Kuaishou Cut Object Storage Costs by 50% with LRC Erasure Coding

Kuaishou reduced half of its massive object storage expenses by redesigning its architecture to use HBase indexing, HDFS large‑file storage, MemoryCache, and a cross‑IDC LRC erasure‑coding warm layer that maintains disaster‑recovery while dynamically moving data from hot to warm to cold tiers.

Big DataKuaishouLRC

0 likes · 12 min read

How Kuaishou Cut Object Storage Costs by 50% with LRC Erasure Coding

DataFunTalk

Oct 27, 2021 · Big Data

Data Value System and Cockpit Construction: A Case Study from CITIC Bank

This article explains how CITIC Bank's software development center built a data value system and management cockpit, detailing business objectives, overall architecture, digital management methodology, implementation steps, and real‑world usage to support the bank's digital transformation.

Big DataData GovernanceDigital Transformation

0 likes · 16 min read

Data Value System and Cockpit Construction: A Case Study from CITIC Bank

dbaplus Community

Oct 26, 2021 · Databases

Scaling JD.com Customer Service with Doris OLAP: Architecture & Caching

JD.com’s customer service team leverages the open‑source MPP database Doris to power real‑time and offline OLAP dashboards, detailing data ingestion pipelines, full‑link monitoring, dual‑stream high‑availability design, dynamic partition management, multi‑layer caching strategies, and performance optimizations applied during the 2020 11.11 shopping festival.

Big DataMonitoringOLAP

0 likes · 15 min read

Scaling JD.com Customer Service with Doris OLAP: Architecture & Caching

DataFunSummit

Oct 26, 2021 · Big Data

Data Value System and Cockpit Construction: A Case Study from CITIC Bank

This article presents a comprehensive overview of CITIC Bank's data value system and cockpit construction, detailing business objectives, overall planning, digital management framework, methodology, implementation cases, and current usage, illustrating how data-driven analytics support the bank's digital transformation.

Big DataData CockpitData Governance

0 likes · 17 min read

High Availability Architecture

Oct 25, 2021 · Big Data

iQIYI Data Governance Practices: Event Tracking (Pingback) Governance and Application

The article details iQIYI's comprehensive data governance initiative for event tracking (Pingback), covering definitions, timing, quality requirements, governance challenges, standardized specifications, coordinate management, testing and gray‑release processes, upgrade workflows, and data security measures that together reduced event volume by 40% and cut resource consumption in half.

AnalyticsBig DataData Governance

0 likes · 16 min read

iQIYI Data Governance Practices: Event Tracking (Pingback) Governance and Application

DataFunSummit

Oct 25, 2021 · Big Data

Building a Multi-Dimensional Analysis System: Practice at Baixin Bank

This talk by Baixin Bank's BI leader outlines the bank's business model, multi-dimensional data analysis requirements, and the design of a laddered analysis solution, including indicator and analysis system construction, user‑product‑enterprise scenario modeling, and productization of data insights for operational decision‑making.

Big DataBusiness IntelligenceMulti-dimensional Analysis

0 likes · 20 min read

Building a Multi-Dimensional Analysis System: Practice at Baixin Bank