Tagged articles
3675 articles
Page 19 of 37
IT Architects Alliance
IT Architects Alliance
Dec 29, 2021 · Fundamentals

Collection of System Architecture Templates and Diagrams

This article presents a series of downloadable system architecture templates covering DMP, blockchain, data quality governance, enterprise technology, data architecture, Xelerator, alarm platform, microservices, front‑back separation, and a generic architecture, each illustrated with descriptive diagrams and brief explanations.

Big DataBlockchainSystem Architecture
0 likes · 5 min read
Collection of System Architecture Templates and Diagrams
Tencent Cloud Developer
Tencent Cloud Developer
Dec 28, 2021 · Industry Insights

How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses

This article analyzes the challenges of massive data query efficiency, explains how Flink's stream processing and ClickHouse's OLAP engine complement each other, and presents a layered real‑time data‑warehouse architecture with practical guidance on data ingestion, write strategies, quality assurance, and evolving batch‑stream integration patterns.

Big DataClickHouseFlink
0 likes · 19 min read
How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses
DataFunSummit
DataFunSummit
Dec 28, 2021 · Artificial Intelligence

Deep Application‑Driven Construction of Medical Knowledge Graphs: Methods, Models, and Case Studies

This article presents a comprehensive overview of medical knowledge graph development, covering global and domestic progress, domain characteristics, a six‑step construction workflow—including schema design, ontology term set creation, and graph building—and showcases practical applications such as intelligent alerts, guideline recommendations, and data direct reporting.

Big DataData IntegrationHealthcare
0 likes · 11 min read
Deep Application‑Driven Construction of Medical Knowledge Graphs: Methods, Models, and Case Studies
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 28, 2021 · Big Data

Comprehensive Guide to Spark SQL: Concepts, DataSet/DataFrame, Functions, Optimization and Common Pitfalls

This article provides an in‑depth overview of Spark SQL, covering its architecture, DataSet/DataFrame creation, DSL and SQL usage, integration with Hive, custom UDF/UDAF/Aggregator implementations, handling of small files, Cartesian product detection, and a catalog of useful built‑in functions and window operations.

Big DataDatasetSpark SQL
0 likes · 29 min read
Comprehensive Guide to Spark SQL: Concepts, DataSet/DataFrame, Functions, Optimization and Common Pitfalls
Su San Talks Tech
Su San Talks Tech
Dec 28, 2021 · Big Data

What Makes Kafka the Backbone of Real‑Time Big Data Processing?

This article provides a comprehensive overview of Apache Kafka, covering its distributed architecture, key advantages and drawbacks, the role of ZooKeeper, message delivery semantics, partitioning strategies, storage mechanisms, and performance optimizations such as zero‑copy and batch processing, all essential for high‑throughput real‑time data pipelines.

Big DataDistributed MessagingStreaming
0 likes · 23 min read
What Makes Kafka the Backbone of Real‑Time Big Data Processing?
DataFunTalk
DataFunTalk
Dec 25, 2021 · Artificial Intelligence

Optimizing Spark‑ML Linear Models with Project Matrix: Background, Progress, and Future Plans

This article introduces the Project Matrix initiative that re‑examines and restructures Spark‑ML linear models, detailing the background of Spark‑ML usage at JD, the performance‑focused optimizations such as blockification and virtual centering, and outlines upcoming work to further improve scalability and accuracy.

Big DataPerformance OptimizationSpark
0 likes · 9 min read
Optimizing Spark‑ML Linear Models with Project Matrix: Background, Progress, and Future Plans
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 24, 2021 · Big Data

Key Updates and New Features in Apache Flink 1.14.2 Release

The Apache Flink 1.14.2 release, launched on December 16, fixes a critical Log4j vulnerability, resolves OOM issues with the Pulsar connector, introduces numerous Table API, DataStream API, connector, and checkpoint enhancements, deprecates several legacy APIs, and drops support for Apache Mesos, while also promoting related PDF resources.

Apache FlinkBig DataCheckpoints
0 likes · 8 min read
Key Updates and New Features in Apache Flink 1.14.2 Release
AntTech
AntTech
Dec 23, 2021 · Databases

Understanding Graph Computing: Fundamentals, Applications, and Future Directions

This article explains graph computing fundamentals, illustrates its use in fraud detection, search ranking, and brain modeling, highlights Ant Group's record‑breaking performance and standards efforts, and outlines future challenges such as standardization, higher performance, and integration with AI.

Artificial IntelligenceBig DataPerformance
0 likes · 13 min read
Understanding Graph Computing: Fundamentals, Applications, and Future Directions
DataFunTalk
DataFunTalk
Dec 23, 2021 · Big Data

Building an Advertising Data Platform on ClickHouse: Architecture, Challenges, and Practices

This article details the design and implementation of an advertising data platform at eBay, explaining the business scenario, why ClickHouse was chosen over alternatives, the technical challenges faced, and the solutions involving lambda architecture, table engine choices, compression techniques, data ingestion pipelines, consistency guarantees, and deployment practices.

AdvertisingBig DataClickHouse
0 likes · 26 min read
Building an Advertising Data Platform on ClickHouse: Architecture, Challenges, and Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 23, 2021 · Big Data

Key Spark Configuration Parameters and Their Explanations

This article presents a comprehensive list of essential Spark configuration settings—including executor memory, off‑heap memory, memory fractions, shuffle options, and adaptive query execution parameters—each accompanied by a concise description to help users fine‑tune Spark performance.

Adaptive Query ExecutionBig DataMemory Management
0 likes · 6 min read
Key Spark Configuration Parameters and Their Explanations
DataFunSummit
DataFunSummit
Dec 22, 2021 · Big Data

Data Governance Practices and Experiences at NetEase Cloud Music

This article details NetEase Cloud Music's comprehensive data governance journey, covering data warehouse architecture, data standards, event tracking (埋点) governance, asset lifecycle management, and future automation plans, illustrating how systematic governance improves data quality, cost efficiency, and business insight.

Big DataData Governancedata-warehouse
0 likes · 21 min read
Data Governance Practices and Experiences at NetEase Cloud Music
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 22, 2021 · Big Data

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

This article explains Change Data Capture (CDC), compares query‑based and log‑based approaches, introduces Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including Java source, deserialization, sink code and required Maven dependencies—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCClickHouse
0 likes · 14 min read
Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse
Architects Research Society
Architects Research Society
Dec 21, 2021 · Fundamentals

Next-Generation Master Data Management (MDM): Architecture, Business Value, and Technical Challenges

This article explains master data management concepts, regulatory drivers, business benefits, key technical challenges, architectural trends such as graph databases and machine learning, and highlights leading vendors, providing a comprehensive overview for enterprises seeking modern MDM solutions.

AnalyticsBig DataData Governance
0 likes · 9 min read
Next-Generation Master Data Management (MDM): Architecture, Business Value, and Technical Challenges
DataFunTalk
DataFunTalk
Dec 21, 2021 · Artificial Intelligence

Personalized Federated Learning and AI for Drug Discovery: Challenges, Applications, and Cloud Solutions

This talk by Huawei senior engineer Xu Chi explores the challenges of drug screening, AI-driven drug discovery practices, and how personalized federated learning combined with Huawei Cloud's high‑performance computing accelerates pharmaceutical research, including case studies, platform services, and collaborative efforts.

AIBig DataCloud Computing
0 likes · 11 min read
Personalized Federated Learning and AI for Drug Discovery: Challenges, Applications, and Cloud Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 21, 2021 · Big Data

Understanding Spark 3.0 Adaptive Query Execution (AQE) and Dynamic Partition Pruning (DPP)

This article explains the two most important Spark 3.0 features—Adaptive Query Execution and Dynamic Partition Pruning—detailing how AQE dynamically optimizes join strategies, partition coalescing, and skew handling, while DPP reduces I/O by pruning irrelevant fact‑table partitions at runtime.

Adaptive Query ExecutionBig DataDynamic Partition Pruning
0 likes · 10 min read
Understanding Spark 3.0 Adaptive Query Execution (AQE) and Dynamic Partition Pruning (DPP)
HelloTech
HelloTech
Dec 20, 2021 · Big Data

Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization

Hello Mobility unified its fragmented ElasticSearch clusters into a single, real‑time search platform—leveraging Kafka‑driven CDC, Flink stream processing, custom ES plugins, and extensive performance tuning—to deliver scalable matching, recommendation and voice services, ultimately raising completed orders by 49.8 % and driver acceptance by 37 %.

Big DataFlinkSearch Platform
0 likes · 19 min read
Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization
Architecture Digest
Architecture Digest
Dec 20, 2021 · Backend Development

Understanding Kafka: Core Design, Architecture, and Performance

This article explains Kafka’s fundamental design concepts—including topics, partitions, replicas, consumer groups, and its network architecture—while highlighting performance features such as sequential writes, zero‑copy, log segmentation, and how the controller coordinates with ZooKeeper, providing a comprehensive overview for backend developers.

Big DataKafkaMessage Queue
0 likes · 12 min read
Understanding Kafka: Core Design, Architecture, and Performance
DataFunSummit
DataFunSummit
Dec 18, 2021 · Big Data

Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP

The Fast OLAP Forum held on December 19 at DataFunCon gathers leading experts from Baidu, Tencent, JD, and FreeWheel to share cutting‑edge techniques in vectorized execution, cloud‑native ClickHouse, large‑scale OLAP architectures, and Presto optimizations, offering deep insights for practitioners dealing with massive real‑time data workloads.

Apache DorisBig DataClickHouse
0 likes · 7 min read
Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 18, 2021 · Big Data

Slowly Changing Dimensions (SCD) – Design Principles, Challenges, and Hive Implementation

This article explains the concept of Slowly Changing Dimensions (SCD), discusses practical design questions, compares three change‑tracking requirements, presents three implementation patterns, and provides detailed Hive/SQL examples for historical data initialization and incremental updates in large‑scale data warehouses.

Big DataSCDdata-warehouse
0 likes · 20 min read
Slowly Changing Dimensions (SCD) – Design Principles, Challenges, and Hive Implementation
Taobao Frontend Technology
Taobao Frontend Technology
Dec 16, 2021 · Artificial Intelligence

How Virtual Digital Humans Are Shaping the Future of Entertainment and Tech

This article defines virtual characters, outlines their market growth and industry chain, showcases leading products and solutions, and details the technical research—including AI-driven animation, rendering pipelines, scene orchestration, and big‑data algorithms—being pursued by Alibaba's front‑end team.

AIBig DataGame Development
0 likes · 12 min read
How Virtual Digital Humans Are Shaping the Future of Entertainment and Tech
Ctrip Technology
Ctrip Technology
Dec 16, 2021 · Big Data

Data Standard Management Practices in Ctrip Vacation Data Governance

This article outlines Ctrip Vacation's data standard management approach, covering why standards are needed, the three‑element framework of scope, tools, and policies, and detailed practices for data integration, production change handling, metadata governance, portal dashboard standardization, and self‑service query templating.

Big DataData GovernanceData Integration
0 likes · 12 min read
Data Standard Management Practices in Ctrip Vacation Data Governance
High Availability Architecture
High Availability Architecture
Dec 16, 2021 · Big Data

iQIYI Basic Data Platform: Architecture, High Availability, and Service Practices

The iQIYI Basic Data Platform unifies internal data exchange standards, integrates massive multi‑business data, and implements high‑availability solutions for ID services, messaging, HBase storage, and read‑write scaling, showcasing practical engineering approaches to big‑data reliability and performance.

Big DataDistributed SystemsHBase
0 likes · 11 min read
iQIYI Basic Data Platform: Architecture, High Availability, and Service Practices
政采云技术
政采云技术
Dec 16, 2021 · Big Data

What Is Event Tracking (埋点) and Its Implementation in a Data Analysis System

This article explains the concept of event tracking (埋点), its importance for capturing user behavior, outlines the four‑module architecture of a tracking system, compares code‑based, visual and full tracking methods, describes data models, storage, management, and presents a practical case study with analysis techniques.

AnalyticsBackendBig Data
0 likes · 15 min read
What Is Event Tracking (埋点) and Its Implementation in a Data Analysis System
Liangxu Linux
Liangxu Linux
Dec 15, 2021 · Fundamentals

Cracking the 4‑Billion QQ Deduplication Challenge with 1 GB Memory

This article walks through four approaches—sorting, hashmap, file splitting, and a bitmap technique—to deduplicate 4 billion QQ numbers within a 1 GB memory limit, explains why the first three fail, and shows how a bitmap solves the problem efficiently.

Big DataMemory Optimizationalgorithm
0 likes · 8 min read
Cracking the 4‑Billion QQ Deduplication Challenge with 1 GB Memory
JD Cloud Developers
JD Cloud Developers
Dec 15, 2021 · Big Data

How JD Retail Scales Billion‑Item Selection with ClickHouse & Elasticsearch

This article details JD Retail's strategic "Nirvana" product‑selection platform, describing the technical challenges of handling billions of items and hundreds of tags, and presenting a dual‑engine solution using ClickHouse and Elasticsearch with Spark‑driven data pipelines to achieve fast filtering, multidimensional analytics, and efficient storage.

Big DataClickHouseElasticsearch
0 likes · 15 min read
How JD Retail Scales Billion‑Item Selection with ClickHouse & Elasticsearch
DataFunSummit
DataFunSummit
Dec 14, 2021 · Big Data

Data Map: Background, Definition, and Youzan’s Practical Implementation

This article introduces the concept of a data map, explains its background and goals, describes Youzan’s end‑to‑end data‑map practice—including full data lineage, search, management, link analysis, impact estimation, and optimization—and concludes with a summary and future outlook.

Big DataData GovernanceData Lineage
0 likes · 16 min read
Data Map: Background, Definition, and Youzan’s Practical Implementation
Tencent Cloud Developer
Tencent Cloud Developer
Dec 13, 2021 · Cloud Computing

Trends in the Internet of Things and the Differentiated Development Path of Tencent Cloud IoT

Zhou Jiaxin explains that as IoT moves from universal connectivity to intelligent integration, Tencent Cloud IoT’s “Lianlian” platform tackles cost, efficiency and ecosystem gaps by embedding content, AI, big‑data and WeChat services into eight modular solutions, enabling rapid, cross‑industry smart applications.

AIBig DataCloud Computing
0 likes · 13 min read
Trends in the Internet of Things and the Differentiated Development Path of Tencent Cloud IoT
Top Architect
Top Architect
Dec 13, 2021 · Big Data

Design and Implementation of BanYu's Big Data Access Control System

This article describes the evolution from an unsecured data warehouse to a comprehensive big‑data access control system at BanYu, detailing the background, data access methods, design goals, authentication and authorization mechanisms, policy configuration, integration with Metabase, and the overall workflow that balances security with efficiency.

Big DataLDAPPresto
0 likes · 15 min read
Design and Implementation of BanYu's Big Data Access Control System
Python Crawling & Data Mining
Python Crawling & Data Mining
Dec 13, 2021 · Big Data

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

This article explains several algorithmic strategies—including sorting, hash maps, file splitting, and bitmap techniques—to remove duplicates from a file containing 4 billion QQ numbers while staying within a 1 GB memory limit, and it provides extension exercises for sorting, median, top‑K, and duplicate detection.

Big DataMemory Optimizationalgorithm
0 likes · 8 min read
How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM
Java Architect Essentials
Java Architect Essentials
Dec 11, 2021 · Information Security

Protecting Mobile Privacy in the Big Data Era: Risks of Data Leakage and How to Stay Safe

In today's big‑data era, excessive stress leads many to seek relief through risky online activities, but unauthorized app permissions and visits to dubious sites can expose personal information, so users must stay vigilant, limit permissions, avoid harmful sites, and use security tools to protect their mobile privacy.

Big DataMobile Securitydata leakage
0 likes · 6 min read
Protecting Mobile Privacy in the Big Data Era: Risks of Data Leakage and How to Stay Safe
IT Architects Alliance
IT Architects Alliance
Dec 11, 2021 · Big Data

Design and Implementation of Banyu's Big Data Permission System

This article describes the background, design goals, authentication and authorization mechanisms, system architecture, policy configuration, and Metabase integration of Banyu's big data permission system, which secures Hive, Presto, HDFS and other data access components using Apache Ranger and LDAP.

Apache RangerBig DataLDAP
0 likes · 14 min read
Design and Implementation of Banyu's Big Data Permission System
JD Retail Technology
JD Retail Technology
Dec 10, 2021 · Industry Insights

How JD Retail Cloud’s CRM Turned a Convenience Store Chain into a Data‑Driven Growth Engine

The award‑winning JD Retail Cloud store‑CRM solution helped the Haolinju convenience‑store chain overcome fragmented membership systems by rebuilding user data, applying big‑data algorithms and marketing automation, which boosted precise‑marketing ROI by 19% and increased purchase frequency by 0.7 per customer.

Big DataCRMCloud Computing
0 likes · 6 min read
How JD Retail Cloud’s CRM Turned a Convenience Store Chain into a Data‑Driven Growth Engine
DataFunTalk
DataFunTalk
Dec 10, 2021 · Big Data

Building and Evolving NetEase Yanxuan Real-Time Computing Platform: Architecture, SQLization, Serviceization, and Data Governance

This article details NetEase Yanxuan's real-time computing platform development from 2017 to present, covering its architecture, Flink‑SQL development environment, service‑oriented deployment, resource optimization, cloud‑native migration, comprehensive data governance, and future plans for stream‑batch integration and intelligent job diagnostics.

Big DataCloud NativeData Governance
0 likes · 14 min read
Building and Evolving NetEase Yanxuan Real-Time Computing Platform: Architecture, SQLization, Serviceization, and Data Governance
DataFunSummit
DataFunSummit
Dec 10, 2021 · Big Data

Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance

This article details NetEase Yanxuan's evolution of a real‑time data platform from 2017 to present, covering background, current scale, layered architecture, Flink‑SQL development IDE, service‑oriented task execution, resource‑optimizing deployment modes, cloud‑native migration, comprehensive data governance, and future batch‑stream integration plans.

Big DataCloud NativeData Governance
0 likes · 15 min read
Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance
21CTO
21CTO
Dec 9, 2021 · Big Data

Designing a Scalable Big Data Permission System: From Hive to Metabase

BanYu’s early data warehouse lacked any access controls, prompting the creation of a comprehensive big‑data permission system that integrates authentication and authorization across Hive, Presto, HDFS, and Metabase using LDAP, Ranger policies, workflow automation, and both synchronous and asynchronous policy initialization.

AuthorizationBig DataLDAP
0 likes · 16 min read
Designing a Scalable Big Data Permission System: From Hive to Metabase
DataFunTalk
DataFunTalk
Dec 9, 2021 · Big Data

Mobile Cloud LakeHouse: Cloud‑Native Big Data Analytics Architecture and Practices

This article introduces the cloud‑native LakeHouse solution from China Mobile Cloud, covering its lake‑warehouse integration concept, overall architecture, core functions such as storage‑compute separation, one‑click data ingestion, intelligent metadata discovery, serverless execution, JDBC support, incremental updates, and typical application scenarios in public and private clouds.

Big DataCloud NativeData Integration
0 likes · 17 min read
Mobile Cloud LakeHouse: Cloud‑Native Big Data Analytics Architecture and Practices
Architects' Tech Alliance
Architects' Tech Alliance
Dec 8, 2021 · Cloud Computing

Future Network Architecture and Emerging Data Center Technologies in the New Infrastructure Era

The article examines the concept of "new infrastructure" in China, outlines the evolution of the Internet toward a third generation, discusses candidate future network architectures such as SDN, CCN, and XIA, and reviews emerging data‑center networking technologies like large‑scale L2, VXLAN, virtual switching, and large‑scale switching, highlighting their role in supporting AI, big data, and cloud computing workloads.

AIBig DataSDN
0 likes · 12 min read
Future Network Architecture and Emerging Data Center Technologies in the New Infrastructure Era
Alimama Tech
Alimama Tech
Dec 8, 2021 · Big Data

Marketing Channel Attribution Models and Conversion Effectiveness Evaluation

Effective marketing budget allocation relies on robust channel attribution models that combine dimensions, metrics, and segmentation with rule‑based or data‑driven (Shapley) credit assignment across defined attribution windows, enabling multi‑touch analysis, conversion‑time insights, and ROI‑focused channel performance evaluation.

Big DataROIattribution model
0 likes · 16 min read
Marketing Channel Attribution Models and Conversion Effectiveness Evaluation
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 8, 2021 · Big Data

Presto Overview, Architecture, and Query Optimization Techniques

This article introduces Presto, an open‑source MPP SQL engine, explains its coordinator‑worker architecture and connector model, and provides detailed storage, query, and join optimization strategies—including in‑memory parallelism, dynamic plan compilation, and practical SQL code examples—to achieve low‑latency, high‑performance analytics on big data.

Big DataPrestoquery optimization
0 likes · 7 min read
Presto Overview, Architecture, and Query Optimization Techniques
Open Source Linux
Open Source Linux
Dec 5, 2021 · Operations

Essential Skill Maps Every DevOps Engineer Should Master

This article compiles a series of visual skill maps covering DevOps, cloud computing, big data, security, architecture, and development practices, offering engineers a comprehensive roadmap to build and expand their technical knowledge across multiple domains.

Big DataCloud ComputingDevOps
0 likes · 3 min read
Essential Skill Maps Every DevOps Engineer Should Master
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 4, 2021 · Big Data

Understanding Spark's BlockManager, MemoryStore, and DiskStore

This article explains Spark's storage architecture, detailing the roles and interactions of BlockManager, MemoryStore, and DiskStore, including their initialization, data management mechanisms, code implementations, and eviction strategies, to help readers grasp how Spark efficiently handles in‑memory and on‑disk data.

Big DataBlockManagerDiskStore
0 likes · 12 min read
Understanding Spark's BlockManager, MemoryStore, and DiskStore
Open Source Linux
Open Source Linux
Dec 3, 2021 · Big Data

How Big Data Tech Evolved: Lessons from Alibaba, JD, and Didi

This article traces the evolution of big data technologies from early concepts and Google research papers through the rise of Hadoop, examines the platform transformations of Alibaba, JD.com, and Didi, and offers practical stack‑selection guidance for medium‑ and small‑scale enterprises.

AlibabaBig DataDidi
0 likes · 17 min read
How Big Data Tech Evolved: Lessons from Alibaba, JD, and Didi
21CTO
21CTO
Dec 2, 2021 · Fundamentals

Why China Is Betting on Open‑Source to Revitalize Its Software Industry

China's Ministry of Industry and Information Technology warns that the domestic software sector lags internationally and outlines a comprehensive plan—ranging from talent cultivation and open‑source community building to big‑data expansion—to transform the industry by 2025.

Big DataChinaOpen-source
0 likes · 5 min read
Why China Is Betting on Open‑Source to Revitalize Its Software Industry
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 1, 2021 · Big Data

Understanding Spark Shuffle: Mechanisms, Evolution, and Optimization

This article provides a comprehensive overview of Spark's shuffle process, explaining its definition, internal mechanisms such as shuffle write and read, the evolution of shuffle managers, and practical optimization techniques including parameter tuning and broadcast variables, all aimed at improving performance in large‑scale data processing.

Big DataShuffleShuffle Reader
0 likes · 18 min read
Understanding Spark Shuffle: Mechanisms, Evolution, and Optimization
Alimama Tech
Alimama Tech
Dec 1, 2021 · Big Data

Optimization Algorithms for Guaranteed Delivery Advertising in Double‑11 Interactive Campaign

During Double‑11, the team created two specialized allocation algorithms—a brand‑score‑driven primal‑dual method for guaranteed‑downline contracts and a guarantee‑and‑balance flow‑re‑ranking approach for guaranteed‑non‑downline contracts—both using near‑line dual adjustments to meet contract volumes while boosting interaction depth, repeat visits, and browsing time.

Allocation AlgorithmBig Datadual programming
0 likes · 14 min read
Optimization Algorithms for Guaranteed Delivery Advertising in Double‑11 Interactive Campaign
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 30, 2021 · Big Data

Scaling Real‑Time Data Warehousing for Double‑11: Flink + Hologres in Action

During the 2021 Double‑11 shopping festival, logistics provider DiSiFang upgraded its real‑time data warehouse with Flink and Hologres, enabling multi‑billion‑row joins, cutting costs by 50%, and delivering stable, low‑latency analytics that powered high‑frequency dashboards and improved overall delivery speed.

Big DataCloud ComputingFlink
0 likes · 13 min read
Scaling Real‑Time Data Warehousing for Double‑11: Flink + Hologres in Action
Qunar Tech Salon
Qunar Tech Salon
Nov 29, 2021 · Big Data

Construction and Practice of Qunar's Business Intelligence Platform

This article details the evolution, architecture, and technical choices of Qunar's BI platform—from early one‑stop reporting to a modular, self‑service system supporting real‑time analytics, multi‑metric calculations, and unified data governance—highlighting challenges, solutions, and performance benchmarks across big‑data technologies.

BIBig DataClickHouse
0 likes · 23 min read
Construction and Practice of Qunar's Business Intelligence Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 28, 2021 · Big Data

OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework

This article presents the OneData methodology for designing, standardizing, and governing a data warehouse, detailing background challenges, goals, industry references, core concepts, unified business and design consolidation, data modeling layers, naming conventions, data quality controls, and the resulting operational improvements and business value.

Big DataData GovernanceOnedata
0 likes · 20 min read
OneData Methodology: Building a Unified Data Warehouse Architecture and Governance Framework
DataFunTalk
DataFunTalk
Nov 27, 2021 · Big Data

iQIYI Data Middle Platform: Architecture, Data Governance Practices, and Future Plans

The article details iQIYI’s data middle platform architecture and its comprehensive data governance practices, covering platform overview, data flow, unified standards, metadata management, production quality assurance, and future AI‑driven enhancements, illustrating how centralized data services improve reliability, efficiency, and security.

Big DataData GovernanceData Quality
0 likes · 27 min read
iQIYI Data Middle Platform: Architecture, Data Governance Practices, and Future Plans
dbaplus Community
dbaplus Community
Nov 27, 2021 · Big Data

How Vipshop’s Hera Data Service Boosts Big Data Access and Performance

The article details the design, architecture, core features, scheduling logic, and performance gains of Vipshop’s self‑built Hera data service, which unifies data‑warehouse access, supports multiple engines, adapts SQL execution, and dramatically improves SLA for both B‑to‑B and B‑to‑C workloads.

Big DataData ServiceETL
0 likes · 22 min read
How Vipshop’s Hera Data Service Boosts Big Data Access and Performance
Tencent Cloud Developer
Tencent Cloud Developer
Nov 26, 2021 · Big Data

WeChat's ClickHouse Real‑Time Data Warehouse: Challenges, Co‑Construction, and Performance Gains

Facing Hadoop’s minute‑to‑hour query latency on petabyte‑scale data, WeChat partnered with Tencent Cloud to build a ClickHouse‑based real‑time warehouse, adding custom ingestion, query‑optimisation and management tools that deliver billion‑row throughput, sub‑5‑second queries and over ten‑fold performance gains across millions of daily queries.

Big DataClickHouseCloud Native
0 likes · 9 min read
WeChat's ClickHouse Real‑Time Data Warehouse: Challenges, Co‑Construction, and Performance Gains
NiuNiu MaTe
NiuNiu MaTe
Nov 26, 2021 · Big Data

How to Deduplicate 4 Billion QQ Numbers Using Only 1 GB of Memory

This article walks through four practical techniques—sorting, hashmap, file splitting, and bitmap—to remove duplicate QQ numbers from a 4‑billion‑record file within a 1 GB memory limit, and provides extended exercises for sorting, median, top‑K, and duplicate detection.

Big Dataalgorithmbitmap
0 likes · 8 min read
How to Deduplicate 4 Billion QQ Numbers Using Only 1 GB of Memory
StarRocks
StarRocks
Nov 24, 2021 · Big Data

Building a Scalable OLAP Platform at SF Express: StarRocks Evaluation and Lessons

SF Express’s data engineering team details how they migrated from a mixed‑component OLAP stack to a unified StarRocks platform, describing the evaluation criteria, performance‑critical design choices, import and query optimizations, and future roadmap for a high‑availability, low‑cost big‑data analytics solution.

Big DataOLAPPerformance Tuning
0 likes · 14 min read
Building a Scalable OLAP Platform at SF Express: StarRocks Evaluation and Lessons
Qunar Tech Salon
Qunar Tech Salon
Nov 24, 2021 · Databases

Comprehensive Guide to Elasticsearch Index Design, Settings, and Mapping

This article provides a detailed guide on Elasticsearch index design, covering index settings, shard and replica planning, mapping strategies, complex types, lifecycle management, template usage, and practical best‑practice recommendations for large‑scale log data clusters.

Big DataElasticsearchMapping
0 likes · 27 min read
Comprehensive Guide to Elasticsearch Index Design, Settings, and Mapping
DataFunTalk
DataFunTalk
Nov 24, 2021 · Big Data

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Plans

This article presents Tencent's game big‑data analysis platform, detailing its background, the architecture of the iData engine—including offline multi‑dimensional analysis (TGMars), online portrait analysis (TGFace), and real‑time multi‑dimensional analysis (TGDruid)—application scenarios, performance insights, and future ecosystem and open‑source plans.

Big DataGame AnalyticsOLAP
0 likes · 15 min read
Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Plans
DataFunTalk
DataFunTalk
Nov 23, 2021 · Big Data

ClickHouse Practice at Youzan: Architecture, Deployment, and Future Plans

This article details Youzan's adoption of ClickHouse for real-time analytics, covering its evolution from Presto, Druid, and Kylin, the system's architecture, deployment strategies, use cases, performance characteristics, limitations, and future roadmap, including integration with Apache Doris and emerging big‑data trends.

Big DataClickHouseOLAP
0 likes · 23 min read
ClickHouse Practice at Youzan: Architecture, Deployment, and Future Plans
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 22, 2021 · Big Data

Comprehensive Big Data Learning Path and Resource Guide

This article presents a detailed learning roadmap for aspiring big‑data experts, covering foundational programming languages, data structures, Linux basics, databases, distributed system theory, and essential frameworks such as Hadoop, Spark, Flink, Kafka, and provides curated B‑site video links and reference materials.

Big DataFlinkHadoop
0 likes · 9 min read
Comprehensive Big Data Learning Path and Resource Guide
DataFunTalk
DataFunTalk
Nov 20, 2021 · Big Data

How to Build a Big Data Platform from Zero to One: Architecture, Components, and Best Practices

This article provides a comprehensive guide to designing and implementing a big‑data platform, covering architecture overview, data ingestion with Flume, storage on HDFS/Hive/HBase, processing engines such as Hive, Spark and Flink, scheduling solutions like Azkaban and Airflow, and the construction of self‑service analytics systems.

Big DataETLHadoop
0 likes · 29 min read
How to Build a Big Data Platform from Zero to One: Architecture, Components, and Best Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 20, 2021 · Big Data

Comprehensive Overview of Apache Flink Concepts, Mechanisms, and Interview Questions

This article provides an extensive technical guide to Apache Flink, covering its exactly‑once consumption guarantees, checkpoint and two‑phase commit mechanisms, differences from Spark, state backends, watermark handling, time semantics, window joins, CEP, backpressure, architecture layers, deployment, resource management, and common operational issues.

Big DataCEPCheckpoint
0 likes · 77 min read
Comprehensive Overview of Apache Flink Concepts, Mechanisms, and Interview Questions
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Nov 19, 2021 · Big Data

Real‑Time Data Warehouse Practices with Apache Kudu: Architecture, Partitioning, and Platformization

This article reviews the challenges of building a real‑time data warehouse, compares Lambda and Kappa architectures, introduces Apache Kudu’s master‑tablet design, storage model and partition strategies, and shares practical experiences and future directions for a Kudu‑based streaming analytics platform.

Apache KuduBig DataKappa architecture
0 likes · 8 min read
Real‑Time Data Warehouse Practices with Apache Kudu: Architecture, Partitioning, and Platformization
Shopee Tech Team
Shopee Tech Team
Nov 18, 2021 · Industry Insights

How Shopee Automates Southeast Asian Last‑Mile Sorting with AI and Big Data

This article analyzes the inefficiencies of Southeast Asian last‑mile logistics and explains Shopee's AI‑driven, data‑centric solution that builds a trusted address library, uses offline training and online inference, and adopts AOI‑based matching to automate parcel sorting and driver assignment.

AIBig DataLogistics
0 likes · 17 min read
How Shopee Automates Southeast Asian Last‑Mile Sorting with AI and Big Data
Big Data Technology Architecture
Big Data Technology Architecture
Nov 16, 2021 · Databases

ByteHouse: ClickHouse Enterprise Edition Case Studies and Optimizations at ByteDance

ByteDance’s ByteHouse, a ClickHouse enterprise edition, showcases large‑scale real‑time analytics through two detailed case studies—recommendation system metrics and ad‑delivery data—detailing technical selection, challenges, multi‑threaded Kafka Engine, async indexing, buffer engine enhancements, and the resulting performance gains.

Big DataByteHouseClickHouse
0 likes · 10 min read
ByteHouse: ClickHouse Enterprise Edition Case Studies and Optimizations at ByteDance
DevOps
DevOps
Nov 16, 2021 · Operations

Key Strategies and Recommendations for Successful Enterprise Digital Transformation

The article outlines how enterprises can assess digital transformation outcomes, formulate effective strategies, build large‑scale capabilities, foster agile culture, and continuously monitor progress, drawing on McKinsey research and real‑world examples to guide traditional firms toward sustainable digital growth.

Big DataDigital TransformationOperations
0 likes · 17 min read
Key Strategies and Recommendations for Successful Enterprise Digital Transformation
Big Data Technology Architecture
Big Data Technology Architecture
Nov 13, 2021 · Big Data

Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake

This article details Baicaowei's migration from an IDC‑hosted Hadoop cluster to a cloud‑native data lake on Alibaba Cloud, outlining the business drivers, pain points of the legacy platform, architectural goals, design principles, solution selection, implementation steps, and future outlook for the new big‑data ecosystem.

Alibaba CloudBig DataDelta Lake
0 likes · 16 min read
Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 9, 2021 · Fundamentals

Eight Key Aspects of Digital Transformation – Summary of Ma Xiaodong’s “Digital Transformation Methodology”

This article presents a concise PPT‑style summary of Ma Xiaodong’s book “Digital Transformation Methodology”, outlining eight essential topics—why, when, what, whether, who, how, tools, and case studies of digital transformation—along with numerous illustrative slides and links to related big‑data resources.

Big DataCase StudiesDigital Transformation
0 likes · 5 min read
Eight Key Aspects of Digital Transformation – Summary of Ma Xiaodong’s “Digital Transformation Methodology”
21CTO
21CTO
Nov 8, 2021 · Big Data

How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons

Facing rapid business iteration, Baidu’s iFanFan data team designed a unified real‑time and offline big‑data platform, tackling business, technical, and organizational challenges through Lambda/Kappa architectures, data integration, storage, computation, governance, and scalable analytics to deliver timely, accurate, and valuable data products.

Big DataData ArchitectureReal-time Processing
0 likes · 33 min read
How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons
DataFunSummit
DataFunSummit
Nov 8, 2021 · Big Data

Building JD's OLAP System: From Data Ingestion to Management and Future Plans

This article explains how JD.com designs and evolves its OLAP platform, covering data sources, ingestion, storage, real‑time and offline processing, key challenges such as timeliness, high throughput, consistency, and the solutions implemented to support massive e‑commerce analytics.

Big DataDistributed SystemsJD.com
0 likes · 13 min read
Building JD's OLAP System: From Data Ingestion to Management and Future Plans
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration
0 likes · 29 min read
Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 7, 2021 · Databases

Understanding Secondary Indexes and Coprocessor Solutions in HBase

This article explains the concept of secondary indexes in HBase, describes how coprocessors (including observers and endpoints) enable server‑side processing, compares coprocessor‑based solutions such as Apache Phoenix with non‑coprocessor approaches using Elasticsearch or Solr, and outlines their advantages and trade‑offs.

Big DataCoprocessorHBase
0 likes · 11 min read
Understanding Secondary Indexes and Coprocessor Solutions in HBase
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Nov 6, 2021 · Big Data

Why User Profiling Projects Fail: Common Pitfalls and Deep Causes

The article analyzes why user profiling initiatives frequently collapse, highlighting surface mistakes such as confusing past behavior with future predictions, mixing behavior with motivation, and mistaking correlation for causation, while also exposing deeper issues like unrealistic business expectations, over‑reliance on static tags, and insufficient predictive modeling and causal analysis.

Big DataBusiness IntelligencePredictive Modeling
0 likes · 9 min read
Why User Profiling Projects Fail: Common Pitfalls and Deep Causes
21CTO
21CTO
Nov 1, 2021 · Big Data

Essential Data Engineering Roadmap: Skills, Tools, and Technologies to Master

This guide outlines the fast‑growing data engineering career path, covering essential Linux fundamentals, programming languages, testing, database concepts, data warehouses, processing frameworks, messaging systems, cluster computing, workflow scheduling, monitoring, infrastructure as code, and CI/CD tools.

Big Datadata engineeringdata pipelines
0 likes · 5 min read
Essential Data Engineering Roadmap: Skills, Tools, and Technologies to Master
DataFunTalk
DataFunTalk
Oct 30, 2021 · Big Data

Product Practice of Data Governance Tools at NetEase: Review, Pain Points, Strategy, and Future Planning

The presentation at the DataFun Summit detailed NetEase's data‑governance tool practice, reviewing past initiatives, current challenges, comprehensive product strategies, and future roadmap to improve compute and storage efficiency, cost quantification, and systematic governance across business lines.

Big DataData GovernanceData Lifecycle
0 likes · 13 min read
Product Practice of Data Governance Tools at NetEase: Review, Pain Points, Strategy, and Future Planning
Kuaishou Big Data
Kuaishou Big Data
Oct 28, 2021 · Big Data

How Kuaishou Cut Object Storage Costs by 50% with LRC Erasure Coding

Kuaishou reduced half of its massive object storage expenses by redesigning its architecture to use HBase indexing, HDFS large‑file storage, MemoryCache, and a cross‑IDC LRC erasure‑coding warm layer that maintains disaster‑recovery while dynamically moving data from hot to warm to cold tiers.

Big DataKuaishouLRC
0 likes · 12 min read
How Kuaishou Cut Object Storage Costs by 50% with LRC Erasure Coding
DataFunTalk
DataFunTalk
Oct 27, 2021 · Big Data

Data Value System and Cockpit Construction: A Case Study from CITIC Bank

This article explains how CITIC Bank's software development center built a data value system and management cockpit, detailing business objectives, overall architecture, digital management methodology, implementation steps, and real‑world usage to support the bank's digital transformation.

Big DataData GovernanceDigital Transformation
0 likes · 16 min read
Data Value System and Cockpit Construction: A Case Study from CITIC Bank
dbaplus Community
dbaplus Community
Oct 26, 2021 · Databases

Scaling JD.com Customer Service with Doris OLAP: Architecture & Caching

JD.com’s customer service team leverages the open‑source MPP database Doris to power real‑time and offline OLAP dashboards, detailing data ingestion pipelines, full‑link monitoring, dual‑stream high‑availability design, dynamic partition management, multi‑layer caching strategies, and performance optimizations applied during the 2020 11.11 shopping festival.

Big DataMonitoringOLAP
0 likes · 15 min read
Scaling JD.com Customer Service with Doris OLAP: Architecture & Caching
DataFunSummit
DataFunSummit
Oct 26, 2021 · Big Data

Data Value System and Cockpit Construction: A Case Study from CITIC Bank

This article presents a comprehensive overview of CITIC Bank's data value system and cockpit construction, detailing business objectives, overall planning, digital management framework, methodology, implementation cases, and current usage, illustrating how data-driven analytics support the bank's digital transformation.

Big DataData CockpitData Governance
0 likes · 17 min read
Data Value System and Cockpit Construction: A Case Study from CITIC Bank
High Availability Architecture
High Availability Architecture
Oct 25, 2021 · Big Data

iQIYI Data Governance Practices: Event Tracking (Pingback) Governance and Application

The article details iQIYI's comprehensive data governance initiative for event tracking (Pingback), covering definitions, timing, quality requirements, governance challenges, standardized specifications, coordinate management, testing and gray‑release processes, upgrade workflows, and data security measures that together reduced event volume by 40% and cut resource consumption in half.

AnalyticsBig DataData Governance
0 likes · 16 min read
iQIYI Data Governance Practices: Event Tracking (Pingback) Governance and Application
DataFunSummit
DataFunSummit
Oct 25, 2021 · Big Data

Building a Multi-Dimensional Analysis System: Practice at Baixin Bank

This talk by Baixin Bank's BI leader outlines the bank's business model, multi-dimensional data analysis requirements, and the design of a laddered analysis solution, including indicator and analysis system construction, user‑product‑enterprise scenario modeling, and productization of data insights for operational decision‑making.

Big DataBusiness IntelligenceMulti-dimensional Analysis
0 likes · 20 min read
Building a Multi-Dimensional Analysis System: Practice at Baixin Bank