Tagged articles
3675 articles
Page 11 of 37
DataFunSummit
DataFunSummit
Aug 2, 2023 · Big Data

Loop Detection in Risk Control: Challenges, Distributed Graph Computing Optimizations, and ArcNeural Engine Case Studies

This article discusses the challenges of loop detection in financial risk control, presents distributed graph computing optimization techniques—including pruning, multi‑graph handling, and memory‑efficient algorithms—shows experimental results, and shares real‑world ArcNeural engine case studies and future directions.

ArcNeuralBig DataLoop Detection
0 likes · 13 min read
Loop Detection in Risk Control: Challenges, Distributed Graph Computing Optimizations, and ArcNeural Engine Case Studies
HomeTech
HomeTech
Aug 2, 2023 · Artificial Intelligence

Push Precision Recommendation System: Overview, Iteration, and Design

This article presents a comprehensive overview of the push precision recommendation system, detailing its data processing pipeline, machine‑learning‑driven algorithms, modular architecture—including offline, near‑real‑time, and push layers—and subsequent system iterations, optimizations, visual monitoring platforms, and future development directions.

ArchitectureBig Datamachine learning
0 likes · 11 min read
Push Precision Recommendation System: Overview, Iteration, and Design
FunTester
FunTester
Aug 1, 2023 · Big Data

Rethinking Big Data Testing: Defining Problem Domains and Key Test Areas

The article explores how to approach testing for big data platforms and applications by first defining problem domains, categorizing concrete user‑oriented questions, and then mapping them to focused test areas such as data extraction, real‑time updates, algorithm verification, and response timeliness.

Big DataQuality assuranceapplication
0 likes · 7 min read
Rethinking Big Data Testing: Defining Problem Domains and Key Test Areas
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 31, 2023 · Big Data

From BI to Kappa: How Data Architecture Evolved in the Big Data Era

This article traces the evolution of data architecture from early BI systems through traditional big‑data stacks, streaming, Lambda and Kappa designs, and explains how a unified stream‑batch model simplifies development while keeping logic consistent across data‑analysis and pipeline applications.

BI systemsBig DataData Architecture
0 likes · 16 min read
From BI to Kappa: How Data Architecture Evolved in the Big Data Era
DataFunSummit
DataFunSummit
Jul 28, 2023 · Big Data

User Path Analysis and SessionAnalytics: Business Practices, Technical Architecture, and Open‑Source Framework

This article introduces user path analysis and the SessionAnalytics open‑source framework, covering business scenarios, data processing techniques, algorithmic mining methods, technical architecture, implementation details, comparisons with event‑based analysis, and a comprehensive Q&A for practical deployment.

Big DataNLPdata engineering
0 likes · 19 min read
User Path Analysis and SessionAnalytics: Business Practices, Technical Architecture, and Open‑Source Framework
Top Architect
Top Architect
Jul 27, 2023 · Big Data

Performance Comparison of Elasticsearch and ClickHouse for Log Search

This article compares Elasticsearch and ClickHouse as log‑search solutions, detailing their architectures, Docker‑compose deployments, data‑ingestion pipelines with Vector, query syntax differences, and benchmark results that show ClickHouse generally outperforms Elasticsearch in speed and aggregation efficiency.

Big DataClickHouseDocker
0 likes · 13 min read
Performance Comparison of Elasticsearch and ClickHouse for Log Search
vivo Internet Technology
vivo Internet Technology
Jul 26, 2023 · Big Data

Understanding HBase Compaction: Principles, Process, Throttling Strategies, and Optimization Cases

Understanding HBase compaction involves knowing its minor and major merge types, trigger mechanisms, file‑selection policies such as RatioBased and Exploring, throttling controls based on file count, and practical tuning of key parameters to avoid latency spikes, as illustrated by real‑world production cases.

Big DataHBasecompaction
0 likes · 36 min read
Understanding HBase Compaction: Principles, Process, Throttling Strategies, and Optimization Cases
DataFunTalk
DataFunTalk
Jul 25, 2023 · Databases

Building an Integrated Metric Data Service Platform with Apache Doris: Architecture Evolution and Millisecond‑Level Query Performance

This article describes how Financial One Account, a technology service arm of Ping An, migrated from a Hadoop‑Presto‑Kylin stack to an Apache Doris‑based data platform, detailing the architectural evolution, OLAP engine selection, metric system design, performance optimizations, and future roadmap for real‑time analytics.

Apache DorisBig DataOLAP
0 likes · 15 min read
Building an Integrated Metric Data Service Platform with Apache Doris: Architecture Evolution and Millisecond‑Level Query Performance
Architect's Guide
Architect's Guide
Jul 24, 2023 · Big Data

Using Bitmap and Bloom Filter to De‑duplicate 4 Billion IDs Within 1 GB Memory

The article explains how to store and de‑duplicate 4 billion unsigned integers using a bitmap to reduce memory from 14.9 GB to under 500 MB, introduces the concept and benefits of bitmaps, describes Bloom filters, their principles, advantages, limitations, typical use cases, and provides Java and Redis implementation examples.

Big DataBitmapData Structures
0 likes · 10 min read
Using Bitmap and Bloom Filter to De‑duplicate 4 Billion IDs Within 1 GB Memory
21CTO
21CTO
Jul 17, 2023 · Big Data

How WeChat Cut Query Latency from Seconds to 100 ms with Druid Optimizations

This case study explains how the WeChat multi‑dimensional monitoring platform identified performance bottlenecks in its Druid‑based data layer, analyzed user query patterns, and applied sub‑query splitting, Redis caching, and segment size reductions to achieve over 85% cache‑hit rates and bring average query latency down to around 100 ms.

Big DataDruidcaching
0 likes · 13 min read
How WeChat Cut Query Latency from Seconds to 100 ms with Druid Optimizations
ITPUB
ITPUB
Jul 16, 2023 · Big Data

How WeChat Reduced Query Latency from 1000ms to 100ms in Its Multi‑Dimensional Monitoring Platform

This article explains how the WeChat multi‑dimensional monitoring platform, which processes billions of data points daily, identified performance bottlenecks in its Druid‑based data layer and applied sub‑query splitting, Redis caching, and sub‑dimension tables to achieve over 85% cache hit rate and bring average query time down to around 100 ms.

Big DataDruidPerformance Optimization
0 likes · 13 min read
How WeChat Reduced Query Latency from 1000ms to 100ms in Its Multi‑Dimensional Monitoring Platform
Top Architect
Top Architect
Jul 14, 2023 · Big Data

Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases

This article introduces the Lambda Architecture for billion‑scale real‑time data analysis, explains its three layers—Batch, Speed, and Serving—covers its flexibility, fault tolerance, and scalability, and demonstrates concrete applications such as Twitter hashtag analysis and a smart‑parking recommendation system.

Batch LayerBig DataLambda architecture
0 likes · 11 min read
Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases
DataFunSummit
DataFunSummit
Jul 11, 2023 · Big Data

Tencent's Autonomous Big Data Platform: Data‑Driven Governance and AI‑Powered Optimization

Tencent’s big data platform introduces a data‑plus‑algorithm driven autonomous solution that automates self‑diagnosis, self‑optimization, and self‑management for trillion‑scale analytics, addressing challenges of massive task governance, resource efficiency, and stability through observable data foundations, pluggable decision engines, and generalized AI decision intelligence.

AI decisionAutonomous PlatformBig Data
0 likes · 17 min read
Tencent's Autonomous Big Data Platform: Data‑Driven Governance and AI‑Powered Optimization
DataFunTalk
DataFunTalk
Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data
0 likes · 18 min read
Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg
Java Architecture Diary
Java Architecture Diary
Jul 11, 2023 · Big Data

Redpanda vs Apache Kafka with KRaft: Why Redpanda Is Up to 10× Faster

This article presents a detailed benchmark comparing Redpanda 23.1 and Apache Kafka 3.4.0 (with and without KRaft) across multiple AWS instance types, showing how Redpanda consistently delivers higher throughput and dramatically lower end‑to‑end latency, often outperforming Kafka by 4‑20× even with extra hardware.

Apache KafkaBig DataKRaft
0 likes · 12 min read
Redpanda vs Apache Kafka with KRaft: Why Redpanda Is Up to 10× Faster
Architect
Architect
Jul 10, 2023 · Big Data

Understanding Lambda Architecture for Real‑Time Billion‑Scale Data Analysis

This article explains the Lambda Architecture—a three‑layer big‑data processing model combining batch and speed layers to deliver accurate, low‑latency analytics, and illustrates its use with Twitter hashtag tracking and a smart‑parking recommendation system.

Batch ProcessingBig DataLambda architecture
0 likes · 10 min read
Understanding Lambda Architecture for Real‑Time Billion‑Scale Data Analysis
DataFunSummit
DataFunSummit
Jul 9, 2023 · Big Data

Data Governance and Application for Behavior Analysis: Modeling Methods, Architecture, and Practical Cases

This article explains how a data‑ecosystem team governs and applies behavior‑analysis data by describing common analysis scenarios, data‑warehouse modeling methods and their pros and cons, the concepts and overall architecture of behavior‑centric analytics, key system components, and several concrete analysis examples such as retention, funnel and path analysis.

Big DataColumnar StorageUser Segmentation
0 likes · 12 min read
Data Governance and Application for Behavior Analysis: Modeling Methods, Architecture, and Practical Cases
DataFunTalk
DataFunTalk
Jul 9, 2023 · Operations

Building High‑Performance Observability Data Pipelines with Vector and Honghu

This article explains the concepts and importance of observability, introduces the Vector data‑pipeline tool and its architecture, demonstrates how to configure sources, transforms and sinks, and shows how to integrate Vector with the Honghu platform to build a complete, real‑time monitoring solution for modern distributed systems.

Big DataHonghuObservability
0 likes · 33 min read
Building High‑Performance Observability Data Pipelines with Vector and Honghu
AntTech
AntTech
Jul 6, 2023 · Industry Insights

Unlocking AI Value: Data Quality, Privacy, and Blockchain in the Smart Era

The article examines how high‑quality data, robust privacy protection, and blockchain‑enabled trust infrastructure are essential for unlocking the value of AI models, citing market forecasts, examples from smart‑car and fintech firms, and the growing Chinese big‑data market through 2026.

AIBig DataBlockchain
0 likes · 9 min read
Unlocking AI Value: Data Quality, Privacy, and Blockchain in the Smart Era
Huolala Tech
Huolala Tech
Jul 6, 2023 · Big Data

How to Optimize DAG Task Scheduling to Cut 30 Minutes from Critical Path

This article explains how to analyze and automatically optimize complex DAG‑based data platform task chains, identify bottlenecks, adjust upstream task timings, and reduce critical‑path execution time by up to 30 minutes while preventing resource contention and peak overloads.

Big DataDAGResource Optimization
0 likes · 15 min read
How to Optimize DAG Task Scheduling to Cut 30 Minutes from Critical Path
Python Programming Learning Circle
Python Programming Learning Circle
Jul 6, 2023 · Big Data

Analyzing Google Ngram Data with Python and PyTubes

This article demonstrates how to download the Google Ngram 1‑gram dataset, load the roughly 1.4 billion rows with Python and the PyTubes library, use NumPy to compute yearly word‑frequency percentages, filter and plot the trends for the word “Python” and compare it with other programming languages.

Big DataGoogle NgramPyTubes
0 likes · 8 min read
Analyzing Google Ngram Data with Python and PyTubes
DataFunSummit
DataFunSummit
Jul 6, 2023 · Big Data

Design and Practice of Alibaba Cloud's Billion‑Scale Real‑Time Log Analysis

This article presents Alibaba Cloud's SLS billion‑scale real‑time log analysis architecture, covering business background, core challenges such as low‑latency queries, massive data scale, high concurrency, and multi‑tenant isolation, and detailing key design solutions like LSM‑based storage, index‑columnar storage, data locality, layered caching, and future directions.

Big Datadistributed storagehigh concurrency
0 likes · 17 min read
Design and Practice of Alibaba Cloud's Billion‑Scale Real‑Time Log Analysis
Data Thinking Notes
Data Thinking Notes
Jul 5, 2023 · Big Data

Top 10 Big Data Trends Shaping China’s Data Industry in 2023

At the 2023 Big Data Industry Development Conference in Beijing, the China Communications Standards Association unveiled the top ten big‑data keywords, highlighting trends such as lake‑warehouse integration, data assetization, DataOps, intelligent analytics, data ethics, security, public data licensing, and cross‑border data flows.

Big DataData EthicsData Governance
0 likes · 16 min read
Top 10 Big Data Trends Shaping China’s Data Industry in 2023
dbaplus Community
dbaplus Community
Jul 5, 2023 · Databases

Mid‑Year 2023 Database Industry Roundup: Major Releases and Trends

The 2023 first‑half newsletter compiles a comprehensive overview of the database sector, highlighting the surge of domestic vendors, key technological breakthroughs such as HTAP and serverless, and detailed version updates across RDBMS, NewSQL, graph, time‑series, big‑data, and cloud databases, offering valuable insights for practitioners and decision‑makers.

Big DataIndustry ReportNewSQL
0 likes · 48 min read
Mid‑Year 2023 Database Industry Roundup: Major Releases and Trends
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 4, 2023 · Big Data

Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics

This article presents a step‑by‑step guide on how the logistics provider Haicheng Bangda implemented a streaming data warehouse using Paimon, Flink CDC, and Kubernetes, covering business background, architecture choices, environment setup, SQL examples, troubleshooting tips, and future roadmap for their digital transformation.

Big DataCDCFlink
0 likes · 27 min read
Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics
Data Thinking Notes
Data Thinking Notes
Jul 2, 2023 · Big Data

Mastering Data Governance: A Comprehensive Framework for Enterprise Success

This article outlines a complete data governance framework, detailing the five managerial domains—control, process, governance, technology, and value—along with strategies for data strategy, organizational structure, policies, processes, standards, quality, security, and platform tools, and highlights AI’s pivotal role in enhancing governance efficiency.

Big DataData GovernanceData Quality
0 likes · 10 min read
Mastering Data Governance: A Comprehensive Framework for Enterprise Success
DataFunSummit
DataFunSummit
Jul 2, 2023 · Big Data

Building a One‑Stop AB Testing Platform at NetEase Cloud Music: Architecture, Metric Infrastructure, Scientific Evaluation, and Efficiency

The article describes how NetEase Cloud Music designed and deployed a comprehensive AB testing platform, covering system infrastructure, metric modeling, scientific experiment validation (including SRM mitigation and statistical power), and operational efficiency improvements to support rapid product iteration across multiple devices.

AB testingBig DataData Infrastructure
0 likes · 13 min read
Building a One‑Stop AB Testing Platform at NetEase Cloud Music: Architecture, Metric Infrastructure, Scientific Evaluation, and Efficiency
DataFunTalk
DataFunTalk
Jul 2, 2023 · Big Data

Bilibili Data Service Middle Platform: Architecture, Practices, and Future Roadmap

This article presents Bilibili's data service middle platform, detailing its background, one‑stop data service architecture, core processes, model and API construction, query mechanisms, full‑link control, cost‑reduction, high‑availability strategies, achieved results, and future roadmap.

ArchitectureBig DataData Governance
0 likes · 18 min read
Bilibili Data Service Middle Platform: Architecture, Practices, and Future Roadmap
21CTO
21CTO
Jun 30, 2023 · Information Security

How WeChat’s Security Data Warehouse Powers Billions of Daily Feature Reads

This article explains the origins, evolution, and current architecture of WeChat’s security data warehouse, detailing its unified feature storage, data quality guarantees, multi‑IDC synchronization, and operational system that streamlines feature management, analysis, and deployment to support the platform’s massive security strategy.

Big DataFeature ManagementOperations
0 likes · 15 min read
How WeChat’s Security Data Warehouse Powers Billions of Daily Feature Reads
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 30, 2023 · Big Data

Advertising Data Lake Architecture and Real-time Optimizations

By replacing the costly Lambda architecture with a unified data‑lake built on Iceberg and Flink CDC, the advertising team achieved minute‑level latency, strong consistency, and lower storage expenses, cutting end‑to‑end processing times from hours to a few minutes across budgeting, warehousing, OLAP and ETL workloads.

AdvertisingBig DataFlink
0 likes · 13 min read
Advertising Data Lake Architecture and Real-time Optimizations
StarRocks
StarRocks
Jun 29, 2023 · Big Data

How StarRocks Boosted Mango TV’s Data Platform Performance by Over 10×

Mango TV replaced its fragmented EMR‑Hive‑Kudu‑Presto stack with a unified StarRocks lakehouse, simplifying architecture, cutting operational costs, and achieving more than a ten‑fold increase in query speed while supporting real‑time analytics, materialized views, bitmap indexing, and store‑compute separation.

Big DataBitmap IndexMaterialized Views
0 likes · 14 min read
How StarRocks Boosted Mango TV’s Data Platform Performance by Over 10×
DataFunTalk
DataFunTalk
Jun 29, 2023 · Big Data

Practical Deployment of Delta Lake in BI and AI Products

This article summarizes a technical presentation on how Delta Lake is integrated into a BI+AI platform, covering the product background, data‑lake architecture, Delta Lake features such as ACID transactions, schema management, multi‑engine support, performance optimizations, and future development directions.

AIBIBig Data
0 likes · 12 min read
Practical Deployment of Delta Lake in BI and AI Products
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 27, 2023 · Big Data

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

This article details Alibaba Cloud MaxCompute’s lakehouse evolution, describing its unified storage‑metadata‑compute design, the Transactional Table 2.0 format, near‑real‑time incremental ingestion, clustering and compaction services, transaction handling, TimeTravel and incremental queries, and future roadmap for big‑data workloads.

Big DataIncremental ProcessingLakehouse
0 likes · 23 min read
How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing
DataFunTalk
DataFunTalk
Jun 26, 2023 · Big Data

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

This presentation details Iceberg's core capabilities—transactional writes, schema evolution, implicit partitioning, and row‑level updates—while showcasing Xiaomi's real‑world applications such as log ingestion redesign, near‑real‑time warehousing, offline optimizations, column‑level encryption, Hive migration strategies, and outlining upcoming enhancements like materialized views and cloud migration.

Big DataColumn EncryptionData Lake
0 likes · 20 min read
Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans
dbaplus Community
dbaplus Community
Jun 25, 2023 · Big Data

WeChat’s 10× Query Speedup: From 1000ms to 100ms with Druid & Redis

WeChat’s multi‑dimensional monitoring platform faced severe query latency and I/O bottlenecks, so the team analyzed user behavior and Druid architecture, then introduced sub‑query splitting, Redis caching, and segment size reductions, achieving over 85% cache hit rate and reducing average query time to around 100 ms.

Big DataCacheDruid
0 likes · 12 min read
WeChat’s 10× Query Speedup: From 1000ms to 100ms with Druid & Redis
DataFunTalk
DataFunTalk
Jun 25, 2023 · Big Data

Multi‑Cloud Cache Evolution at Zhihu: From Multi‑HDFS to UnionStore to Alluxio

This technical presentation details Zhihu's journey in multi‑cloud caching, covering the motivations for a multi‑cloud architecture, the design and limitations of the self‑built UnionStore component, and the adoption of Alluxio to achieve significant performance, stability, and cost improvements across model serving and training workloads.

AlluxioBig Datacaching
0 likes · 24 min read
Multi‑Cloud Cache Evolution at Zhihu: From Multi‑HDFS to UnionStore to Alluxio
Data Thinking Notes
Data Thinking Notes
Jun 24, 2023 · Fundamentals

Why a Robust Data Metric System Is the Lifeblood of Modern Businesses

This article explains the concepts, construction, and value of data metric systems and tag systems, describing how they help product managers turn raw data into actionable indicators, support decision‑making, guide operations, drive user growth, and ensure a unified statistical standard across the enterprise.

Big DataBusiness IntelligenceData Product Management
0 likes · 16 min read
Why a Robust Data Metric System Is the Lifeblood of Modern Businesses
DataFunTalk
DataFunTalk
Jun 24, 2023 · Big Data

Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing

This article explains the evolution of Alibaba Cloud's MaxCompute platform into a lakehouse architecture that supports near‑real‑time incremental processing, detailing its development history, core design of transactional tables, five‑module technical stack, data ingestion methods, optimization services, transaction management, query capabilities, ecosystem integration, practical applications, future roadmap, and common user questions.

Big DataData LakeIncremental Processing
0 likes · 24 min read
Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing
DataFunSummit
DataFunSummit
Jun 22, 2023 · Big Data

Building a Data Middle Platform Indicator System for the Automotive Industry

This article explains how a comprehensive indicator system within a data middle platform can address the automotive industry's data challenges, outlines the evolution of data platforms, details a step‑by‑step methodology for indicator design, development, and management, and presents real‑world case studies.

Big DataData Middle PlatformDigital Marketing
0 likes · 12 min read
Building a Data Middle Platform Indicator System for the Automotive Industry
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 21, 2023 · Big Data

Design and Optimization of Bilibili's Real-Time Data Quality Monitoring Platform

This article details the background, architecture, challenges, and iterative improvements of Bilibili's real-time data quality monitoring platform, covering offline and streaming DQC, resource-efficient Flink designs, InfluxDB proxy integration, CQ table handling, operational safeguards, and future engineering plans.

Big DataData QualityFlink
0 likes · 22 min read
Design and Optimization of Bilibili's Real-Time Data Quality Monitoring Platform
Code Ape Tech Column
Code Ape Tech Column
Jun 21, 2023 · Big Data

From Java Streams to Spark: Basic Big Data Operations Explained

This article demonstrates how developers familiar with Java Stream APIs can quickly grasp fundamental Spark operations—including map, flatMap, groupBy, and reduce—by translating stream examples into Spark code, providing complete code snippets, explanations of transformations versus actions, and practical tips for handling exceptions in distributed processing.

Big DataJava StreamMAP
0 likes · 24 min read
From Java Streams to Spark: Basic Big Data Operations Explained
MaGe Linux Operations
MaGe Linux Operations
Jun 20, 2023 · Big Data

What Is Kafka? A Beginner’s Guide to Distributed Streaming and Messaging

Kafka is an open‑source, distributed streaming platform that uses a publish/subscribe message queue architecture to provide high‑throughput, fault‑tolerant real‑time data processing, featuring topics, partitions, replicas, consumer groups, and multiple APIs for producers, consumers, streams, connectors, and administration.

Big DataDistributed StreamingKafka
0 likes · 20 min read
What Is Kafka? A Beginner’s Guide to Distributed Streaming and Messaging
Wukong Talks Architecture
Wukong Talks Architecture
Jun 20, 2023 · Databases

Evolution of JD Baitiao’s Data Architecture: From MySQL to Apache ShardingSphere

This article chronicles JD Baitiao’s journey from early MySQL and NoSQL solutions through DBRep to the adoption of Apache ShardingSphere, highlighting the technical motivations, decoupling strategies, performance comparisons, and the broader Database Plus vision for scalable, stable financial‑grade data architectures.

ArchitectureBig DataJD Baitiao
0 likes · 14 min read
Evolution of JD Baitiao’s Data Architecture: From MySQL to Apache ShardingSphere
Architects' Tech Alliance
Architects' Tech Alliance
Jun 19, 2023 · Fundamentals

Understanding Complex Systems and Software Architecture: Definitions, Types, Principles, and Design Considerations

This article explains what complex systems and software architecture are, outlines various architectural categories, discusses essential functional and non‑functional requirements, and presents design principles and typical solutions such as domain‑driven design, microservices, cloud‑native, DevOps, and big‑data architectures for building stable, scalable, and maintainable systems.

Big DataComplex SystemsDomain-Driven Design
0 likes · 13 min read
Understanding Complex Systems and Software Architecture: Definitions, Types, Principles, and Design Considerations
DaTaobao Tech
DaTaobao Tech
Jun 19, 2023 · Product Management

User Experience Analysis of Taobao Detail Page Using User Journey and VOC Data

The article, the second in a ten‑part Taobao APP UX series, explains how module‑level user‑journey metrics and Voice‑of‑Customer chat data are collected, labeled with a BIO‑CRF taxonomy, clustered via DBSCAN, and correlated to identify size and quality concerns on the men’s‑clothing detail page, prompting module redesigns, A/B tests, and resulting in higher conversion rates and reduced dwell time.

A/B testingBig DataUser experience
0 likes · 11 min read
User Experience Analysis of Taobao Detail Page Using User Journey and VOC Data
Data Thinking Notes
Data Thinking Notes
Jun 18, 2023 · Big Data

Data Lake vs Data Warehouse: Uncover the Real Differences

This article explores the evolving concept of data lakes, compares them with traditional data warehouses across storage, modeling, tooling, and user roles, and examines the emerging lake‑warehouse integration, highlighting why both remain essential in modern big‑data architectures.

Big DataData ArchitectureData Lake
0 likes · 12 min read
Data Lake vs Data Warehouse: Uncover the Real Differences
DeWu Technology
DeWu Technology
Jun 16, 2023 · Big Data

Traffic Replay Platform for Data Platform Testing

The team built an online traffic‑replay platform that captures real user requests, replays them in a synchronized pre‑release environment, automatically compares responses using AAdiff and field‑ignore rules, achieving 86% interface coverage, 30% fewer regression bugs, 98% replay success and halving manual testing effort, while providing a zero‑intrusion, high‑concurrency solution for ongoing smoke, regression, stress and cache validation.

Big DataData Platformtraffic replay
0 likes · 10 min read
Traffic Replay Platform for Data Platform Testing
JD Tech
JD Tech
Jun 16, 2023 · Big Data

Comprehensive Introduction to Apache Kafka: Architecture, Features, and Best Practices

This article provides a detailed overview of Apache Kafka, covering its distributed streaming architecture, storage mechanisms, replication, consumer groups, compression techniques, exactly‑once semantics, configuration tips, and performance optimizations for building reliable high‑throughput data pipelines.

Big DataDistributed StreamingExactly-Once
0 likes · 19 min read
Comprehensive Introduction to Apache Kafka: Architecture, Features, and Best Practices
Data Thinking Notes
Data Thinking Notes
Jun 14, 2023 · Big Data

Why Data Warehouse Standards Matter and How to Implement Them Effectively

This article explains why data‑warehouse standards are essential for improving team efficiency, product quality, and maintenance costs, and provides a step‑by‑step guide covering standard creation, discussion, rollout, supervision, continuous improvement, as well as detailed design, process, quality, and security specifications.

Big DataStandardsdata modeling
0 likes · 18 min read
Why Data Warehouse Standards Matter and How to Implement Them Effectively
DataFunTalk
DataFunTalk
Jun 14, 2023 · Big Data

Active Data Governance with Operator-Level Lineage: Practices and Exploration

This article presents Big Data company's active data governance practice using operator-level lineage, detailing the shortcomings of traditional lineage, the implementation of indicator chain governance, and the exploration of proactive model governance to achieve smarter, more precise data management.

Big DataData GovernanceOperator-Level Lineage
0 likes · 14 min read
Active Data Governance with Operator-Level Lineage: Practices and Exploration
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 13, 2023 · Big Data

Iceberg Data Lake Implementation and Optimization at iQIYI

This article details iQIYI's adoption of Iceberg for its data lake, covering the OLAP architecture, reasons for a data lake, Iceberg's table format advantages over Hive, platform construction, streaming ingestion, query and performance optimizations, real‑world business deployments, and future plans.

Big DataData LakeFlink
0 likes · 21 min read
Iceberg Data Lake Implementation and Optimization at iQIYI
DataFunSummit
DataFunSummit
Jun 9, 2023 · Artificial Intelligence

Construction and Application of a Power Industry Knowledge Graph

This article describes how a power‑industry knowledge graph is built using AI, big‑data and cloud techniques, outlines its multi‑dimensional structure, and demonstrates various application scenarios such as personal achievement aggregation, professional learning, job training, generic knowledge services, and decision support for power production.

Big Dataknowledge graphknowledge management
0 likes · 10 min read
Construction and Application of a Power Industry Knowledge Graph
Alibaba Cloud Native
Alibaba Cloud Native
Jun 9, 2023 · Cloud Native

Accelerate AI & Big Data on Kubernetes with Elastic File Client & Fluid

This article explains how the Elastic File Client (EFC) and Fluid together provide a cloud‑native, high‑performance storage solution for AI and big‑data workloads on Kubernetes, detailing architecture challenges, core features, performance benchmarks, and a step‑by‑step deployment guide.

AIBig DataCloud Native
0 likes · 16 min read
Accelerate AI & Big Data on Kubernetes with Elastic File Client & Fluid
Huolala Tech
Huolala Tech
Jun 8, 2023 · Big Data

How Huolala Built a Robust Big Data Security Framework: Lessons and Practices

This article details Huolala's practical experience in constructing a comprehensive big data security system, covering data lifecycle protection, classification standards, capability development, and governance, while balancing regulatory compliance and business growth.

Big DataData Governancecloud infrastructure
0 likes · 10 min read
How Huolala Built a Robust Big Data Security Framework: Lessons and Practices
DevOps
DevOps
Jun 7, 2023 · Big Data

Deploying Apache Spark on YARN vs Kubernetes: Architecture, Benefits, and Comparison

This article explains how Apache Spark can be deployed using the traditional Hadoop YARN resource manager and the newer Kubernetes approach, detailing configuration steps, submission methods, and a comprehensive comparison of isolation, scalability, learning curve, logging, performance, and cost considerations.

Big DataKubernetesSpark
0 likes · 10 min read
Deploying Apache Spark on YARN vs Kubernetes: Architecture, Benefits, and Comparison
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 7, 2023 · Big Data

How Alibaba Cloud’s Flink Advisor Transforms Real‑Time Log Diagnosis

Alibaba Cloud's Flink Intelligent Diagnosis (Advisor) combines real‑time data‑warehouse, log‑clustering, and decision‑tree algorithms to automatically analyze error logs, diagnose job anomalies, and provide optimization suggestions, dramatically reducing manual support tickets and improving user experience across Flink managed services.

AIBig DataFlink
0 likes · 12 min read
How Alibaba Cloud’s Flink Advisor Transforms Real‑Time Log Diagnosis
FunTester
FunTester
Jun 7, 2023 · Big Data

Optimizing Query Performance in WeChat's Multi‑Dimensional Monitoring Platform with Druid and Redis

The article details how WeChat's multi‑dimensional metric monitoring platform, which handles billions of data points per minute, reduced average query latency from over 1000 ms to around 140 ms and achieved over 85% cache hit rate by analyzing query behavior, redesigning the data layer architecture, splitting queries into sub‑queries, adding Redis caching, and introducing sub‑dimension tables.

Big DataCacheDruid
0 likes · 13 min read
Optimizing Query Performance in WeChat's Multi‑Dimensional Monitoring Platform with Druid and Redis
DataFunSummit
DataFunSummit
Jun 5, 2023 · Big Data

Building an Intelligent Data Analysis Platform Based on a Unified Semantic Layer

This article presents a comprehensive overview of Xiaomi's intelligent data analysis platform built on a unified semantic layer, covering business scenarios, system architecture, core modules such as data assets and semantic modeling, and the platform's product capabilities like visual analytics, alerts, and embedded dashboards.

Big DataIntelligent Analyticsrow-level security
0 likes · 14 min read
Building an Intelligent Data Analysis Platform Based on a Unified Semantic Layer
DataFunSummit
DataFunSummit
Jun 4, 2023 · Fundamentals

The Role of Metadata in Data Governance and Its Applications

Metadata serves as a foundational element of data governance, enabling analysis, monitoring, discovery, and understanding of data assets, while applications such as data lineage, impact analysis, and data mapping help organizations assess quality, trace origins, and optimize processing workflows.

Big DataInformation Managementmetadata
0 likes · 5 min read
The Role of Metadata in Data Governance and Its Applications
Architects Research Society
Architects Research Society
Jun 3, 2023 · Big Data

Understanding Azure Synapse Analytics: Architecture, Features, and Workloads

Azure Synapse Analytics is a cloud‑native, unlimited analytics service that combines data warehousing, big‑data processing, and AI integration, offering unified SQL and Spark engines, extensive language support, workload management, and tight integration with Power BI, Azure Data Lake, and Azure Databricks for rapid, scalable data insights.

AzureBig DataSynapse
0 likes · 11 min read
Understanding Azure Synapse Analytics: Architecture, Features, and Workloads
DataFunSummit
DataFunSummit
Jun 2, 2023 · Artificial Intelligence

Knowledge Graph–Based Root Cause Analysis for Intelligent Manufacturing

This article explains how knowledge‑graph technology combined with artificial‑intelligence methods can enhance intelligent manufacturing by improving quality and reliability through advanced root‑cause analysis, detailing development trends, analytical techniques, challenges, practical frameworks, and real‑world case studies.

Big DataRoot Cause Analysisintelligent manufacturing
0 likes · 17 min read
Knowledge Graph–Based Root Cause Analysis for Intelligent Manufacturing
WeiLi Technology Team
WeiLi Technology Team
Jun 2, 2023 · Big Data

Flink RocksDB State Backend: Practical Tuning Guide for Large Jobs

This article explains how to optimize Flink’s RocksDB state backend for large‑scale streaming jobs, covering state types, enabling latency tracking, incremental checkpoints, predefined options, and advanced memory and thread settings, with practical configuration examples and performance comparisons.

Big DataFlinkPerformance Tuning
0 likes · 16 min read
Flink RocksDB State Backend: Practical Tuning Guide for Large Jobs
360 Tech Engineering
360 Tech Engineering
Jun 2, 2023 · Big Data

Overcoming Challenges in User Profiling: A Big Data‑Driven Framework for Precise Marketing

The article outlines how a unified, big‑data‑based user profiling platform addresses traditional data silos, high costs, and limited functionality by standardizing tags, integrating Spark and RHadoop processing, and enabling a closed‑loop marketing workflow that improves accuracy and operational efficiency.

Big DataData IntegrationMarketing Automation
0 likes · 7 min read
Overcoming Challenges in User Profiling: A Big Data‑Driven Framework for Precise Marketing
DataFunTalk
DataFunTalk
Jun 2, 2023 · Big Data

Iceberg Data Lake Implementation and Optimization at iQIYI

This article details iQIYI's adoption of the Iceberg data lake, covering its OLAP architecture, reasons for a lake, Iceberg table format advantages over Hive, platform construction, extensive performance optimizations, and real‑world business use cases such as ad‑flow unification, log analysis, audit, and CDC pipelines.

Big DataData LakeFlink
0 likes · 18 min read
Iceberg Data Lake Implementation and Optimization at iQIYI
WeChat Backend Team
WeChat Backend Team
Jun 1, 2023 · Big Data

How WeChat Boosted Flink Stability with TaskManager Recovery and Load Balancing

This article details WeChat’s Gemini‑2.0 real‑time streaming platform built on Flink, explaining two key stability enhancements: a TaskManager‑level partial failure recovery that avoids data loss during node crashes, and a load‑balancing scheduler that evenly distributes tasks across TaskManagers to improve resource utilization and reduce latency.

Big DataFlinkKubernetes
0 likes · 16 min read
How WeChat Boosted Flink Stability with TaskManager Recovery and Load Balancing
DataFunTalk
DataFunTalk
May 30, 2023 · Big Data

Optimizing Chart Query Performance in YouShu BI: Data Query Principles, Intelligent Caching, Query Merging, and Diagnostics

This article explains the data query fundamentals of YouShu BI charts, introduces intelligent caching design, describes query merging and various optimization techniques—including partition filters, value acceleration, and SQL generation—and outlines performance diagnosis methods to improve BI chart responsiveness.

BIBig DataChart Performance
0 likes · 16 min read
Optimizing Chart Query Performance in YouShu BI: Data Query Principles, Intelligent Caching, Query Merging, and Diagnostics
Architects Research Society
Architects Research Society
May 28, 2023 · Big Data

Understanding Azure Synapse Analytics: An Integrated Data Lake and Data Warehouse Platform

This article examines Microsoft Azure Synapse Analytics, explaining how its unified framework combines data lake and data warehouse capabilities through components such as Pipelines, Dedicated SQL pools, Spark pools, and Serverless SQL, and evaluates its advantages over separate tools like Snowflake and Databricks.

Azure SynapseBig DataCloud Analytics
0 likes · 7 min read
Understanding Azure Synapse Analytics: An Integrated Data Lake and Data Warehouse Platform
Architects Research Society
Architects Research Society
May 28, 2023 · Big Data

Databricks vs Snowflake: Comparing Data Lake and Data Warehouse Cloud Solutions

This article compares the cloud‑based analytics platforms Databricks and Snowflake, examining how Databricks serves as a data‑lake processing tool with emerging warehouse features while Snowflake operates as a scalable data‑warehouse that incorporates lake‑style capabilities, and discusses their complementary use cases.

Big DataCloud AnalyticsDatabricks
0 likes · 7 min read
Databricks vs Snowflake: Comparing Data Lake and Data Warehouse Cloud Solutions
StarRocks
StarRocks
May 26, 2023 · Big Data

How SeaTunnel’s StarRocks Connector Enables High‑Performance Data Sync

This article explains SeaTunnel’s architecture and its StarRocks connector, detailing source and sink features such as field projection, predicate push‑down, parallel reading, state recovery, data type mapping, Stream Load writes, CDC support, configuration examples, and future roadmap for exactly‑once semantics.

Big DataConnectorData Integration
0 likes · 16 min read
How SeaTunnel’s StarRocks Connector Enables High‑Performance Data Sync
DataFunTalk
DataFunTalk
May 23, 2023 · Big Data

Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization

This article details Bilibili's technical practice of constructing a millisecond‑response lake‑warehouse platform using Apache Iceberg, covering the background challenges, unified architecture, multi‑dimensional sorting and indexing for query acceleration, the Magnus service for intelligent optimization, and the current production deployment and performance metrics.

Big DataCubeIceberg
0 likes · 14 min read
Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization
DataFunTalk
DataFunTalk
May 22, 2023 · Big Data

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

This article explains Alibaba Cloud's data lake architecture, unified metadata services, storage management optimizations, and format handling techniques, illustrating how lakehouse concepts, multi‑engine support, and lifecycle policies enable efficient, secure, and cost‑effective big data processing in the cloud.

Big DataCloud ServicesData Lake
0 likes · 22 min read
Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices
Data Thinking Notes
Data Thinking Notes
May 21, 2023 · Information Security

Why Government Data Sharing Stalls and How a “Three‑Rights” Model Can Unlock It

The article analyzes why government data sharing often fails—citing legal, technical, security, and organizational hurdles—then outlines one‑to‑one and centralized sharing models, highlights four critical success factors, and proposes a “three‑rights” framework supported by blockchain to create trustworthy, sustainable inter‑departmental data exchange.

Big DataBlockchainData Governance
0 likes · 11 min read
Why Government Data Sharing Stalls and How a “Three‑Rights” Model Can Unlock It
Data Thinking Notes
Data Thinking Notes
May 17, 2023 · Big Data

Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance

This article details how Wing Pay built a comprehensive data development and governance platform, covering company background, business scenarios, goals, challenges, task development workflow, task types, SparkSQL editor features, double‑environment deployment, Airflow scheduling, DataX data bus, resource isolation, compute optimization, data quality monitoring, cloud‑native practices, future outlook, and a Q&A on data permissions and governance.

AirflowBig DataCloud Native
0 likes · 17 min read
Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance
DataFunTalk
DataFunTalk
May 17, 2023 · Databases

Evolution of 360 Commercial Real-Time Data Warehouse and Apache Doris Deployment

This article details the three‑stage evolution of 360's real‑time data warehouse—from Storm + Druid + MySQL to Flink + Druid + TiDB and finally to Flink + Apache Doris—explaining architectural pain points, the reasons for choosing Doris, and how the new system delivers sub‑second query latency, strong consistency, and simplified operations across advertising scenarios.

Apache DorisBig DataData Consistency
0 likes · 17 min read
Evolution of 360 Commercial Real-Time Data Warehouse and Apache Doris Deployment
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
May 17, 2023 · Databases

StarRocks Production Practice at Tongcheng Travel: Architecture, Use Cases, and Technical Evaluation

This article details Tongcheng Travel’s production deployment of the StarRocks OLAP database, covering background, business scenarios, technical evaluation against ClickHouse and Greenplum, implementation with Flink SQL, real‑time analytics, offline reporting, CDP use cases, performance optimizations, and future cloud‑native plans.

Big DataFlinkOLAP
0 likes · 12 min read
StarRocks Production Practice at Tongcheng Travel: Architecture, Use Cases, and Technical Evaluation
WeChat Backend Team
WeChat Backend Team
May 17, 2023 · Big Data

Boosting Real-Time Recommendations: Apache Pulsar Optimizations at WeChat

This article details how WeChat's Gemini‑2.0 big‑data platform leverages Apache Pulsar, outlining cloud‑native advantages, load‑balancing refinements, cache and SSD tuning, high‑availability safeguards, and cost‑saving strategies that together enable large‑scale, real‑time, deep‑learning recommendation workloads.

Apache PulsarBig DataCloud Native
0 likes · 17 min read
Boosting Real-Time Recommendations: Apache Pulsar Optimizations at WeChat