Tagged articles
370 articles
Page 2 of 4
DataFunSummit
DataFunSummit
Jan 9, 2024 · Big Data

Introducing Yunqi Lakehouse: An Integrated Cloud‑Native Data Platform with Incremental Computing and Auto Materialized Views

This article introduces Yunqi's self‑developed Lakehouse product, explaining its cloud‑native, one‑stop data platform architecture, incremental computing that balances freshness, performance and cost, and the autoMV feature that automatically creates materialized views to boost query speed up to nine times.

Auto Materialized ViewBig DataData Platform
0 likes · 14 min read
Introducing Yunqi Lakehouse: An Integrated Cloud‑Native Data Platform with Incremental Computing and Auto Materialized Views
DataFunSummit
DataFunSummit
Jan 4, 2024 · Big Data

YY Live Business Metric Governance Practice

This presentation details YY Live’s data product team’s end‑to‑end business metric governance practice, covering problem background, analysis, governance objectives, multi‑team collaboration, implementation steps, achieved efficiencies, and future directions leveraging large language models.

Big DataData PlatformLLM
0 likes · 16 min read
YY Live Business Metric Governance Practice
Goodme Frontend Team
Goodme Frontend Team
Jan 1, 2024 · Frontend Development

How Guming’s Front‑End Data Center Enables Real‑Time Monitoring for Web, Mini‑Programs, Flutter & Node.js

Guming’s Front‑End Data Center integrates monitoring, performance, logging, and analytics for web, mini‑programs, Flutter clients, and Node.js services, offering real‑time alerts, high availability, sampling, multi‑channel data pipelines, custom charting, and detailed CPU/GC profiling to streamline issue diagnosis and business insights.

Data Platformfrontendmonitoring
0 likes · 10 min read
How Guming’s Front‑End Data Center Enables Real‑Time Monitoring for Web, Mini‑Programs, Flutter & Node.js
Zuoyebang Tech Team
Zuoyebang Tech Team
Dec 28, 2023 · Big Data

How We Scaled Our Data Platform by Migrating to Apache DolphinScheduler

Facing growing task volumes and diverse workload types, we upgraded our data development platform's scheduling engine to Apache DolphinScheduler, detailing the migration process, architectural enhancements, stability and observability improvements, multi‑tenant support, and the resulting performance gains and future roadmap.

Apache DolphinSchedulerBig DataData Platform
0 likes · 12 min read
How We Scaled Our Data Platform by Migrating to Apache DolphinScheduler
DataFunTalk
DataFunTalk
Dec 28, 2023 · Product Management

Building an Effective Data Platform: Insights and Practices from Tencent's Senior Product Manager

Senior Tencent product manager He Zhichao shares his experience and methodology for creating a high‑quality data platform, covering the transition from technical roles to product, understanding data users’ needs, the Euler asset‑factory implementation, product‑manager best practices, and solutions to common data‑engineering challenges.

Data GovernanceData PlatformDataOps
0 likes · 16 min read
Building an Effective Data Platform: Insights and Practices from Tencent's Senior Product Manager
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 26, 2023 · Big Data

How Panasonic Overcame Data Silos: A Big Data Governance Journey

Panasonic's digital transformation case study details the challenges of fragmented data across 64 subsidiaries, the strategic adoption of a serverless big‑data platform, governance milestones from 2021 to 2023, tool comparisons, standardization efforts, talent development, and future outlook driven by five core values.

Big DataData GovernanceData Platform
0 likes · 15 min read
How Panasonic Overcame Data Silos: A Big Data Governance Journey
58 Tech
58 Tech
Dec 22, 2023 · Big Data

Design and Implementation of the JiShi BI Data Visualization Platform

This article details the architecture, core processes, and module designs of the JiShi BI platform, a self‑built data visualization and analysis system that integrates data ingestion, processing, enhanced AI‑driven analytics, and multi‑dimensional dashboard capabilities to support enterprise decision‑making.

BIData PlatformData visualization
0 likes · 12 min read
Design and Implementation of the JiShi BI Data Visualization Platform
Zhuanzhuan Tech
Zhuanzhuan Tech
Dec 14, 2023 · Big Data

Design and Implementation of a Data Service Platform for New Media Business

This article details the background, challenges, design principles, and implementation of a unified data service platform—including data modeling, multi-source governance, real-time processing, and a Doris-based storage solution—to support large‑scale video data for a new media operation.

Apache DorisData GovernanceData Platform
0 likes · 7 min read
Design and Implementation of a Data Service Platform for New Media Business
DataFunTalk
DataFunTalk
Dec 5, 2023 · Big Data

Design and Practice of Xiaomi’s One‑Stop Data Production Platform

This article presents a comprehensive overview of Xiaomi’s data production platform, detailing the full data lifecycle, the technical‑driven product design methodology, the platform’s architecture and core capabilities, as well as real‑world case studies and a Q&A session that illustrate how the system improves data collection, storage, processing, and usage across the organization.

Data LifecycleData PlatformETL
0 likes · 17 min read
Design and Practice of Xiaomi’s One‑Stop Data Production Platform
Sohu Tech Products
Sohu Tech Products
Nov 22, 2023 · Big Data

Field Extraction and Read‑Time Modeling in the Honghu Data Platform

The Honghu data platform delivers a unified, UI‑driven environment that uses read‑time modeling to dynamically structure massive heterogeneous logs during queries, replacing pre‑defined schemas and ETL pipelines with rule‑based field extraction (regex, JSON, key‑value, IP) bound to data‑source types, trading CPU cycles for flexible, accurate analysis.

Data Platformfield extractionread-time modeling
0 likes · 17 min read
Field Extraction and Read‑Time Modeling in the Honghu Data Platform
Data Thinking Notes
Data Thinking Notes
Nov 2, 2023 · Operations

How Bilibili Built a Scalable Data Quality Assurance System for Its Data Warehouse

This article details Bilibili's data quality assurance framework, covering its evolution across four data platform stages, the architecture of its quality data warehouse, core capabilities such as a complete assurance system, digital‑driven continuous optimization, and efficient incident handling, plus case studies, future plans, and a Q&A session.

Big DataBilibiliData Platform
0 likes · 27 min read
How Bilibili Built a Scalable Data Quality Assurance System for Its Data Warehouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 23, 2023 · Big Data

Bilibili Data Quality Assurance: Architecture, Goals, Core Capabilities, and Future Outlook

This article outlines Bilibili's data quality assurance framework, detailing its evolution across four development stages, the current data platform architecture, identified pain points, four key quality objectives, core capabilities such as a quality data warehouse, comprehensive monitoring, digital optimization, fault handling, and future directions.

Big DataData GovernanceData Platform
0 likes · 22 min read
Bilibili Data Quality Assurance: Architecture, Goals, Core Capabilities, and Future Outlook
DataFunSummit
DataFunSummit
Oct 15, 2023 · Big Data

Construction and Architecture of JD One-Service Data Service System

This article details JD's three‑stage evolution of its data service platform, explains thematic (topic‑based) data services, introduces the One‑Service unified architecture, and outlines future plans for standardization, low‑code front‑end, and operational improvements.

Big DataData PlatformData Service
0 likes · 13 min read
Construction and Architecture of JD One-Service Data Service System
JD Tech
JD Tech
Oct 10, 2023 · Operations

Technical Case Study of JDV Visual Dashboard Platform for the 618 Promotion

This article details how JDV, JD.com’s internal visual dashboard platform, tackled the massive data‑intensive 618 promotion by implementing real‑time updates, cross‑midnight count stops, request‑state control, heartbeat monitoring, proxy data sources, and a suite of developer tools to ensure stability, performance, and rapid feature delivery.

Data PlatformReal-Timelarge scale
0 likes · 18 min read
Technical Case Study of JDV Visual Dashboard Platform for the 618 Promotion
Architects Research Society
Architects Research Society
Sep 26, 2023 · Big Data

From a Single Data Lake to a Decentralized Data Mesh: A Step‑by‑Step Migration Guide

This article explains why traditional centralized data lakes hinder modern software development, introduces the data‑mesh concept as a decentralized alternative, and walks through an e‑commerce microservice example with concrete steps, data‑API designs, and migration tactics to transition from a monolithic lake to a distributed data mesh.

Data LakeData MeshData Platform
0 likes · 22 min read
From a Single Data Lake to a Decentralized Data Mesh: A Step‑by‑Step Migration Guide
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 22, 2023 · Big Data

Data Lake: Concepts, Architecture, and Application in iQIYI's Data Platform

iQIYI’s data‑middle‑platform team built a four‑zone data lake—raw, product, work, and sensitive—integrated with unified ODS/DWD/MID layers, a metadata catalog, and self‑service tools, leveraging HDFS, Hive/Iceberg, Spark/Trino, and Flink, migrated to Apache Iceberg for real‑time freshness, and now aims to further streamline modules and adopt new technologies.

Apache IcebergData GovernanceData Lake
0 likes · 13 min read
Data Lake: Concepts, Architecture, and Application in iQIYI's Data Platform
Data Thinking Notes
Data Thinking Notes
Sep 6, 2023 · Big Data

How to Build an Effective Tagging System for Data Platforms

This article explains what objects and tags are, distinguishes physical, network and electronic tags, outlines how to construct and manage a comprehensive tag taxonomy for user profiling, product labeling, and data platforms, and details quality assessment criteria for tags in DMP, CDP, and recommendation systems.

CDPDMPData Platform
0 likes · 13 min read
How to Build an Effective Tagging System for Data Platforms
Efficient Ops
Efficient Ops
Aug 30, 2023 · Operations

How New Oriental Built a Scalable DevOps Platform to Cut Costs and Boost Security

New Oriental’s recent DevOps transformation details how the company tackled siloed platforms, built a unified service‑tree‑driven infrastructure, created a real‑time data processing platform, and implemented comprehensive security measures—including red‑blue exercises, penetration testing, sensitive data monitoring, and CA/KMS—to boost efficiency and reduce costs.

Cost reductionData PlatformDevOps
0 likes · 7 min read
How New Oriental Built a Scalable DevOps Platform to Cut Costs and Boost Security
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 22, 2023 · Big Data

DataOps Practices and Challenges at ByteDance: From Model to Productization

The article summarizes ByteDance's DataOps journey, detailing its mid‑platform tool and Data BP model, core performance metrics, quality, hardware and human efficiency challenges, concrete DataOps implementation, productization through DataLeap, best‑practice promotion, and future outlook for data‑driven business value.

Big DataByteDanceData Governance
0 likes · 17 min read
DataOps Practices and Challenges at ByteDance: From Model to Productization
DataFunSummit
DataFunSummit
Aug 20, 2023 · Big Data

Kuaishou Data Service System: Modeling, Architecture, and Future Directions

This article presents Kuaishou's comprehensive data service system, covering its domain modeling, evolution from custom to unified services, the Octo query engine and data preparation platform architecture, the dual data API and analysis services, and future plans for intelligence and serverless high‑performance capabilities.

Big DataData PlatformData Service
0 likes · 16 min read
Kuaishou Data Service System: Modeling, Architecture, and Future Directions
Data Thinking Notes
Data Thinking Notes
Aug 13, 2023 · Big Data

How to Successfully Deliver a Data Governance Project: Step‑by‑Step Guide

This article outlines a comprehensive methodology for delivering a data governance project, covering planning, blueprint design, implementation, and acceptance phases, with detailed guidance on team formation, stakeholder roles, requirement analysis, platform architecture, management processes, and post‑deployment operations.

Big DataData GovernanceData Platform
0 likes · 12 min read
How to Successfully Deliver a Data Governance Project: Step‑by‑Step Guide
DataFunSummit
DataFunSummit
Aug 13, 2023 · Big Data

KwaiBI: Evolution of Kuaishou’s One‑Stop Business Intelligence Platform from 1.0 to 2.0

The article details Kuaishou’s KwaiBI business intelligence platform evolution, covering its 1.0 tool‑based implementation, the 2.0 standardized architecture built on an indicator middle‑platform, core processes, data integration, self‑service features, and future directions for self‑service and intelligent analytics.

BIBig DataData Integration
0 likes · 22 min read
KwaiBI: Evolution of Kuaishou’s One‑Stop Business Intelligence Platform from 1.0 to 2.0
DataFunSummit
DataFunSummit
Aug 8, 2023 · Artificial Intelligence

Xiaomi’s Experience in Deploying Intelligent Analytics: Productization, Challenges, and Future Plans

The article shares Xiaomi’s practical experience in building and productizing intelligent analytics, explaining why it is needed, how it integrates with BI, the essential prerequisites, staged implementation, technical challenges, and future roadmap including smart alerts, automated insights, and data Q&A.

AIBI IntegrationBig Data
0 likes · 15 min read
Xiaomi’s Experience in Deploying Intelligent Analytics: Productization, Challenges, and Future Plans
Didi Tech
Didi Tech
Jul 31, 2023 · Big Data

Data Serviceization at Didi: Architecture, Phases, and Standard Metric Service

Didi’s data serviceization converts raw business data into consumable services through a four‑stage pipeline—integration, development, production, and back‑flow—while the Data Dream Factory and Shu‑Chain platform automate synchronization, provide a unified access gateway for thousands of APIs, and introduce a standard metric service that abstracts storage complexities and ensures high‑performance, secure data delivery.

Data IntegrationData Platformdata serviceization
0 likes · 16 min read
Data Serviceization at Didi: Architecture, Phases, and Standard Metric Service
DataFunSummit
DataFunSummit
Jul 30, 2023 · Big Data

Data Platform Evolution and Digital Practice in the FMCG Industry: A Baicaowei Case Study

This article presents a comprehensive case study of Baicaowei's data platform evolution, digital workflow, metric governance, business modeling, and BI insights, illustrating how big‑data technologies and rational architecture simplification empower the fast‑moving consumer goods sector to enhance operational efficiency and decision‑making.

BIData PlatformDigital Transformation
0 likes · 10 min read
Data Platform Evolution and Digital Practice in the FMCG Industry: A Baicaowei Case Study
DataFunSummit
DataFunSummit
Jul 24, 2023 · Big Data

Design and Practice of OPPO Big Data Diagnostic Platform

This article presents the background, technical architecture, feature set, workflow, and practical results of OPPO's big data diagnostic platform, illustrating how intelligent, non‑intrusive task analysis improves efficiency, stability, and cost across massive offline and real‑time workloads.

Data PlatformOPPOTask Optimization
0 likes · 10 min read
Design and Practice of OPPO Big Data Diagnostic Platform
Architects Research Society
Architects Research Society
Jul 16, 2023 · Big Data

Four Innovation Phases of Netflix’s Trillion‑Scale Real‑Time Data Infrastructure

The article chronicles Netflix’s evolution from a failing batch pipeline to a cloud‑native, self‑service streaming platform, detailing four development phases, the technical challenges faced, the stream‑processing patterns introduced, key learnings, and future opportunities for real‑time data and machine‑learning workloads.

Data PlatformFlinkKafka
0 likes · 30 min read
Four Innovation Phases of Netflix’s Trillion‑Scale Real‑Time Data Infrastructure
Architects Research Society
Architects Research Society
Jun 24, 2023 · Big Data

Why Enterprise Data Architects Should Build Distributed Data Meshes Instead of Large Centralized Platforms

The article argues that traditional centralized data warehouses and lakes are increasingly unsustainable for large enterprises and proposes a paradigm shift to a distributed data mesh architecture that emphasizes domain‑owned data products, discoverability, interoperability, and global governance to overcome inherent inefficiencies.

Data MeshData Platformdistributed architecture
0 likes · 6 min read
Why Enterprise Data Architects Should Build Distributed Data Meshes Instead of Large Centralized Platforms
DeWu Technology
DeWu Technology
Jun 16, 2023 · Big Data

Traffic Replay Platform for Data Platform Testing

The team built an online traffic‑replay platform that captures real user requests, replays them in a synchronized pre‑release environment, automatically compares responses using AAdiff and field‑ignore rules, achieving 86% interface coverage, 30% fewer regression bugs, 98% replay success and halving manual testing effort, while providing a zero‑intrusion, high‑concurrency solution for ongoing smoke, regression, stress and cache validation.

Big DataData Platformtraffic replay
0 likes · 10 min read
Traffic Replay Platform for Data Platform Testing
DataFunSummit
DataFunSummit
Jun 16, 2023 · Big Data

Apache Kyuubi Practices and Service Evolution at iQIYI

This article details iQIYI's implementation of Apache Kyuubi for Spark Thrift Server, covering the evolution from native Spark Thrift to Kyuubi 0.7 and 1.x, multi‑tenant architecture, tag‑based configurations, SQL auditing, lineage collection, service monitoring, small‑file and Z‑order optimizations, and a brief Q&A.

Apache KyuubiData PlatformSpark SQL
0 likes · 15 min read
Apache Kyuubi Practices and Service Evolution at iQIYI
DataFunTalk
DataFunTalk
Jun 13, 2023 · Big Data

Building a Big Data Security Center with Apache Ranger: Practices and Technical Insights from NetEase

This article presents NetEase's practical experience of constructing a big‑data security center using Apache Ranger, covering Ranger's core features, a comprehensive security solution, detailed technical analyses, and the outcomes of commercializing the platform across multiple enterprise environments.

Apache RangerData Platformaccess control
0 likes · 30 min read
Building a Big Data Security Center with Apache Ranger: Practices and Technical Insights from NetEase
DevOps
DevOps
May 29, 2023 · Artificial Intelligence

Key Takeaways from Microsoft Build 2023: Azure OpenAI Service, Microsoft Fabric, and the Copilot Stack

The article summarizes Microsoft Build 2023 highlights, detailing Azure OpenAI Service's GA and new plugins, enterprise data integration via vector search, Microsoft Fabric's OneLake data lake, Windows Copilot features, and the Copilot Stack architecture that enables developers to build AI‑powered Copilot applications.

AIAzure OpenAICopilot
0 likes · 6 min read
Key Takeaways from Microsoft Build 2023: Azure OpenAI Service, Microsoft Fabric, and the Copilot Stack
DataFunSummit
DataFunSummit
May 27, 2023 · Big Data

Building and Practicing the Performance Assurance System of YouShu BI

This article presents an in‑depth overview of the YouShu BI product, outlines the high‑concurrency performance challenges faced by enterprise BI, and details the multi‑layer performance architecture—including front‑end, back‑end, data engine, and data source layers—along with smart caching, MPP acceleration, materialized views, and the Data Doctor operations that together ensure low‑latency, reliable analytics for large‑scale users.

BIData PlatformMPP
0 likes · 16 min read
Building and Practicing the Performance Assurance System of YouShu BI
Data Thinking Notes
Data Thinking Notes
May 17, 2023 · Big Data

Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance

This article details how Wing Pay built a comprehensive data development and governance platform, covering company background, business scenarios, goals, challenges, task development workflow, task types, SparkSQL editor features, double‑environment deployment, Airflow scheduling, DataX data bus, resource isolation, compute optimization, data quality monitoring, cloud‑native practices, future outlook, and a Q&A on data permissions and governance.

AirflowBig DataCloud Native
0 likes · 17 min read
Inside Wing Pay’s Scalable Big Data Platform: Architecture & Governance
Bilibili Tech
Bilibili Tech
May 12, 2023 · Big Data

Upgrade of Dependency Model in Bilibili Data Platform

Bilibili’s data platform upgraded its dependency model by shifting from project‑level to task‑level dependencies, adding root and end nodes, using virtual tasks for external data, introducing offset handling, implementing an abstract DependencySubject and asynchronous callbacks, achieving sub‑second latency for tens of thousands of daily tasks while planning automated lineage and richer rule support.

BilibiliData Platformdata dependency
0 likes · 10 min read
Upgrade of Dependency Model in Bilibili Data Platform
Data Thinking Notes
Data Thinking Notes
May 7, 2023 · Big Data

How Financial Institutions Can Master Data‑Driven Transformation in 2024

This article examines two decades of data warehouse evolution in the financial sector, identifies persistent pain points such as platform lag, data quality, and low service efficiency, and proposes a cloud‑native, data‑centric framework—including a unified blueprint, three‑layer architecture, and six core capabilities—to accelerate enterprise‑wide data capability building and drive high‑quality digital growth.

Big DataCloud NativeData Governance
0 likes · 18 min read
How Financial Institutions Can Master Data‑Driven Transformation in 2024
Top Architect
Top Architect
May 4, 2023 · Big Data

Data Middle Platform: General Architecture and Core Components

The article explains the concept, benefits, and detailed modular architecture of a data middle platform, covering data storage, acquisition, processing, governance, security, and operation frameworks, and illustrates how enterprises can build and evolve such platforms to turn data into valuable services.

Big DataData ArchitectureData Governance
0 likes · 19 min read
Data Middle Platform: General Architecture and Core Components
DataFunTalk
DataFunTalk
Apr 9, 2023 · Big Data

Building an Agile Business Intelligence Platform at Zhongyuan Bank: Architecture, Practices, and Future Outlook

The article details Zhongyuan Bank's end‑to‑end agile BI platform construction, covering business goals, a step‑by‑step development timeline, core architecture, eight key functionalities, low‑code data processing, real‑time streaming, visualization dashboards, intelligent Q&A, and future directions for platform intelligence and openness.

BIBig DataData Platform
0 likes · 19 min read
Building an Agile Business Intelligence Platform at Zhongyuan Bank: Architecture, Practices, and Future Outlook
ITPUB
ITPUB
Apr 7, 2023 · Big Data

How WeDataSphere Builds a One‑Stop, Open‑Source Big Data Platform

This article outlines the motivations for building a comprehensive data platform, describes the measurement and tailoring approach, details WeDataSphere’s architecture—including DataSphere Studio and Apache Linkis middleware—and shares the open‑source roadmap and future vision for the platform.

Apache LinkisData PlatformDataSphere Studio
0 likes · 11 min read
How WeDataSphere Builds a One‑Stop, Open‑Source Big Data Platform
DataFunSummit
DataFunSummit
Mar 11, 2023 · Big Data

Insights on Data Platform SaaS Transformation and Customization Strategies

The article examines the opportunities and challenges of turning data platforms into SaaS solutions, compares sales‑driven and product‑driven models, analyzes cost factors and industry gaps, and shares practical approaches such as platform‑plus‑component architecture, real‑world case studies, and product‑management considerations for better meeting B2B customization demands.

Data PlatformSaaScloud computing
0 likes · 21 min read
Insights on Data Platform SaaS Transformation and Customization Strategies
ShiZhen AI
ShiZhen AI
Mar 1, 2023 · Cloud Native

Why We Chose Kafka for Our Open‑Source Real‑Time Streaming Platform

The article explains how market trends, data‑driven enterprise needs, and internal platform experience led Didi to build Know Streaming—a zero‑intrusion, plugin‑based real‑time streaming solution built on Kafka—to address scalability, operability, and community adoption challenges.

Cloud NativeData PlatformKafka
0 likes · 12 min read
Why We Chose Kafka for Our Open‑Source Real‑Time Streaming Platform
DataFunTalk
DataFunTalk
Feb 27, 2023 · Big Data

Comprehensive Overview of Data Middle Platform Architecture and Its Core Frameworks

This article provides a detailed overview of data middle platform concepts, describing a decoupled six‑subsystem architecture—including storage, collection, processing, governance, security, and operation frameworks—while illustrating typical enterprise implementations, industry‑specific solutions, and best‑practice considerations for building scalable, secure, and value‑driven data platforms.

Big DataData GovernanceData Integration
0 likes · 25 min read
Comprehensive Overview of Data Middle Platform Architecture and Its Core Frameworks
DataFunTalk
DataFunTalk
Feb 26, 2023 · Big Data

Design, Optimization, and Use Cases of Data Lineage in ByteDance's DataLeap Platform

This article presents an in‑depth overview of DataLeap's data lineage capabilities, covering the challenges, multi‑layer model design, implementation with Apache Atlas and JanusGraph, performance optimizations, diverse use cases across asset, development, governance and security domains, and future trends for lineage technology.

Apache AtlasBig DataData Governance
0 likes · 19 min read
Design, Optimization, and Use Cases of Data Lineage in ByteDance's DataLeap Platform
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Feb 17, 2023 · Big Data

Inside Xinghuan Tech’s Next‑Gen Big Data 3.0 Architecture: Unified, Cloud‑Native, Real‑Time

This article details Xinghuan Technology’s evolution from 2013 to the present, describing its self‑developed Big Data 3.0 stack—including a unified data platform, SQL‑centric development, cloud‑native resource scheduling, distributed storage managed by Raft, DAG‑based compute engines, and real‑time stream processing—while highlighting key milestones and design principles that differentiate it from traditional Hadoop‑based solutions.

Data PlatformReal-time ProcessingSQL Optimizer
0 likes · 19 min read
Inside Xinghuan Tech’s Next‑Gen Big Data 3.0 Architecture: Unified, Cloud‑Native, Real‑Time
DataFunSummit
DataFunSummit
Feb 17, 2023 · Big Data

Data Governance Practices and Platform Construction with Alibaba DataWorks

Alibaba’s DataWorks team shares extensive experiences in building and operating a large‑scale data platform, covering data governance across stages—from data stability and quality to security, cost control, and organizational culture—illustrating how systematic practices and tools drive efficiency, reliability, and value for enterprises.

Big DataCost OptimizationData Governance
0 likes · 55 min read
Data Governance Practices and Platform Construction with Alibaba DataWorks
Data Thinking Notes
Data Thinking Notes
Feb 14, 2023 · Big Data

How Cloud Music Turned 60k Tables into Valuable Data Assets

This article details Cloud Music's year‑long data assetization journey, covering the background, practical achievements, governance methods, and future roadmap for turning massive data warehouses into high‑value, well‑governed assets that drive cost reduction and business insight.

Big DataData GovernanceData Platform
0 likes · 10 min read
How Cloud Music Turned 60k Tables into Valuable Data Assets
Kuaishou Big Data
Kuaishou Big Data
Feb 14, 2023 · Big Data

How OAX Revolutionizes Open Analysis in Kuaishou’s Data Platform

This article introduces OAX (Open Analysis eXpressions), Kuaishou’s unified open‑analysis language, detailing its design background, guiding principles, five‑layer language model, syntax—including data types, compute capabilities and five analysis elements—its access protocol, runtime architecture, optimization steps, and the benefits it brings to the company’s big‑data analytics ecosystem.

AnalyticsData PlatformOAX
0 likes · 19 min read
How OAX Revolutionizes Open Analysis in Kuaishou’s Data Platform
Data Thinking Notes
Data Thinking Notes
Feb 6, 2023 · Big Data

How Tencent Tackles Data Governance Challenges with the WeData Platform

This article outlines Tencent's data governance challenges, its internal three‑stage practice, detailed case studies such as Tencent News and PCG cost governance, and introduces the WeData platform's architecture and tools for standardization, quality, security, and metadata management, concluding with a Q&A session.

Big DataData GovernanceData Platform
0 likes · 17 min read
How Tencent Tackles Data Governance Challenges with the WeData Platform
DataFunSummit
DataFunSummit
Jan 16, 2023 · Big Data

Building an O2O Industry Data Platform: From Monitoring to Diagnosis

This article shares practical insights on constructing an O2O industry data platform, detailing user classification, business pain points, and a three‑step strategy—monitoring, analysis, and diagnosis—to extract core metrics, implement tailored reporting, conduct operational and pricing analyses, and drive data‑driven product improvements.

AnalysisBusiness IntelligenceData Platform
0 likes · 15 min read
Building an O2O Industry Data Platform: From Monitoring to Diagnosis
DataFunTalk
DataFunTalk
Jan 14, 2023 · Big Data

Lean Data Methodology: A Visual Guide to Data‑Driven Digital Transformation

Lean Data Methodology combines lean thinking, design thinking, the Cynefin framework, and agile principles to create a data‑driven digital transformation system that defines value, eliminates waste, and equips enterprises with strategic, product, governance, collaboration, platform, and cultural capabilities for building lean digital enterprises.

Data GovernanceData PlatformDigital Transformation
0 likes · 11 min read
Lean Data Methodology: A Visual Guide to Data‑Driven Digital Transformation
DataFunSummit
DataFunSummit
Jan 7, 2023 · Big Data

Redefining the Customer Data Platform (CDP) for New Energy Vehicle Companies

This article explores why the automotive industry's shift to new energy vehicles necessitates a redefinition of the Customer Data Platform (CDP), detailing the changing traffic structure, varied departmental demands, CDP typologies, implementation strategies, and the benefits of a unified, extensible CDP architecture for marketing, sales, and after‑sales.

Big DataCDPData Platform
0 likes · 13 min read
Redefining the Customer Data Platform (CDP) for New Energy Vehicle Companies
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Jan 6, 2023 · Big Data

Designing a Unified Enterprise Data Storage and Compute Platform

This article explains how enterprises can build a unified data storage and compute foundation, covering strategic goals, functional and architectural requirements, and the layered design of business support, storage‑compute, and resource management to enable scalable, secure, and high‑performance data platforms.

ComputeData Platformenterprise architecture
0 likes · 15 min read
Designing a Unified Enterprise Data Storage and Compute Platform
DataFunTalk
DataFunTalk
Jan 6, 2023 · Big Data

ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution

This article presents the architecture and practical experience of ZhongAn's hundred‑billion‑scale data integration service, covering common integration technologies, business support scenarios for offline and real‑time data, technical challenges, evolution from single‑machine to service‑oriented designs, and future directions using Flink and DataX.

Data IntegrationData PlatformDataX
0 likes · 31 min read
ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution
Data Thinking Notes
Data Thinking Notes
Jan 5, 2023 · Big Data

Why Data Lakes Are Outshining Traditional Data Warehouses: A Deep Dive

This comprehensive guide explains the evolution from traditional data warehouses to modern data lakes, detailing concepts, architectures, differences, implementation steps, and real‑world case studies, while also comparing major cloud providers' solutions and highlighting how data platforms support digital transformation and analytics.

AnalyticsBig DataData Lake
0 likes · 97 min read
Why Data Lakes Are Outshining Traditional Data Warehouses: A Deep Dive
DataFunSummit
DataFunSummit
Jan 4, 2023 · Big Data

Data Intelligence Expert Interview – Maturity, Trends, and Practices of Data Middle Platforms

The interview gathers insights from data‑platform experts on the maturity stages, technology trends, implementation methodologies, open‑source ecosystems, system architectures, governance, security, and assessment criteria of modern data middle platforms, offering a comprehensive guide for practitioners.

Big DataData GovernanceData Observability
0 likes · 28 min read
Data Intelligence Expert Interview – Maturity, Trends, and Practices of Data Middle Platforms
Data Thinking Notes
Data Thinking Notes
Jan 3, 2023 · Big Data

How a Scalable Data Service Platform Transforms Big Data into APIs

This article outlines the design and implementation of a unified data service platform that standardizes data access, accelerates model processing, provides flexible API construction, and ensures high availability through gateway, caching, and monitoring, ultimately reducing cost and improving efficiency for both C‑end and B‑end applications.

Big DataData PlatformService Architecture
0 likes · 25 min read
How a Scalable Data Service Platform Transforms Big Data into APIs
vivo Internet Technology
vivo Internet Technology
Dec 28, 2022 · Big Data

Vivo Real-Time Computing Platform: Architecture, Practices, and Applications

The Vivo Real‑Time Computing Platform, built on Apache Flink, delivers a one‑stop data construction and governance solution that processes up to 5 PB daily, offering high‑availability submission and control services, robust stability, rich SQL usability, efficient Kubernetes deployment, strong security, and supports real‑time warehouses and short‑video recommendation, while targeting future elastic scaling and lake‑house unification.

Apache FlinkData PlatformReal‑Time Computing
0 likes · 18 min read
Vivo Real-Time Computing Platform: Architecture, Practices, and Applications
High Availability Architecture
High Availability Architecture
Dec 27, 2022 · Big Data

Design and Implementation of a Data Service Middle Platform for Scalable Data SaaS

This article presents a comprehensive overview of a data service middle platform, detailing its background, architectural design, data construction, model definition and acceleration, API creation, query processing, service gateway, common solutions for standardization and cost reduction, as well as achieved results and future plans.

APIBig DataData Platform
0 likes · 22 min read
Design and Implementation of a Data Service Middle Platform for Scalable Data SaaS
DevOpsClub
DevOpsClub
Dec 19, 2022 · R&D Management

How ByteDance’s DevMind Platform Transforms R&D Efficiency Measurement

The article details ByteDance’s DevMind platform, describing its origins, the challenges of measuring software development efficiency, the collaborative and value‑driving “flywheel” concepts, the architectural design across data lifecycle and query engine layers, and the principles and future roadmap for scaling R&D performance.

Data PlatformDevMindR&D efficiency
0 likes · 29 min read
How ByteDance’s DevMind Platform Transforms R&D Efficiency Measurement
DataFunSummit
DataFunSummit
Dec 13, 2022 · Big Data

Introducing the Star River Big Data Development Platform: Architecture, Core Capabilities, and Future Plans

This article presents an in‑depth overview of 58.com’s self‑built Star River big data platform, covering its evolution across three eras, resource management hierarchy, core technical capabilities such as metadata services, data maps and lineage, governance practices, and the roadmap for further enhancements.

Big DataData GovernanceData Platform
0 likes · 14 min read
Introducing the Star River Big Data Development Platform: Architecture, Core Capabilities, and Future Plans
DataFunSummit
DataFunSummit
Dec 10, 2022 · Big Data

Applying Apache Spark in Guanyuan Self-Service Analytics System: Architecture, Challenges, and Solutions

This presentation details how Guanyuan Data leverages Apache Spark within its self‑service analytics platform, covering product features, flexible deployment, resource isolation, performance challenges, architectural solutions, and future cloud‑native enhancements to support thousands of users and massive query workloads.

Apache SparkBig DataData Platform
0 likes · 14 min read
Applying Apache Spark in Guanyuan Self-Service Analytics System: Architecture, Challenges, and Solutions
转转QA
转转QA
Dec 8, 2022 · Backend Development

Applying AOP to Reduce Coupling in a Data Construction Platform

This article explains how Aspect‑Oriented Programming (AOP) was introduced into a data construction platform to address high development effort, strong business coupling, and maintenance difficulty by isolating cross‑cutting concerns such as logging, thereby improving modularity, development speed, and code maintainability.

Aspect Oriented ProgrammingData PlatformSoftware Architecture
0 likes · 5 min read
Applying AOP to Reduce Coupling in a Data Construction Platform
DataFunSummit
DataFunSummit
Dec 7, 2022 · Big Data

Modern Data Governance at NetEase DataFan: Evolution, Challenges, and Solutions

This article details NetEase DataFan's journey in building a full‑stack big‑data platform, explains the design‑first data‑mid‑platform approach, analyzes cost, quality, and security problems encountered, and presents the modern data‑governance framework that integrates development, governance, and consumption into a closed loop.

Big DataCost ManagementData Governance
0 likes · 22 min read
Modern Data Governance at NetEase DataFan: Evolution, Challenges, and Solutions
DataFunTalk
DataFunTalk
Dec 5, 2022 · Big Data

Data Governance Practices at ZTO Express: Challenges, Solutions, and Future Plans

The article details ZTO Express's data governance journey, covering company background, drivers and goals, challenges such as data asset inventory, standardization, quality, and modeling, and outlines their multi‑layered governance framework, practical implementations in data quality, model and metadata, and future plans.

Data PlatformLogisticsmetadata
0 likes · 17 min read
Data Governance Practices at ZTO Express: Challenges, Solutions, and Future Plans
DataFunSummit
DataFunSummit
Dec 1, 2022 · Big Data

City Data Acquisition Platform: Architecture, Core Technologies, and Incremental Synchronization Strategies

This article presents an overview of a smart city unified perception platform, detailing its modular architecture, solutions for multi-source heterogeneity, incremental synchronization strategies, and real-time API data collection, while discussing extensibility and practical implementation considerations.

Big DataData PlatformIncremental Sync
0 likes · 20 min read
City Data Acquisition Platform: Architecture, Core Technologies, and Incremental Synchronization Strategies
ITPUB
ITPUB
Nov 25, 2022 · Big Data

How Berserker’s Big Data Platform Solved Scheduling, State and Scaling Challenges

This article details the architecture, evolution, and technical solutions of the Berserker big‑data platform—including component design, state‑management problems, release strategies, two‑phase commit, RPC handling, routing, message queuing, containerized execution, dependency model redesign, and future roadmap—demonstrating how the system achieved high availability, low latency, and scalable operations.

Data PlatformDockerKubernetes
0 likes · 19 min read
How Berserker’s Big Data Platform Solved Scheduling, State and Scaling Challenges
Huolala Tech
Huolala Tech
Nov 24, 2022 · Big Data

How Huolala Built Its Own Self-Service Data Analysis Platform from Scratch

This article details Huolala's journey from identifying the need for a fast, secure, and scalable BI solution to designing and implementing a self‑service data analysis platform that integrates diverse data sources, offers intuitive visualisation, and addresses real‑world operational challenges.

BIData PlatformProduct Development
0 likes · 13 min read
How Huolala Built Its Own Self-Service Data Analysis Platform from Scratch
DataFunSummit
DataFunSummit
Nov 22, 2022 · Big Data

BI Platform Practice at Xiaomi: Evolution, Architecture, and Future Directions

This article details Xiaomi's multi‑year journey in building a group‑wide Business Intelligence platform, covering its historical evolution, technical challenges in performance, modeling, visualization and permissions, the current four‑layer architecture, and future plans to make the platform more business‑centric and simpler.

AnalyticsBIBig Data
0 likes · 15 min read
BI Platform Practice at Xiaomi: Evolution, Architecture, and Future Directions
DataFunTalk
DataFunTalk
Nov 21, 2022 · Big Data

Building a Unified Data Analytics Platform at TCL Using StarRocks

The article describes how TCL leveraged StarRocks to create a unified data analytics platform, detailing the company’s background, OLAP evolution, typical StarRocks use cases such as real‑time dashboards, HR analytics, and email alerts, and outlines future plans for further integration and performance improvements.

Case StudyData PlatformOLAP
0 likes · 10 min read
Building a Unified Data Analytics Platform at TCL Using StarRocks
DataFunTalk
DataFunTalk
Nov 20, 2022 · Big Data

Evolution and Practices of Modern Data Governance at NetEase DataFun

This article outlines NetEase DataFun's journey in building a full‑stack big data platform, describing the four‑stage development of data governance—from designing a unified data middle‑platform to addressing cost, quality, and security challenges—and presents the principles of modern data governance that integrate development, consumption, and continuous improvement.

Data Platformdata security
0 likes · 23 min read
Evolution and Practices of Modern Data Governance at NetEase DataFun
Data Thinking Notes
Data Thinking Notes
Nov 10, 2022 · Big Data

Building Kuaishou’s Scalable Metadata Management Platform for Big Data

This article details Kuaishou’s evolution of its metadata management platform—from early Hive‑centric beginnings to a unified 2.0 architecture and a forward‑looking 3.0 vision—highlighting challenges, key technologies, and how metadata drives data production, consumption, governance, and cost optimization across the big‑data middle platform.

Data GovernanceData Platformmetadata lineage
0 likes · 17 min read
Building Kuaishou’s Scalable Metadata Management Platform for Big Data
政采云技术
政采云技术
Nov 8, 2022 · Industry Insights

How Small Big‑Data Frontend Teams Can Thrive: A Survival Guide

This guide outlines the essential concepts of big data, the roles of a front‑end data team, practical workflow steps, platform architecture, industry benchmarks, and actionable strategies for small teams to improve efficiency, visualization capabilities, and digital operations.

Big DataData PlatformData visualization
0 likes · 14 min read
How Small Big‑Data Frontend Teams Can Thrive: A Survival Guide
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 5, 2022 · Big Data

How Alibaba Cloud’s Integrated Big Data & AI Platform Is Evolving

The talk outlines the evolution of Alibaba Cloud’s integrated big data and AI platform, highlighting the three‑V fundamentals, the AI‑inspired usability‑scale‑efficiency triangle, open‑source trends, and how the platform unifies offline and real‑time analytics while simplifying governance and development.

AI integrationAlibaba CloudData Platform
0 likes · 9 min read
How Alibaba Cloud’s Integrated Big Data & AI Platform Is Evolving
DataFunTalk
DataFunTalk
Nov 3, 2022 · Big Data

How Meituan Food Service SaaS Built a Data Middle Platform on StarRocks

This article describes how Meituan Food Service SaaS built a high‑quality, large‑scale data middle platform using StarRocks, covering business overview, technical selection, multi‑layer architecture, virtual views, intelligent tiered querying, multi‑active hot standby, and the performance gains achieved.

Data PlatformMeituanStarRocks
0 likes · 17 min read
How Meituan Food Service SaaS Built a Data Middle Platform on StarRocks
DataFunTalk
DataFunTalk
Nov 2, 2022 · Big Data

Tencent Oula Data Governance Platform: Architecture, Practices, and Solutions

Tencent's Oula platform, launched in 2019, provides a DataOps‑driven, end‑to‑end data governance solution covering data discovery, asset factory, metric platform, and governance engine, and the talk details its construction goals, data development governance, unified metric system, data map, and Q&A on asset health and lineage.

Data PlatformDataOpsmetadata
0 likes · 17 min read
Tencent Oula Data Governance Platform: Architecture, Practices, and Solutions
DataFunSummit
DataFunSummit
Oct 12, 2022 · Big Data

Practical Application of Kyuubi in Xiaomi’s Big Data Platform

This article details how Xiaomi integrated the open‑source Kyuubi SQL gateway into its evolving big‑data platform, describing the challenges of multiple SQL services, the architectural redesign for a unified, high‑availability service, performance gains, new features such as engine pooling and Z‑ordering, and future roadmap plans.

Big DataData PlatformKyuubi
0 likes · 15 min read
Practical Application of Kyuubi in Xiaomi’s Big Data Platform

Solving Real‑World Data Quality Challenges with X‑Select’s DQC Platform

This article explains how X‑Select’s Data Quality Platform (DQC) addresses common data quality problems in large‑scale data development by defining six quality dimensions, leveraging open‑source solutions such as Apache Griffin and Qualitis, and implementing rule definition, execution, alerting, and workflow interruption within a Spark‑based architecture.

Big DataData PlatformData Quality
0 likes · 15 min read
Solving Real‑World Data Quality Challenges with X‑Select’s DQC Platform
DataFunTalk
DataFunTalk
Sep 22, 2022 · Big Data

Architecture and Practices of Zhihu DMP System Based on Doris

This article presents a comprehensive overview of Zhihu's Data Management Platform (DMP), covering its business background, three core business modes, detailed architecture, offline and real‑time data pipelines, feature storage design, performance optimization techniques, and future iteration directions.

DMPData Platformdoris
0 likes · 14 min read
Architecture and Practices of Zhihu DMP System Based on Doris
DataFunSummit
DataFunSummit
Sep 12, 2022 · Big Data

DataFun Summit 2022: Data Integration Platform – SeaTunnel V2 Architecture Evolution and DataOps Practices

The DataFun Summit 2022, held on September 17, gathered leading experts from Baiji Whale Open Source, NetEase, Tapdata, and Alibaba Cloud to share deep technical insights on SeaTunnel V2 architecture, DataOps implementations, and open‑source big‑data studio tools, offering attendees practical guidance for modern data platforms.

ApacheBig DataData Platform
0 likes · 8 min read
DataFun Summit 2022: Data Integration Platform – SeaTunnel V2 Architecture Evolution and DataOps Practices
DataFunSummit
DataFunSummit
Sep 2, 2022 · Big Data

ZhongAn Insurance Data Platform: Digital Transformation, 4633 Framework, and Real‑time Data Warehouse with StarRocks

This article details ZhongAn Insurance's digital transformation through its 4633 data‑centric framework, the architecture of its JiZhi data platform, the challenges of its original ClickHouse‑based real‑time warehouse, and how migrating to StarRocks improved performance, scalability, and operational efficiency across advertising and insurance use cases.

Big DataData PlatformDigital Transformation
0 likes · 13 min read
ZhongAn Insurance Data Platform: Digital Transformation, 4633 Framework, and Real‑time Data Warehouse with StarRocks
Aikesheng Open Source Community
Aikesheng Open Source Community
Aug 31, 2022 · Big Data

Tencent's Big Data Construction: Philosophy, Architecture Evolution, and Open‑Source Strategy

The article introduces Tencent's big‑data platform philosophy and overall architecture, detailing three generations of evolution from offline Hadoop‑based processing to real‑time Spark/Storm integration and finally AI‑driven machine‑learning platforms, while also highlighting the team, book publication, and a related giveaway event.

Big DataCloud NativeData Platform
0 likes · 12 min read
Tencent's Big Data Construction: Philosophy, Architecture Evolution, and Open‑Source Strategy
DataFunSummit
DataFunSummit
Aug 12, 2022 · Big Data

JD's Big Data Cross‑Domain and Hierarchical Storage Practices

JD’s article details its big‑data platform’s cross‑domain and hierarchical storage solutions, describing the challenges of multi‑datacenter data synchronization, the architecture of its storage layer, the implemented asynchronous and synchronous data flows, topology management, metadata tagging, and performance‑enhancing techniques for efficient, disaster‑resilient data handling.

Data PlatformHierarchical Storagecross-domain storage
0 likes · 11 min read
JD's Big Data Cross‑Domain and Hierarchical Storage Practices
DataFunSummit
DataFunSummit
Aug 11, 2022 · Big Data

Huya Data Platform: Cost Reduction and SLA Strategies

This article presents Huya's big data platform evolution, detailing cost‑saving measures, SLA practices, multi‑datacenter architecture, containerized resources, metadata‑driven intelligence, and future directions such as hybrid‑engine materialized views to improve efficiency and service reliability.

Cost OptimizationData PlatformSLA
0 likes · 15 min read
Huya Data Platform: Cost Reduction and SLA Strategies
Meituan Technology Team
Meituan Technology Team
Aug 4, 2022 · Big Data

Optimizing Kafka for Large-Scale Data Platforms at Meituan

The article details Meituan's massive Kafka deployment—over 15,000 machines handling more than 30 PB of daily data—its performance and management challenges, and the comprehensive application‑layer, system‑layer, and hybrid‑layer optimizations Meituan implemented to reduce read/write latency and improve large‑scale cluster reliability.

Cluster ManagementData PlatformFull‑Link Monitoring
0 likes · 25 min read
Optimizing Kafka for Large-Scale Data Platforms at Meituan
DataFunTalk
DataFunTalk
Aug 4, 2022 · Big Data

Kyuubi Application Practice on Xiaomi's Big Data Platform

This talk presents the end‑to‑end deployment of Kyuubi as a unified, high‑availability SQL gateway on Xiaomi’s big‑data platform, covering its integration, architecture upgrades, multi‑engine support, performance gains, operational improvements, and future roadmap.

Data PlatformKyuubiSQL Gateway
0 likes · 16 min read
Kyuubi Application Practice on Xiaomi's Big Data Platform
Ctrip Technology
Ctrip Technology
Jul 7, 2022 · Big Data

Design and Implementation of a Unified Data Service Platform for Reducing Development Cost and Enhancing Efficiency

The article describes how Ctrip built a unified data service platform that standardizes API development, leverages multiple storage engines, introduces token‑based security, Sentinel rate‑limiting, caching, and automatic contract generation to dramatically cut development cycles and improve reliability for big‑data workloads.

APIBig DataData Platform
0 likes · 10 min read
Design and Implementation of a Unified Data Service Platform for Reducing Development Cost and Enhancing Efficiency

Data Indicator Testing Platform and Quality Assurance

The article presents an Indicator Testing Platform that automates metric validation—covering timeliness, completeness, accuracy, and consistency—through model‑level comparison, regression, online monitoring, and TDD‑style testing, dramatically reducing manual effort and enabling rapid detection and correction of data quality issues across thousands of business indicators.

Automated TestingData PlatformData Quality
0 likes · 10 min read
Data Indicator Testing Platform and Quality Assurance