Tagged articles
3672 articles
Page 2 of 37
Instant Consumer Technology Team
Instant Consumer Technology Team
Oct 28, 2025 · Artificial Intelligence

Can Data Virtualization Deliver Millisecond Real‑Time Features Across Stores?

This article shares a three‑year journey of building a data‑virtualization‑based, multi‑environment feature management framework for real‑time risk decision platforms, detailing challenges like heterogeneous storage, cold‑start, and operational stability, and presenting a unified architecture that decouples physical storage from business logic.

Big DataReal-time analyticsdata virtualization
0 likes · 16 min read
Can Data Virtualization Deliver Millisecond Real‑Time Features Across Stores?
DataFunSummit
DataFunSummit
Oct 28, 2025 · Fundamentals

Why Unstructured Data Management Is the Next Frontier for Enterprises

This article explores the evolution, current state, and challenges of enterprise unstructured data management, reviews case studies from traditional firms, Huawei and Ant Group, proposes an ECM‑based reference framework, compares it with structured data governance, and outlines future integration strategies with AI and unified data platforms.

AIBig DataData Governance
0 likes · 28 min read
Why Unstructured Data Management Is the Next Frontier for Enterprises
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 28, 2025 · Big Data

How Huolala Scaled Elasticsearch to 40B Records with Serverless Cloud Architecture

Huolala, a leading smart logistics platform serving over 14 markets and millions of users, detailed its massive Elasticsearch deployment—over 1.5 万 CPU cores, 40 billion records, 4 PB data—highlighting multi‑AZ design, serverless migration, and a comprehensive management platform that boosted performance, reduced costs, and enabled AI‑driven services.

AI searchBig DataElasticsearch
0 likes · 10 min read
How Huolala Scaled Elasticsearch to 40B Records with Serverless Cloud Architecture
StarRocks
StarRocks
Oct 28, 2025 · Databases

How Cisco Migrated from Pinot to StarRocks and Boosted Query Performance by Up to 70%

This article details Cisco Webex's migration from a complex Pinot‑Trino OLAP stack to StarRocks, covering the challenges of the legacy system, the step‑by‑step migration process—including storage, compute, and SQL dialect transformation—and the resulting performance gains, cost reductions, and operational improvements.

Big DataOLAPPinot
0 likes · 23 min read
How Cisco Migrated from Pinot to StarRocks and Boosted Query Performance by Up to 70%
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 24, 2025 · Big Data

How Leapmotor Scaled to 1M Cars with a Real‑Time Flink Data Platform

Leapmotor’s rapid growth to one million production cars drove a shift from daily batch data to minute‑level real‑time analytics, prompting the adoption of Flink as the core engine of a multi‑layered big‑data platform that handles massive IoT signals, supports fault diagnosis, and integrates batch and streaming workloads on the cloud.

Big DataData PlatformFlink
0 likes · 13 min read
How Leapmotor Scaled to 1M Cars with a Real‑Time Flink Data Platform
Big Data Tech Team
Big Data Tech Team
Oct 23, 2025 · Industry Insights

How to Build a Reusable, Well‑Designed Data Warehouse Model

This article analyzes why analysts and data engineers clash over non‑reusable data models, presents metrics such as cross‑layer reference rate and model reuse coefficient, and outlines a step‑by‑step framework—including ODS takeover, subject‑domain mapping, dimension consistency, fact‑table integration, development best practices, and tool support—to transform siloed warehouses into a shared data‑platform.

Big DataData GovernanceData Platform
0 likes · 15 min read
How to Build a Reusable, Well‑Designed Data Warehouse Model
DataFunSummit
DataFunSummit
Oct 22, 2025 · Big Data

How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage

This article introduces Douyin Group’s comprehensive data asset management platform, explains why it emphasizes data assets over raw metadata, outlines its full‑linkage lineage capabilities, and presents practical insights on building, applying, and future‑proofing big data lineage within complex enterprise environments.

Big DataData Asset ManagementData Lineage
0 likes · 5 min read
How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage
Architect Chen
Architect Chen
Oct 22, 2025 · Big Data

How to Eliminate Kafka Message Backlog with Practical Optimizations

This guide presents concrete techniques for improving Kafka consumer and producer performance, scaling clusters, tuning broker settings, and designing asynchronous buffering layers to prevent message accumulation and boost overall throughput.

Big DataKafkaPerformance Optimization
0 likes · 5 min read
How to Eliminate Kafka Message Backlog with Practical Optimizations
Raymond Ops
Raymond Ops
Oct 21, 2025 · Big Data

Deep Dive into Kafka Architecture: Topics, Partitions, and Reliable Data Pipelines

This article explains Kafka’s core concepts—including topics, partitions, log segmentation, indexing, and acknowledgment mechanisms—then provides a step‑by‑step guide to deploy a Zookeeper‑Kafka cluster integrated with Filebeat, Logstash, and the ELK stack for reliable log collection and analysis.

Big DataELKFilebeat
0 likes · 11 min read
Deep Dive into Kafka Architecture: Topics, Partitions, and Reliable Data Pipelines
Selected Java Interview Questions
Selected Java Interview Questions
Oct 21, 2025 · Big Data

How to Sync Massive MySQL Datasets Efficiently with DataX

This guide walks through the challenges of synchronizing tens of millions of records between heterogeneous MySQL databases, explains why traditional mysqldump or file‑based methods fail, and provides a step‑by‑step tutorial on installing, configuring, and using Alibaba's open‑source DataX tool for both full and incremental data synchronization.

Big DataDataXETL
0 likes · 15 min read
How to Sync Massive MySQL Datasets Efficiently with DataX
DataFunSummit
DataFunSummit
Oct 19, 2025 · Big Data

How Apache Gravitino and OpenLineage Transform Data Governance in the AI Era

This article explains how the rapid rise of AI and large‑model technologies is driving a paradigm shift in data governance toward intelligent, automated, and real‑time collaboration, outlines the challenges of multi‑cloud environments, and demonstrates how Apache Gravitino and OpenLineage provide a unified metadata and lineage solution that improves data quality, compliance, and business agility.

Apache GravitinoBig DataData Lineage
0 likes · 12 min read
How Apache Gravitino and OpenLineage Transform Data Governance in the AI Era
DataFunTalk
DataFunTalk
Oct 19, 2025 · Big Data

How Zhihu’s Big Data Strategy Cuts Costs and Boosts Efficiency

This article outlines Zhihu’s big‑data cost‑reduction journey, covering its background, the FinOps‑driven financial management system, technical strategies for lowering expenses, and a forward‑looking summary of challenges and sustainable efficiency gains within the organization and industry context.

Big DataData PlatformFinOps
0 likes · 4 min read
How Zhihu’s Big Data Strategy Cuts Costs and Boosts Efficiency
DataFunSummit
DataFunSummit
Oct 18, 2025 · Big Data

How Zhihu’s Big Data FinOps Cuts Costs and Boosts Efficiency

This article outlines Zhihu’s practical use of big‑data FinOps, describing its hybrid‑cloud architecture, the challenges of multi‑vendor cost management, and how a systematic billing system launched in 2022 drives sustainable cost reduction across the organization.

Big DataCost reductionData Platform
0 likes · 4 min read
How Zhihu’s Big Data FinOps Cuts Costs and Boosts Efficiency
Huolala Tech
Huolala Tech
Oct 17, 2025 · Big Data

How HuoLala Accelerated User Profiling 30× Faster with Apache Doris

This article details how HuoLala built a high‑performance user profiling platform on Apache Doris, redesigning data models, leveraging bitmap storage, and applying query‑level optimizations to achieve up to 30‑fold speed gains, lower memory usage, and scalable real‑time analytics.

Apache DorisBig DataBitmap
0 likes · 17 min read
How HuoLala Accelerated User Profiling 30× Faster with Apache Doris
StarRocks
StarRocks
Oct 14, 2025 · Big Data

How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks

Ctrip's User Behavior Tracking (UBT) system, handling 30 TB of daily data, moved from ClickHouse to StarRocks' compute‑storage separated architecture, cutting average query latency from 1.4 seconds to 203 ms, halving storage, reducing nodes from 50 to 40, and boosting write throughput to 3 million rows per second.

Big DataClickHouseData Migration
0 likes · 15 min read
How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks
DataFunSummit
DataFunSummit
Oct 14, 2025 · Big Data

How Douyin’s Data Asset Platform Redefines Big Data Lineage

This article introduces Douyin Group’s one‑stop Data Asset Management Platform, explains why the company focuses on data assets rather than raw metadata, and details the evolution, architecture, applications, and future outlook of its comprehensive big‑data lineage system.

Big DataData Asset ManagementData Governance
0 likes · 5 min read
How Douyin’s Data Asset Platform Redefines Big Data Lineage
Baidu Geek Talk
Baidu Geek Talk
Oct 13, 2025 · Big Data

How Baidu Scaled Its Data Warehouse to Handle Billions of PVs and Petabytes

This article details Baidu APP's massive data‑warehouse overhaul, describing the two‑step strategy that stabilized log cleaning, modernized the ETL framework, introduced wide‑table architectures, and implemented tiered storage to dramatically improve processing speed, reliability, and cost efficiency for petabyte‑scale workloads.

Big DataData WarehouseETL
0 likes · 25 min read
How Baidu Scaled Its Data Warehouse to Handle Billions of PVs and Petabytes
DataFunSummit
DataFunSummit
Oct 11, 2025 · Big Data

What Small Banks Can Learn from Cutting-Edge Data Governance Practices

This article shares a data‑governance roadmap for small and medium banks, covering industry pain points, high‑quality data sets, a three‑step governance path, data standards, metadata management, master‑data strategy, business data modeling, a hybrid Greenplum‑Hadoop platform, quality monitoring, and a maturity assessment framework.

BankingBig DataData Architecture
0 likes · 21 min read
What Small Banks Can Learn from Cutting-Edge Data Governance Practices
DataFunTalk
DataFunTalk
Oct 8, 2025 · Big Data

How ByteHouse Cuts Data Warehouse Costs: Tackling Explicit and Implicit Challenges

As data volumes explode, enterprises struggle with the high hardware, performance, operational, and migration costs of traditional OLAP warehouses, but ByteHouse’s cloud‑native architecture offers a cost‑effective, high‑performance solution that dramatically reduces both explicit and hidden expenses.

Big DataByteHouseCost reduction
0 likes · 6 min read
How ByteHouse Cuts Data Warehouse Costs: Tackling Explicit and Implicit Challenges
DataFunTalk
DataFunTalk
Oct 6, 2025 · Big Data

What Ant Group Learned: 5 Pillars of Effective Data Governance

Ant Group shares its practical experience in big data governance, outlining five key focus areas—architecture, security, compliance, quality, and value—through four structured sections and detailed discussions on data quality and storage governance, while also exploring future challenges and the economics of data.

Ant GroupBig DataData Architecture
0 likes · 4 min read
What Ant Group Learned: 5 Pillars of Effective Data Governance
DataFunSummit
DataFunSummit
Oct 4, 2025 · Operations

How Zhihu Leverages FinOps and Mixed‑Cloud Architecture to Slash Costs

This article explains how Zhihu’s big‑data platform applies FinOps principles and a mixed‑cloud strategy to overcome multi‑vendor complexity, organizational challenges, and sustainability issues, ultimately achieving continuous cost reduction and efficiency gains.

Big DataCost reductionFinOps
0 likes · 4 min read
How Zhihu Leverages FinOps and Mixed‑Cloud Architecture to Slash Costs
ITPUB
ITPUB
Oct 3, 2025 · Big Data

How Qunar Travel Cut 2000 CPU Cores by Optimizing Kafka Production

This case study details how Qunar Travel's engineering team analyzed Kafka production bottlenecks during peak traffic, added targeted monitoring, tuned thread and batch parameters, and validated the changes through gray‑scale tests, ultimately saving about 2000 CPU cores across three clusters while reducing request volume and improving network and disk utilization.

Big DataCPU SavingsKafka
0 likes · 14 min read
How Qunar Travel Cut 2000 CPU Cores by Optimizing Kafka Production
Amap Tech
Amap Tech
Sep 29, 2025 · Artificial Intelligence

How Gaode’s AI‑Powered Route Planner Saves Money and Time During Holiday Travel

Gaode Map introduces three AI‑driven routing features—toll‑free exit recommendation, global faster‑route optimization, and high‑frequency rapid‑route detection—that combine massive traffic data, multi‑objective algorithms, and real‑time prediction to help users save both toll costs and travel time during the National Day travel peak.

AIBig Datamulti-objective
0 likes · 8 min read
How Gaode’s AI‑Powered Route Planner Saves Money and Time During Holiday Travel
DataFunSummit
DataFunSummit
Sep 28, 2025 · Big Data

How ByteHouse Cuts Data Warehouse Costs: Tackling Hidden and Visible Expenses

This article examines the exploding data volumes that pressure modern enterprises, outlines the explicit (hardware, performance) and implicit (operations, migration) costs of operating an OLAP‑based data warehouse, and explains how ByteHouse’s cloud‑native architecture reduces both cost categories while delivering real‑time analytics.

Big DataByteHouseData Warehouse
0 likes · 5 min read
How ByteHouse Cuts Data Warehouse Costs: Tackling Hidden and Visible Expenses
DataFunTalk
DataFunTalk
Sep 27, 2025 · Artificial Intelligence

How Bilibili Uses LLMs to Diagnose Big Data Platform Issues

This article explains how Bilibili leverages a large‑language‑model‑driven assistant to diagnose and resolve failures and slowdowns in its massive big‑data platform, detailing the platform’s five‑layer architecture, common task issues, and the need for intelligent troubleshooting tools.

AI AssistantBig DataBilibili
0 likes · 5 min read
How Bilibili Uses LLMs to Diagnose Big Data Platform Issues
Huolala Tech
Huolala Tech
Sep 26, 2025 · Big Data

How We Migrated 40 PB of Hive Data Across Clouds with Zero Downtime

This article details the end‑to‑end design, challenges, and implementation of a cross‑cloud migration of over 200 k Hive tables and nearly 40 PB of data using the self‑developed Kirk service, covering architecture, verification steps, and lessons learned to achieve 100 % data consistency without impacting production services.

Big DataData ConsistencyData Migration
0 likes · 20 min read
How We Migrated 40 PB of Hive Data Across Clouds with Zero Downtime
DataFunTalk
DataFunTalk
Sep 25, 2025 · Big Data

How Tencent Cloud’s AI‑Ready Data Platform Redefines Big Data for AI

This article outlines the challenges of high‑quality data for AI, introduces Tencent Cloud’s AI‑Ready data platform with three core capabilities—DIaaS, Setats, and ES‑based knowledge search—covers the end‑to‑end WeData integration, intelligent agents for automation, and showcases ecosystem partnerships driving industry‑wide intelligent transformation.

AIBig DataData Platform
0 likes · 14 min read
How Tencent Cloud’s AI‑Ready Data Platform Redefines Big Data for AI
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Sep 22, 2025 · Big Data

Why AI‑Native Big Data Platforms Are About to Explode

The article examines how large‑model limitations in accuracy, explainability, and stability have stalled decision‑support use, prompting industry leaders to champion AI‑Ready data infrastructures, Data 4.0 concepts, and AI‑generated service code as the next wave of AI‑native big data platforms.

AIAI-nativeBig Data
0 likes · 6 min read
Why AI‑Native Big Data Platforms Are About to Explode
NiuNiu MaTe
NiuNiu MaTe
Sep 22, 2025 · Big Data

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

Learn four practical techniques—simple sorting, hashmap deduplication, external merge sort, and bitmap bit‑set optimization—to efficiently remove duplicate QQ numbers from a 40‑billion‑record file while staying within a strict 1 GB memory limit, even handling tighter 100 MB constraints.

Big DataBitmapalgorithm
0 likes · 9 min read
How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM
DataFunTalk
DataFunTalk
Sep 22, 2025 · Big Data

How Kuaishou Scales Intelligent BI: Insights from Its Data Platform

This article outlines Kuaishou's Data Platform team's mission to boost data‑driven decision making through advanced compute engines, high‑performance services, and AI‑enhanced BI, detailing its architecture, challenges, solutions, and future outlook for large‑scale intelligent analytics.

AIAnalyticsBI
0 likes · 6 min read
How Kuaishou Scales Intelligent BI: Insights from Its Data Platform
DataFunSummit
DataFunSummit
Sep 21, 2025 · Big Data

Breaking the CPU Wall: BIGO’s Gluten Engine Accelerates Spark and Flink

When big‑data workloads hit the CPU wall, BIGO’s adoption of the open‑source Gluten project delivers native‑engine execution for Spark and a roadmap for Flink, achieving up to 30% end‑to‑end speedup, 50% memory savings, and a scalable, cost‑effective data processing platform.

Big DataFlinkGluten
0 likes · 16 min read
Breaking the CPU Wall: BIGO’s Gluten Engine Accelerates Spark and Flink
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 19, 2025 · Big Data

How MMS Powered a 50 PB BigQuery‑to‑MaxCompute Migration for GoTerra

This article details GoTerra's massive six‑month, 50 PB migration from GCP BigQuery to Alibaba Cloud MaxCompute, covering project scope, technical challenges such as complex data types, partition strategies, and high‑speed requirements, and explaining how the MaxCompute Migration Service (MMS) solved them with innovative architecture, scheduling, and data‑reorder techniques.

Big DataBigQueryData Migration
0 likes · 14 min read
How MMS Powered a 50 PB BigQuery‑to‑MaxCompute Migration for GoTerra
DataFunTalk
DataFunTalk
Sep 19, 2025 · Big Data

How Kuaishou’s Data Platform Powers Intelligent BI with AI and Big Data

This article outlines how Kuaishou’s Data Platform Department enhances decision‑making efficiency by building advanced compute engines and high‑performance services, detailing the platform’s architecture, challenges of intelligent BI, AI‑driven solutions, and the end‑to‑end BI workflow from data ingestion to analysis.

AnalyticsBIBig Data
0 likes · 5 min read
How Kuaishou’s Data Platform Powers Intelligent BI with AI and Big Data
Big Data Tech Team
Big Data Tech Team
Sep 17, 2025 · Big Data

How to Build a Scalable Tag System for Recommendation Engines

This article explains why a robust tag system is essential for recommendation and mining strategies, outlines the hierarchy of entity, concept, and theme tags, and provides practical principles, architecture, and step‑by‑step methods for constructing and managing tags in large‑scale data platforms.

Big DataData Architecturedata labeling
0 likes · 14 min read
How to Build a Scalable Tag System for Recommendation Engines
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 15, 2025 · Big Data

How a FinTech Firm Boosted Real‑Time Decision Making with StarRocks Data Warehouse

This case study details how Shuhe Technology, a leading fintech company, overcame data redundancy, low resource utilization, and slow reporting by adopting Alibaba Cloud EMR Serverless StarRocks for a unified, real‑time data warehouse, achieving standardized data pipelines, cost savings, and minute‑level decision latency.

Big DataFinTechStarRocks
0 likes · 8 min read
How a FinTech Firm Boosted Real‑Time Decision Making with StarRocks Data Warehouse
Data Party THU
Data Party THU
Sep 14, 2025 · Big Data

How to Evaluate Battery Storage Health with Big Data and LightGBM

This report details a university big‑data project that builds a data‑driven framework for assessing lithium‑ion battery storage health, cleaning operational data, detecting abnormal cells with DBSCAN, and predicting SOC/SOH using LightGBM, while highlighting findings, limitations, and future improvements.

Big DataDBSCANLightGBM
0 likes · 4 min read
How to Evaluate Battery Storage Health with Big Data and LightGBM
Data Party THU
Data Party THU
Sep 13, 2025 · Artificial Intelligence

How a Multi‑Agent Large Model Transforms Ecological Big‑Data Analysis

This report details a university project that built a flexible, high‑performance multi‑agent large‑model framework for ecological environment big‑data analysis, covering system architecture, individual agents, memory mechanisms, report generation, a FastAPI‑LangGraph backend, a React frontend, testing methodology, and future directions.

AIBig DataFastAPI
0 likes · 7 min read
How a Multi‑Agent Large Model Transforms Ecological Big‑Data Analysis
Data Party THU
Data Party THU
Sep 12, 2025 · Big Data

Key Lessons from Winning the 2025 China University Big Data Competition

The author shares a detailed account of their experience in the 2025 China University Big Data Competition, describing the team’s top national ranking, the shift from absolute stock price prediction to robust ranking learning, extensive feature engineering, and reflections on balancing technical ambition with real‑world constraints.

Big DataStock Predictiondata competition
0 likes · 5 min read
Key Lessons from Winning the 2025 China University Big Data Competition
Data Party THU
Data Party THU
Sep 11, 2025 · Big Data

How We Conquered the 2025 Chinese University Big Data Challenge: Financial Time‑Series Lessons

Our team "Stay Overnight" from Chongqing University of Posts and Telecommunications placed second nationally in the 2025 China University Computer Competition Big Data Challenge, navigating volatile financial data, shifting from time‑series to supervised learning, and emphasizing feature engineering to boost model performance.

Big DataModel Selectioncompetition report
0 likes · 4 min read
How We Conquered the 2025 Chinese University Big Data Challenge: Financial Time‑Series Lessons
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 11, 2025 · Big Data

How Paimon Transforms Membership Data Warehousing: From Legacy Lambda to Real‑Time Lakehouse

This article examines the challenges of a legacy Lambda‑based membership data warehouse, introduces Apache Paimon’s lakehouse architecture and its key features, and showcases three real‑world implementations—partial‑update order wide tables, Bitmap‑based UV counting, and branch‑based data correction—while discussing benefits, remaining challenges, and future directions.

Big DataData LakeData Warehouse
0 likes · 29 min read
How Paimon Transforms Membership Data Warehousing: From Legacy Lambda to Real‑Time Lakehouse
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 10, 2025 · Big Data

Unlock Seamless BigQuery to MaxCompute Migration with dbt‑maxcompute

This article details the real‑world migration of Southeast Asian tech leader GoTerra from BigQuery to MaxCompute, showcasing how the open‑source dbt‑maxcompute adapter enables smooth ELT transitions, advanced incremental strategies, performance gains, ecosystem compatibility, and comprehensive best‑practice implementations for large‑scale data pipelines.

Big DataData MigrationELT
0 likes · 13 min read
Unlock Seamless BigQuery to MaxCompute Migration with dbt‑maxcompute
Architect Chen
Architect Chen
Sep 10, 2025 · Big Data

How Kafka Achieves Million‑Message Throughput: Sequential Writes, Page Cache, Batching & Zero‑Copy

The article explains how Kafka attains high‑throughput performance by using sequential disk writes, leveraging the OS page cache, employing producer and consumer batching with configurable parameters, and utilizing zero‑copy sendfile to minimize CPU and memory overhead, enabling stable million‑message per second rates.

BatchingBig DataHigh Throughput
0 likes · 5 min read
How Kafka Achieves Million‑Message Throughput: Sequential Writes, Page Cache, Batching & Zero‑Copy
Data Party THU
Data Party THU
Sep 8, 2025 · Big Data

What We Learned from the 2025 China University Big Data Competition

The article shares a top‑5 team's experience in the 2025 China University Big Data Challenge, detailing their roster, competition rules, four key technical insights on data pitfalls, model alignment, generalization, and leveraging SOTA models, plus reflections on the event's excellent support and collaborative atmosphere.

Big Datafeature engineeringmodel generalization
0 likes · 6 min read
What We Learned from the 2025 China University Big Data Competition
Data Party THU
Data Party THU
Sep 6, 2025 · Big Data

From Data Chaos to Predictive Insight: My Solo Journey in the 2025 Big Data Competition

An individual participant recounts their journey in the 2025 China University Computer Competition Big Data Challenge, detailing data cleaning, feature engineering, model building on 300‑stock historical prices, and insights gained from solo competition experience, highlighting challenges, lessons, and future directions in financial AI.

Big Datacompetitiondata engineering
0 likes · 4 min read
From Data Chaos to Predictive Insight: My Solo Journey in the 2025 Big Data Competition
DataFunSummit
DataFunSummit
Sep 2, 2025 · Big Data

How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture

Xiaomi’s engineers explain how they tackled data‑lake challenges—small files, metadata latency, and multi‑cloud costs—by combining compact storage, Gravitino‑based metadata governance, Iceberg and Paimon formats, and JuiceFS abstraction, achieving lower storage expenses, faster queries, and a roadmap toward intelligent, real‑time, multimodal lakehouses.

Big DataData LakeStorage Optimization
0 likes · 14 min read
How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture
StarRocks
StarRocks
Sep 2, 2025 · Big Data

How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Flash Sale

Faced with billions of marketing events and minute‑level decision requirements during Taobao's flash‑sale campaign, the e‑commerce data team built a real‑time lakehouse using StarRocks and Paimon, leveraged asynchronous materialized views and RoaringBitmap deduplication, and achieved sub‑second query latency, massive cost savings, and stable high‑concurrency performance.

Big DataLakehouseMaterialized Views
0 likes · 26 min read
How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Flash Sale
Baidu Geek Talk
Baidu Geek Talk
Sep 1, 2025 · Big Data

How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink

This article explains how Baidu Netdisk transitioned from Spark Streaming to a Flink‑based Tiangong real‑time computing engine, detailing the evolution, reasons for choosing Flink, architecture, configuration examples, business use cases, technical challenges, and future platform plans.

Baidu NetdiskBig DataFlink
0 likes · 16 min read
How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink
Instant Consumer Technology Team
Instant Consumer Technology Team
Sep 1, 2025 · Frontend Development

What’s Hot in Frontend, AI, and Cloud This Week? Top Insights and Tools

This weekly tech roundup highlights Meituan’s dynamic container performance breakthrough, Huawei’s Mate X5 foldable adaptation, ByteDance’s Rspack 1.5 features, AI‑driven automation advances, MQTT and Crush terminal tools, Alibaba Cloud’s AI platform milestones, and practical guides for performance optimization and Chrome extension development.

AIBig DataDevTools
0 likes · 8 min read
What’s Hot in Frontend, AI, and Cloud This Week? Top Insights and Tools
DataFunTalk
DataFunTalk
Sep 1, 2025 · Big Data

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

JD Retail outlines the growing data management challenges it faces—including asset discovery, architecture agility, development quality, and rising IT costs—and presents a comprehensive data governance framework that leverages standards, agile architecture, development isolation, and resource optimization to improve efficiency and reduce operational expenses.

Big DataData GovernanceData Management
0 likes · 7 min read
How JD Retail Tackles Data Governance Challenges to Boost Efficiency
Architects' Tech Alliance
Architects' Tech Alliance
Aug 31, 2025 · Artificial Intelligence

Why the Last Decade Became the Golden Age of AI Chip Architecture

The article traces the evolution of AI hardware over the past ten years, outlining three key phases—from early chip limitations that sidelined neural networks, through CPU advances that still fell short, to the rise of GPUs and specialized AI chips that finally unlocked rapid AI deployment, while also highlighting the parallel impact of algorithmic breakthroughs and massive data growth.

AI hardwareBig DataGPU
0 likes · 5 min read
Why the Last Decade Became the Golden Age of AI Chip Architecture
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 29, 2025 · Big Data

How MaxCompute Streaming Insert Revolutionized Real‑Time Data Migration from BigQuery

This article details how a leading Southeast Asian tech group migrated its real‑time write workloads from Google BigQuery to MaxCompute using MaxCompute Streaming Insert, covering architecture, core features, migration challenges, optimization strategies, business impact, and future enhancements.

Big DataBigQuery MigrationMaxCompute
0 likes · 9 min read
How MaxCompute Streaming Insert Revolutionized Real‑Time Data Migration from BigQuery
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Aug 29, 2025 · Fundamentals

Understanding Distributed Storage: HDFS, CephFS, GlusterFS, and FastDFS Compared

This article compares four major distributed storage solutions—HDFS, CephFS, GlusterFS, and FastDFS—detailing their architectures, strengths, weaknesses, and ideal use cases for big‑data processing, cloud-native environments, and high‑concurrency file services, and how they fit into modern infrastructure strategies.

Big DataCephFSFastDFS
0 likes · 5 min read
Understanding Distributed Storage: HDFS, CephFS, GlusterFS, and FastDFS Compared
Kuaishou Tech
Kuaishou Tech
Aug 28, 2025 · Big Data

Auron Joins Apache Incubator: High‑Performance Vectorized Engine Accelerates Big Data Workloads

The Auron project, originally the Blaze engine from Kuaishou, has entered the Apache Software Foundation incubator, offering a Rust‑based native vectorized execution engine that integrates with Spark, delivers over two‑fold performance gains on TPC‑DS benchmarks, and is supported by a growing open‑source community.

Apache IncubatorAuronBig Data
0 likes · 6 min read
Auron Joins Apache Incubator: High‑Performance Vectorized Engine Accelerates Big Data Workloads
DataFunTalk
DataFunTalk
Aug 28, 2025 · Big Data

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

JD Retail faces growing data volume, redundant models, and resource‑intensive storage, prompting a comprehensive data‑governance strategy that defines standards, streamlines architecture, isolates development, and optimizes compute and storage costs, ultimately enabling more efficient, secure, and agile data operations across the enterprise.

Big DataData ArchitectureData Governance
0 likes · 8 min read
How JD Retail Tackles Data Governance Challenges to Boost Efficiency
DataFunTalk
DataFunTalk
Aug 27, 2025 · Big Data

How JD Retail Overcomes Data Governance Challenges to Boost Efficiency

JD Retail confronts growing data volume, redundant models, shared account risks, and rising storage costs, and responds with a comprehensive data governance framework that standardizes data, streamlines architecture, isolates development, and optimizes resources to achieve efficient, secure, and cost‑effective data operations.

Big DataData ArchitectureData Governance
0 likes · 8 min read
How JD Retail Overcomes Data Governance Challenges to Boost Efficiency
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 26, 2025 · Big Data

How MaxCompute Evolves for Python & AI: From SDK to Native Distributed Engine

This article outlines MaxCompute's decade‑long evolution—from the early PyODPS SDK to the native Distributed Python Engine—highlights the challenges big‑data platforms face in the AI era, and showcases Data+AI solutions and real‑world case studies across multimodal processing, massive text deduplication, and autonomous‑driving data pipelines.

AI FunctionsBig DataData+AI
0 likes · 15 min read
How MaxCompute Evolves for Python & AI: From SDK to Native Distributed Engine
Big Data Tech Team
Big Data Tech Team
Aug 25, 2025 · Interview Experience

Essential Big Data Interview Questions for Data Warehouse Engineer Roles

A comprehensive list of interview topics covering self‑introduction, career moves, data‑warehouse design, team building, architecture comparisons, fact‑table classification, common dimensions, performance tuning, and data‑governance for aspiring big‑data engineers.

Big DataData GovernanceFlink
0 likes · 4 min read
Essential Big Data Interview Questions for Data Warehouse Engineer Roles
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 19, 2025 · Big Data

Cut Shuffle Costs by 60% with MaxCompute’s Cluster Optimization Tool

MaxCompute’s new Cluster Optimization Recommendation analyzes 31 days of shuffle data to automatically suggest optimal hash clustering keys, dramatically cutting shuffle traffic and CU consumption for large jobs, while providing one‑click ALTER TABLE scripts and detailed benefit reports to boost big‑data processing efficiency.

Big DataCost reductionHash Clustering
0 likes · 8 min read
Cut Shuffle Costs by 60% with MaxCompute’s Cluster Optimization Tool
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 18, 2025 · Fundamentals

5 Common Interview Pitfalls Uncovered from 16 Mock Sessions

After conducting 16 one‑on‑one mock interviews and debriefs, we identified five recurring issues—from lacking a holistic project view and poor expression to sloppy resume formatting, underutilizing large‑language models, and neglecting regular self‑review—that candidates should address to improve their interview performance.

Big Datacareer advicecommunication
0 likes · 6 min read
5 Common Interview Pitfalls Uncovered from 16 Mock Sessions
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 13, 2025 · Big Data

How ODPS Evolved Over 15 Years into a Next‑Gen AI‑Ready Big Data Platform

This article chronicles ODPS's 15‑year journey from its exploratory beginnings to a modern, AI‑enabled big data platform, detailing its four development phases, architectural layers, SQL engine upgrades, real‑time processing, lakehouse integration, and the new Data+AI capabilities offered by MaxCompute and DataWorks.

AI integrationBig DataData Warehouse
0 likes · 12 min read
How ODPS Evolved Over 15 Years into a Next‑Gen AI‑Ready Big Data Platform
Big Data Technology Tribe
Big Data Technology Tribe
Aug 12, 2025 · Databases

Why Lakehouse Architecture Is Redefining Modern Data Platforms

This article explains the evolution from traditional data warehouses and data lakes to the unified Lakehouse architecture, detailing its design, benefits, challenges, and research directions for delivering high‑performance SQL and advanced analytics on open‑format storage.

Big DataData LakeData Warehouse
0 likes · 20 min read
Why Lakehouse Architecture Is Redefining Modern Data Platforms
ITPUB
ITPUB
Aug 10, 2025 · Databases

Why Did These Database Titans Fall? Lessons from 50 Years of DB Evolution

The article chronicles half a century of database history, analyzing the rise and collapse of systems like Informix, Sybase, FoxPro, HBase, and dBase, while examining how Oracle, Microsoft, and IBM are adapting to cloud and AI, and forecasting the forces reshaping the future of data storage.

AIBig DataDatabase History
0 likes · 9 min read
Why Did These Database Titans Fall? Lessons from 50 Years of DB Evolution
Sohu Smart Platform Tech Team
Sohu Smart Platform Tech Team
Aug 9, 2025 · Artificial Intelligence

How SimHash and Cosine Similarity Accelerate Large-Scale Text Deduplication

This article explains why traditional pairwise text comparison is impractical for massive news corpora, introduces cosine similarity and SimHash as efficient deduplication techniques, walks through their mathematical foundations, step‑by‑step implementation details, code examples, and discusses trade‑offs such as accuracy versus speed.

Big DataCosine SimilaritySimHash
0 likes · 12 min read
How SimHash and Cosine Similarity Accelerate Large-Scale Text Deduplication
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 7, 2025 · Big Data

Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance

Facing growing data demands in finance, we replaced two legacy synchronization pipelines with a unified, low‑latency architecture using BabelX Real‑Time, Flink CDC, Iceberg v2 and Paimon, achieving minute‑level data freshness, ten‑to‑thirty‑fold query speedups, reduced storage costs, and streamlined schema management across multiple business units.

Big DataFlinkIceberg
0 likes · 12 min read
Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Big Data

How Alibaba Built a World‑Class Big Data Platform Over a Decade

Over ten years, Alibaba’s data engineers transformed a modest Hadoop‑based system into a globally‑scalable, high‑performance big data platform—ODPS/MaxCompute—supporting massive offline and real‑time workloads, pioneering innovations like the 5K cluster expansion, Blink streaming, and the unified ‘Moon’ migration.

AlibabaBig DataData Platform
0 likes · 25 min read
How Alibaba Built a World‑Class Big Data Platform Over a Decade
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Operations

Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes

The article details how Alibaba’s Tesla SRE platform supports the massive offline and real‑time big‑data ecosystems through a layered, data‑driven operations framework—DataOps—integrating unified portals, configuration, job, workflow, and analytics platforms, enabling automated monitoring, intelligent decision‑making, and self‑healing capabilities across 100,000+ nodes.

Big DataDataOpsOperations
0 likes · 20 min read
Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Big Data

How MaxQA Supercharges Query Performance for Large‑Scale Data Warehouses

This article details the migration of Southeast Asia's leading tech group GoTerra from Google BigQuery to Alibaba Cloud MaxCompute, explaining the performance challenges, the MaxQA accelerator architecture, optimization techniques, resource‑quota strategies, and future enhancements that together double query efficiency while reducing costs.

Big DataData WarehousePerformance Optimization
0 likes · 19 min read
How MaxQA Supercharges Query Performance for Large‑Scale Data Warehouses
Big Data Technology Tribe
Big Data Technology Tribe
Aug 5, 2025 · Big Data

How Spark’s Catalyst Optimizer Transforms SQL Queries: Trees, Rules, and Code Generation

This article explains Spark SQL’s Catalyst optimizer, describing its extensible design, tree‑based representation, rule‑driven transformations, batch execution to a fixed point, and how Scala’s pattern matching and quasiquotes enable efficient analysis, logical optimization, physical planning, and code generation.

Big DataCatalyst OptimizerCode Generation
0 likes · 18 min read
How Spark’s Catalyst Optimizer Transforms SQL Queries: Trees, Rules, and Code Generation
Kuaishou Tech
Kuaishou Tech
Jul 31, 2025 · Big Data

How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics

This article details how Kuaishou’s content middle platform tackled the massive challenges of real‑time, flexible, and cost‑effective data analysis at trillion‑scale by redesigning its architecture, adopting ClickHouse, splitting wide tables, and implementing a scatter‑gather execution model with pre‑shuffle and bitmap optimizations.

Big DataClickHousePerformance Optimization
0 likes · 17 min read
How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics
Data Party THU
Data Party THU
Jul 31, 2025 · Industry Insights

How a 30‑Minute Steel Melt Can Unlock a 10% Production Boost – Insights from Industrial Data Analysis

The article explores real‑world industrial cases—from steel furnace timing and historic lithography to modern manufacturing—showing how continuous improvement, root‑cause analysis, and careful handling of correlation versus causation can reveal hidden inefficiencies, while highlighting the limits of traditional statistics and the emerging role of AI in industrial data analytics.

AIBig DataContinuous Improvement
0 likes · 14 min read
How a 30‑Minute Steel Melt Can Unlock a 10% Production Boost – Insights from Industrial Data Analysis
ITPUB
ITPUB
Jul 29, 2025 · Big Data

How to Deduplicate 4 Billion QQ IDs Using a Bitmap Within 1 GB Memory

Learn how to efficiently remove duplicates from 4 billion QQ numbers using a memory‑friendly Bitmap approach that fits within a 1 GB limit, including calculations, step‑by‑step implementation, Java code, and a discussion of its advantages and drawbacks.

Big DataBitmapData Structures
0 likes · 9 min read
How to Deduplicate 4 Billion QQ IDs Using a Bitmap Within 1 GB Memory
360 Tech Engineering
360 Tech Engineering
Jul 29, 2025 · Information Security

How AI and Big Data Are Redefining Global Cybersecurity – Insights from Zhou Hongyi

In his 2025 World Internet Conference Digital Silk Road Forum keynote, Zhou Hongyi warned that the programmable, AI‑driven, data‑centric world amplifies cyber vulnerabilities, described the rise of state‑level cyber warfare and AI‑powered attacks, and outlined 360’s security‑as‑service strategy and global cooperation plans to protect nations and enterprises.

AIBig DataSecurity Operations
0 likes · 5 min read
How AI and Big Data Are Redefining Global Cybersecurity – Insights from Zhou Hongyi
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 29, 2025 · Big Data

How GoTerra Cut Costs and Boost Speed: BigQuery‑to‑MaxCompute Performance Secrets

This article details the real‑world migration of a leading Southeast Asian tech group from BigQuery to MaxCompute, exposing the three major challenges, the data‑driven performance‑optimization methodology, and the concrete techniques—Auto Partition, UNNEST redesign, large‑query graph optimizations, and intelligent tuning—that delivered dramatic cost reductions and query‑speed gains.

Auto PartitionBig DataData Warehouse Migration
0 likes · 17 min read
How GoTerra Cut Costs and Boost Speed: BigQuery‑to‑MaxCompute Performance Secrets
Bilibili Tech
Bilibili Tech
Jul 25, 2025 · Big Data

How Unified Metadata Lineage Transforms Big Data Governance and Security

This article introduces the comprehensive design and evolution of a unified metadata lineage platform for big data, covering background, data processing chain, lineage models, system architecture, quality metrics, application scenarios, and future plans to enhance data governance, quality, and security.

Big DataData GovernanceData Quality
0 likes · 27 min read
How Unified Metadata Lineage Transforms Big Data Governance and Security
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 25, 2025 · Big Data

Cross-Contrastive Learning Cuts Flink Anomaly Detection Errors by 12%

The paper “Noise Matters: Cross Contrastive Learning for Flink Anomaly Detection”, accepted at VLDB 2025, introduces a novel cross‑contrastive method that leverages attention‑based representations and a boundary‑aware loss to detect Flink‑specific hotspot anomalies, achieving a 12.1% F1 improvement over state‑of‑the‑art techniques.

Big DataCross-Contrastive LearningFlink
0 likes · 6 min read
Cross-Contrastive Learning Cuts Flink Anomaly Detection Errors by 12%
DataFunSummit
DataFunSummit
Jul 19, 2025 · Artificial Intelligence

Big Data Meets Generative AI: Industry Transformations from Prof. Dou

Prof. Dou Dejing shares his journey into Fudan University's Data Intelligence Lab, outlines the history and synergy of big data and AI, reviews generative AI breakthroughs, evaluates large‑model strengths and weaknesses, and explores their expanding industrial applications and market potential.

Big Dataartificial intelligencegenerative AI
0 likes · 13 min read
Big Data Meets Generative AI: Industry Transformations from Prof. Dou
DataFunSummit
DataFunSummit
Jul 18, 2025 · Databases

Boosting ClickHouse on WeChat: Performance Tools, Lakehouse Hacks & AI

This article explores how ClickHouse is deployed across WeChat for real‑time analytics, introduces a suite of performance‑monitoring tools, details lakehouse read and bitmap optimizations, and describes the integration of AI‑driven vector search, showcasing substantial speedups and scalability improvements.

AIBig DataClickHouse
0 likes · 12 min read
Boosting ClickHouse on WeChat: Performance Tools, Lakehouse Hacks & AI
Youzan Coder
Youzan Coder
Jul 18, 2025 · Cloud Native

How Mixed Workloads Boost Kubernetes CPU Utilization by Over 40%

This article explains how Youzan transformed its Kubernetes clusters from static over‑commit scheduling to load‑balanced mixed workloads using Koordinator and the Longxi kernel, achieving higher CPU utilization, lower costs, and better resource management for both online and offline services.

Big DataCloud NativeKoordinator
0 likes · 10 min read
How Mixed Workloads Boost Kubernetes CPU Utilization by Over 40%
DataFunSummit
DataFunSummit
Jul 18, 2025 · Big Data

Data Lake & Lakehouse Innovations: Real-Time Analytics and Industry Case Studies

This article presents a curated collection of cutting‑edge data lake and lakehouse case studies—including real‑time analytics, cloud‑native architectures, industry implementations from sales platforms to automotive IoT, and the latest advancements in open‑source projects—offering insights into modern big‑data strategies and governance.

Big DataData LakeLakehouse
0 likes · 2 min read
Data Lake & Lakehouse Innovations: Real-Time Analytics and Industry Case Studies