Tagged articles

3672 articles

Page 2 of 37

Oct 28, 2025 · Artificial Intelligence

Can Data Virtualization Deliver Millisecond Real‑Time Features Across Stores?

This article shares a three‑year journey of building a data‑virtualization‑based, multi‑environment feature management framework for real‑time risk decision platforms, detailing challenges like heterogeneous storage, cold‑start, and operational stability, and presenting a unified architecture that decouples physical storage from business logic.

Big DataReal-time analyticsdata virtualization

0 likes · 16 min read

Can Data Virtualization Deliver Millisecond Real‑Time Features Across Stores?

DataFunSummit

Oct 28, 2025 · Fundamentals

Why Unstructured Data Management Is the Next Frontier for Enterprises

This article explores the evolution, current state, and challenges of enterprise unstructured data management, reviews case studies from traditional firms, Huawei and Ant Group, proposes an ECM‑based reference framework, compares it with structured data governance, and outlines future integration strategies with AI and unified data platforms.

AIBig DataData Governance

0 likes · 28 min read

Why Unstructured Data Management Is the Next Frontier for Enterprises

Alibaba Cloud Big Data AI Platform

Oct 28, 2025 · Big Data

How Huolala Scaled Elasticsearch to 40B Records with Serverless Cloud Architecture

Huolala, a leading smart logistics platform serving over 14 markets and millions of users, detailed its massive Elasticsearch deployment—over 1.5 万 CPU cores, 40 billion records, 4 PB data—highlighting multi‑AZ design, serverless migration, and a comprehensive management platform that boosted performance, reduced costs, and enabled AI‑driven services.

AI searchBig DataElasticsearch

0 likes · 10 min read

How Huolala Scaled Elasticsearch to 40B Records with Serverless Cloud Architecture

StarRocks

Oct 28, 2025 · Databases

How Cisco Migrated from Pinot to StarRocks and Boosted Query Performance by Up to 70%

This article details Cisco Webex's migration from a complex Pinot‑Trino OLAP stack to StarRocks, covering the challenges of the legacy system, the step‑by‑step migration process—including storage, compute, and SQL dialect transformation—and the resulting performance gains, cost reductions, and operational improvements.

Big DataOLAPPinot

0 likes · 23 min read

How Cisco Migrated from Pinot to StarRocks and Boosted Query Performance by Up to 70%

Mike Chen's Internet Architecture

Oct 26, 2025 · Big Data

4 Proven Strategies to Speed Up Kafka Consumer Performance

This guide explains how to boost Kafka consumer throughput by increasing concurrency, streamlining consumer logic, tuning key configuration parameters, applying practical settings, and scaling hardware or using buffering layers to handle peak loads efficiently.

Big DataConfigurationConsumer

0 likes · 5 min read

4 Proven Strategies to Speed Up Kafka Consumer Performance

Alibaba Cloud Big Data AI Platform

Oct 24, 2025 · Big Data

How Leapmotor Scaled to 1M Cars with a Real‑Time Flink Data Platform

Leapmotor’s rapid growth to one million production cars drove a shift from daily batch data to minute‑level real‑time analytics, prompting the adoption of Flink as the core engine of a multi‑layered big‑data platform that handles massive IoT signals, supports fault diagnosis, and integrates batch and streaming workloads on the cloud.

Big DataData PlatformFlink

0 likes · 13 min read

How Leapmotor Scaled to 1M Cars with a Real‑Time Flink Data Platform

Big Data Tech Team

Oct 23, 2025 · Industry Insights

How to Build a Reusable, Well‑Designed Data Warehouse Model

This article analyzes why analysts and data engineers clash over non‑reusable data models, presents metrics such as cross‑layer reference rate and model reuse coefficient, and outlines a step‑by‑step framework—including ODS takeover, subject‑domain mapping, dimension consistency, fact‑table integration, development best practices, and tool support—to transform siloed warehouses into a shared data‑platform.

Big DataData GovernanceData Platform

0 likes · 15 min read

How to Build a Reusable, Well‑Designed Data Warehouse Model

DataFunSummit

Oct 22, 2025 · Big Data

How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage

This article introduces Douyin Group’s comprehensive data asset management platform, explains why it emphasizes data assets over raw metadata, outlines its full‑linkage lineage capabilities, and presents practical insights on building, applying, and future‑proofing big data lineage within complex enterprise environments.

Big DataData Asset ManagementData Lineage

0 likes · 5 min read

How Douyin’s Data Asset Platform Revolutionizes Big Data Lineage

Architect Chen

Oct 22, 2025 · Big Data

How to Eliminate Kafka Message Backlog with Practical Optimizations

This guide presents concrete techniques for improving Kafka consumer and producer performance, scaling clusters, tuning broker settings, and designing asynchronous buffering layers to prevent message accumulation and boost overall throughput.

Big DataKafkaPerformance Optimization

0 likes · 5 min read

How to Eliminate Kafka Message Backlog with Practical Optimizations

Raymond Ops

Oct 21, 2025 · Big Data

Deep Dive into Kafka Architecture: Topics, Partitions, and Reliable Data Pipelines

This article explains Kafka’s core concepts—including topics, partitions, log segmentation, indexing, and acknowledgment mechanisms—then provides a step‑by‑step guide to deploy a Zookeeper‑Kafka cluster integrated with Filebeat, Logstash, and the ELK stack for reliable log collection and analysis.

Big DataELKFilebeat

0 likes · 11 min read

Deep Dive into Kafka Architecture: Topics, Partitions, and Reliable Data Pipelines

Selected Java Interview Questions

Oct 21, 2025 · Big Data

How to Sync Massive MySQL Datasets Efficiently with DataX

This guide walks through the challenges of synchronizing tens of millions of records between heterogeneous MySQL databases, explains why traditional mysqldump or file‑based methods fail, and provides a step‑by‑step tutorial on installing, configuring, and using Alibaba's open‑source DataX tool for both full and incremental data synchronization.

Big DataDataXETL

0 likes · 15 min read

How to Sync Massive MySQL Datasets Efficiently with DataX

Big Data Technology & Architecture

Oct 20, 2025 · Big Data

Unlocking Lakehouse Power: Paimon and Doris Integrated Solutions

This article reviews how Paimon and Doris combine to solve unified storage, data visibility, and performance challenges in modern lakehouse architectures, detailing their complementary features, integration capabilities, and real‑world use cases from leading companies.

AnalyticsBig DataData Lake

0 likes · 8 min read

Unlocking Lakehouse Power: Paimon and Doris Integrated Solutions

DataFunSummit

Oct 19, 2025 · Big Data

How Apache Gravitino and OpenLineage Transform Data Governance in the AI Era

This article explains how the rapid rise of AI and large‑model technologies is driving a paradigm shift in data governance toward intelligent, automated, and real‑time collaboration, outlines the challenges of multi‑cloud environments, and demonstrates how Apache Gravitino and OpenLineage provide a unified metadata and lineage solution that improves data quality, compliance, and business agility.

Apache GravitinoBig DataData Lineage

0 likes · 12 min read

How Apache Gravitino and OpenLineage Transform Data Governance in the AI Era

DataFunTalk

Oct 19, 2025 · Big Data

How Zhihu’s Big Data Strategy Cuts Costs and Boosts Efficiency

This article outlines Zhihu’s big‑data cost‑reduction journey, covering its background, the FinOps‑driven financial management system, technical strategies for lowering expenses, and a forward‑looking summary of challenges and sustainable efficiency gains within the organization and industry context.

Big DataData PlatformFinOps

0 likes · 4 min read

How Zhihu’s Big Data Strategy Cuts Costs and Boosts Efficiency

DataFunSummit

Oct 18, 2025 · Big Data

How Zhihu’s Big Data FinOps Cuts Costs and Boosts Efficiency

This article outlines Zhihu’s practical use of big‑data FinOps, describing its hybrid‑cloud architecture, the challenges of multi‑vendor cost management, and how a systematic billing system launched in 2022 drives sustainable cost reduction across the organization.

Big DataCost reductionData Platform

0 likes · 4 min read

How Zhihu’s Big Data FinOps Cuts Costs and Boosts Efficiency

Ray's Galactic Tech

Oct 17, 2025 · Big Data

Unlock Kafka’s Billion-Message Performance: The Four Core Techniques

This article breaks down Kafka’s architecture, explaining how sequential I/O, zero‑copy, batching with compression, and partition‑based horizontal scaling combine to deliver ultra‑high throughput, low latency, and strong reliability for handling billions of messages.

Big DataKafkaStreaming

0 likes · 10 min read

Unlock Kafka’s Billion-Message Performance: The Four Core Techniques

Huolala Tech

Oct 17, 2025 · Big Data

How HuoLala Accelerated User Profiling 30× Faster with Apache Doris

This article details how HuoLala built a high‑performance user profiling platform on Apache Doris, redesigning data models, leveraging bitmap storage, and applying query‑level optimizations to achieve up to 30‑fold speed gains, lower memory usage, and scalable real‑time analytics.

Apache DorisBig DataBitmap

0 likes · 17 min read

How HuoLala Accelerated User Profiling 30× Faster with Apache Doris

JD Tech Talk

Oct 16, 2025 · Big Data

Understanding Apache Hudi Core Concepts: Timeline, File Layout, and Table Types

This article explains Apache Hudi's architecture, including its timeline mechanism, file layout, indexing strategies, table types (COW and MOR), query options, storage format versioning, backward compatibility, and key configuration settings for managing data lake tables.

Apache HudiBig DataCopy-on-Write

0 likes · 8 min read

Understanding Apache Hudi Core Concepts: Timeline, File Layout, and Table Types

Instant Consumer Technology Team

Oct 14, 2025 · Big Data

How to Boost Spark SQL DAG Efficiency with Regex‑Driven Temporary Views

This article explains how to reduce intermediate tables, simplify dependencies, and improve execution efficiency in Spark SQL pipelines by using session‑level temporary views and regex‑based SQL parsing to automatically merge and rewrite DAG tasks in large‑scale data platforms.

Big DataDAG OptimizationETL

0 likes · 13 min read

How to Boost Spark SQL DAG Efficiency with Regex‑Driven Temporary Views

StarRocks

Oct 14, 2025 · Big Data

How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks

Ctrip's User Behavior Tracking (UBT) system, handling 30 TB of daily data, moved from ClickHouse to StarRocks' compute‑storage separated architecture, cutting average query latency from 1.4 seconds to 203 ms, halving storage, reducing nodes from 50 to 40, and boosting write throughput to 3 million rows per second.

Big DataClickHouseData Migration

0 likes · 15 min read

How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks

DataFunSummit

Oct 14, 2025 · Big Data

How Douyin’s Data Asset Platform Redefines Big Data Lineage

This article introduces Douyin Group’s one‑stop Data Asset Management Platform, explains why the company focuses on data assets rather than raw metadata, and details the evolution, architecture, applications, and future outlook of its comprehensive big‑data lineage system.

Big DataData Asset ManagementData Governance

0 likes · 5 min read

How Douyin’s Data Asset Platform Redefines Big Data Lineage

Baidu Geek Talk

Oct 13, 2025 · Big Data

How Baidu Scaled Its Data Warehouse to Handle Billions of PVs and Petabytes

This article details Baidu APP's massive data‑warehouse overhaul, describing the two‑step strategy that stabilized log cleaning, modernized the ETL framework, introduced wide‑table architectures, and implemented tiered storage to dramatically improve processing speed, reliability, and cost efficiency for petabyte‑scale workloads.

Big DataData WarehouseETL

0 likes · 25 min read

How Baidu Scaled Its Data Warehouse to Handle Billions of PVs and Petabytes

DataFunSummit

Oct 11, 2025 · Big Data

What Small Banks Can Learn from Cutting-Edge Data Governance Practices

This article shares a data‑governance roadmap for small and medium banks, covering industry pain points, high‑quality data sets, a three‑step governance path, data standards, metadata management, master‑data strategy, business data modeling, a hybrid Greenplum‑Hadoop platform, quality monitoring, and a maturity assessment framework.

BankingBig DataData Architecture

0 likes · 21 min read

What Small Banks Can Learn from Cutting-Edge Data Governance Practices

DataFunTalk

Oct 8, 2025 · Big Data

How ByteHouse Cuts Data Warehouse Costs: Tackling Explicit and Implicit Challenges

As data volumes explode, enterprises struggle with the high hardware, performance, operational, and migration costs of traditional OLAP warehouses, but ByteHouse’s cloud‑native architecture offers a cost‑effective, high‑performance solution that dramatically reduces both explicit and hidden expenses.

Big DataByteHouseCost reduction

0 likes · 6 min read

How ByteHouse Cuts Data Warehouse Costs: Tackling Explicit and Implicit Challenges

DataFunTalk

Oct 6, 2025 · Big Data

What Ant Group Learned: 5 Pillars of Effective Data Governance

Ant Group shares its practical experience in big data governance, outlining five key focus areas—architecture, security, compliance, quality, and value—through four structured sections and detailed discussions on data quality and storage governance, while also exploring future challenges and the economics of data.

Ant GroupBig DataData Architecture

0 likes · 4 min read

What Ant Group Learned: 5 Pillars of Effective Data Governance

DataFunSummit

Oct 4, 2025 · Operations

How Zhihu Leverages FinOps and Mixed‑Cloud Architecture to Slash Costs

This article explains how Zhihu’s big‑data platform applies FinOps principles and a mixed‑cloud strategy to overcome multi‑vendor complexity, organizational challenges, and sustainability issues, ultimately achieving continuous cost reduction and efficiency gains.

Big DataCost reductionFinOps

0 likes · 4 min read

How Zhihu Leverages FinOps and Mixed‑Cloud Architecture to Slash Costs

ITPUB

Oct 3, 2025 · Big Data

How Qunar Travel Cut 2000 CPU Cores by Optimizing Kafka Production

This case study details how Qunar Travel's engineering team analyzed Kafka production bottlenecks during peak traffic, added targeted monitoring, tuned thread and batch parameters, and validated the changes through gray‑scale tests, ultimately saving about 2000 CPU cores across three clusters while reducing request volume and improving network and disk utilization.

Big DataCPU SavingsKafka

0 likes · 14 min read

How Qunar Travel Cut 2000 CPU Cores by Optimizing Kafka Production

Amap Tech

Sep 29, 2025 · Artificial Intelligence

How Gaode’s AI‑Powered Route Planner Saves Money and Time During Holiday Travel

Gaode Map introduces three AI‑driven routing features—toll‑free exit recommendation, global faster‑route optimization, and high‑frequency rapid‑route detection—that combine massive traffic data, multi‑objective algorithms, and real‑time prediction to help users save both toll costs and travel time during the National Day travel peak.

AIBig Datamulti-objective

0 likes · 8 min read

How Gaode’s AI‑Powered Route Planner Saves Money and Time During Holiday Travel

DataFunSummit

Sep 28, 2025 · Big Data

How ByteHouse Cuts Data Warehouse Costs: Tackling Hidden and Visible Expenses

This article examines the exploding data volumes that pressure modern enterprises, outlines the explicit (hardware, performance) and implicit (operations, migration) costs of operating an OLAP‑based data warehouse, and explains how ByteHouse’s cloud‑native architecture reduces both cost categories while delivering real‑time analytics.

Big DataByteHouseData Warehouse

0 likes · 5 min read

How ByteHouse Cuts Data Warehouse Costs: Tackling Hidden and Visible Expenses

DataFunTalk

Sep 27, 2025 · Artificial Intelligence

How Bilibili Uses LLMs to Diagnose Big Data Platform Issues

This article explains how Bilibili leverages a large‑language‑model‑driven assistant to diagnose and resolve failures and slowdowns in its massive big‑data platform, detailing the platform’s five‑layer architecture, common task issues, and the need for intelligent troubleshooting tools.

AI AssistantBig DataBilibili

0 likes · 5 min read

How Bilibili Uses LLMs to Diagnose Big Data Platform Issues

Huolala Tech

Sep 26, 2025 · Big Data

How We Migrated 40 PB of Hive Data Across Clouds with Zero Downtime

This article details the end‑to‑end design, challenges, and implementation of a cross‑cloud migration of over 200 k Hive tables and nearly 40 PB of data using the self‑developed Kirk service, covering architecture, verification steps, and lessons learned to achieve 100 % data consistency without impacting production services.

Big DataData ConsistencyData Migration

0 likes · 20 min read

How We Migrated 40 PB of Hive Data Across Clouds with Zero Downtime

DataFunTalk

Sep 25, 2025 · Big Data

How Tencent Cloud’s AI‑Ready Data Platform Redefines Big Data for AI

This article outlines the challenges of high‑quality data for AI, introduces Tencent Cloud’s AI‑Ready data platform with three core capabilities—DIaaS, Setats, and ES‑based knowledge search—covers the end‑to‑end WeData integration, intelligent agents for automation, and showcases ecosystem partnerships driving industry‑wide intelligent transformation.

AIBig DataData Platform

0 likes · 14 min read

How Tencent Cloud’s AI‑Ready Data Platform Redefines Big Data for AI

Big Data Technology & Architecture

Sep 24, 2025 · Big Data

Avoid These 6 Common Paimon Data Loss Pitfalls in Flink and Spark

Learn the six typical scenarios that cause data loss when writing to Paimon—ranging from checkpoint failures and misconfigured partial‑update mode to incorrect sequence fields, snapshot retention issues, concurrent bucket writes, and outdated Spark versions—and how to prevent each problem.

Big DataCheckpointData loss

0 likes · 5 min read

Avoid These 6 Common Paimon Data Loss Pitfalls in Flink and Spark

DataFunTalk

Sep 23, 2025 · Big Data

How Kuaishou’s Data Platform Powers Intelligent BI: Architecture, Challenges, and Solutions

This article outlines Kuaishou Data Platform's mission to boost data decision efficiency, describes its three‑layer architecture, explains the BI process from data ingestion to application, and shares practical experiences and future outlook for intelligent BI powered by AI and big data.

AIBIBig Data

0 likes · 5 min read

How Kuaishou’s Data Platform Powers Intelligent BI: Architecture, Challenges, and Solutions

AI2ML AI to Machine Learning

Sep 22, 2025 · Big Data

Why AI‑Native Big Data Platforms Are About to Explode

The article examines how large‑model limitations in accuracy, explainability, and stability have stalled decision‑support use, prompting industry leaders to champion AI‑Ready data infrastructures, Data 4.0 concepts, and AI‑generated service code as the next wave of AI‑native big data platforms.

AIAI-nativeBig Data

0 likes · 6 min read

Why AI‑Native Big Data Platforms Are About to Explode

NiuNiu MaTe

Sep 22, 2025 · Big Data

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

Learn four practical techniques—simple sorting, hashmap deduplication, external merge sort, and bitmap bit‑set optimization—to efficiently remove duplicate QQ numbers from a 40‑billion‑record file while staying within a strict 1 GB memory limit, even handling tighter 100 MB constraints.

Big DataBitmapalgorithm

0 likes · 9 min read

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

DataFunTalk

Sep 22, 2025 · Big Data

How Kuaishou Scales Intelligent BI: Insights from Its Data Platform

This article outlines Kuaishou's Data Platform team's mission to boost data‑driven decision making through advanced compute engines, high‑performance services, and AI‑enhanced BI, detailing its architecture, challenges, solutions, and future outlook for large‑scale intelligent analytics.

AIAnalyticsBI

0 likes · 6 min read

How Kuaishou Scales Intelligent BI: Insights from Its Data Platform

DataFunSummit

Sep 21, 2025 · Big Data

Breaking the CPU Wall: BIGO’s Gluten Engine Accelerates Spark and Flink

When big‑data workloads hit the CPU wall, BIGO’s adoption of the open‑source Gluten project delivers native‑engine execution for Spark and a roadmap for Flink, achieving up to 30% end‑to‑end speedup, 50% memory savings, and a scalable, cost‑effective data processing platform.

Big DataFlinkGluten

0 likes · 16 min read

Breaking the CPU Wall: BIGO’s Gluten Engine Accelerates Spark and Flink

Alibaba Cloud Big Data AI Platform

Sep 19, 2025 · Big Data

How MMS Powered a 50 PB BigQuery‑to‑MaxCompute Migration for GoTerra

This article details GoTerra's massive six‑month, 50 PB migration from GCP BigQuery to Alibaba Cloud MaxCompute, covering project scope, technical challenges such as complex data types, partition strategies, and high‑speed requirements, and explaining how the MaxCompute Migration Service (MMS) solved them with innovative architecture, scheduling, and data‑reorder techniques.

Big DataBigQueryData Migration

0 likes · 14 min read

How MMS Powered a 50 PB BigQuery‑to‑MaxCompute Migration for GoTerra

DataFunTalk

Sep 19, 2025 · Big Data

How Kuaishou’s Data Platform Powers Intelligent BI with AI and Big Data

This article outlines how Kuaishou’s Data Platform Department enhances decision‑making efficiency by building advanced compute engines and high‑performance services, detailing the platform’s architecture, challenges of intelligent BI, AI‑driven solutions, and the end‑to‑end BI workflow from data ingestion to analysis.

AnalyticsBIBig Data

0 likes · 5 min read

How Kuaishou’s Data Platform Powers Intelligent BI with AI and Big Data

Big Data Tech Team

Sep 17, 2025 · Big Data

How to Build a Scalable Tag System for Recommendation Engines

This article explains why a robust tag system is essential for recommendation and mining strategies, outlines the hierarchy of entity, concept, and theme tags, and provides practical principles, architecture, and step‑by‑step methods for constructing and managing tags in large‑scale data platforms.

Big DataData Architecturedata labeling

0 likes · 14 min read

How to Build a Scalable Tag System for Recommendation Engines

Mike Chen's Internet Architecture

Sep 16, 2025 · Big Data

Why ElasticSearch Is Essential for Modern Search, Logging, and Big Data Solutions

This article explains how ElasticSearch serves as a core middleware for large‑scale architectures, covering its role in search engines, log analysis with the ELK stack, massive data querying, and even as an independent database system, illustrated with practical examples and diagrams.

Big DataData Architecturelog analysis

0 likes · 4 min read

Why ElasticSearch Is Essential for Modern Search, Logging, and Big Data Solutions

Data Party THU

Sep 16, 2025 · Big Data

How Big Data Transforms Petrochemical Price Forecasting: A Student Project Review

This report details a university big‑data project that built a full pipeline—from raw petrochemical market data and text mining to variable selection, XGBoost/LightGBM/Lasoo regression and RNN/LSTM/GRU models—to predict product prices across multiple horizons, evaluate errors, and deliver an interactive demo.

Big DataLSTMRNN

0 likes · 8 min read

How Big Data Transforms Petrochemical Price Forecasting: A Student Project Review

Alibaba Cloud Big Data AI Platform

Sep 15, 2025 · Big Data

How a FinTech Firm Boosted Real‑Time Decision Making with StarRocks Data Warehouse

This case study details how Shuhe Technology, a leading fintech company, overcame data redundancy, low resource utilization, and slow reporting by adopting Alibaba Cloud EMR Serverless StarRocks for a unified, real‑time data warehouse, achieving standardized data pipelines, cost savings, and minute‑level decision latency.

Big DataFinTechStarRocks

0 likes · 8 min read

How a FinTech Firm Boosted Real‑Time Decision Making with StarRocks Data Warehouse

Data Party THU

Sep 14, 2025 · Big Data

How to Evaluate Battery Storage Health with Big Data and LightGBM

This report details a university big‑data project that builds a data‑driven framework for assessing lithium‑ion battery storage health, cleaning operational data, detecting abnormal cells with DBSCAN, and predicting SOC/SOH using LightGBM, while highlighting findings, limitations, and future improvements.

Big DataDBSCANLightGBM

0 likes · 4 min read

How to Evaluate Battery Storage Health with Big Data and LightGBM

Data Party THU

Sep 13, 2025 · Artificial Intelligence

How a Multi‑Agent Large Model Transforms Ecological Big‑Data Analysis

This report details a university project that built a flexible, high‑performance multi‑agent large‑model framework for ecological environment big‑data analysis, covering system architecture, individual agents, memory mechanisms, report generation, a FastAPI‑LangGraph backend, a React frontend, testing methodology, and future directions.

AIBig DataFastAPI

0 likes · 7 min read

How a Multi‑Agent Large Model Transforms Ecological Big‑Data Analysis

Data Party THU

Sep 12, 2025 · Big Data

Key Lessons from Winning the 2025 China University Big Data Competition

The author shares a detailed account of their experience in the 2025 China University Big Data Competition, describing the team’s top national ranking, the shift from absolute stock price prediction to robust ranking learning, extensive feature engineering, and reflections on balancing technical ambition with real‑world constraints.

Big DataStock Predictiondata competition

0 likes · 5 min read

Key Lessons from Winning the 2025 China University Big Data Competition

Data Party THU

Sep 11, 2025 · Big Data

How We Conquered the 2025 Chinese University Big Data Challenge: Financial Time‑Series Lessons

Our team "Stay Overnight" from Chongqing University of Posts and Telecommunications placed second nationally in the 2025 China University Computer Competition Big Data Challenge, navigating volatile financial data, shifting from time‑series to supervised learning, and emphasizing feature engineering to boost model performance.

Big DataModel Selectioncompetition report

0 likes · 4 min read

How We Conquered the 2025 Chinese University Big Data Challenge: Financial Time‑Series Lessons

360 Zhihui Cloud Developer

Sep 11, 2025 · Big Data

How Paimon Transforms Membership Data Warehousing: From Legacy Lambda to Real‑Time Lakehouse

This article examines the challenges of a legacy Lambda‑based membership data warehouse, introduces Apache Paimon’s lakehouse architecture and its key features, and showcases three real‑world implementations—partial‑update order wide tables, Bitmap‑based UV counting, and branch‑based data correction—while discussing benefits, remaining challenges, and future directions.

Big DataData LakeData Warehouse

0 likes · 29 min read

How Paimon Transforms Membership Data Warehousing: From Legacy Lambda to Real‑Time Lakehouse

Mike Chen's Internet Architecture

Sep 10, 2025 · Fundamentals

Understanding Key Distributed Storage Systems: HDFS, Ceph, FastDFS, and TFS

This article provides a concise overview of four major distributed storage solutions—HDFS, Ceph, FastDFS, and TFS—highlighting their architectures, strengths, weaknesses, and typical use cases for large‑scale data and e‑commerce applications.

Big DataCephFastDFS

0 likes · 4 min read

Understanding Key Distributed Storage Systems: HDFS, Ceph, FastDFS, and TFS

Alibaba Cloud Big Data AI Platform

Sep 10, 2025 · Big Data

Unlock Seamless BigQuery to MaxCompute Migration with dbt‑maxcompute

This article details the real‑world migration of Southeast Asian tech leader GoTerra from BigQuery to MaxCompute, showcasing how the open‑source dbt‑maxcompute adapter enables smooth ELT transitions, advanced incremental strategies, performance gains, ecosystem compatibility, and comprehensive best‑practice implementations for large‑scale data pipelines.

Big DataData MigrationELT

0 likes · 13 min read

Unlock Seamless BigQuery to MaxCompute Migration with dbt‑maxcompute

Architect Chen

Sep 10, 2025 · Big Data

How Kafka Achieves Million‑Message Throughput: Sequential Writes, Page Cache, Batching & Zero‑Copy

The article explains how Kafka attains high‑throughput performance by using sequential disk writes, leveraging the OS page cache, employing producer and consumer batching with configurable parameters, and utilizing zero‑copy sendfile to minimize CPU and memory overhead, enabling stable million‑message per second rates.

BatchingBig DataHigh Throughput

0 likes · 5 min read

How Kafka Achieves Million‑Message Throughput: Sequential Writes, Page Cache, Batching & Zero‑Copy

Data Party THU

Sep 8, 2025 · Big Data

What We Learned from the 2025 China University Big Data Competition

The article shares a top‑5 team's experience in the 2025 China University Big Data Challenge, detailing their roster, competition rules, four key technical insights on data pitfalls, model alignment, generalization, and leveraging SOTA models, plus reflections on the event's excellent support and collaborative atmosphere.

Big Datafeature engineeringmodel generalization

0 likes · 6 min read

What We Learned from the 2025 China University Big Data Competition

Data Party THU

Sep 6, 2025 · Big Data

From Data Chaos to Predictive Insight: My Solo Journey in the 2025 Big Data Competition

An individual participant recounts their journey in the 2025 China University Computer Competition Big Data Challenge, detailing data cleaning, feature engineering, model building on 300‑stock historical prices, and insights gained from solo competition experience, highlighting challenges, lessons, and future directions in financial AI.

Big Datacompetitiondata engineering

0 likes · 4 min read

From Data Chaos to Predictive Insight: My Solo Journey in the 2025 Big Data Competition

DataFunTalk

Sep 6, 2025 · Big Data

How Xiaomi Cuts Costs and Boosts Efficiency with a Cloud‑Native Lakehouse Architecture

Xiaomi’s data‑lake team explains how they tackled small‑file issues, unified metadata with Gravitino, migrated Hive to Iceberg and Fileset, leveraged JuiceFS for multi‑cloud storage, and combined Iceberg and Paimon to achieve cost‑effective, high‑performance batch and real‑time analytics.

Big DataCloud NativeData Lake

0 likes · 13 min read

How Xiaomi Cuts Costs and Boosts Efficiency with a Cloud‑Native Lakehouse Architecture

DataFunSummit

Sep 2, 2025 · Big Data

How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture

Xiaomi’s engineers explain how they tackled data‑lake challenges—small files, metadata latency, and multi‑cloud costs—by combining compact storage, Gravitino‑based metadata governance, Iceberg and Paimon formats, and JuiceFS abstraction, achieving lower storage expenses, faster queries, and a roadmap toward intelligent, real‑time, multimodal lakehouses.

Big DataData LakeStorage Optimization

0 likes · 14 min read

How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture

StarRocks

Sep 2, 2025 · Big Data

How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Flash Sale

Faced with billions of marketing events and minute‑level decision requirements during Taobao's flash‑sale campaign, the e‑commerce data team built a real‑time lakehouse using StarRocks and Paimon, leveraged asynchronous materialized views and RoaringBitmap deduplication, and achieved sub‑second query latency, massive cost savings, and stable high‑concurrency performance.

Big DataLakehouseMaterialized Views

0 likes · 26 min read

How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Flash Sale

Baidu Geek Talk

Sep 1, 2025 · Big Data

How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink

This article explains how Baidu Netdisk transitioned from Spark Streaming to a Flink‑based Tiangong real‑time computing engine, detailing the evolution, reasons for choosing Flink, architecture, configuration examples, business use cases, technical challenges, and future platform plans.

Baidu NetdiskBig DataFlink

0 likes · 16 min read

How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink

Instant Consumer Technology Team

Sep 1, 2025 · Frontend Development

What’s Hot in Frontend, AI, and Cloud This Week? Top Insights and Tools

This weekly tech roundup highlights Meituan’s dynamic container performance breakthrough, Huawei’s Mate X5 foldable adaptation, ByteDance’s Rspack 1.5 features, AI‑driven automation advances, MQTT and Crush terminal tools, Alibaba Cloud’s AI platform milestones, and practical guides for performance optimization and Chrome extension development.

AIBig DataDevTools

0 likes · 8 min read

What’s Hot in Frontend, AI, and Cloud This Week? Top Insights and Tools

DataFunTalk

Sep 1, 2025 · Big Data

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

JD Retail outlines the growing data management challenges it faces—including asset discovery, architecture agility, development quality, and rising IT costs—and presents a comprehensive data governance framework that leverages standards, agile architecture, development isolation, and resource optimization to improve efficiency and reduce operational expenses.

Big DataData GovernanceData Management

0 likes · 7 min read

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

Architects' Tech Alliance

Aug 31, 2025 · Artificial Intelligence

Why the Last Decade Became the Golden Age of AI Chip Architecture

The article traces the evolution of AI hardware over the past ten years, outlining three key phases—from early chip limitations that sidelined neural networks, through CPU advances that still fell short, to the rise of GPUs and specialized AI chips that finally unlocked rapid AI deployment, while also highlighting the parallel impact of algorithmic breakthroughs and massive data growth.

AI hardwareBig DataGPU

0 likes · 5 min read

Why the Last Decade Became the Golden Age of AI Chip Architecture

Alibaba Cloud Big Data AI Platform

Aug 29, 2025 · Big Data

How MaxCompute Streaming Insert Revolutionized Real‑Time Data Migration from BigQuery

This article details how a leading Southeast Asian tech group migrated its real‑time write workloads from Google BigQuery to MaxCompute using MaxCompute Streaming Insert, covering architecture, core features, migration challenges, optimization strategies, business impact, and future enhancements.

Big DataBigQuery MigrationMaxCompute

0 likes · 9 min read

How MaxCompute Streaming Insert Revolutionized Real‑Time Data Migration from BigQuery

Mike Chen's Internet Architecture

Aug 29, 2025 · Fundamentals

Understanding Distributed Storage: HDFS, CephFS, GlusterFS, and FastDFS Compared

This article compares four major distributed storage solutions—HDFS, CephFS, GlusterFS, and FastDFS—detailing their architectures, strengths, weaknesses, and ideal use cases for big‑data processing, cloud-native environments, and high‑concurrency file services, and how they fit into modern infrastructure strategies.

Big DataCephFSFastDFS

0 likes · 5 min read

Understanding Distributed Storage: HDFS, CephFS, GlusterFS, and FastDFS Compared

Kuaishou Tech

Aug 28, 2025 · Big Data

Auron Joins Apache Incubator: High‑Performance Vectorized Engine Accelerates Big Data Workloads

The Auron project, originally the Blaze engine from Kuaishou, has entered the Apache Software Foundation incubator, offering a Rust‑based native vectorized execution engine that integrates with Spark, delivers over two‑fold performance gains on TPC‑DS benchmarks, and is supported by a growing open‑source community.

Apache IncubatorAuronBig Data

0 likes · 6 min read

Auron Joins Apache Incubator: High‑Performance Vectorized Engine Accelerates Big Data Workloads

DataFunTalk

Aug 28, 2025 · Big Data

How JD Retail Tackles Data Governance Challenges to Boost Efficiency

JD Retail faces growing data volume, redundant models, and resource‑intensive storage, prompting a comprehensive data‑governance strategy that defines standards, streamlines architecture, isolates development, and optimizes compute and storage costs, ultimately enabling more efficient, secure, and agile data operations across the enterprise.

Big DataData ArchitectureData Governance

0 likes · 8 min read

DataFunTalk

Aug 27, 2025 · Big Data

How JD Retail Overcomes Data Governance Challenges to Boost Efficiency

JD Retail confronts growing data volume, redundant models, shared account risks, and rising storage costs, and responds with a comprehensive data governance framework that standardizes data, streamlines architecture, isolates development, and optimizes resources to achieve efficient, secure, and cost‑effective data operations.

Big DataData ArchitectureData Governance

0 likes · 8 min read

How JD Retail Overcomes Data Governance Challenges to Boost Efficiency

Alibaba Cloud Big Data AI Platform

Aug 26, 2025 · Big Data

How MaxCompute Evolves for Python & AI: From SDK to Native Distributed Engine

This article outlines MaxCompute's decade‑long evolution—from the early PyODPS SDK to the native Distributed Python Engine—highlights the challenges big‑data platforms face in the AI era, and showcases Data+AI solutions and real‑world case studies across multimodal processing, massive text deduplication, and autonomous‑driving data pipelines.

AI FunctionsBig DataData+AI

0 likes · 15 min read

How MaxCompute Evolves for Python & AI: From SDK to Native Distributed Engine

Alibaba Cloud Big Data AI Platform

Aug 26, 2025 · Big Data

How to Build a Multi‑Tenant Big Data Platform on MaxCompute: Lessons from a GCP Migration

This article details how a leading Southeast Asian tech group migrated from BigQuery to MaxCompute, designing a multi‑tenant big data platform with separate control and data planes, addressing cross‑account access, governance, and cost challenges on Alibaba Cloud.

Alibaba CloudBig DataMaxCompute

0 likes · 7 min read

How to Build a Multi‑Tenant Big Data Platform on MaxCompute: Lessons from a GCP Migration

Big Data Tech Team

Aug 25, 2025 · Interview Experience

Essential Big Data Interview Questions for Data Warehouse Engineer Roles

A comprehensive list of interview topics covering self‑introduction, career moves, data‑warehouse design, team building, architecture comparisons, fact‑table classification, common dimensions, performance tuning, and data‑governance for aspiring big‑data engineers.

Big DataData GovernanceFlink

0 likes · 4 min read

Essential Big Data Interview Questions for Data Warehouse Engineer Roles

Alibaba Cloud Big Data AI Platform

Aug 19, 2025 · Big Data

Cut Shuffle Costs by 60% with MaxCompute’s Cluster Optimization Tool

MaxCompute’s new Cluster Optimization Recommendation analyzes 31 days of shuffle data to automatically suggest optimal hash clustering keys, dramatically cutting shuffle traffic and CU consumption for large jobs, while providing one‑click ALTER TABLE scripts and detailed benefit reports to boost big‑data processing efficiency.

Big DataCost reductionHash Clustering

0 likes · 8 min read

Cut Shuffle Costs by 60% with MaxCompute’s Cluster Optimization Tool

Tencent Technical Engineering

Aug 18, 2025 · Artificial Intelligence

How Multi‑Agent AI Powers Zero‑Barrier Big Data Analysis in Tomoro’s Lumos

This article explores Tomoro’s Lumos data‑intelligent agent, detailing its multi‑agent architecture, technical design, practical implementations, and performance optimizations that enable seamless, low‑threshold big‑data self‑service analysis powered by AI.

AIBig DataLow-Code Analytics

0 likes · 30 min read

How Multi‑Agent AI Powers Zero‑Barrier Big Data Analysis in Tomoro’s Lumos

Big Data Technology & Architecture

Aug 18, 2025 · Fundamentals

5 Common Interview Pitfalls Uncovered from 16 Mock Sessions

After conducting 16 one‑on‑one mock interviews and debriefs, we identified five recurring issues—from lacking a holistic project view and poor expression to sloppy resume formatting, underutilizing large‑language models, and neglecting regular self‑review—that candidates should address to improve their interview performance.

Big Datacareer advicecommunication

0 likes · 6 min read

5 Common Interview Pitfalls Uncovered from 16 Mock Sessions

Mike Chen's Internet Architecture

Aug 17, 2025 · Big Data

Master Kafka: Essential Commands for Starting, Managing Topics, and Messaging

This guide walks you through the core Kafka commands for starting and stopping the service, creating, listing, describing, and deleting topics, as well as producing and consuming messages, while explaining key parameters such as Zookeeper, partitions, and replication factors.

Big DataDistributed SystemsKafka

0 likes · 4 min read

Master Kafka: Essential Commands for Starting, Managing Topics, and Messaging

Mike Chen's Internet Architecture

Aug 16, 2025 · Big Data

Mastering ELK: A Complete Guide to Elasticsearch, Logstash, and Kibana

This article introduces the ELK stack—Elasticsearch, Logstash, and Kibana—explaining each component, their roles in large‑scale log processing, and the step‑by‑step workflow for collecting, storing, and visualizing log data in modern big‑data environments.

Big DataELKElasticsearch

0 likes · 4 min read

Mastering ELK: A Complete Guide to Elasticsearch, Logstash, and Kibana

Alibaba Cloud Big Data AI Platform

Aug 13, 2025 · Big Data

How ODPS Evolved Over 15 Years into a Next‑Gen AI‑Ready Big Data Platform

This article chronicles ODPS's 15‑year journey from its exploratory beginnings to a modern, AI‑enabled big data platform, detailing its four development phases, architectural layers, SQL engine upgrades, real‑time processing, lakehouse integration, and the new Data+AI capabilities offered by MaxCompute and DataWorks.

AI integrationBig DataData Warehouse

0 likes · 12 min read

How ODPS Evolved Over 15 Years into a Next‑Gen AI‑Ready Big Data Platform

Big Data Technology Tribe

Aug 12, 2025 · Databases

Why Lakehouse Architecture Is Redefining Modern Data Platforms

This article explains the evolution from traditional data warehouses and data lakes to the unified Lakehouse architecture, detailing its design, benefits, challenges, and research directions for delivering high‑performance SQL and advanced analytics on open‑format storage.

Big DataData LakeData Warehouse

0 likes · 20 min read

Why Lakehouse Architecture Is Redefining Modern Data Platforms

ITPUB

Aug 10, 2025 · Databases

Why Did These Database Titans Fall? Lessons from 50 Years of DB Evolution

The article chronicles half a century of database history, analyzing the rise and collapse of systems like Informix, Sybase, FoxPro, HBase, and dBase, while examining how Oracle, Microsoft, and IBM are adapting to cloud and AI, and forecasting the forces reshaping the future of data storage.

AIBig DataDatabase History

0 likes · 9 min read

Why Did These Database Titans Fall? Lessons from 50 Years of DB Evolution

Sohu Smart Platform Tech Team

Aug 9, 2025 · Artificial Intelligence

How SimHash and Cosine Similarity Accelerate Large-Scale Text Deduplication

This article explains why traditional pairwise text comparison is impractical for massive news corpora, introduces cosine similarity and SimHash as efficient deduplication techniques, walks through their mathematical foundations, step‑by‑step implementation details, code examples, and discusses trade‑offs such as accuracy versus speed.

Big DataCosine SimilaritySimHash

0 likes · 12 min read

How SimHash and Cosine Similarity Accelerate Large-Scale Text Deduplication

JD Retail Technology

Aug 8, 2025 · Big Data

How JD.com Transformed Its Traffic Data Pipeline from Lambda to a Lakehouse Architecture

This article examines JD.com's migration of its massive traffic data processing from a dual Lambda architecture to an integrated lakehouse solution, detailing the challenges, innovative optimizations with Flink and Hudi, performance gains, cost reductions, and future directions for real‑time data handling.

Big DataFlinkHudi

0 likes · 10 min read

How JD.com Transformed Its Traffic Data Pipeline from Lambda to a Lakehouse Architecture

iQIYI Technical Product Team

Aug 7, 2025 · Big Data

Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance

Facing growing data demands in finance, we replaced two legacy synchronization pipelines with a unified, low‑latency architecture using BabelX Real‑Time, Flink CDC, Iceberg v2 and Paimon, achieving minute‑level data freshness, ten‑to‑thirty‑fold query speedups, reduced storage costs, and streamlined schema management across multiple business units.

Big DataFlinkIceberg

0 likes · 12 min read

Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance

Alibaba Cloud Big Data AI Platform

Aug 5, 2025 · Big Data

How Alibaba Built a World‑Class Big Data Platform Over a Decade

Over ten years, Alibaba’s data engineers transformed a modest Hadoop‑based system into a globally‑scalable, high‑performance big data platform—ODPS/MaxCompute—supporting massive offline and real‑time workloads, pioneering innovations like the 5K cluster expansion, Blink streaming, and the unified ‘Moon’ migration.

AlibabaBig DataData Platform

0 likes · 25 min read

How Alibaba Built a World‑Class Big Data Platform Over a Decade

Alibaba Cloud Big Data AI Platform

Aug 5, 2025 · Operations

Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes

The article details how Alibaba’s Tesla SRE platform supports the massive offline and real‑time big‑data ecosystems through a layered, data‑driven operations framework—DataOps—integrating unified portals, configuration, job, workflow, and analytics platforms, enabling automated monitoring, intelligent decision‑making, and self‑healing capabilities across 100,000+ nodes.

Big DataDataOpsOperations

0 likes · 20 min read

Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes

Alibaba Cloud Big Data AI Platform

Aug 5, 2025 · Big Data

How MaxQA Supercharges Query Performance for Large‑Scale Data Warehouses

This article details the migration of Southeast Asia's leading tech group GoTerra from Google BigQuery to Alibaba Cloud MaxCompute, explaining the performance challenges, the MaxQA accelerator architecture, optimization techniques, resource‑quota strategies, and future enhancements that together double query efficiency while reducing costs.

Big DataData WarehousePerformance Optimization

0 likes · 19 min read

How MaxQA Supercharges Query Performance for Large‑Scale Data Warehouses

Big Data Technology Tribe

Aug 5, 2025 · Big Data

How Spark’s Catalyst Optimizer Transforms SQL Queries: Trees, Rules, and Code Generation

This article explains Spark SQL’s Catalyst optimizer, describing its extensible design, tree‑based representation, rule‑driven transformations, batch execution to a fixed point, and how Scala’s pattern matching and quasiquotes enable efficient analysis, logical optimization, physical planning, and code generation.

Big DataCatalyst OptimizerCode Generation

0 likes · 18 min read

How Spark’s Catalyst Optimizer Transforms SQL Queries: Trees, Rules, and Code Generation

Alibaba Cloud Big Data AI Platform

Aug 4, 2025 · Big Data

Decoupling Ops Troubleshooting: Building a DataOps Warehouse with ETL

This article explains how to transform traditional SRE troubleshooting into a data‑driven process by pre‑collecting operational metrics into a data warehouse, using ETL to create layered data models (ODS, DIM, DWD, DWS) that enable efficient, repeatable analysis while balancing data freshness and storage costs.

Big DataData WarehouseDataOps

0 likes · 7 min read

Decoupling Ops Troubleshooting: Building a DataOps Warehouse with ETL

Alibaba Cloud Big Data AI Platform

Aug 4, 2025 · Operations

Unlocking Unmanned Ops: DataOps & SRE Strategies for Big Data Management

The article explains how DataOps and SRE practices enable large‑scale, data‑driven operations in big‑data environments, aiming for fully automated, intelligent, and ultimately unmanned management of complex systems.

AI OpsBig DataDataOps

0 likes · 6 min read

Unlocking Unmanned Ops: DataOps & SRE Strategies for Big Data Management

Kuaishou Tech

Jul 31, 2025 · Big Data

How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics

This article details how Kuaishou’s content middle platform tackled the massive challenges of real‑time, flexible, and cost‑effective data analysis at trillion‑scale by redesigning its architecture, adopting ClickHouse, splitting wide tables, and implementing a scatter‑gather execution model with pre‑shuffle and bitmap optimizations.

Big DataClickHousePerformance Optimization

0 likes · 17 min read

How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics

Data Party THU

Jul 31, 2025 · Industry Insights

How a 30‑Minute Steel Melt Can Unlock a 10% Production Boost – Insights from Industrial Data Analysis

The article explores real‑world industrial cases—from steel furnace timing and historic lithography to modern manufacturing—showing how continuous improvement, root‑cause analysis, and careful handling of correlation versus causation can reveal hidden inefficiencies, while highlighting the limits of traditional statistics and the emerging role of AI in industrial data analytics.

AIBig DataContinuous Improvement

0 likes · 14 min read

How a 30‑Minute Steel Melt Can Unlock a 10% Production Boost – Insights from Industrial Data Analysis

ITPUB

Jul 29, 2025 · Big Data

How to Deduplicate 4 Billion QQ IDs Using a Bitmap Within 1 GB Memory

Learn how to efficiently remove duplicates from 4 billion QQ numbers using a memory‑friendly Bitmap approach that fits within a 1 GB limit, including calculations, step‑by‑step implementation, Java code, and a discussion of its advantages and drawbacks.

Big DataBitmapData Structures

0 likes · 9 min read

How to Deduplicate 4 Billion QQ IDs Using a Bitmap Within 1 GB Memory

360 Tech Engineering

Jul 29, 2025 · Information Security

How AI and Big Data Are Redefining Global Cybersecurity – Insights from Zhou Hongyi

In his 2025 World Internet Conference Digital Silk Road Forum keynote, Zhou Hongyi warned that the programmable, AI‑driven, data‑centric world amplifies cyber vulnerabilities, described the rise of state‑level cyber warfare and AI‑powered attacks, and outlined 360’s security‑as‑service strategy and global cooperation plans to protect nations and enterprises.

AIBig DataSecurity Operations

0 likes · 5 min read

How AI and Big Data Are Redefining Global Cybersecurity – Insights from Zhou Hongyi

Alibaba Cloud Big Data AI Platform

Jul 29, 2025 · Big Data

How GoTerra Cut Costs and Boost Speed: BigQuery‑to‑MaxCompute Performance Secrets

This article details the real‑world migration of a leading Southeast Asian tech group from BigQuery to MaxCompute, exposing the three major challenges, the data‑driven performance‑optimization methodology, and the concrete techniques—Auto Partition, UNNEST redesign, large‑query graph optimizations, and intelligent tuning—that delivered dramatic cost reductions and query‑speed gains.

Auto PartitionBig DataData Warehouse Migration

0 likes · 17 min read

How GoTerra Cut Costs and Boost Speed: BigQuery‑to‑MaxCompute Performance Secrets

Bilibili Tech

Jul 25, 2025 · Big Data

How Unified Metadata Lineage Transforms Big Data Governance and Security

This article introduces the comprehensive design and evolution of a unified metadata lineage platform for big data, covering background, data processing chain, lineage models, system architecture, quality metrics, application scenarios, and future plans to enhance data governance, quality, and security.

Big DataData GovernanceData Quality

0 likes · 27 min read

How Unified Metadata Lineage Transforms Big Data Governance and Security

Alibaba Cloud Big Data AI Platform

Jul 25, 2025 · Big Data

Cross-Contrastive Learning Cuts Flink Anomaly Detection Errors by 12%

The paper “Noise Matters: Cross Contrastive Learning for Flink Anomaly Detection”, accepted at VLDB 2025, introduces a novel cross‑contrastive method that leverages attention‑based representations and a boundary‑aware loss to detect Flink‑specific hotspot anomalies, achieving a 12.1% F1 improvement over state‑of‑the‑art techniques.

Big DataCross-Contrastive LearningFlink

0 likes · 6 min read

Cross-Contrastive Learning Cuts Flink Anomaly Detection Errors by 12%

Big Data Technology & Architecture

Jul 23, 2025 · Big Data

What’s New in Apache Flink 2.0? Key Features and Cloud‑Native Upgrades for 2025

This article summarizes the major Apache Flink 2.0 updates released in the first half of 2025, covering architecture separation, cloud‑native deployment, AI‑driven agents, SQL enhancements, data integration, operational tools, and performance optimizations for real‑time intelligent computing.

AI integrationBig DataCloud Native

0 likes · 10 min read

What’s New in Apache Flink 2.0? Key Features and Cloud‑Native Upgrades for 2025

ITFLY8 Architecture Home

Jul 20, 2025 · Big Data

Exploring the Architecture of a Data Lake and Application Platform

This article outlines the overall architecture, data architecture, logical project structure, and the construction of a data resource center for a data lake and application platform, illustrated through a series of diagrams that depict each component and their interconnections.

Big DataData LakeData Platform

0 likes · 1 min read

Exploring the Architecture of a Data Lake and Application Platform

DataFunSummit

Jul 19, 2025 · Artificial Intelligence

Big Data Meets Generative AI: Industry Transformations from Prof. Dou

Prof. Dou Dejing shares his journey into Fudan University's Data Intelligence Lab, outlines the history and synergy of big data and AI, reviews generative AI breakthroughs, evaluates large‑model strengths and weaknesses, and explores their expanding industrial applications and market potential.

Big Dataartificial intelligencegenerative AI

0 likes · 13 min read

Big Data Meets Generative AI: Industry Transformations from Prof. Dou

DataFunSummit

Jul 18, 2025 · Databases

Boosting ClickHouse on WeChat: Performance Tools, Lakehouse Hacks & AI

This article explores how ClickHouse is deployed across WeChat for real‑time analytics, introduces a suite of performance‑monitoring tools, details lakehouse read and bitmap optimizations, and describes the integration of AI‑driven vector search, showcasing substantial speedups and scalability improvements.

AIBig DataClickHouse

0 likes · 12 min read

Boosting ClickHouse on WeChat: Performance Tools, Lakehouse Hacks & AI

Alibaba Cloud Big Data AI Platform

Jul 18, 2025 · Big Data

MaxCompute’s Complex Type Overhaul Boosts Performance Beyond BigQuery

The article examines a real-world migration of a major Southeast Asian tech group from Google BigQuery to Alibaba Cloud MaxCompute, highlighting the challenges of complex data types, the columnar storage and execution engine redesigns, and the resulting performance gains that often surpass BigQuery.

Big DataComplex TypesData Warehouse

0 likes · 12 min read

MaxCompute’s Complex Type Overhaul Boosts Performance Beyond BigQuery

Youzan Coder

Jul 18, 2025 · Cloud Native

How Mixed Workloads Boost Kubernetes CPU Utilization by Over 40%

This article explains how Youzan transformed its Kubernetes clusters from static over‑commit scheduling to load‑balanced mixed workloads using Koordinator and the Longxi kernel, achieving higher CPU utilization, lower costs, and better resource management for both online and offline services.

Big DataCloud NativeKoordinator

0 likes · 10 min read

How Mixed Workloads Boost Kubernetes CPU Utilization by Over 40%

DataFunSummit

Jul 18, 2025 · Big Data

Data Lake & Lakehouse Innovations: Real-Time Analytics and Industry Case Studies

This article presents a curated collection of cutting‑edge data lake and lakehouse case studies—including real‑time analytics, cloud‑native architectures, industry implementations from sales platforms to automotive IoT, and the latest advancements in open‑source projects—offering insights into modern big‑data strategies and governance.

Big DataData LakeLakehouse

0 likes · 2 min read

Data Lake & Lakehouse Innovations: Real-Time Analytics and Industry Case Studies