Tagged articles

Big Data

3720 articles · Page 1 of 38
DataFunSummit
DataFunSummit
Jul 4, 2026 · Industry Insights

How a Modern Data Platform Is Redefining the Future of Insurance

The article details how Ping An Property & Casualty transformed its legacy siloed data architecture into a systematic Kunpeng Intelligent Platform, built three core pillars—Agent platform, OSI semantic layer, and AI tools—boosted ChatBI accuracy, evaluated OpenClaw’s limits, and delivered end‑to‑end AI across marketing, underwriting, claims, agriculture, and forecasting.

AIBig DataData Platform
0 likes · 12 min read
How a Modern Data Platform Is Redefining the Future of Insurance
DataFunTalk
DataFunTalk
Jun 30, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

Xiaohongshu, with over 3.5 billion monthly users and daily logs in the trillions, migrated 500 PB of data to Alibaba Cloud and iterated its data platform through four architecture generations—ClickHouse‑based ad‑hoc, Lambda, Lakehouse, and a unified incremental compute model—cutting resource, development, and storage costs to one‑third while delivering sub‑10‑second query latency at petabyte scale.

Big DataClickHouseData Architecture
0 likes · 22 min read
How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era
DataFunTalk
DataFunTalk
Jun 25, 2026 · Big Data

From Writing SQL to Speaking Requirements: Practical Guide to DataWorks Data Agent

This article walks through using DataWorks Data Agent to automate end‑to‑end data‑warehouse development—from preparing source tables and a structured requirement document, uploading it, crafting task commands, selecting execution modes and models, to the agent generating SQL, building workflows, publishing them, and producing a final report—all without writing SQL manually.

AI AutomationBig DataData Agent
0 likes · 16 min read
From Writing SQL to Speaking Requirements: Practical Guide to DataWorks Data Agent
DataFunTalk
DataFunTalk
Jun 24, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu, with over 350 million monthly users and daily logs in the billions, migrated its data platform from AWS to Alibaba Cloud and iterated four times—from a ClickHouse‑based ad‑hoc layer to a Lambda architecture and finally a Lakehouse with incremental compute—cutting architecture complexity, resource cost and development effort each to about one‑third while delivering second‑level analytics on trillion‑scale data.

Big DataClickHouseData Architecture
0 likes · 22 min read
How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era
dbaplus Community
dbaplus Community
Jun 23, 2026 · Big Data

From Hand‑Written SQL to One‑Click Validation: Alibaba’s Verify‑Data Agent Skill Design Review

The article details how Alibaba’s production‑grade Verify‑Data Agent Skill replaces manual, multi‑SQL data validation with a single natural‑language command, automating table discovery, SQL generation, execution, and review‑level reporting, achieving up to 30‑minute turnaround, comprehensive coverage, and robust risk controls for big‑data pipelines.

Big DataData QualityData Validation
0 likes · 28 min read
From Hand‑Written SQL to One‑Click Validation: Alibaba’s Verify‑Data Agent Skill Design Review
DataFunTalk
DataFunTalk
Jun 21, 2026 · Big Data

How Zhihu Optimized Spark Jobs with Gluten: A Practical Deep‑Dive

This article details Zhihu's end‑to‑end experience of migrating Spark SQL workloads to the open‑source Gluten framework, covering background performance benchmarks, the architecture of Gluten and Velox, consistency and performance challenges encountered during migration, the concrete fixes applied, and the resulting resource savings and future plans.

Big DataGlutenOptimization
0 likes · 22 min read
How Zhihu Optimized Spark Jobs with Gluten: A Practical Deep‑Dive
DataFunSummit
DataFunSummit
Jun 20, 2026 · Big Data

Building an Agentic Analytics Platform for the Gaming Industry with SelectDB

The article analyzes the fourfold challenges of game‑industry data analysis—high timeliness, massive concurrency, heterogeneous sources, and petabyte‑scale volumes—and explains how SelectDB’s evolution to an AI‑Ready, Agentic platform with MCP and a semantic layer addresses these issues through real‑time OLAP, multimodal processing, and autonomous decision loops.

AI-ReadyAgentic AIBig Data
0 likes · 16 min read
Building an Agentic Analytics Platform for the Gaming Industry with SelectDB
DataFunTalk
DataFunTalk
Jun 20, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

The article details Xiaohongshu's step‑by‑step migration from a simple ClickHouse‑based analytics stack to a Lambda‑style 2.0 architecture and finally to a Lakehouse‑based 3.0 design, highlighting concrete performance numbers, cost reductions, and the definition of a generic incremental‑compute model (SPOT) that underpins the evolution.

Big DataClickHouseData Architecture
0 likes · 22 min read
How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era
DataFunSummit
DataFunSummit
Jun 19, 2026 · Big Data

Near‑Real‑Time Data Warehousing with Yunqi Lakehouse: Cases from Xiaohongshu, Kuaishou, Meituan

The article examines how Xiaohongshu, Kuaishou and Meituan adopted Yunqi Lakehouse’s General Incremental Computing and Single‑Engine architecture to achieve near‑real‑time data warehouses, cutting resource usage to as low as 1/20 of full‑batch jobs, reducing data latency from days to minutes, and improving query performance.

Big DataCase StudyGeneral Incremental Computing
0 likes · 12 min read
Near‑Real‑Time Data Warehousing with Yunqi Lakehouse: Cases from Xiaohongshu, Kuaishou, Meituan
DataFunTalk
DataFunTalk
Jun 16, 2026 · Big Data

How MaxCompute Evolves Data Platforms for AI: Architecture, Features, and Real‑World Cases

The article explains how Alibaba Cloud's MaxCompute transforms a traditional data warehouse into a cloud‑native, multimodal Data+AI platform by introducing a four‑layer architecture, SQL‑based AI functions, the Python‑native MaxFrame framework, and a series of industry case studies that demonstrate performance gains and flexible resource scheduling.

Big DataCloud NativeData+AI
0 likes · 11 min read
How MaxCompute Evolves Data Platforms for AI: Architecture, Features, and Real‑World Cases
IT Learning Made Simple
IT Learning Made Simple
Jun 14, 2026 · Industry Insights

Why Data Architects Are the Hottest Talent in the DT Era

The article explains why data architects have become essential in the DT era, detailing their responsibilities, core skills, big‑data technology stack, governance practices, career paths, and the tools they use to turn data into a strategic asset for enterprises.

Big DataCareer PathData Architecture
0 likes · 9 min read
Why Data Architects Are the Hottest Talent in the DT Era
dbaplus Community
dbaplus Community
Jun 14, 2026 · Big Data

Why Big Data Is Falling Silent: When Scale Can’t Fake Value Anymore

Although national data production reached 52.26 ZB in 2025 and keeps growing, the term “big data” is disappearing because it no longer serves as an organizational credit that hides the need for real value, responsibility, and measurable business impact, especially in the AI era.

AI impactBig DataData Governance
0 likes · 13 min read
Why Big Data Is Falling Silent: When Scale Can’t Fake Value Anymore
DataFunTalk
DataFunTalk
Jun 11, 2026 · Artificial Intelligence

How Qichacha Leverages Large Language Models for Field‑Level Data Lineage

This article details Qichacha's use of large language models to extract field‑level data lineage from heterogeneous, non‑standard code and ETL assets, describing the motivation, architectural blueprint, practical challenges such as cost, accuracy and hallucination, and the resulting improvements in impact analysis, metric tracing, and sensitive‑data governance.

Big DataData GovernanceFlink
0 likes · 11 min read
How Qichacha Leverages Large Language Models for Field‑Level Data Lineage
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 11, 2026 · Big Data

How iQIYI’s QBFS Enables Seamless Hybrid‑Cloud Storage and Cuts Big‑Data Costs by Over 30%

iQIYI’s big‑data team built a self‑developed QBFS virtual file system that unifies private and multiple public clouds, providing transparent routing, automatic migration, intelligent caching and fine‑grained governance, which together reduce storage and compute costs by more than 30 % while supporting scalable analytics.

Big DataCachingData Migration
0 likes · 21 min read
How iQIYI’s QBFS Enables Seamless Hybrid‑Cloud Storage and Cuts Big‑Data Costs by Over 30%
IT Learning Made Simple
IT Learning Made Simple
Jun 8, 2026 · R&D Management

The Essential Gear to Become a Software Architect

This guide maps the complete skill tree for aspiring software architects, detailing foundational knowledge, core competencies such as system design and performance tuning, extended expertise in cloud‑native and big‑data technologies, and a staged learning roadmap to help newcomers acquire the necessary gear.

Big DataCloud NativePerformance Optimization
0 likes · 9 min read
The Essential Gear to Become a Software Architect
DataFunSummit
DataFunSummit
Jun 7, 2026 · Artificial Intelligence

How Qichacha Uses Large Language Models for Field‑Level Data Lineage

This article details Qichacha's technical journey of applying large language models to resolve field‑level data lineage challenges in a complex, multi‑source data environment, describing the motivation, architecture, practical implementation, engineering trade‑offs, and measurable outcomes.

AIBig DataData Governance
0 likes · 11 min read
How Qichacha Uses Large Language Models for Field‑Level Data Lineage
Digital Planet
Digital Planet
Jun 6, 2026 · Big Data

Why Has the Term “Big Data” Suddenly Disappeared?

Although data production continues to surge—reaching 52.26 ZB in 2025—the “big data” label is fading because its original narrative of scale as value has run out, exposing a credit‑and‑responsibility gap that forces organizations to demand concrete business impact rather than mere infrastructure.

AI impactBig DataData Governance
0 likes · 15 min read
Why Has the Term “Big Data” Suddenly Disappeared?
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 4, 2026 · Big Data

Scalar‑Vector Hybrid Search in a Data Lake with One SQL on EMR Serverless Spark

EMR Serverless Spark now supports scalar‑vector hybrid search via DLF Global Index, allowing a single Spark SQL statement to perform vector similarity and scalar filtering together, eliminating data movement, reducing latency, and boosting performance for scenarios such as autonomous driving, e‑commerce, and knowledge‑base retrieval.

Big DataDLF Global IndexEMR Serverless Spark
0 likes · 17 min read
Scalar‑Vector Hybrid Search in a Data Lake with One SQL on EMR Serverless Spark
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
May 28, 2026 · Artificial Intelligence

From Assisted to Autonomous: How DataWorks Data Agent Revolutionizes Data Intelligence

DataWorks Data Agent advances from an assisted, code‑completion tool to a fully autonomous data‑intelligent agent, using a dual‑engine CLI/Claw architecture, unified runtime, open Skill ecosystem, and CPU‑GPU co‑optimization to automatically understand requirements, explore data, generate code, execute tasks, and deliver end‑to‑end results for developers and operators.

AIAutomationBig Data
0 likes · 10 min read
From Assisted to Autonomous: How DataWorks Data Agent Revolutionizes Data Intelligence
DataFunSummit
DataFunSummit
May 28, 2026 · Artificial Intelligence

How DataWorks Data Agent Advances from Augmented Assistance to Full Autonomy

The article analyzes DataWorks Data Agent’s evolution from a helper‑style tool to an autonomous data‑centric AI agent, detailing its five‑stage roadmap, dual‑engine CLI/Claw architecture, unified runtime kernel, open skill ecosystem, and CPU‑GPU joint optimization for enterprise‑grade data automation.

AIAutomationBig Data
0 likes · 12 min read
How DataWorks Data Agent Advances from Augmented Assistance to Full Autonomy
DataFunTalk
DataFunTalk
May 28, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse with generic incremental compute, cutting architecture complexity, resource and development costs by one‑third while delivering second‑level queries over trillions of rows.

Big DataClickHouseData Architecture
0 likes · 21 min read
How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era
DataFunTalk
DataFunTalk
May 25, 2026 · Big Data

MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases

This article examines how Alibaba Cloud’s MaxCompute platform has been transformed for AI workloads, detailing its multi‑layer architecture, multimodal data storage, SQL AI functions, the Python‑based MaxFrame framework, and real‑world deployments in large‑model preprocessing, autonomous driving, and multimodal image labeling.

AIBig DataDistributed Computing
0 likes · 12 min read
MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases
AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
May 25, 2026 · Artificial Intelligence

AI‑Powered Underwater Simulation: Autonomous Perception, Decision & Execution

The article presents a comprehensive AI‑driven framework for unmanned underwater vehicles, detailing a three‑layer decision architecture, human‑machine collaboration models, conflict‑resolution mechanisms, data acquisition and simulation pipelines, ontology‑based knowledge graphs, and self‑evolution processes to enable reliable autonomous perception, planning, and actuation in complex marine environments.

Big DataOperationsR&D Management
0 likes · 30 min read
AI‑Powered Underwater Simulation: Autonomous Perception, Decision & Execution
AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
May 24, 2026 · Industry Insights

From CIA‑Labeled ‘Garbage’ to Military Disappointment: Palantir’s Series of Failures

The article chronicles Palantir’s two‑decade saga of high‑profile setbacks—from a $5 billion, six‑year military AI project and a failed financial platform to stalled consumer data alliances—showing how advanced algorithms falter when detached from real‑world business needs.

AIBig DataIndustry Analysis
0 likes · 8 min read
From CIA‑Labeled ‘Garbage’ to Military Disappointment: Palantir’s Series of Failures
Big Data Tech Team
Big Data Tech Team
May 24, 2026 · Big Data

Data Warehouse Interview Pitfall Guide 2.0: Avoid Common SQL, Modeling, and ETL Mistakes

This guide compiles the most frequent interview pitfalls for data warehouse roles, covering SQL join and aggregation errors, window function misuse, subquery versus CTE performance myths, dimensional modeling mistakes, SCD implementation traps, layered design issues, data quality handling, ETL traps, Hive and Spark performance questions, real‑time warehousing considerations, and effective interview strategies.

Big DataETLHive
0 likes · 3 min read
Data Warehouse Interview Pitfall Guide 2.0: Avoid Common SQL, Modeling, and ETL Mistakes
DataFunTalk
DataFunTalk
May 22, 2026 · Big Data

How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era

The article details Xiaohongshu's evolution from a simple ClickHouse‑based analytics layer to a Lambda‑enabled 2.0 stack and finally a Lakehouse‑based 3.0 architecture, showing how each iteration reduced infrastructure complexity, resource consumption and development effort by roughly one‑third while supporting trillions of daily events and AI‑driven use cases.

Big DataClickHouseData Architecture
0 likes · 21 min read
How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era
DataFunSummit
DataFunSummit
May 21, 2026 · Big Data

Alibaba Cloud’s Agent-Ready Big Data AI Infrastructure: Boosting Data Development from Hours to Minutes

Facing a projected 85% of enterprises deploying internal agents within two years, Alibaba Cloud proposes an Agent-Ready big‑data AI infrastructure—comprising a unified data lake, real‑time processing, high‑dimensional vector retrieval, elastic model serving, and comprehensive security governance—that has already cut data‑development cycles from hours to 5‑10 minutes in internal model‑training and Taobao flash‑sale scenarios.

AIAgent-ReadyBig Data
0 likes · 15 min read
Alibaba Cloud’s Agent-Ready Big Data AI Infrastructure: Boosting Data Development from Hours to Minutes
DataFunSummit
DataFunSummit
May 20, 2026 · Big Data

How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture

The article explains how Kuaishou partnered with Apache Hudi to overhaul its ODS‑based data lake, addressing latency, storage cost, and complexity for AI and BI workloads, detailing the evolution from mysql‑to‑hive to mysql‑to‑hudi 1.0 and 2.0, the resulting performance gains, cost savings, and future roadmap.

AIBIBig Data
0 likes · 20 min read
How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture
Linyb Geek Road
Linyb Geek Road
May 20, 2026 · Big Data

Why 90% of Companies Get Data Governance Wrong and How to Reduce Friction

Most data‑governance initiatives fail not because of lacking technology but because they add friction; the article explains how companies mistakenly focus on rules, platforms, and processes, and offers a step‑by‑step approach—identifying high‑value tables, minimal metadata, targeted quality rules, and fast issue diagnosis—to make governance truly useful.

Big DataData GovernanceData Quality
0 likes · 29 min read
Why 90% of Companies Get Data Governance Wrong and How to Reduce Friction
DataFunTalk
DataFunTalk
May 19, 2026 · Industry Insights

From Single‑Point Copilot to Platform‑Level Agentic: Real Challenges and Future Forks for Data Platforms

A live discussion dissected the shift from single‑point Copilot assistants to platform‑level Agentic data platforms, exposing hard architectural, security, knowledge‑base, evaluation, stability‑cost, and governance challenges while debating whether the future will favor a super‑agent or a multi‑agent ecosystem.

Agentic AIBig DataData Platform
0 likes · 18 min read
From Single‑Point Copilot to Platform‑Level Agentic: Real Challenges and Future Forks for Data Platforms
DataFunSummit
DataFunSummit
May 17, 2026 · Industry Insights

From Single‑point Copilot to Platform‑level Agentic: Real Challenges and Future Paths for Data Platforms

A 90‑minute live discussion with data experts from vivo and YangQianGuan reveals that moving from a simple Copilot assistant to a platform‑level Agentic data system requires fundamental architectural changes, new infrastructure for memory, planning, tool orchestration, security guardrails, knowledge management, robust evaluation, and a clear ROI strategy.

AI GovernanceBig DataData Platform
0 likes · 19 min read
From Single‑point Copilot to Platform‑level Agentic: Real Challenges and Future Paths for Data Platforms
Data Party THU
Data Party THU
May 15, 2026 · Artificial Intelligence

2026 Big Data Challenge Announces Monthly Star Winners and Shares Winning Teams’ Insights

The 2026 China University Computer Competition – Big Data Challenge reveals the Monthly Star award winners, each receiving 800 RMB, and presents detailed experience reports from the top teams covering feature engineering, model selection, training validation, and ensemble strategies for stock prediction.

Big DataModel FusionTime Series Validation
0 likes · 7 min read
2026 Big Data Challenge Announces Monthly Star Winners and Shares Winning Teams’ Insights
dbaplus Community
dbaplus Community
May 14, 2026 · Big Data

Building a ‘One‑Sentence Bank’: Big Data and AI Fusion for Small Banks

The article outlines the evolution of big data in banking, compares management models for heterogeneous data, describes the shift from data engineering to knowledge engineering, introduces LLMOps for high‑quality knowledge bases, and details how integrating AI and data can enable a “one‑sentence bank” that answers queries and executes tasks.

Big DataData GovernanceKnowledge Engineering
0 likes · 22 min read
Building a ‘One‑Sentence Bank’: Big Data and AI Fusion for Small Banks
vivo Internet Technology
vivo Internet Technology
May 13, 2026 · Big Data

How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations

This article details Vivo's end‑to‑end upgrade of a YARN 2.6.0 cluster to a modern version for a million‑node, hundred‑thousand‑tasks‑per‑day platform, covering architectural evolution, scheduler migration, compatibility fixes, performance tuning, and service‑continuity strategies.

Big DataCapacity SchedulerHadoop
0 likes · 28 min read
How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations
DeWu Technology
DeWu Technology
May 13, 2026 · Big Data

How BP Claw Solves AI Coding Input Challenges in FlinkSpec’s Real‑Time Data Warehouse

The article explains how BP Claw tackles unstable AI coding results by automatically converting low‑quality PRD documents into structured, high‑quality requirements, applying token‑saving strategies, strict hallucination guards, and multi‑skill orchestration, which together boost FlinkSpec’s real‑time data‑warehouse delivery efficiency by up to 30%.

AI codingBP ClawBig Data
0 likes · 17 min read
How BP Claw Solves AI Coding Input Challenges in FlinkSpec’s Real‑Time Data Warehouse
DataFunTalk
DataFunTalk
May 11, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse built on Iceberg, StarRocks, Flink and Spark, cutting architecture complexity, resource and development costs by two‑thirds while supporting trillions of daily events with sub‑second query latency.

Big DataClickHouseFlink
0 likes · 22 min read
How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era
DataFunTalk
DataFunTalk
May 8, 2026 · Big Data

How MaxCompute Evolves into a Data+AI Platform: Architecture, Core Capabilities, and Real-World Cases

The article explains how Alibaba Cloud's MaxCompute has been transformed into a cloud‑native Data+AI platform, detailing its layered architecture, multimodal storage, model management, hybrid compute scheduling, SQL AI functions, the MaxFrame Python framework, and several enterprise case studies that demonstrate performance gains and flexible resource orchestration.

AI integrationBig DataCloud Native
0 likes · 11 min read
How MaxCompute Evolves into a Data+AI Platform: Architecture, Core Capabilities, and Real-World Cases
DataFunTalk
DataFunTalk
May 6, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

The article details Xiaohongshu's four‑stage data‑platform evolution—from a simple ClickHouse ad‑hoc setup to a Lambda‑based 2.0 design and finally a lakehouse‑driven 3.0 architecture—highlighting the adoption of general incremental compute, cost‑reduction to one‑third, performance gains of up to ten‑fold, and the SPOT standards that guide the new system.

Big DataClickHouseData Architecture
0 likes · 21 min read
How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era
DataFunTalk
DataFunTalk
Apr 29, 2026 · Big Data

How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based analytics stack to a unified lakehouse with generic incremental compute, cutting architecture complexity, resource cost, and development effort by roughly one‑third while supporting petabyte‑scale, sub‑second queries across its 350 million‑user app.

Big DataClickHouseData Architecture
0 likes · 22 min read
How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era
Model Perspective
Model Perspective
Apr 28, 2026 · Big Data

How a Taiwan Ban Became Free Advertising for Amap’s Map App

A recent Taiwan government warning against Amap turned into a viral boost, exposing the app’s superior traffic‑light countdown, massive data‑driven network effects, and the underlying reverse‑propagation model that explains why the ban accelerated downloads rather than suppressing them.

AmapBig DataNetwork Effects
0 likes · 11 min read
How a Taiwan Ban Became Free Advertising for Amap’s Map App
DataFunTalk
DataFunTalk
Apr 28, 2026 · Artificial Intelligence

From “Lobster” to Ontology: DACon Reveals the Next Trend in Self‑Evolving AI Agents

The DACon conference in Shanghai gathered over 8,000 developers and experts, showcasing 50 talks that explored self‑evolving AI agents, the open‑source GenericAgent framework, data‑governance ontology, Agent‑Ready big‑data infrastructure, and AI+AR ecosystems, while highlighting practical case studies and future industry directions.

AI AgentsAI+ARBig Data
0 likes · 11 min read
From “Lobster” to Ontology: DACon Reveals the Next Trend in Self‑Evolving AI Agents
DataFunSummit
DataFunSummit
Apr 27, 2026 · Artificial Intelligence

How Tencent Games Leverages AI to Turn Data Governance into a Service

Tencent Games’ data governance team details an AI‑driven, end‑to‑end semantic framework that shifts traditional rule‑based data management to a service‑oriented model, cutting storage waste by 30 %, halving development time, and boosting asset recommendation accuracy to 95 % across its global gaming platform.

AIBig DataData Governance
0 likes · 19 min read
How Tencent Games Leverages AI to Turn Data Governance into a Service
DataFunSummit
DataFunSummit
Apr 25, 2026 · Big Data

AI‑Era Multimodal Data Lake Infrastructure: TBDS Design, Storage, Compute, and Governance

The article analyzes how Tencent Cloud's TBDS platform tackles the AI era's multimodal data lake challenges through a native storage format (Lance), elastic Ray‑based compute, standardized metadata with Gravitino, and automated governance via Lakekeeper, citing architecture details, performance numbers, and real‑world deployments.

AI InfrastructureBig DataGravitino
0 likes · 13 min read
AI‑Era Multimodal Data Lake Infrastructure: TBDS Design, Storage, Compute, and Governance
DataFunSummit
DataFunSummit
Apr 24, 2026 · Artificial Intelligence

AI‑Driven Data Governance as a Service: Tencent Games' Paradigm Shift

This talk details how Tencent Games leverages AI to transform its data governance from rule‑based, passive processes into a semantic, service‑oriented paradigm, addressing resource waste, low collaboration efficiency, and scalability challenges while delivering measurable improvements in cost, speed, and asset quality.

AIAutomationBig Data
0 likes · 19 min read
AI‑Driven Data Governance as a Service: Tencent Games' Paradigm Shift
DataFunTalk
DataFunTalk
Apr 22, 2026 · Industry Insights

How Xiaohongshu Cut Data Platform Costs by Two‑Thirds with Incremental Computing

This article details Xiaohongshu's journey from a ClickHouse‑based batch analytics stack to a unified lakehouse architecture powered by generic incremental computing, showing how the company reduced architecture complexity, resource consumption and development effort each to roughly one‑third while supporting trillions of daily events with sub‑10‑second query latency.

Big DataData ArchitectureLakehouse
0 likes · 24 min read
How Xiaohongshu Cut Data Platform Costs by Two‑Thirds with Incremental Computing
Big Data Tech Team
Big Data Tech Team
Apr 22, 2026 · Big Data

Inside Big Tech: Full Breakdown of AI Agents for Data Warehouse Governance

The article analyzes how leading internet companies embed AI agents across the entire data‑warehouse lifecycle to automate governance, presenting real‑world case studies from Alibaba, ByteDance, JD.com and Tencent, and quantifies benefits such as over 65% reduction in manual effort, 50% drop in metric duplication, and a 40% boost in resource utilization.

AI AgentsAutomationBig Data
0 likes · 10 min read
Inside Big Tech: Full Breakdown of AI Agents for Data Warehouse Governance
DataFunSummit
DataFunSummit
Apr 21, 2026 · Industry Insights

How SelectDB Cuts 60% Costs and Boosts Real‑Time Performance for New Energy Batteries

The whitepaper analyzes the data‑driven transformation of the new‑energy battery sector, outlines four core challenges—massive data streams, fast‑changing R&D demands, long manufacturing cycles, and multi‑dimensional quality standards—and demonstrates how SelectDB’s unified lake‑warehouse architecture delivers million‑level throughput, second‑level latency, up to 30× query speedup, and 60% cost reduction across real‑world case studies.

Big DataCase StudyData Warehouse
0 likes · 18 min read
How SelectDB Cuts 60% Costs and Boosts Real‑Time Performance for New Energy Batteries
DataFunSummit
DataFunSummit
Apr 19, 2026 · Big Data

How OPPO Built a Multi‑Modal Data Lake with Gravitino and Curvine

OPPO’s data‑lake team, led by David, detailed their transition from Hive‑Spark to a unified multi‑modal lake, leveraging Gravitino for cross‑engine metadata management and the open‑source Curvine cache to eliminate data silos, boost I/O performance, and support massive image, recommendation, and AI‑Agent workloads.

Big DataData LakeMultimodal
0 likes · 11 min read
How OPPO Built a Multi‑Modal Data Lake with Gravitino and Curvine
Big Data Tech Team
Big Data Tech Team
Apr 17, 2026 · Industry Insights

Can AI Replace Data Warehouse Engineers? Exploring the Future of Data Modeling

The article examines how large‑language‑model AI can automate data‑warehouse modeling tasks—generating SQL, designing schemas, handling ETL, and tracing lineage—while highlighting current pain points, practical limitations, and four emerging trends that will reshape the role of data engineers over the next few years.

AIBig DataData Warehouse
0 likes · 11 min read
Can AI Replace Data Warehouse Engineers? Exploring the Future of Data Modeling
Ctrip Technology
Ctrip Technology
Apr 16, 2026 · Big Data

How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s

When attribution analysis on over 900 million rows slowed to more than 40 seconds and threatened cluster stability, Ctrip's smart attribution team rebuilt the architecture with Ray and DuckDB, achieving sub‑15‑second query times, 160 % performance gain, and complete resource isolation.

Attribution AnalysisBig DataDistributed Computing
0 likes · 22 min read
How Ray + DuckDB Cut 9B-Row Attribution Queries from 40s to 15s
DataFunTalk
DataFunTalk
Apr 16, 2026 · Big Data

How Xiaohongshu Cut Data Architecture Costs by Two‑Thirds with Incremental Computing

This article details Xiaohongshu's data platform evolution from a simple ClickHouse‑based ad‑hoc system to a Lambda‑style architecture and finally a lakehouse solution, highlighting how the adoption of a new incremental computing model reduced architectural complexity, resource consumption and development effort each to roughly one‑third while delivering sub‑second query performance on petabyte‑scale data.

Big DataData ArchitectureLakehouse
0 likes · 21 min read
How Xiaohongshu Cut Data Architecture Costs by Two‑Thirds with Incremental Computing
DataFunSummit
DataFunSummit
Apr 15, 2026 · Industry Insights

Why Traditional Data Platforms Fail and How Ontology Drives Triple‑Digit ROI

The article analyzes costly data‑platform failures—such as a $40 million payroll system in San Francisco schools and a collapsed Healthcare.gov launch—identifies the root cause as ineffective data middle platforms, and demonstrates how Palantir’s ontology‑based three‑layer architecture (semantic, dynamics, decision) can turn data into actionable insights, delivering triple‑digit ROI for enterprises like BP, Novartis, and General Mills.

Big DataData PlatformOntology
0 likes · 5 min read
Why Traditional Data Platforms Fail and How Ontology Drives Triple‑Digit ROI
DataFunTalk
DataFunTalk
Apr 11, 2026 · Industry Insights

Why Most Intelligent Data Analytics Fail and How Aloudata’s Agent Architecture Solves It

This article examines three common misconceptions in enterprise intelligent data analysis, explains how a semantic metric layer can break data silos, and details Aloudata Agent’s dual‑path engine, multi‑agent collaboration, and product design that together deliver trustworthy, deep, and democratized analytics for modern businesses.

AIAttribution AnalysisBig Data
0 likes · 18 min read
Why Most Intelligent Data Analytics Fail and How Aloudata’s Agent Architecture Solves It
DataFunTalk
DataFunTalk
Apr 10, 2026 · Big Data

How Xiaohongshu Cut Data Architecture Costs by Two‑Thirds with Incremental Computing

This article analyzes Xiaohongshu's data platform evolution—from a simple ClickHouse‑based analytics layer to a Lambda architecture and finally a lakehouse design—highlighting how adopting a new incremental computing model reduced architecture complexity, resource consumption, and development effort each to roughly one‑third while delivering sub‑second query performance on petabyte‑scale data.

Big DataData ArchitectureLakehouse
0 likes · 22 min read
How Xiaohongshu Cut Data Architecture Costs by Two‑Thirds with Incremental Computing
Big Data Tech Team
Big Data Tech Team
Apr 9, 2026 · Industry Insights

Why Data Engineers Are the New AI Powerhouses: 4 Core Reasons & Actionable Tips

The article analyzes why data development engineers are becoming more valuable in the AI era, outlining four core reasons—including data‑driven AI limits, the rise of RAG architectures, heightened data compliance, and a talent shortage—while offering concrete advice on mastering real‑time pipelines, unstructured data, and AI infrastructure.

AI InfrastructureBig DataData Engineering
0 likes · 8 min read
Why Data Engineers Are the New AI Powerhouses: 4 Core Reasons & Actionable Tips
Alibaba Cloud Observability
Alibaba Cloud Observability
Apr 6, 2026 · Cloud Native

How Alibaba Cloud Built Real‑Time OpenAPI Monitoring with Flink + SLS

This article details the design and implementation of a cloud‑native, real‑time monitoring system for Alibaba Cloud OpenAPI, covering background challenges, a Flink‑SLS architecture, multi‑region data processing, checkpoint and state‑backend tuning, source‑side predicate pushdown, visualization with Grafana, and production results.

Big DataCloud NativeFlink
0 likes · 21 min read
How Alibaba Cloud Built Real‑Time OpenAPI Monitoring with Flink + SLS
Big Data Tech Team
Big Data Tech Team
Apr 1, 2026 · Big Data

Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It

In the 2026 spring hiring season, many big‑data job seekers see their resumes disappear because they still focus on offline batch processing, while employers now demand real‑time streaming, AI‑driven data pipelines, and cloud‑native deployment skills such as Flink, vector databases, and Kubernetes.

AI integrationBig DataCloud Native
0 likes · 7 min read
Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It
Big Data Tech Team
Big Data Tech Team
Mar 30, 2026 · Big Data

2026 Data Warehouse Interview Guide: Essential Questions for All Three Rounds

This article compiles a comprehensive set of data‑warehouse interview questions—including self‑introduction prompts, SQL and window‑function challenges, data‑skew solutions, architecture design, file‑format trade‑offs, governance, and team‑leadership topics—to help candidates prepare for first, second, and third‑round interviews at leading tech firms.

Big DataData GovernanceSQL
0 likes · 7 min read
2026 Data Warehouse Interview Guide: Essential Questions for All Three Rounds
vivo Internet Technology
vivo Internet Technology
Mar 25, 2026 · Industry Insights

How Vivo Scaled Marketing Automation with Presto, Bitmap, and StarRocks

This case study details how Vivo’s marketing automation platform evolved its data‑driven architecture—from a Presto‑based wide‑table design, through a Bitmap optimization, to a StarRocks migration—addressing performance bottlenecks, reducing resource costs, and enhancing data security.

Big DataData ArchitectureOLAP
0 likes · 11 min read
How Vivo Scaled Marketing Automation with Presto, Bitmap, and StarRocks
DeWu Technology
DeWu Technology
Mar 25, 2026 · Big Data

How Code LLM Transforms E‑commerce Data Warehouses: From Data Rights to AI‑Driven Automation

This article analyzes how large‑language models for code, exemplified by Claude Code, are integrated into an e‑commerce data‑warehouse ecosystem, defining data‑rights boundaries, introducing agentic workflows, decoupling cognitive and execution runtimes, and establishing standardized I/O contracts to achieve safe, scalable AI‑assisted development and governance.

Big DataCode LLMData Warehouse
0 likes · 24 min read
How Code LLM Transforms E‑commerce Data Warehouses: From Data Rights to AI‑Driven Automation
DataFunSummit
DataFunSummit
Mar 25, 2026 · Big Data

How Apache Gravitino and OpenLineage Transform Data Governance for AI‑Driven Enterprises

In the era of AI and multi‑cloud, this article analyzes the core challenges of data governance—data silos, quality gaps, and compliance risks—and explains how Apache Gravitino’s unified metadata architecture together with OpenLineage’s standardized lineage model provide a scalable, automated solution for intelligent, real‑time data management.

Apache GravitinoBig DataData Governance
0 likes · 15 min read
How Apache Gravitino and OpenLineage Transform Data Governance for AI‑Driven Enterprises
DataFunSummit
DataFunSummit
Mar 24, 2026 · Industry Insights

How DataWorks Is Transforming Big Data Development with AI Agents

The article outlines DataWorks' evolution from a decade‑long big‑data governance platform to an AI‑driven Copilot and autonomous Agent system, detailing its technical foundations, tool‑adaptation layer, context engineering, security safeguards, and future vision of a professional, open, and intelligent big‑data development ecosystem.

AI CopilotAgentBig Data
0 likes · 13 min read
How DataWorks Is Transforming Big Data Development with AI Agents
DataFunSummit
DataFunSummit
Mar 16, 2026 · Big Data

How MaxCompute Evolves into an AI‑Native Data Warehouse: Architecture, Capabilities, and Real‑World Cases

This article outlines MaxCompute's 15‑year transformation from a traditional structured‑compute engine to an AI‑native data warehouse, detailing its data, heterogeneous compute, and model capabilities, showcasing three core ability pillars, real‑world case studies, and future development directions.

AI-nativeBig DataCase Study
0 likes · 7 min read
How MaxCompute Evolves into an AI‑Native Data Warehouse: Architecture, Capabilities, and Real‑World Cases
DataFunTalk
DataFunTalk
Mar 3, 2026 · Big Data

Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance

This article presents a series of seven technical case studies—including Tencent Cloud’s Iceberg‑based batch‑stream integration, AI‑driven data governance with Apache Gravitino, Xiaohongshu’s lakehouse evolution, and a multimodal data‑lake solution—detailing challenges, architectural designs, implementation steps, performance results, and future directions.

AIBig DataData Lake
0 likes · 8 min read
Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance
DeWu Technology
DeWu Technology
Mar 2, 2026 · Big Data

Mastering Spark UI: Deep Dive into Metrics, Tuning, and Real‑World Cases

This article provides a comprehensive guide to Spark UI, explaining each primary and secondary tab, the key metrics they expose, and how to interpret them for performance bottleneck detection, followed by two detailed case studies and practical tuning recommendations for Spark workloads.

Big DataCase StudyMetrics
0 likes · 19 min read
Mastering Spark UI: Deep Dive into Metrics, Tuning, and Real‑World Cases
DataFunSummit
DataFunSummit
Mar 1, 2026 · Big Data

How Ant Group’s Flex Engine Supercharges Flink with Vectorization

This article details Ant Group’s Flex vectorized engine built on Velox, covering the current state of vectorization, Flex’s architecture (Flink + Velox), core feature development, correctness guarantees, large‑scale deployment results, and future directions for full‑link vectorization and broader hardware support.

Big DataFlexFlink
0 likes · 18 min read
How Ant Group’s Flex Engine Supercharges Flink with Vectorization
DataFunSummit
DataFunSummit
Feb 8, 2026 · Big Data

Kuaishou’s Data Lake Upgrade with Hudi: Solving AI & BI Challenges

The article explains how Kuaishou modernized its data lake by partnering with Apache Hudi to address latency, storage cost, and consistency issues in both AI and BI pipelines, detailing architectural changes, new ingestion tools, partitioning strategies, compaction mechanisms, performance gains and future plans.

AIBIBig Data
0 likes · 20 min read
Kuaishou’s Data Lake Upgrade with Hudi: Solving AI & BI Challenges
DataFunSummit
DataFunSummit
Feb 7, 2026 · Big Data

How Flink Enables Real‑Time AI Inference and Agent Construction

This article explains Apache Flink’s stream processing fundamentals, introduces the open‑source Flink Agents framework for building event‑driven AI agents, details Alibaba Cloud’s Flink AI Function for real‑time LLM inference, and showcases demos, architecture, integration patterns, and practical use cases such as VOC analysis, live‑stream analytics, and intelligent operations.

Apache FlinkBig DataCloud Computing
0 likes · 24 min read
How Flink Enables Real‑Time AI Inference and Agent Construction
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 4, 2026 · Big Data

How Paimon + StarRocks Power Real‑Time OLAP for Double‑11 Mega‑Sales

During Double‑11 mega‑sales, Taobao Group faced exploding OLAP query traffic, costly data sync pipelines, and slow near‑real‑time analytics, so they unified real‑time and batch data in Paimon, leveraged StarRocks for high‑performance lake queries, tuned cluster settings, and saved nearly ten‑million yuan annually while cutting refresh latency by 80%.

Big DataData LakeOLAP
0 likes · 22 min read
How Paimon + StarRocks Power Real‑Time OLAP for Double‑11 Mega‑Sales
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 2, 2026 · Big Data

Real‑Time Analytics with Alibaba Cloud Serverless Spark & Paimon for Taobao Flash Sale

This article details how Alibaba Cloud EMR Serverless Spark combined with the Paimon lakehouse framework enables Taobao Flash Sale’s retail data team to achieve low‑latency, high‑throughput real‑time analytics, batch processing, and feature generation, outlining architecture evolution, performance gains, and practical Spark tuning techniques.

Big DataLakehousePaimon
0 likes · 18 min read
Real‑Time Analytics with Alibaba Cloud Serverless Spark & Paimon for Taobao Flash Sale
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 2, 2026 · Big Data

How We Built a Scalable Lakehouse Architecture with StarRocks, Paimon, and Flink

This article details the evolution of a data warehouse at RenliJia from a MaxCompute‑centric setup to a modern lakehouse using StarRocks, Paimon, Flink, and Fluss, describing design goals, technical evaluations, implementation steps for offline, OLAP, and real‑time workloads, and the challenges and future plans that emerged.

Big DataData WarehouseFlink
0 likes · 25 min read
How We Built a Scalable Lakehouse Architecture with StarRocks, Paimon, and Flink
Big Data Tech Team
Big Data Tech Team
Feb 2, 2026 · Big Data

Choosing the Right Data Sync Tool: Sqoop vs DataX vs Flink CDC vs Airbyte

This article analyzes the architecture, sync modes, latency, scalability, usability, and deployment aspects of four popular data synchronization solutions—Sqoop, DataX, Flink CDC, and Airbyte—and provides a practical decision tree to avoid common misuse pitfalls in enterprise data pipelines.

AirbyteBig DataData synchronization
0 likes · 9 min read
Choosing the Right Data Sync Tool: Sqoop vs DataX vs Flink CDC vs Airbyte
Raymond Ops
Raymond Ops
Jan 30, 2026 · Big Data

Build an Enterprise‑Grade HDFS HA and YARN Scheduler from Scratch

This guide walks you through designing and deploying a highly available HDFS architecture with dual NameNodes, ZooKeeper‑based failover, and a tuned YARN resource scheduler, covering detailed configuration files, failover testing, performance tuning, monitoring, automated health checks, capacity planning, and best‑practice checklists for production‑grade big‑data platforms.

AutomationBig DataHA
0 likes · 28 min read
Build an Enterprise‑Grade HDFS HA and YARN Scheduler from Scratch
Radish, Keep Going!
Radish, Keep Going!
Jan 30, 2026 · Big Data

How Uber Scaled Data Replication to Petabytes Daily with Distcp Optimizations

Uber tackled the challenge of replicating over 350 PB of data across on‑premise and cloud lakes by redesigning Hadoop Distcp, moving intensive tasks to the Application Master, parallelising copy‑listing and commit phases, and leveraging Uber‑mapper jobs to dramatically cut latency and improve resource efficiency.

Big DataData ReplicationDistcp
0 likes · 17 min read
How Uber Scaled Data Replication to Petabytes Daily with Distcp Optimizations
Data Party THU
Data Party THU
Jan 29, 2026 · Big Data

How a Tsinghua Big Data Program Turned a Chemistry PhD into an AI‑Powered Process Engineer

This article recounts a Tsinghua University PhD student's journey through a multidisciplinary big‑data training program, detailing the acquisition of AI and data‑science skills, the creation of novel algorithms like MicroFlowSAM and ImageRAG, and their successful application to chemical engineering research and industry projects.

Big DataChemical EngineeringIndustrial Application
0 likes · 8 min read
How a Tsinghua Big Data Program Turned a Chemistry PhD into an AI‑Powered Process Engineer
Big Data Tech Team
Big Data Tech Team
Jan 22, 2026 · Industry Insights

Top 10 Open‑Source Data Visualization Platforms You Should Know

This article presents a concise overview of ten popular open‑source data visualization tools—including Echarts, D3.js, Grafana, Plotly, Redash, Metabase, Superset, Kibana, AntV, and Pyecharts—highlighting their main features, typical use cases, and visual examples to help readers choose the right solution for their needs.

Big DataD3.jsData Visualization
0 likes · 6 min read
Top 10 Open‑Source Data Visualization Platforms You Should Know
Ray's Galactic Tech
Ray's Galactic Tech
Jan 22, 2026 · Big Data

Export 1 Billion Elasticsearch Docs in 3 Hours Using PIT + Slice

This guide explains how to reliably export over a billion Elasticsearch documents within a few hours by using Point‑In‑Time (PIT) snapshots combined with parallel Slice processing, covering diagnostics, performance modeling, consistency levels, failure recovery, and resource isolation.

Big DataElasticsearchPIT
0 likes · 7 min read
Export 1 Billion Elasticsearch Docs in 3 Hours Using PIT + Slice
StarRocks
StarRocks
Jan 22, 2026 · Big Data

How Paimon + StarRocks Accelerates Double‑11 OLAP Queries by 80% Refresh Speed

This article explains how Taotian Group unified real‑time and offline data using Paimon as lake storage and StarRocks for high‑performance OLAP, eliminating costly sync pipelines, cutting refresh time by about 80%, saving nearly ten million yuan annually, and detailing the architecture, cluster safeguards, configuration tweaks, monitoring, and future roadmap for large‑scale promotional events.

Big DataData ArchitectureOLAP
0 likes · 24 min read
How Paimon + StarRocks Accelerates Double‑11 OLAP Queries by 80% Refresh Speed
DataFunSummit
DataFunSummit
Jan 18, 2026 · Big Data

How Ray Reinvents AI Data Pipelines for Massive Multimodal Inference

This article examines the shortcomings of traditional big‑data engines for AI workloads, presents a Ray‑based heterogeneous fusion architecture that unifies CPU/GPU scheduling, Python ecosystems, and streaming‑batch processing, and details fault‑tolerance, checkpointing, compute‑storage separation, resource‑utilization, scalability, and observability improvements that enable thousands of nodes and dramatically higher GPU efficiency.

Big DataCloud NativeDistributed Computing
0 likes · 31 min read
How Ray Reinvents AI Data Pipelines for Massive Multimodal Inference
ByteDance Data Platform
ByteDance Data Platform
Jan 15, 2026 · Artificial Intelligence

Why Model Evaluation Can Be Cool: Innovative Automated Testing for Data‑Driven LLM Agents

In the era of rapidly advancing large‑model technology, the article outlines the challenges of evaluating data‑centric LLM agents, proposes a three‑layer evaluation framework covering basic capabilities, component‑level checks, and end‑to‑end business impact, and shares practical innovations such as semantic‑equivalence SQL matching, agent‑as‑judge pipelines, and a unified assessment platform.

Agent as judgeBig DataData Agent
0 likes · 22 min read
Why Model Evaluation Can Be Cool: Innovative Automated Testing for Data‑Driven LLM Agents
StarRocks
StarRocks
Jan 15, 2026 · Artificial Intelligence

How AI‑First Lakehouse Redefines Data Platforms for Multimodal Analytics

The article outlines the evolution from traditional OLAP to an AI‑first Lakehouse, detailing unified multimodal storage, CPU/GPU heterogeneous scheduling, native vector search, in‑database AI inference, agent‑centric execution, and self‑evolving platform capabilities that together reshape modern data analytics.

AIBig DataIn‑Database Inference
0 likes · 11 min read
How AI‑First Lakehouse Redefines Data Platforms for Multimodal Analytics
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Jan 6, 2026 · Industry Insights

Apache Paimon: Boosting Real-Time Data Lakes for Fraud Detection & Manufacturing

This article examines Apache Paimon’s innovative lakehouse architecture, detailing its LSM‑Tree storage, flexible merge engine, and multi‑engine integration, and showcases two real‑world deployments—an operator’s real‑time fraud‑prevention system and a manufacturing firm’s unified data platform—highlighting performance gains and cost reductions.

Apache PaimonBig DataCase Study
0 likes · 15 min read
Apache Paimon: Boosting Real-Time Data Lakes for Fraud Detection & Manufacturing
Past Memory Big Data
Past Memory Big Data
Dec 29, 2025 · Industry Insights

How Chinese Open‑Source Projects Dominated Half of 2025 Apache Top‑Level Projects

In 2025, five Apache Top‑Level Projects with Chinese origins—Uniffle, StreamPark, Gravitino, DevLake and HertzBeat—emerged, illustrating a shift toward central, platform‑oriented solutions driven by growing system scale, engineering complexity, and collaborative costs rather than a deliberate national agenda.

Big DataCloud NativeTop-Level Projects
0 likes · 7 min read
How Chinese Open‑Source Projects Dominated Half of 2025 Apache Top‑Level Projects
Big Data Tech Team
Big Data Tech Team
Dec 29, 2025 · Big Data

Master Big Data Development: A Complete Roadmap from Beginner to Expert

This guide presents a comprehensive big‑data development roadmap, detailing industry opportunities, a six‑module technology stack, four progressive learning stages, hands‑on project ideas, interview question strategies, common pitfalls, and curated resources, helping aspiring engineers become proficient and interview‑ready while avoiding common mistakes.

Big DataData EngineeringRoadmap
0 likes · 11 min read
Master Big Data Development: A Complete Roadmap from Beginner to Expert
Big Data Tech Team
Big Data Tech Team
Dec 26, 2025 · Interview Experience

How to Nail a 2‑Minute Data Engineer Self‑Introduction

This guide outlines a concise, 1.5‑2‑minute self‑introduction for data engineering interviews, highlighting essential personal details, technical stack, project achievements, business impact, and common pitfalls to avoid, with a concrete example and actionable tips.

Big DataCareer AdviceData Engineering
0 likes · 5 min read
How to Nail a 2‑Minute Data Engineer Self‑Introduction
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 24, 2025 · Big Data

How Paimon’s Column‑Separation Architecture Powers Real‑Time Multi‑Modal Lakehouse for AI

This article explains the challenges of frequent column changes in AI feature engineering, introduces Paimon’s column‑separation storage with a global continuous Row ID, details its Blob data type for efficient multi‑modal handling, and outlines production results and future roadmap for building an AI‑native data lakehouse.

Apache PaimonBLOBBig Data
0 likes · 11 min read
How Paimon’s Column‑Separation Architecture Powers Real‑Time Multi‑Modal Lakehouse for AI
DataFunTalk
DataFunTalk
Dec 17, 2025 · Artificial Intelligence

How Large Language Models Unlock Field‑Level Data Lineage at Scale

This talk explains how a data platform tackled massive, heterogeneous enterprise data by using large language models and prompt engineering to automatically extract field‑level lineage from SQL scripts, achieve over 80% coverage, and raise accuracy above 95%, dramatically cutting impact‑analysis time.

AI for data engineeringBig DataLarge Language Model
0 likes · 6 min read
How Large Language Models Unlock Field‑Level Data Lineage at Scale