Showing 100 articles max
DataFunTalk
DataFunTalk
Apr 10, 2026 · Big Data

How Xiaohongshu Cut Data Architecture Costs by Two‑Thirds with Incremental Computing

This article analyzes Xiaohongshu's data platform evolution—from a simple ClickHouse‑based analytics layer to a Lambda architecture and finally a lakehouse design—highlighting how adopting a new incremental computing model reduced architecture complexity, resource consumption, and development effort each to roughly one‑third while delivering sub‑second query performance on petabyte‑scale data.

Big DataData ArchitectureLakehouse
0 likes · 22 min read
How Xiaohongshu Cut Data Architecture Costs by Two‑Thirds with Incremental Computing
Lobster Programming
Lobster Programming
Apr 8, 2026 · Big Data

How to Implement Real‑Time API Traffic Counting at Scale

This article compares three practical approaches—direct database storage, a Flink‑Kafka‑Redis‑Grafana pipeline, and an ELK stack—to achieve real‑time API request counting for high‑concurrency scenarios, outlining their architectures, advantages, and trade‑offs.

API analyticsELKFlink
0 likes · 6 min read
How to Implement Real‑Time API Traffic Counting at Scale

How Kafka Powers Scalable E‑commerce Order Processing with Go

This article walks through the challenges of a fast‑growing e‑commerce platform during peak sales, explains why Apache Kafka is the ideal asynchronous messaging backbone, and provides a complete Go implementation—including producers, consumers, best‑practice patterns, and real‑world use cases—to achieve high throughput, fault tolerance, and seamless scalability.

Distributed SystemsMessage QueueSarama
0 likes · 14 min read
How Kafka Powers Scalable E‑commerce Order Processing with Go
Big Data Tech Team
Big Data Tech Team
Apr 1, 2026 · Big Data

Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It

In the 2026 spring hiring season, many big‑data job seekers see their resumes disappear because they still focus on offline batch processing, while employers now demand real‑time streaming, AI‑driven data pipelines, and cloud‑native deployment skills such as Flink, vector databases, and Kubernetes.

AI integrationBig DataCloud Native
0 likes · 7 min read
Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It
Big Data Tech Team
Big Data Tech Team
Mar 30, 2026 · Big Data

2026 Data Warehouse Interview Guide: Essential Questions for All Three Rounds

This article compiles a comprehensive set of data‑warehouse interview questions—including self‑introduction prompts, SQL and window‑function challenges, data‑skew solutions, architecture design, file‑format trade‑offs, governance, and team‑leadership topics—to help candidates prepare for first, second, and third‑round interviews at leading tech firms.

Big DataInterview preparationSQL
0 likes · 7 min read
2026 Data Warehouse Interview Guide: Essential Questions for All Three Rounds
DeWu Technology
DeWu Technology
Mar 25, 2026 · Big Data

How Code LLM Transforms E‑commerce Data Warehouses: From Data Rights to AI‑Driven Automation

This article analyzes how large‑language models for code, exemplified by Claude Code, are integrated into an e‑commerce data‑warehouse ecosystem, defining data‑rights boundaries, introducing agentic workflows, decoupling cognitive and execution runtimes, and establishing standardized I/O contracts to achieve safe, scalable AI‑assisted development and governance.

Big DataCode LLMData Warehouse
0 likes · 24 min read
How Code LLM Transforms E‑commerce Data Warehouses: From Data Rights to AI‑Driven Automation
DataFunSummit
DataFunSummit
Mar 25, 2026 · Big Data

How Apache Gravitino and OpenLineage Transform Data Governance for AI‑Driven Enterprises

In the era of AI and multi‑cloud, this article analyzes the core challenges of data governance—data silos, quality gaps, and compliance risks—and explains how Apache Gravitino’s unified metadata architecture together with OpenLineage’s standardized lineage model provide a scalable, automated solution for intelligent, real‑time data management.

Apache GravitinoBig DataData Lineage
0 likes · 15 min read
How Apache Gravitino and OpenLineage Transform Data Governance for AI‑Driven Enterprises
Big Data Tech Team
Big Data Tech Team
Mar 18, 2026 · Big Data

From Zero to One: Building Enterprise Data Standards for Data Warehouses

This guide explains why data standards are essential for data warehouses, outlines the four categories of standards, and provides a step‑by‑step process—including research, framework design, template creation, review, implementation, and ongoing maintenance—to help practitioners and interviewees establish robust, business‑aligned data standards.

Data StandardizationData WarehouseMetrics
0 likes · 10 min read
From Zero to One: Building Enterprise Data Standards for Data Warehouses
DataFunSummit
DataFunSummit
Mar 16, 2026 · Big Data

How MaxCompute Evolves into an AI‑Native Data Warehouse: Architecture, Capabilities, and Real‑World Cases

This article outlines MaxCompute's 15‑year transformation from a traditional structured‑compute engine to an AI‑native data warehouse, detailing its data, heterogeneous compute, and model capabilities, showcasing three core ability pillars, real‑world case studies, and future development directions.

AI-nativeBig DataCloud XPU
0 likes · 7 min read
How MaxCompute Evolves into an AI‑Native Data Warehouse: Architecture, Capabilities, and Real‑World Cases
mikechen
mikechen
Mar 12, 2026 · Big Data

How Kafka Handles Million‑Message Concurrency: Architecture Deep Dive

This article explains how Kafka’s sequential disk writes, zero‑copy data path, partition‑based parallelism, and configurable broker and partition settings enable linear‑scale throughput that can reach millions of transactions per second in large‑scale streaming systems.

Distributed SystemsPartitioningThroughput
0 likes · 5 min read
How Kafka Handles Million‑Message Concurrency: Architecture Deep Dive
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 6, 2026 · Big Data

How DataWorks Turns Data Quality Rules into Code with Data Contracts

This article explains how DataWorks integrates data quality specifications directly into the SQL development workflow using Data Contracts, addressing governance lag, versioning gaps, and trust issues while providing a unified, version‑controlled, and automated quality assurance process for offline data pipelines.

DataWorksSQLYAML
0 likes · 12 min read
How DataWorks Turns Data Quality Rules into Code with Data Contracts
StarRocks
StarRocks
Mar 5, 2026 · Big Data

How Fanatics Scaled to PB‑Level Data with StarRocks & Apache Iceberg Lakehouse

Fanatics unified its fragmented data stack by building a StarRocks‑powered Lakehouse on Apache Iceberg, replacing Redshift, Snowflake, Athena, and Druid, which cut costs by up to 95%, delivered sub‑second dashboard queries on petabyte‑scale data, and enabled real‑time and historical analytics on a single platform.

Apache IcebergData ArchitectureFanatics
0 likes · 10 min read
How Fanatics Scaled to PB‑Level Data with StarRocks & Apache Iceberg Lakehouse
DataFunTalk
DataFunTalk
Mar 3, 2026 · Big Data

Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance

This article presents a series of seven technical case studies—including Tencent Cloud’s Iceberg‑based batch‑stream integration, AI‑driven data governance with Apache Gravitino, Xiaohongshu’s lakehouse evolution, and a multimodal data‑lake solution—detailing challenges, architectural designs, implementation steps, performance results, and future directions.

AIBig DataIceberg
0 likes · 8 min read
Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance
DeWu Technology
DeWu Technology
Mar 2, 2026 · Big Data

Mastering Spark UI: Deep Dive into Metrics, Tuning, and Real‑World Cases

This article provides a comprehensive guide to Spark UI, explaining each primary and secondary tab, the key metrics they expose, and how to interpret them for performance bottleneck detection, followed by two detailed case studies and practical tuning recommendations for Spark workloads.

Big DataMetricsOptimization
0 likes · 19 min read
Mastering Spark UI: Deep Dive into Metrics, Tuning, and Real‑World Cases