Tagged articles
278 articles
Page 1 of 3
DataFunSummit
DataFunSummit
May 3, 2026 · Databases

ScopeDB: Real-Time Data Analytics Solution for the Cloud‑Native Era

ScopeDB introduces a cloud‑native, real‑time analytics database that combines structured core columns with a flexible JSON column, adaptive indexing, a custom query language (ScopeQL), and true compute‑storage separation, delivering sub‑second query latency, high throughput, and up to 70% cost reduction compared with traditional big‑data stacks.

Cloud NativeReal-time analyticsScopeDB
0 likes · 14 min read
ScopeDB: Real-Time Data Analytics Solution for the Cloud‑Native Era
DataFunSummit
DataFunSummit
Apr 21, 2026 · Industry Insights

How SelectDB Cuts 60% Costs and Boosts Real‑Time Performance for New Energy Batteries

The whitepaper analyzes the data‑driven transformation of the new‑energy battery sector, outlines four core challenges—massive data streams, fast‑changing R&D demands, long manufacturing cycles, and multi‑dimensional quality standards—and demonstrates how SelectDB’s unified lake‑warehouse architecture delivers million‑level throughput, second‑level latency, up to 30× query speedup, and 60% cost reduction across real‑world case studies.

Big DataCase StudyData Warehouse
0 likes · 18 min read
How SelectDB Cuts 60% Costs and Boosts Real‑Time Performance for New Energy Batteries
StarRocks
StarRocks
Apr 16, 2026 · Databases

Why Traditional Databases Stall AI Agents—and How StarRocks Overcomes the Bottleneck

Traditional databases were built for low‑frequency, human‑driven queries, but AI agents generate dozens of concurrent, sub‑second queries that expose architectural limits, and StarRocks addresses these challenges with self‑healing optimization, real‑time data pipelines, extreme concurrency handling, and seamless lakehouse access.

Database ConcurrencyLakehouseReal-time analytics
0 likes · 13 min read
Why Traditional Databases Stall AI Agents—and How StarRocks Overcomes the Bottleneck
Alibaba Cloud Native
Alibaba Cloud Native
Apr 2, 2026 · Industry Insights

How EventHouse Redefines AI‑Native Event Data Platforms for the Agent Era

EventHouse, Alibaba Cloud’s AI‑native data platform, unifies event ingestion, storage, governance and intelligent analysis through a layered architecture that supports real‑time SQL, zero‑ETL federation and Luma Agent‑driven conversational analytics, positioning it as a next‑generation AI data foundation for enterprises seeking to turn event streams into actionable insights.

AI Data PlatformAgentic AICloud Native
0 likes · 16 min read
How EventHouse Redefines AI‑Native Event Data Platforms for the Agent Era
StarRocks
StarRocks
Mar 11, 2026 · Databases

How StarRocks Supercharges Real‑Time Ad Funnel Monitoring and Creative Optimization

This article dissects the full advertising funnel, explains why CTR and eCPM are critical, and demonstrates how StarRocks combined with Flink can deliver minute‑level real‑time monitoring, material selection, anomaly alerts, A/B testing, and a successful migration from Druid for massive ad‑tech workloads.

AdvertisingMaterialized ViewsReal-time analytics
0 likes · 20 min read
How StarRocks Supercharges Real‑Time Ad Funnel Monitoring and Creative Optimization
ByteDance Data Platform
ByteDance Data Platform
Feb 11, 2026 · Databases

How ByteHouse Redefines Real‑Time Multimodal Analytics with a Cloud‑Native Data Warehouse

ByteHouse, ByteDance's cloud‑native data warehouse, evolves from a traditional warehouse to a next‑generation AI‑ready platform that handles 800+ PB of data, supports 25,000 nodes, and delivers real‑time, multimodal analytics through a decoupled storage‑compute architecture, AI‑driven query optimization, and native vector search integration.

AI OptimizationCloud NativeReal-time analytics
0 likes · 9 min read
How ByteHouse Redefines Real‑Time Multimodal Analytics with a Cloud‑Native Data Warehouse
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 2, 2026 · Big Data

Real‑Time Analytics with Alibaba Cloud Serverless Spark & Paimon for Taobao Flash Sale

This article details how Alibaba Cloud EMR Serverless Spark combined with the Paimon lakehouse framework enables Taobao Flash Sale’s retail data team to achieve low‑latency, high‑throughput real‑time analytics, batch processing, and feature generation, outlining architecture evolution, performance gains, and practical Spark tuning techniques.

Big DataLakehousePaimon
0 likes · 18 min read
Real‑Time Analytics with Alibaba Cloud Serverless Spark & Paimon for Taobao Flash Sale
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 2, 2026 · Big Data

How We Built a Scalable Lakehouse Architecture with StarRocks, Paimon, and Flink

This article details the evolution of a data warehouse at RenliJia from a MaxCompute‑centric setup to a modern lakehouse using StarRocks, Paimon, Flink, and Fluss, describing design goals, technical evaluations, implementation steps for offline, OLAP, and real‑time workloads, and the challenges and future plans that emerged.

Big DataData WarehouseFlink
0 likes · 25 min read
How We Built a Scalable Lakehouse Architecture with StarRocks, Paimon, and Flink
StarRocks
StarRocks
Jan 22, 2026 · Big Data

How Paimon + StarRocks Accelerates Double‑11 OLAP Queries by 80% Refresh Speed

This article explains how Taotian Group unified real‑time and offline data using Paimon as lake storage and StarRocks for high‑performance OLAP, eliminating costly sync pipelines, cutting refresh time by about 80%, saving nearly ten million yuan annually, and detailing the architecture, cluster safeguards, configuration tweaks, monitoring, and future roadmap for large‑scale promotional events.

Big DataData ArchitectureOLAP
0 likes · 24 min read
How Paimon + StarRocks Accelerates Double‑11 OLAP Queries by 80% Refresh Speed
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 30, 2025 · Big Data

How StarRocks and Apache Paimon Unite to Build a True Lakehouse Native Engine

StarRocks and Apache Paimon have been progressively integrated across multiple releases, enabling a unified lakehouse architecture that supports multi-source federated analysis, time-travel queries, native readers/writers, distributed planning, and advanced profiling, while delivering performance gains that bring Paimon query speed on par with native StarRocks tables.

Apache PaimonData IntegrationLakehouse
0 likes · 9 min read
How StarRocks and Apache Paimon Unite to Build a True Lakehouse Native Engine
Java Companion
Java Companion
Dec 6, 2025 · Backend Development

Replacing MySQL with Apache Doris in Spring Boot for Real‑Time Analytics

This article demonstrates how to integrate Apache Doris, a high‑performance MPP analytical database, into a Spring Boot application as a drop‑in replacement for MySQL, detailing environment setup, Maven dependencies, configuration, entity mapping, repository, service and controller code, and performance testing that shows Doris’s superior real‑time query speed.

Apache DorisJavaMPP database
0 likes · 15 min read
Replacing MySQL with Apache Doris in Spring Boot for Real‑Time Analytics
Ctrip Technology
Ctrip Technology
Nov 20, 2025 · Big Data

How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon

Ctrip transformed its traditional T+1 offline warehouse into a near‑real‑time lakehouse by integrating Flink CDC with Apache Paimon, designing a two‑stage CDC ingestion, optimizing performance, implementing dynamic updates, and deploying the solution across multiple business scenarios, achieving minute‑level latency, reduced costs, and faster data‑driven decisions.

CDCFlinkPaimon
0 likes · 27 min read
How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 17, 2025 · Big Data

Flink 2025 Updates: Disaggregated State, AI Agents, and SQL Enhancements

The 2025 Flink release introduces a disaggregated state management architecture for cloud‑native elasticity, AI‑driven Flink Agents with LLM, Memory and Tool support, Delta Join and VARIANT type for semi‑structured data, adaptive batch execution, incremental checkpoints, high‑speed network optimizations, and new SQL and Process Table Functions, reshaping real‑time analytics.

Disaggregated StateFlinkReal-time analytics
0 likes · 8 min read
Flink 2025 Updates: Disaggregated State, AI Agents, and SQL Enhancements
Instant Consumer Technology Team
Instant Consumer Technology Team
Oct 28, 2025 · Artificial Intelligence

Can Data Virtualization Deliver Millisecond Real‑Time Features Across Stores?

This article shares a three‑year journey of building a data‑virtualization‑based, multi‑environment feature management framework for real‑time risk decision platforms, detailing challenges like heterogeneous storage, cold‑start, and operational stability, and presenting a unified architecture that decouples physical storage from business logic.

Big DataReal-time analyticsdata virtualization
0 likes · 16 min read
Can Data Virtualization Deliver Millisecond Real‑Time Features Across Stores?
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 22, 2025 · Big Data

Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink

Li Auto’s data team tackled the explosion of vehicle‑telemetry data—over a trillion rows and millions of signals per second—by redesigning their data foundation with Alibaba Cloud’s Hologres and Flink, achieving sub‑second latency, elastic scaling, high availability, and significant cost reductions across real‑time and offline workloads.

Car TelemetryData PlatformFlink
0 likes · 16 min read
Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink
Huolala Tech
Huolala Tech
Oct 17, 2025 · Big Data

How HuoLala Accelerated User Profiling 30× Faster with Apache Doris

This article details how HuoLala built a high‑performance user profiling platform on Apache Doris, redesigning data models, leveraging bitmap storage, and applying query‑level optimizations to achieve up to 30‑fold speed gains, lower memory usage, and scalable real‑time analytics.

Apache DorisBig DataBitmap
0 likes · 17 min read
How HuoLala Accelerated User Profiling 30× Faster with Apache Doris
ITPUB
ITPUB
Oct 11, 2025 · Databases

How OceanBase Achieves Real‑Time HTAP: Inside Its Unified Storage and Vectorized Engine

This article details OceanBase's evolution from a distributed OLTP system to a unified HTAP database, covering its cost‑based optimizer, vectorized execution, integrated row‑column storage, bypass import, materialized views, external tables, full‑text search, and real‑world use cases for real‑time analytics.

Columnar StorageHTAPOceanBase
0 likes · 12 min read
How OceanBase Achieves Real‑Time HTAP: Inside Its Unified Storage and Vectorized Engine
DataFunTalk
DataFunTalk
Sep 25, 2025 · Big Data

How Tencent Cloud’s AI‑Ready Data Platform Redefines Big Data for AI

This article outlines the challenges of high‑quality data for AI, introduces Tencent Cloud’s AI‑Ready data platform with three core capabilities—DIaaS, Setats, and ES‑based knowledge search—covers the end‑to‑end WeData integration, intelligent agents for automation, and showcases ecosystem partnerships driving industry‑wide intelligent transformation.

AIBig DataData Platform
0 likes · 14 min read
How Tencent Cloud’s AI‑Ready Data Platform Redefines Big Data for AI
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 5, 2025 · Big Data

How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Taobao Flash Sale

Facing minute‑level decision demands and billions of marketing events during Taobao's Flash Sale, the Ele.me data team built a real‑time lakehouse with StarRocks and Paimon, leveraging asynchronous materialized views, RoaringBitmap de‑duplication, and resource isolation to achieve sub‑second query latency, lower storage costs, and stable high‑concurrency.

LakehouseMaterialized ViewsPaimon
0 likes · 25 min read
How StarRocks + Paimon Powered Real‑Time Analytics for Alibaba’s Taobao Flash Sale
Kuaishou Tech
Kuaishou Tech
Jul 31, 2025 · Big Data

How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics

This article details how Kuaishou’s content middle platform tackled the massive challenges of real‑time, flexible, and cost‑effective data analysis at trillion‑scale by redesigning its architecture, adopting ClickHouse, splitting wide tables, and implementing a scatter‑gather execution model with pre‑shuffle and bitmap optimizations.

Big DataClickHousePerformance Optimization
0 likes · 17 min read
How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics
DataFunSummit
DataFunSummit
Jul 18, 2025 · Big Data

Data Lake & Lakehouse Innovations: Real-Time Analytics and Industry Case Studies

This article presents a curated collection of cutting‑edge data lake and lakehouse case studies—including real‑time analytics, cloud‑native architectures, industry implementations from sales platforms to automotive IoT, and the latest advancements in open‑source projects—offering insights into modern big‑data strategies and governance.

Big DataData LakeLakehouse
0 likes · 2 min read
Data Lake & Lakehouse Innovations: Real-Time Analytics and Industry Case Studies
DataFunTalk
DataFunTalk
Jul 13, 2025 · Big Data

Unlock Real-Time Multidimensional Insights with Cloud Lakehouse Technology

This guide presents a series of expert case studies and insights on how Cloud Lakehouse solutions enable real‑time, fully managed multidimensional data analysis, improve user data experiences, balance performance and cost, and power large‑scale IoT and big‑data platforms across industries.

Data ArchitectureIoTLakehouse
0 likes · 2 min read
Unlock Real-Time Multidimensional Insights with Cloud Lakehouse Technology
DataFunSummit
DataFunSummit
Jul 12, 2025 · Big Data

How Fluss Unifies Stream and Lake to Power AI Data Pipelines

In the era of rapid AI growth, Fluss offers a unified lake‑stream architecture that tackles data quality, timeliness, scale, and multimodal challenges by tightly integrating Flink streaming with a high‑performance data lake, enabling seamless real‑time and batch analytics for AI workloads.

AIData LakeFlink
0 likes · 12 min read
How Fluss Unifies Stream and Lake to Power AI Data Pipelines
Big Data Technology Architecture
Big Data Technology Architecture
Jul 8, 2025 · Big Data

Why Fluss Is the Next Big Leap in Real‑Time Stream Storage

The Fluss project, an open‑source next‑generation stream storage engine donated by Alibaba, has entered the Apache Software Foundation incubator, offering columnar streaming, real‑time updates, lake‑flow integration, impressive performance metrics, and a growing global developer community.

Apache IncubatorFlink IntegrationFluss
0 likes · 7 min read
Why Fluss Is the Next Big Leap in Real‑Time Stream Storage
DataFunTalk
DataFunTalk
Jul 8, 2025 · Big Data

Explore Cutting-Edge Lakehouse Solutions: Real-Time Analytics & Data Governance

This guide presents a curated collection of case studies and insights on cloud-native Lakehouse architectures, real‑time analytics, data‑driven user experiences, and data governance, showcasing implementations from companies like SalesEasy, Changan Auto, TikTok, Tencent, JD.com, and more.

Case StudiesData GovernanceLakehouse
0 likes · 2 min read
Explore Cutting-Edge Lakehouse Solutions: Real-Time Analytics & Data Governance
DataFunTalk
DataFunTalk
Jul 7, 2025 · Big Data

Unlock Real-Time Analytics with Cloud Lakehouse: A Complete Guide

This article presents a curated list of sessions covering cloud Lakehouse technology for real-time, multidimensional data analysis, including case studies from SalesEasy, Changan Auto, Tencent, and JD, as well as discussions on data lake adoption, streaming lake Paimon, and the relevance of metadata‑driven data governance in the digital economy.

Big DataCase StudyData Governance
0 likes · 2 min read
Unlock Real-Time Analytics with Cloud Lakehouse: A Complete Guide
DataFunTalk
DataFunTalk
Jul 6, 2025 · Big Data

How Cloud Lakehouse Is Redefining Real-Time Multi-Dimensional Data Analytics

This article presents a curated list of case studies and insights on cloud Lakehouse technology, covering real-time intelligent analytics, data architecture simplification, IoT big‑data platforms, integrated data platforms, and the evolving role of metadata‑driven data governance in the digital economy.

Big DataCase StudiesData Governance
0 likes · 2 min read
How Cloud Lakehouse Is Redefining Real-Time Multi-Dimensional Data Analytics
StarRocks
StarRocks
Jul 1, 2025 · Big Data

How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics

Suixingfu rebuilt its payment data pipeline by replacing a fragmented Lambda stack with a unified Porter CDC + StarRocks + Elasticsearch architecture, achieving three‑fold query speed, ten‑fold analytics efficiency, 20% storage reduction, and sub‑second data‑capture latency across high‑concurrency, ad‑hoc, and batch workloads.

CDCData WarehouseFlink
0 likes · 14 min read
How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 13, 2025 · Artificial Intelligence

Accelerate Enterprise Data Insights with Alibaba Cloud Hologres and AI Agents

Learn how to rapidly build an intelligent data analysis agent by integrating multi‑source data through Alibaba Cloud Hologres, leveraging Bailei’s AI model service and the serverless Function AI platform, covering architecture, step‑by‑step deployment, verification, and resource cleanup for cost‑effective, real‑time business insights.

AIAlibaba CloudData Integration
0 likes · 8 min read
Accelerate Enterprise Data Insights with Alibaba Cloud Hologres and AI Agents
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 9, 2025 · Databases

Why Data Warebase Could Be the Next Game‑Changer for AI Workloads

The article examines how emerging data‑infrastructure trends, multi‑modal databases like Neon, Supabase, and ClickHouse, and the convergence of OLTP, OLAP, and vector search are reshaping AI workloads, introducing the Data Warebase concept that unifies warehouse and database capabilities to meet modern AI workflow demands.

AIHTAPMultimodal Retrieval
0 likes · 32 min read
Why Data Warebase Could Be the Next Game‑Changer for AI Workloads
DataFunSummit
DataFunSummit
Jun 3, 2025 · Big Data

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

Apache IcebergApache PulsarLakehouse
0 likes · 13 min read
BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing
Volcano Engine Developer Services
Volcano Engine Developer Services
May 8, 2025 · Operations

How ByteBrain-LogParser Achieves 1‑2 Orders Faster Log Parsing in Cloud Services

ByteBrain-LogParser is a cloud‑native log‑parsing framework that transforms unstructured logs into dynamic templates with real‑time precision control, delivering parsing speeds up to two orders of magnitude faster than state‑of‑the‑art methods while maintaining near‑SOTA accuracy and low storage overhead.

Cloud ServicesHierarchical ClusteringReal-time analytics
0 likes · 27 min read
How ByteBrain-LogParser Achieves 1‑2 Orders Faster Log Parsing in Cloud Services
Tencent Cloud Developer
Tencent Cloud Developer
May 8, 2025 · Big Data

How Setats Unifies Stream, Batch, and Incremental Processing for Real‑Time Data Lakes

At the 2025 DA Data+AI Conference in Shanghai, Tencent Cloud unveiled Setats—a unified stream‑batch‑incremental engine that cuts system costs, delivers second‑level data visibility and real‑time changelog generation, and demonstrates measurable performance gains in automotive IoT analytics while integrating tightly with the WeData platform.

Batch ProcessingBig Data ArchitectureData Lake
0 likes · 5 min read
How Setats Unifies Stream, Batch, and Incremental Processing for Real‑Time Data Lakes
DeWu Technology
DeWu Technology
Apr 28, 2025 · Databases

GreptimeDB Distributed Architecture, Transparent Caching, and Flow‑Based Real‑Time Analytics

GreptimeDB solves front‑end observability challenges with a distributed architecture (frontend, datanode, flownode, metasrv), transparent two‑level caching, elastic scaling, and an SQL‑based flow engine for real‑time multi‑granularity aggregation and approximate counting, delivering millisecond query latency and cost‑effective storage.

GreptimeDBHyperLogLogReal-time analytics
0 likes · 12 min read
GreptimeDB Distributed Architecture, Transparent Caching, and Flow‑Based Real‑Time Analytics
DataFunSummit
DataFunSummit
Apr 3, 2025 · Big Data

Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD

The Apache Hudi Asia technical salon held in Beijing on March 29 gathered over 230 on‑site participants and 16,000 online viewers, featuring expert talks from leading Chinese tech companies that showcased real‑world Hudi implementations, performance optimizations, and future roadmap for data‑lake technologies.

Apache HudiBig DataData Lake
0 likes · 13 min read
Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 2, 2025 · Databases

Replacing Elasticsearch with Apache Doris for Real‑Time Big Data Analytics: Architecture, Performance, and Enterprise Cases

This article analyzes why Elasticsearch struggles with large‑scale, complex real‑time analytics and demonstrates how Apache Doris’s MPP, columnar storage, and native SQL support provide a cost‑effective, high‑performance alternative, illustrated with detailed enterprise case studies.

Apache DorisBig DataElasticsearch
0 likes · 11 min read
Replacing Elasticsearch with Apache Doris for Real‑Time Big Data Analytics: Architecture, Performance, and Enterprise Cases
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 24, 2025 · Artificial Intelligence

How to Build a Real‑Time Data Analysis Agent with LLMs, Hologres, and MCP

This article explains the challenges LLMs face in data analysis, introduces the Model Context Protocol (MCP) as a standard bridge, and provides a step‑by‑step guide to integrate Hologres, MCP, and large language models—using Claude Desktop as an example—to create a fast, multi‑source data‑analysis agent.

AI AgentHologresLLM
0 likes · 11 min read
How to Build a Real‑Time Data Analysis Agent with LLMs, Hologres, and MCP
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 14, 2025 · Big Data

How Fluss Unifies Lake and Stream for Real‑Time Analytics: Architecture, Benefits, and Future Roadmap

This article summarizes a talk by Alibaba Cloud senior engineer and Flink Committer Luo Yuxia on the challenges of separating lake and stream storage, introduces the Fluss lake‑stream unified architecture, explains its technical benefits such as second‑level data freshness, unified metadata, efficient changelog generation, and outlines future plans for broader ecosystem integration.

Data LakeFlinkFluss
0 likes · 13 min read
How Fluss Unifies Lake and Stream for Real‑Time Analytics: Architecture, Benefits, and Future Roadmap
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 9, 2024 · Big Data

Why Kafka Falls Short for Real‑Time Analytics and How Fluss Changes the Game

Flink Forward Asia 2024 highlighted the limitations of Kafka for real‑time analytics—lack of updates, poor data exploration, costly back‑tracking, and high network overhead—while introducing Fluss, a columnar streaming storage that offers low‑latency reads, CDC, lake‑stream integration, and efficient Delta Join for scalable, fast analytics.

Big DataDelta JoinFlink
0 likes · 15 min read
Why Kafka Falls Short for Real‑Time Analytics and How Fluss Changes the Game
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Nov 27, 2024 · Big Data

Highlights of Tongcheng Travel’s 8th Big Data Technology Salon

The 8th Tongcheng Travel Big Data Technology Salon in Suzhou featured four expert talks covering Tencent Cloud’s Meson Spark engine, near‑line computing for travel itineraries, a Flink‑based real‑time risk control system, and Apache Paimon’s latest lake‑warehouse innovations, followed by a data‑driven business perspective session.

Apache PaimonBig DataData Lake
0 likes · 7 min read
Highlights of Tongcheng Travel’s 8th Big Data Technology Salon
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 25, 2024 · Big Data

Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices

This article presents Tencent's real‑time lakehouse architecture, detailing its three‑layer design of compute, management and storage, and explains the six components of the Intelligent Optimization Service—including Compaction, Index, Clustering, and AutoEngine—along with scenario‑based capabilities, migration strategies, and future optimization directions.

Big DataReal-time analyticsTencent
0 likes · 11 min read
Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices
DataFunSummit
DataFunSummit
Nov 23, 2024 · Big Data

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

This article presents Bilibili's end‑to‑end exploration of a streaming‑batch unified data pipeline built on Apache Iceberg, detailing the original and iterated architectures for massive user behavior transmission, online AI training, DB synchronization, and dimension‑join, along with performance gains, cost savings, and future plans.

Batch ProcessingData LakeFlink
0 likes · 20 min read
Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice
Open Source Tech Hub
Open Source Tech Hub
Nov 16, 2024 · Databases

Build Real‑Time Analytics with StarRocks: Quickstart Tutorial and Sample Queries

This guide introduces StarRocks, a high‑performance MPP database, explains its architecture and typical use cases, walks through a Docker‑based quickstart, shows how to create databases and tables, load NYC crash and weather datasets via Stream Load, and demonstrates analytical SQL queries that reveal traffic‑accident patterns under different weather conditions.

Data WarehouseDockerMPP database
0 likes · 18 min read
Build Real‑Time Analytics with StarRocks: Quickstart Tutorial and Sample Queries
Bilibili Tech
Bilibili Tech
Oct 11, 2024 · Big Data

Business Observability and Real-Time Event Streaming Architecture for Content Production

The paper proposes a business‑observability framework for a content‑production pipeline—illustrated by Bilibili’s workflow—by modeling archives as entities, assigning global AIDs for end‑to‑end tracing, and leveraging a Kafka‑Flink‑ClickHouse event‑streaming platform to monitor real‑time latency, bottlenecks, and safety audits across the entire production line.

Content ProductionEvent StreamingReal-time analytics
0 likes · 19 min read
Business Observability and Real-Time Event Streaming Architecture for Content Production
DataFunSummit
DataFunSummit
Sep 16, 2024 · Databases

DataFun Summit: Technical Papers on Graph Databases, Vector Databases, Real‑Time Data Warehouses and Industry Data Practices

The DataFun Summit page presents a collection of technical papers covering graph database parallel queries, next‑generation vector databases, real‑time data warehouse architectures, and best practices in finance and e‑commerce, while also providing instructions for obtaining the e‑book via a public account.

Big DataData WarehouseReal-time analytics
0 likes · 5 min read
DataFun Summit: Technical Papers on Graph Databases, Vector Databases, Real‑Time Data Warehouses and Industry Data Practices
ZhongAn Tech Team
ZhongAn Tech Team
Sep 3, 2024 · Big Data

Real-Time Log Clustering Architecture and Continuous Clustering Algorithm

This article presents a comprehensive overview of a log clustering system, detailing its background, architecture based on Filebeat, Kafka, Flink, Elasticsearch, and Grafana, and introduces a continuous clustering algorithm using SimHash and Hamming distance for real‑time log governance and anomaly detection.

FlinkLog ClusteringReal-time analytics
0 likes · 14 min read
Real-Time Log Clustering Architecture and Continuous Clustering Algorithm
Data Thinking Notes
Data Thinking Notes
Aug 15, 2024 · Big Data

How to Build a Scalable Data Warehouse: Theory, Architecture, and Best Practices

This article outlines practical approaches to data warehouse construction, covering dimensional modeling, layered architecture, capability development, real‑time and batch processing with technologies like Hive, Spark, Flink, Iceberg, and discusses governance, security, and future trends toward data value and real‑time metrics.

Data GovernanceData WarehouseIceberg
0 likes · 13 min read
How to Build a Scalable Data Warehouse: Theory, Architecture, and Best Practices
DataFunSummit
DataFunSummit
Aug 6, 2024 · Big Data

Implementing a Multi‑Tenant Lakehouse Data Platform for Real‑Time Analytics at a SaaS CRM Company

This article details how a SaaS CRM provider built a cloud‑native Lakehouse platform to support multi‑tenant real‑time analytics, describing data challenges, metadata‑driven architecture, virtual database design, query optimization, BI integration, AI readiness, migration steps, and the resulting performance and scalability gains.

Big DataData PlatformLakehouse
0 likes · 19 min read
Implementing a Multi‑Tenant Lakehouse Data Platform for Real‑Time Analytics at a SaaS CRM Company
JD Cloud Developers
JD Cloud Developers
Aug 6, 2024 · Big Data

Master Real-Time Stream Processing with Flink: Windows & Watermarks

This article provides a comprehensive overview of real-time stream processing, covering data streams, window types, event and processing time, Flink's operator model, watermark mechanisms, and strategies for handling out-of-order and late data to ensure accurate, timely analytics.

FlinkReal-time analyticsWatermarks
0 likes · 15 min read
Master Real-Time Stream Processing with Flink: Windows & Watermarks
Architects' Tech Alliance
Architects' Tech Alliance
Jul 18, 2024 · Databases

Evaluating In-Memory Database Performance on the HaiGuang CPU: Challenges, Requirements, and Application Scenarios

This article examines the growing challenges faced by traditional databases, explains the fundamentals and advantages of in‑memory databases, and details a practical evaluation of the Chinese HaiGuang CPU’s suitability for such workloads, highlighting performance, security, and reliability aspects across various application scenarios.

CPU performanceHaiGuang processorIn-Memory Database
0 likes · 9 min read
Evaluating In-Memory Database Performance on the HaiGuang CPU: Challenges, Requirements, and Application Scenarios
Baidu Tech Salon
Baidu Tech Salon
Jul 11, 2024 · Industry Insights

How Baidu Feed Evolved Its Data Warehouse with Multi‑Version Wide Tables

This article outlines the step‑by‑step evolution of Baidu's Feed data warehouse—from traditional layered modeling to hour‑level core tables, then real‑time wide tables, and finally a flow‑batch integrated multi‑version wide‑table architecture—highlighting the motivations, design choices, challenges, and resulting benefits.

Big DataData WarehouseReal-time analytics
0 likes · 10 min read
How Baidu Feed Evolved Its Data Warehouse with Multi‑Version Wide Tables
Sohu Tech Products
Sohu Tech Products
Jul 10, 2024 · Industry Insights

How StarRocks and Apache Paimon Transform Data Lake Analytics and Migration

This article provides a practical deep‑dive into StarRocks and Apache Paimon, covering data‑lake fundamentals, the technical advantages of both platforms, performance gains over traditional engines, step‑by‑step migration strategies, deployment options on Alibaba Cloud EMR, and future roadmap plans.

Apache PaimonData LakeReal-time analytics
0 likes · 15 min read
How StarRocks and Apache Paimon Transform Data Lake Analytics and Migration
AntData
AntData
Jun 26, 2024 · Databases

In‑Depth Analysis of Rockset’s Cloud‑Native Real‑Time Analytics Architecture

This article examines Rockset’s cloud‑native real‑time analytics database, detailing its document‑oriented data model, RocksDB‑Cloud storage engine, compute‑storage separation, sharding, converged indexing, query processing pipeline, and the implications of OpenAI’s recent acquisition for the broader database ecosystem.

Real-time analyticsRocksDBRockset
0 likes · 14 min read
In‑Depth Analysis of Rockset’s Cloud‑Native Real‑Time Analytics Architecture
Baidu Tech Salon
Baidu Tech Salon
Jun 18, 2024 · Big Data

Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform

Baidu’s log platform processes billions of daily page‑view events and, to monitor them accurately with minute‑level latency, implements a downstream streaming‑task architecture that maps limited custom dimensions, uses watermarks for completeness, trims raw data, aggregates into 5‑minute windows, and outputs concise metrics to Elasticsearch, achieving high accuracy, configurability, and low cost.

Log MonitoringReal-time analyticsUBC
0 likes · 11 min read
Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform
Baidu Geek Talk
Baidu Geek Talk
Jun 17, 2024 · Industry Insights

How Baidu Scales Real‑Time Event Monitoring for Billions of Log Events

This article explains Baidu's log platform architecture, the UBC event‑tracking protocol, monitoring requirements, and the low‑cost, high‑accuracy solutions—including dimension mapping, watermark handling, data trimming, and time‑window aggregation—that enable real‑time, customizable monitoring of petabyte‑scale log streams.

Cost OptimizationLog MonitoringReal-time analytics
0 likes · 13 min read
How Baidu Scales Real‑Time Event Monitoring for Billions of Log Events
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 16, 2024 · Big Data

Real-time Big Data Analytics with Apache Paimon and the Streaming Lakehouse Architecture

This article summarizes Wang Feng's presentation on the next‑generation Lakehouse architecture, explaining how Apache Paimon provides a unified, real‑time data lake format that bridges batch and streaming workloads, enabling low‑latency analytics and AI integration for modern big‑data applications.

Apache PaimonBig DataReal-time analytics
0 likes · 9 min read
Real-time Big Data Analytics with Apache Paimon and the Streaming Lakehouse Architecture
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 6, 2024 · Databases

How StarRocks Redefines Lakehouse Architecture with Ultra-Fast Unified Analytics

StarRocks combines extreme query speed and a unified architecture to deliver a lakehouse solution that separates storage and compute, supports multi‑warehouse resource isolation, offers Trino compatibility, materialized‑view acceleration, and cost‑effective scaling, making it suitable for real‑time analytics, data‑lake queries, and traditional OLAP workloads.

Big DataLakehouseReal-time analytics
0 likes · 23 min read
How StarRocks Redefines Lakehouse Architecture with Ultra-Fast Unified Analytics
DataFunTalk
DataFunTalk
Jun 4, 2024 · Databases

From Lambda Architecture to an All‑in‑One Apache Doris Real‑Time/Offline Data Platform for 5G Connected Factories

The article explains how China Unicom transformed its 5G fully‑connected factory data pipeline from a complex Lambda architecture into a streamlined, real‑time and offline‑integrated solution built on Apache Doris, detailing system requirements, architectural redesign, performance gains, and future plans.

5GApache DorisBig Data
0 likes · 15 min read
From Lambda Architecture to an All‑in‑One Apache Doris Real‑Time/Offline Data Platform for 5G Connected Factories
DataFunSummit
DataFunSummit
May 20, 2024 · Big Data

Real-Time High-Performance Analytics on Data Lakes with CloudLakehouse Multi-Cluster Architecture

This article explains how CloudLakehouse’s Multi‑Cluster elastic architecture enables high‑concurrency, low‑latency real‑time analytics on data lakes by addressing storage‑compute separation, dynamic caching, and automated scaling, providing a cost‑effective solution for customer‑facing data products.

Cloud NativeMulti-ClusterReal-time analytics
0 likes · 18 min read
Real-Time High-Performance Analytics on Data Lakes with CloudLakehouse Multi-Cluster Architecture
Didi Tech
Didi Tech
Mar 28, 2024 · Big Data

How We Unified Real‑Time and Batch Features with StarRocks in Financial Risk Control

This article analyzes the challenges of building real‑time and batch risk‑control features, compares Lambda and Kappa architectures, evaluates storage‑unified and compute‑unified solutions, and details how StarRocks was selected, validated, and deployed to achieve high‑performance, low‑latency feature serving in a financial context.

Big DataData WarehouseReal-time analytics
0 likes · 19 min read
How We Unified Real‑Time and Batch Features with StarRocks in Financial Risk Control
Alibaba Cloud Native
Alibaba Cloud Native
Mar 24, 2024 · Cloud Native

How RocketMQ 5.0 Enables Lightweight Cloud‑Native Stream Processing with RStreams and RSQLDB

This article explains the evolution of message middleware, introduces core concepts of stream processing, and details RocketMQ 5.0's native lightweight stream engine RStreams and its stream database RSQLDB, showing how they simplify real‑time data integration, computation, and scaling in cloud‑native environments.

RSQLDBRStreamsReal-time analytics
0 likes · 14 min read
How RocketMQ 5.0 Enables Lightweight Cloud‑Native Stream Processing with RStreams and RSQLDB
dbaplus Community
dbaplus Community
Mar 19, 2024 · Big Data

How JD’s Mini‑Program Data Center Powers Real‑Time Analytics and Monitoring

JD’s Mini‑Program Data Center integrates data collection, storage, and real‑time analysis using Flink, ClickHouse, and Elasticsearch to provide comprehensive monitoring, user behavior insights, and scalable analytics for mini‑programs across JD’s ecosystem, enabling precise operations and future AI‑driven enhancements.

ClickHouseData centerElasticsearch
0 likes · 19 min read
How JD’s Mini‑Program Data Center Powers Real‑Time Analytics and Monitoring
DataFunSummit
DataFunSummit
Mar 4, 2024 · Big Data

Near Real-Time Metric System Architecture for Dongchedi Used Car Business

This article introduces Dongchedi's near real‑time metric system architecture, covering business background, technical challenges, the unified storage‑compute and query service design using the Las lakehouse built on Apache Hudi, solutions to consistency issues, achieved results, and future plans for further real‑time improvements.

Apache HudiFlinkReal-time analytics
0 likes · 13 min read
Near Real-Time Metric System Architecture for Dongchedi Used Car Business
Sohu Tech Products
Sohu Tech Products
Jan 31, 2024 · Industry Insights

How Didi Scaled Real‑Time Dashboards with StarRocks Materialized Views

This article details Didi's evolution from a multi‑engine OLAP stack to a unified StarRocks solution, explains the design of global dictionaries and materialized views for real‑time dashboard acceleration, and shares performance results, challenges, and future optimization directions.

Big DataDidiMaterialized Views
0 likes · 19 min read
How Didi Scaled Real‑Time Dashboards with StarRocks Materialized Views
DataFunSummit
DataFunSummit
Jan 24, 2024 · Big Data

Trends, Challenges, and Technical Practices of Modern Data Analysis and Indicator Platforms

This article reviews the evolution of data analysis and business intelligence, highlights current trends such as precision, agility, and real‑time needs, discusses common challenges, and presents the design and implementation of a unified semantic layer and indicator platform to enable agile, accurate, and real‑time analytics.

Big DataMetrics PlatformReal-time analytics
0 likes · 14 min read
Trends, Challenges, and Technical Practices of Modern Data Analysis and Indicator Platforms
DataFunTalk
DataFunTalk
Jan 20, 2024 · Big Data

How ByteDance Leverages the Data Flywheel in Large‑Scale Projects

This article explains how ByteDance (Douyin) transforms its data infrastructure from isolated workshops to a unified middle platform and finally to a data flywheel, detailing the three development stages, the Data BP organizational model, real‑time analytics, A/B testing, and the resulting business benefits for large‑scale event projects.

Big DataData FlywheelData Governance
0 likes · 13 min read
How ByteDance Leverages the Data Flywheel in Large‑Scale Projects
ByteDance Data Platform
ByteDance Data Platform
Dec 27, 2023 · Databases

How ByteHouse Redefines Cloud‑Native Data Warehousing for Real‑Time Analytics

This article details ByteHouse's evolution from a ClickHouse‑based OLAP engine to a cloud‑native, massively parallel data warehouse, highlighting its distributed and cloud‑native architectures, enhanced table engines, HaKafka and Materialized MySQL extensions, and real‑world use cases in short‑video, marketing and gaming analytics.

Big DataByteHouseHaKafka
0 likes · 20 min read
How ByteHouse Redefines Cloud‑Native Data Warehousing for Real‑Time Analytics
DataFunTalk
DataFunTalk
Dec 15, 2023 · Big Data

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

The Flink Forward Asia 2023 conference showcased major updates to Apache Flink (versions 1.17 and 1.18), introduced the Apache Paimon lakehouse project, announced Flink CDC 3.0, and highlighted community growth, cloud‑native deployments, and real‑time data‑warehouse use cases across industry leaders.

Apache FlinkApache PaimonBig Data
0 likes · 17 min read
Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0
DataFunSummit
DataFunSummit
Dec 7, 2023 · Databases

Apache Doris: A High‑Performance Real‑Time Analytical Database for Online High‑Concurrency Reporting

This article introduces Apache Doris, a real‑time analytical database built on an MPP architecture, explains its suitability for massive data workloads and online high‑concurrency reporting scenarios, and details the core technologies—storage models, vectorized query engine, materialized views, partitioning, indexing, row‑store and prepared statements—that enable sub‑second query latency and high QPS, while also showing a real‑world case study and how to join the Doris community.

Apache DorisData WarehouseMaterialized Views
0 likes · 13 min read
Apache Doris: A High‑Performance Real‑Time Analytical Database for Online High‑Concurrency Reporting
Big Data Technology Architecture
Big Data Technology Architecture
Nov 29, 2023 · Big Data

Building Real-Time Wide Tables with Partial-Update Using Apache Paimon for NetEase News Recommendation

The article describes how NetEase News' recommendation team replaced a slow, batch‑oriented data‑warehouse pipeline with a Flink‑based, Apache Paimon real‑time wide‑table solution that supports partial updates, reduces latency from hours to minutes, and lowers processing costs while handling both deduplication and non‑deduplication recommendation scenarios.

Apache PaimonData LakeFlink
0 likes · 8 min read
Building Real-Time Wide Tables with Partial-Update Using Apache Paimon for NetEase News Recommendation
DataFunSummit
DataFunSummit
Oct 31, 2023 · Big Data

Customer Data Platform (CDP) at Qunar Travel: Business Background, Construction Practice, Applications, and Future Outlook

This article details Qunar Travel's multi‑year development of a Customer Data Platform (CDP), covering its business motivations, architectural design, tag‑based data processing, real‑time and offline pipelines, user segmentation, marketing automation, performance optimizations, and future directions for model‑driven personalization.

Big DataReal-time analyticsTagging
0 likes · 18 min read
Customer Data Platform (CDP) at Qunar Travel: Business Background, Construction Practice, Applications, and Future Outlook
DataFunTalk
DataFunTalk
Oct 25, 2023 · Databases

Apache Doris Summit Asia 2023: Highlights, Innovations, and Industry Use Cases

The Apache Doris Summit Asia 2023 showcased the milestone 2.0 release, impressive performance gains, rapid community growth, and diverse industry deployments, while outlining future cloud‑native and unified analytics directions that position Doris as a leading real‑time data warehouse solution.

Apache DorisBig DataCloud Native
0 likes · 13 min read
Apache Doris Summit Asia 2023: Highlights, Innovations, and Industry Use Cases
DataFunTalk
DataFunTalk
Oct 13, 2023 · Big Data

Design Principles, Architecture, and Applications of the Open‑Source LakeSoul Lakehouse Framework

This article provides a comprehensive technical overview of LakeSoul, an open‑source, cloud‑native lakehouse framework, covering its design philosophy, core features, architecture, performance benchmarks, real‑time ingestion, incremental computation, multi‑stream joining, security, community progress, and future roadmap.

Big DataData LakehouseFlink
0 likes · 16 min read
Design Principles, Architecture, and Applications of the Open‑Source LakeSoul Lakehouse Framework
DeWu Technology
DeWu Technology
Sep 6, 2023 · Industry Insights

From Simple Split Tests to Real‑Time Multi‑Layer Experiments: The Evolution of an AB Testing Platform

This article traces the step‑by‑step evolution of an AB testing platform—from its initial 1.0 version with basic traffic splitting, through the 2.0 era that introduced multi‑layer orthogonal traffic models and real‑time metric pipelines, to the 3.0 era focused on usability, stability, and advanced analysis—while sharing concrete design decisions, implementation details, and lessons learned.

A/B testingExperiment PlatformReal-time analytics
0 likes · 25 min read
From Simple Split Tests to Real‑Time Multi‑Layer Experiments: The Evolution of an AB Testing Platform
JD Retail Technology
JD Retail Technology
Sep 4, 2023 · Big Data

JD Mini Program Data Center: Architecture, Milestones, and Real‑time Analytics Solutions

The article details the JD Mini Program platform, its data‑center development milestones, comprehensive business panorama, technical architecture, data collection, storage, and analysis pipelines—including Flink‑based real‑time monitoring, ClickHouse custom analytics, and Elasticsearch user‑behavior insights—while outlining current challenges and future AI‑driven enhancements.

Big DataClickHouseData Warehouse
0 likes · 16 min read
JD Mini Program Data Center: Architecture, Milestones, and Real‑time Analytics Solutions
DataFunTalk
DataFunTalk
Sep 4, 2023 · Big Data

Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment

This article presents a comprehensive overview of a batch‑stream unified storage solution built on Hudi and the Lakehouse Analysis Service (LAS), covering background challenges, architectural design, data organization, read/write mechanisms, BTS architecture, real‑world deployment scenarios, and future development plans.

Batch-StreamData WarehouseHudi
0 likes · 22 min read
Unified Batch‑Stream Storage with Hudi and LAS: Architecture, Design, and Deployment
DataFunTalk
DataFunTalk
Sep 3, 2023 · Big Data

Evolution of OLAP at Xingyun Retail Credit Using Apache Doris

This article details how Xingyun Retail Credit transitioned from traditional data warehouses to an Apache Doris‑based OLAP solution, covering data demand generation, OLAP engine selection challenges, multi‑stage implementation, performance optimizations, data‑warehouse construction, real‑world use cases, and future roadmap.

Apache DorisBig DataData Warehouse
0 likes · 16 min read
Evolution of OLAP at Xingyun Retail Credit Using Apache Doris

How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake

This article analyzes the rise of lake‑house architecture in the Hadoop ecosystem, compares the technical capabilities of Hudi, Iceberg and Delta Lake, details implementation enhancements such as MOR and multi‑writer support, showcases Flink integration, presents a real‑time marketing use case, and outlines future development directions.

Big DataData GovernanceDelta Lake
0 likes · 14 min read
How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 7, 2023 · Big Data

Using Doris for Real‑Time Data Warehousing: Benefits, Drawbacks, and Comparison with Flink

The article examines Doris‑based real‑time data warehousing, outlining why teams choose this approach, comparing its low‑threshold development and operational simplicity to Flink’s high‑cost streaming, and highlighting latency, scale limits, and the strict monitoring required for production use.

Big DataData WarehouseFlink
0 likes · 5 min read
Using Doris for Real‑Time Data Warehousing: Benefits, Drawbacks, and Comparison with Flink
JD Cloud Developers
JD Cloud Developers
Jul 19, 2023 · Databases

Why ClickHouse Is the Ideal Choice for Massive Data Storage and Real‑Time Analytics

This article examines the massive‑scale data requirements of an activity‑tracking platform, compares MySQL, Elasticsearch and HBase, and explains why ClickHouse—with its columnar storage, MergeTree engine, vectorized execution, and distributed architecture—offers the best combination of storage capacity, write performance, real‑time analysis, and query speed for billions of records.

ClickHouseColumnar DatabaseData Warehouse
0 likes · 31 min read
Why ClickHouse Is the Ideal Choice for Massive Data Storage and Real‑Time Analytics
Architect
Architect
Jul 10, 2023 · Big Data

Understanding Lambda Architecture for Real‑Time Billion‑Scale Data Analysis

This article explains the Lambda Architecture—a three‑layer big‑data processing model combining batch and speed layers to deliver accurate, low‑latency analytics, and illustrates its use with Twitter hashtag tracking and a smart‑parking recommendation system.

Batch ProcessingBig DataLambda architecture
0 likes · 10 min read
Understanding Lambda Architecture for Real‑Time Billion‑Scale Data Analysis
DataFunTalk
DataFunTalk
Jul 7, 2023 · Databases

Apache Doris 2.0-beta Release: New Query Optimizer, Pipeline Execution Engine, Workload Management and Major Performance Improvements

Apache Doris 2.0-beta, released on July 3, 2023, introduces a new Cascades‑based query optimizer, adaptive pipeline execution engine, workload‑aware resource isolation, enhanced memory management, partial column updates, multi‑catalog support, and numerous performance gains across real‑time analytics, ETL, and high‑concurrency point queries.

Apache DorisDatabase PerformancePipeline Execution
0 likes · 25 min read
Apache Doris 2.0-beta Release: New Query Optimizer, Pipeline Execution Engine, Workload Management and Major Performance Improvements
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 30, 2023 · Big Data

Advertising Data Lake Architecture and Real-time Optimizations

By replacing the costly Lambda architecture with a unified data‑lake built on Iceberg and Flink CDC, the advertising team achieved minute‑level latency, strong consistency, and lower storage expenses, cutting end‑to‑end processing times from hours to a few minutes across budgeting, warehousing, OLAP and ETL workloads.

AdvertisingBig DataFlink
0 likes · 13 min read
Advertising Data Lake Architecture and Real-time Optimizations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 27, 2023 · Big Data

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

This article details Alibaba Cloud MaxCompute’s lakehouse evolution, describing its unified storage‑metadata‑compute design, the Transactional Table 2.0 format, near‑real‑time incremental ingestion, clustering and compaction services, transaction handling, TimeTravel and incremental queries, and future roadmap for big‑data workloads.

Big DataData WarehouseIncremental Processing
0 likes · 23 min read
How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing
DataFunTalk
DataFunTalk
May 19, 2023 · Big Data

Log Analysis and Schema‑On‑Read: Design and Implementation of the Honghu Real‑Time Heterogeneous Data Platform

This article examines the challenges and value of log analysis, introduces the concepts of schema‑on‑read versus schema‑on‑write, and details how the Honghu platform implements real‑time, one‑stop heterogeneous data analytics with flexible storage, indexing, and SQL‑based query engines.

Real-time analyticslog analysisschema-on-read
0 likes · 24 min read
Log Analysis and Schema‑On‑Read: Design and Implementation of the Honghu Real‑Time Heterogeneous Data Platform
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
May 17, 2023 · Databases

StarRocks Production Practice at Tongcheng Travel: Architecture, Use Cases, and Technical Evaluation

This article details Tongcheng Travel’s production deployment of the StarRocks OLAP database, covering background, business scenarios, technical evaluation against ClickHouse and Greenplum, implementation with Flink SQL, real‑time analytics, offline reporting, CDP use cases, performance optimizations, and future cloud‑native plans.

Big DataData WarehouseFlink
0 likes · 12 min read
StarRocks Production Practice at Tongcheng Travel: Architecture, Use Cases, and Technical Evaluation
ITPUB
ITPUB
Mar 28, 2023 · Big Data

How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi

This article details the migration from a traditional Hive‑based data warehouse to a lakehouse architecture using Apache Hudi, covering the original Lambda setup, its pain points, lake‑vs‑warehouse differences, Hudi features, integration challenges, practical solutions, and future roadmap.

Apache HudiBig DataData Warehouse
0 likes · 11 min read
How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi