Tag

Real-time Analytics

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jun 3, 2025 · Big Data

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

Apache IcebergApache PulsarBig Data
0 likes · 13 min read
BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing
DeWu Technology
DeWu Technology
Apr 28, 2025 · Databases

GreptimeDB Distributed Architecture, Transparent Caching, and Flow‑Based Real‑Time Analytics

GreptimeDB solves front‑end observability challenges with a distributed architecture (frontend, datanode, flownode, metasrv), transparent two‑level caching, elastic scaling, and an SQL‑based flow engine for real‑time multi‑granularity aggregation and approximate counting, delivering millisecond query latency and cost‑effective storage.

GreptimeDBHyperLogLogReal-time Analytics
0 likes · 12 min read
GreptimeDB Distributed Architecture, Transparent Caching, and Flow‑Based Real‑Time Analytics
DataFunSummit
DataFunSummit
Apr 3, 2025 · Big Data

Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD

The Apache Hudi Asia technical salon held in Beijing on March 29 gathered over 230 on‑site participants and 16,000 online viewers, featuring expert talks from leading Chinese tech companies that showcased real‑world Hudi implementations, performance optimizations, and future roadmap for data‑lake technologies.

Apache HudiBig DataData Lake
0 likes · 13 min read
Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Nov 27, 2024 · Big Data

Highlights of Tongcheng Travel’s 8th Big Data Technology Salon

The 8th Tongcheng Travel Big Data Technology Salon in Suzhou featured four expert talks covering Tencent Cloud’s Meson Spark engine, near‑line computing for travel itineraries, a Flink‑based real‑time risk control system, and Apache Paimon’s latest lake‑warehouse innovations, followed by a data‑driven business perspective session.

Apache PaimonBig DataData Lake
0 likes · 7 min read
Highlights of Tongcheng Travel’s 8th Big Data Technology Salon
DataFunSummit
DataFunSummit
Nov 23, 2024 · Big Data

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

This article presents Bilibili's end‑to‑end exploration of a streaming‑batch unified data pipeline built on Apache Iceberg, detailing the original and iterated architectures for massive user behavior transmission, online AI training, DB synchronization, and dimension‑join, along with performance gains, cost savings, and future plans.

Data LakeIcebergReal-time Analytics
0 likes · 20 min read
Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice
DataFunSummit
DataFunSummit
Nov 5, 2024 · Big Data

Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices

This article presents Tencent's real-time lakehouse architecture, detailing its three-layer design, the Auto Optimize Service with compaction, indexing, clustering and engine acceleration, scenario capabilities such as multi‑stream joins and in‑place migration, and outlines future optimization directions.

Big DataData LakeIceberg
0 likes · 11 min read
Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices
DataFunSummit
DataFunSummit
Oct 11, 2024 · Big Data

Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi

Kuaishou’s data‑lake initiative tackled exploding offline warehouse costs, redundant model proliferation, and data‑consistency complexities by adopting Apache Hudi’s schema‑evolution capabilities and real‑time lake ingestion, improving cross‑team collaboration and narrowing the real‑time‑offline data gap.

Apache HudiBig DataData Engineering
0 likes · 6 min read
Kuaishou’s Data Lake Technical Maturity Curve: Challenges and Solutions with Apache Hudi
Bilibili Tech
Bilibili Tech
Oct 11, 2024 · Big Data

Business Observability and Real-Time Event Streaming Architecture for Content Production

The paper proposes a business‑observability framework for a content‑production pipeline—illustrated by Bilibili’s workflow—by modeling archives as entities, assigning global AIDs for end‑to‑end tracing, and leveraging a Kafka‑Flink‑ClickHouse event‑streaming platform to monitor real‑time latency, bottlenecks, and safety audits across the entire production line.

Big DataContent ProductionEvent Streaming
0 likes · 19 min read
Business Observability and Real-Time Event Streaming Architecture for Content Production
DataFunSummit
DataFunSummit
Sep 16, 2024 · Databases

DataFun Summit: Technical Papers on Graph Databases, Vector Databases, Real‑Time Data Warehouses and Industry Data Practices

The DataFun Summit page presents a collection of technical papers covering graph database parallel queries, next‑generation vector databases, real‑time data warehouse architectures, and best practices in finance and e‑commerce, while also providing instructions for obtaining the e‑book via a public account.

Big DataData WarehouseReal-time Analytics
0 likes · 5 min read
DataFun Summit: Technical Papers on Graph Databases, Vector Databases, Real‑Time Data Warehouses and Industry Data Practices
ZhongAn Tech Team
ZhongAn Tech Team
Sep 3, 2024 · Big Data

Real-Time Log Clustering Architecture and Continuous Clustering Algorithm

This article presents a comprehensive overview of a log clustering system, detailing its background, architecture based on Filebeat, Kafka, Flink, Elasticsearch, and Grafana, and introduces a continuous clustering algorithm using SimHash and Hamming distance for real‑time log governance and anomaly detection.

Log ClusteringReal-time AnalyticsSimhash
0 likes · 14 min read
Real-Time Log Clustering Architecture and Continuous Clustering Algorithm
DataFunSummit
DataFunSummit
Aug 11, 2024 · Big Data

Real‑time Business Data Anomaly Attribution with Tugraph‑Analytics at Huolala

This article describes how Huolala leveraged the open‑source high‑performance streaming graph engine Tugraph‑Analytics together with Flink to build a real‑time business data anomaly detection and attribution system, detailing the background, architectural evolution, technical choices, implementation details, benefits, and future plans.

Anomaly DetectionBig DataReal-time Analytics
0 likes · 12 min read
Real‑time Business Data Anomaly Attribution with Tugraph‑Analytics at Huolala
DataFunSummit
DataFunSummit
Aug 6, 2024 · Big Data

Implementing a Multi‑Tenant Lakehouse Data Platform for Real‑Time Analytics at a SaaS CRM Company

This article details how a SaaS CRM provider built a cloud‑native Lakehouse platform to support multi‑tenant real‑time analytics, describing data challenges, metadata‑driven architecture, virtual database design, query optimization, BI integration, AI readiness, migration steps, and the resulting performance and scalability gains.

Big DataData PlatformLakehouse
0 likes · 19 min read
Implementing a Multi‑Tenant Lakehouse Data Platform for Real‑Time Analytics at a SaaS CRM Company
Architects' Tech Alliance
Architects' Tech Alliance
Jul 18, 2024 · Databases

Evaluating In-Memory Database Performance on the HaiGuang CPU: Challenges, Requirements, and Application Scenarios

This article examines the growing challenges faced by traditional databases, explains the fundamentals and advantages of in‑memory databases, and details a practical evaluation of the Chinese HaiGuang CPU’s suitability for such workloads, highlighting performance, security, and reliability aspects across various application scenarios.

CPU performanceDatabase ScalabilityHaiGuang processor
0 likes · 9 min read
Evaluating In-Memory Database Performance on the HaiGuang CPU: Challenges, Requirements, and Application Scenarios
DataFunTalk
DataFunTalk
Jul 1, 2024 · Big Data

DataFunCon2024 Beijing: Real‑Time Lakehouse and Big Data Sessions

The DataFunCon2024 Beijing conference on July 5‑6 showcases a series of technical talks about real‑time lakehouse architectures, big‑data analytics, and cloud‑native data warehouses, offering practitioners insights into Apache Paimon, SelectDB, and Doris implementations for faster, more agile data processing.

Apache PaimonBig DataConference
0 likes · 8 min read
DataFunCon2024 Beijing: Real‑Time Lakehouse and Big Data Sessions
AntData
AntData
Jun 26, 2024 · Databases

In‑Depth Analysis of Rockset’s Cloud‑Native Real‑Time Analytics Architecture

This article examines Rockset’s cloud‑native real‑time analytics database, detailing its document‑oriented data model, RocksDB‑Cloud storage engine, compute‑storage separation, sharding, converged indexing, query processing pipeline, and the implications of OpenAI’s recent acquisition for the broader database ecosystem.

Real-time AnalyticsRocksDBRockset
0 likes · 14 min read
In‑Depth Analysis of Rockset’s Cloud‑Native Real‑Time Analytics Architecture
Baidu Tech Salon
Baidu Tech Salon
Jun 18, 2024 · Big Data

Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform

Baidu’s log platform processes billions of daily page‑view events and, to monitor them accurately with minute‑level latency, implements a downstream streaming‑task architecture that maps limited custom dimensions, uses watermarks for completeness, trims raw data, aggregates into 5‑minute windows, and outputs concise metrics to Elasticsearch, achieving high accuracy, configurability, and low cost.

Big DataReal-time AnalyticsUBC
0 likes · 11 min read
Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform
DataFunTalk
DataFunTalk
Jun 4, 2024 · Databases

From Lambda Architecture to an All‑in‑One Apache Doris Real‑Time/Offline Data Platform for 5G Connected Factories

The article explains how China Unicom transformed its 5G fully‑connected factory data pipeline from a complex Lambda architecture into a streamlined, real‑time and offline‑integrated solution built on Apache Doris, detailing system requirements, architectural redesign, performance gains, and future plans.

5GApache DorisData Warehouse
0 likes · 15 min read
From Lambda Architecture to an All‑in‑One Apache Doris Real‑Time/Offline Data Platform for 5G Connected Factories
DataFunSummit
DataFunSummit
May 20, 2024 · Big Data

Real-Time High-Performance Analytics on Data Lakes with CloudLakehouse Multi-Cluster Architecture

This article explains how CloudLakehouse’s Multi‑Cluster elastic architecture enables high‑concurrency, low‑latency real‑time analytics on data lakes by addressing storage‑compute separation, dynamic caching, and automated scaling, providing a cost‑effective solution for customer‑facing data products.

Big DataData LakeReal-time Analytics
0 likes · 18 min read
Real-Time High-Performance Analytics on Data Lakes with CloudLakehouse Multi-Cluster Architecture
DataFunTalk
DataFunTalk
Mar 14, 2024 · Big Data

Applying TuGraph-Analytics for Graph Computing and Data Warehouse Acceleration

This article introduces TuGraph-Analytics, a real‑time stream‑graph engine and its DSL, explains its architecture and core capabilities, demonstrates how graph modeling can accelerate data‑warehouse workloads, and outlines future plans for SQL‑to‑graph translation, performance optimizations, and open‑source development.

Big DataDSLData Warehouse
0 likes · 13 min read
Applying TuGraph-Analytics for Graph Computing and Data Warehouse Acceleration
DataFunSummit
DataFunSummit
Mar 4, 2024 · Big Data

Near Real-Time Metric System Architecture for Dongchedi Used Car Business

This article introduces Dongchedi's near real‑time metric system architecture, covering business background, technical challenges, the unified storage‑compute and query service design using the Las lakehouse built on Apache Hudi, solutions to consistency issues, achieved results, and future plans for further real‑time improvements.

Apache HudiBig DataData Warehouse
0 likes · 13 min read
Near Real-Time Metric System Architecture for Dongchedi Used Car Business