Tagged articles

Incremental Processing

16 articles · Page 1 of 1

Apr 19, 2025 · Artificial Intelligence

Databricks Acquires Fennel: Is Real-Time Computing + AI the Ultimate Data Platform?

The article examines Databricks' acquisition of the incremental computation engine Fennel, detailing how its unified batch‑stream processing, incremental updates, Python‑native development, and built‑in data governance can eliminate data silos, cut costs by up to 90 % and accelerate real‑time feature engineering for AI models, while also discussing industry impact and future roadmap.

AI InfrastructureDatabricksFennel

0 likes · 6 min read

Databricks Acquires Fennel: Is Real-Time Computing + AI the Ultimate Data Platform?

DataFunSummit

Sep 26, 2024 · Big Data

Apache Hudi Incremental Processing and Change Data Capture (CDC): Overview, Incremental Query, and CDC

This article explains Apache Hudi's incremental processing capabilities, covering an overview of the medallion architecture, detailed configuration for incremental queries, the introduction of Change Data Capture (CDC) with required table properties, and a review of how these features enable richer data insights in modern data lake environments.

Apache HudiBig DataChange Data Capture

0 likes · 9 min read

Apache Hudi Incremental Processing and Change Data Capture (CDC): Overview, Incremental Query, and CDC

Alibaba Cloud Big Data AI Platform

Jun 27, 2023 · Big Data

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

This article details Alibaba Cloud MaxCompute’s lakehouse evolution, describing its unified storage‑metadata‑compute design, the Transactional Table 2.0 format, near‑real‑time incremental ingestion, clustering and compaction services, transaction handling, TimeTravel and incremental queries, and future roadmap for big‑data workloads.

Big DataData WarehouseIncremental Processing

0 likes · 23 min read

How MaxCompute’s Lakehouse Architecture Enables Near‑Real‑Time Incremental Processing

DataFunTalk

Jun 24, 2023 · Big Data

Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing

This article explains the evolution of Alibaba Cloud's MaxCompute platform into a lakehouse architecture that supports near‑real‑time incremental processing, detailing its development history, core design of transactional tables, five‑module technical stack, data ingestion methods, optimization services, transaction management, query capabilities, ecosystem integration, practical applications, future roadmap, and common user questions.

Big DataData LakeIncremental Processing

0 likes · 24 min read

Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing

Bilibili Tech

Apr 4, 2023 · Big Data

How Bilibili’s Flink‑Based Real‑Time Incremental Pipeline Cuts Costs and Boosts Latency

This article details Bilibili’s migration from a Spark‑based offline ODS‑to‑DWD sharding process to a Flink real‑time incremental pipeline, explaining the background challenges, the design of multi‑level partitioning, small‑file optimizations, stability enhancements, and the measurable performance gains achieved.

Big DataData WarehouseFlink

0 likes · 19 min read

How Bilibili’s Flink‑Based Real‑Time Incremental Pipeline Cuts Costs and Boosts Latency

Shopee Tech Team

Sep 2, 2022 · Big Data

Shopee Data System Challenges and Apache Hudi Practices

Shopee tackled its data‑system bottlenecks by customizing Apache Hudi to provide unified stream‑batch integration, efficient state‑detail snapshots, and low‑latency wide‑table generation, using CDC‑based bootstrapping, COW/MOR tables, savepoints and partial updates, which cut latency to ten minutes, lowered resource use, and yielded several community‑backed enhancements.

Apache HudiBig DataData Integration

0 likes · 18 min read

Shopee Data System Challenges and Apache Hudi Practices

Big Data Technology Architecture

Aug 23, 2022 · Big Data

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

This article examines the technical differences and feature sets of Apache Hudi, Delta Lake, and Apache Iceberg, highlighting incremental pipelines, concurrency control, merge‑on‑read storage, partition evolution, multi‑mode indexing, and real‑world use cases to help practitioners choose the most suitable lakehouse solution for their workloads.

Apache HudiApache IcebergConcurrency Control

0 likes · 18 min read

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

Big Data Technology & Architecture

May 17, 2022 · Big Data

Apache Hudi: Core Concepts, Architecture, Storage Types, Write Operations, Querying, and Management

This article provides a comprehensive guide to Apache Hudi, covering its basic concepts, timeline architecture, storage types (Copy‑On‑Write and Merge‑On‑Read), write operations, DeltaStreamer usage, Hive/Spark/Presto query integration, data management, indexing, compaction, and best‑practice recommendations for big‑data lake workloads.

Apache HudiBig DataCopy-on-Write

0 likes · 43 min read

Apache Hudi: Core Concepts, Architecture, Storage Types, Write Operations, Querying, and Management

DataFunTalk

May 14, 2021 · Big Data

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

This article presents a technical deep‑dive into Bilibili’s evolution from offline to real‑time data processing, describing the challenges of timeliness, ETL, AI feature engineering, and the design of a Flink‑on‑YARN incremental pipeline that supports trillion‑scale message throughput and AI‑driven real‑time applications.

.aiBig DataFlink

0 likes · 27 min read

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

DataFunTalk

Feb 15, 2021 · Big Data

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

This article presents Meituan's use of Flink to enable incremental data warehouse production, covering the warehouse architecture, streaming data integration evolution, real-time OLAP applications, platform design, and future directions for unified stream‑batch processing.

Big DataFlinkIncremental Processing

0 likes · 11 min read

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

Big Data Technology & Architecture

Oct 21, 2020 · Big Data

An Introduction to Apache Hudi: Concepts, Design Principles, and Architecture

This article introduces Apache Hudi, explaining its core concepts, design principles, table architecture, write and compaction mechanisms, and the three query modes that enable efficient batch and incremental processing on modern data lakes.

Apache HudiBig DataData Lake

0 likes · 21 min read

An Introduction to Apache Hudi: Concepts, Design Principles, and Architecture

Big Data Technology & Architecture

Sep 2, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Features, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark to ingest, manage, and incrementally query large analytical datasets on HDFS‑compatible storage, offering features such as timeline management, copy‑on‑write and merge‑on‑read tables, and support for snapshot, incremental, and read‑optimized queries across engines like Hive, Spark SQL and Presto.

Apache HudiBig DataData Lake

0 likes · 12 min read

An Overview of Apache Hudi: Architecture, Features, and Query Types

Big Data Technology & Architecture

Aug 23, 2020 · Big Data

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

This article introduces Apache Hudi, explaining its storage types, query views, timeline feature, typical use cases such as near‑real‑time ingestion and incremental pipelines, and provides a step‑by‑step Scala/Spark quick‑start guide with code examples for compiling, inserting, updating, querying, and syncing data to Hive.

Apache HudiBig DataData Lake

0 likes · 18 min read

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

Big Data Technology Architecture

May 21, 2020 · Big Data

Near Real-Time Ingestion, Analysis, Incremental Pipelines, and Data Distribution with Apache Hudi

The article explains how Apache Hudi enables near‑real‑time data ingestion from various sources, supports low‑latency analytics, provides incremental processing pipelines, and simplifies data distribution on Hadoop, improving efficiency and reducing operational complexity.

Apache HudiBig DataHadoop

0 likes · 6 min read

Near Real-Time Ingestion, Analysis, Incremental Pipelines, and Data Distribution with Apache Hudi

Architect

May 12, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Concepts, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark and Hadoop‑compatible storage to provide efficient ingestion, incremental processing, and multiple query modes such as snapshot, incremental, and read‑optimized for large analytical datasets.

Apache HudiBig DataData Lake

0 likes · 11 min read

An Overview of Apache Hudi: Architecture, Concepts, and Query Types

Big Data Technology & Architecture

Jan 23, 2020 · Big Data

Understanding Apache Hudi: Incremental Processing and Low‑Latency Data Management on Hadoop

This article explains how Apache Hudi enables efficient, low‑latency incremental data ingestion and processing on Hadoop by providing a unified service layer, describing its motivation, architecture, storage components, write and read paths, compaction, fault recovery, and incremental query capabilities.

Apache HudiHadoopIncremental Processing

0 likes · 17 min read

Understanding Apache Hudi: Incremental Processing and Low‑Latency Data Management on Hadoop