Tagged articles
560 articles
Page 3 of 6
Bilibili Tech
Bilibili Tech
Apr 28, 2023 · Cloud Native

Remote StateBackend for Flink: Design, Optimizations, and Cloud‑Native Migration

To enable Bilibili’s cloud‑native migration, the team built a RemoteStateBackend that moves Flink’s keyed state to the Taishan KV store, using deterministic KeyGroup placement, per‑shard snapshots, asynchronous write batching, off‑heap caching with Bloom‑filter filtering, and a fixed‑size memory model, which together reduce checkpoint overhead, improve disk utilization, and accelerate rescaling for more than one hundred production jobs.

CloudNativeFlinkPerformanceOptimization
0 likes · 18 min read
Remote StateBackend for Flink: Design, Optimizations, and Cloud‑Native Migration
Architects Research Society
Architects Research Society
Apr 25, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the purpose, semantics, and design of Apache Kafka's transaction API, describes how exactly‑once processing is achieved in stream‑processing applications, outlines the Java client usage, and discusses the internal components, performance considerations, and best‑practice tips for developers.

Distributed SystemsExactly-OnceJava
0 likes · 16 min read
Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance
IT Architects Alliance
IT Architects Alliance
Mar 28, 2023 · Big Data

Kafka Core Concepts, Architecture, Performance Optimization, and Production Deployment Guide

This comprehensive guide explains Kafka's core value as a message queue, its fundamental concepts, cluster architecture, high‑performance data handling, resource planning for large‑scale deployments, operational tools, consumer‑group mechanics, offset management, rebalance strategies, and custom partitioner implementation.

DeploymentReplicationStreaming
0 likes · 29 min read
Kafka Core Concepts, Architecture, Performance Optimization, and Production Deployment Guide
Qunar Tech Salon
Qunar Tech Salon
Mar 21, 2023 · Frontend Development

Performance Analysis of React CSR, SSR, and React 18 Streaming SSR

This article examines how different React rendering strategies—client‑side rendering, server‑side rendering, and the new React 18 Streaming SSR—affect key web performance metrics such as TTI, FCP, and first‑paint, and demonstrates substantial latency reductions achieved through streaming and selective hydration.

SSRStreamingTTI
0 likes · 11 min read
Performance Analysis of React CSR, SSR, and React 18 Streaming SSR
Architect
Architect
Mar 17, 2023 · Backend Development

Million‑Scale Data Export with JPA and MyBatis in Spring Boot

This article explains how to export tens of millions of rows from MySQL using Spring Boot by streaming data with JPA or MyBatis, avoiding OutOfMemoryError, switching to CSV format, and provides complete code examples, performance comparison, and tips for generating test data.

CSVDataExportMyBatis
0 likes · 12 min read
Million‑Scale Data Export with JPA and MyBatis in Spring Boot
DataFunTalk
DataFunTalk
Mar 9, 2023 · Big Data

Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang

This article presents a comprehensive case study of Manbang's real‑time data platform, detailing its business background, cloud‑native Flink + Hologres architecture, migration from self‑built clusters, real‑time product features, decision‑making workflows, and future roadmap, highlighting performance and cost benefits.

FlinkLogisticsStreaming
0 likes · 16 min read
Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 1, 2023 · Big Data

How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain

This article describes the challenges of a highly complex supply‑chain system, the evolution from early MySQL‑based reporting to a modern real‑time data platform using Flink, Kafka, ClickHouse, Hologres and other cloud services, and the tools and lessons learned to achieve low‑latency, high‑throughput analytics.

ClickHouseFlinkKafka
0 likes · 11 min read
How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain
DataFunSummit
DataFunSummit
Feb 28, 2023 · Big Data

Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans

This article introduces the Iceberg table format, explains its core architecture and advantages such as transactionality, implicit partitioning and row‑level updates, details Xiaomi's practical deployments—including CDC pipelines, partition strategies, compaction services, and stream‑batch integration—and outlines future development directions.

Data LakeFlinkIceberg
0 likes · 20 min read
Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans
DeWu Technology
DeWu Technology
Feb 24, 2023 · Big Data

Real-Time Data Architecture Evolution for a Complex Supply Chain

The article traces Dewu’s supply‑chain data platform from slow MySQL reporting through early CDC‑based wide tables to a Flink‑Kafka‑ClickHouse 1.0 design, then to a more scalable Flink‑Kafka‑Hologres 2.0 architecture that solves upsert and compute‑storage separation, while detailing key operational tricks, code‑generation tools, and future plans for lake‑house integration.

Big DataClickHouseFlink
0 likes · 10 min read
Real-Time Data Architecture Evolution for a Complex Supply Chain
Alimama Tech
Alimama Tech
Feb 15, 2023 · Big Data

Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview

Dolphin, Alibaba’s hyper‑converged multi‑modal big‑data engine, unifies OLAP, AI, streaming, and batch workloads on a decoupled compute‑storage MPP foundation, offering a Dolphin SQL layer, advanced bitmap/GroupTable/AFile indexes, intelligent materialization, and one‑write‑multiple‑read storage that cuts costs over 70% while delivering sub‑millisecond queries on trillion‑row datasets.

AIBig DataOLAP
0 likes · 14 min read
Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview
Architecture Digest
Architecture Digest
Feb 9, 2023 · Big Data

Understanding Kafka Messages, Topics, Partitions, and Consumers

This article explains Kafka's core concepts—including messages as byte arrays, optional keys for partition control, topic and partition organization, producer and consumer roles, offsets, consumer groups, and broker clusters—providing a concise technical overview for developers learning Kafka.

ConsumerKafkaMessage
0 likes · 6 min read
Understanding Kafka Messages, Topics, Partitions, and Consumers
Bilibili Tech
Bilibili Tech
Jan 31, 2023 · Big Data

Design and Optimization of Real-Time Data Quality Control (DQC) Platform on Bilibili's Big Data System

Bilibili redesigned its real-time data-quality control platform by replacing per-rule Flink jobs with a unified, dynamically-configured architecture that classifies Kafka topics, aggregates via InfluxDB full-table and continuous queries, mitigates data inflation, adds a high-performance proxy, and implements robust monitoring and recovery to ensure scalable, reliable data quality for its big-data services.

Big DataDQCFlink
0 likes · 22 min read
Design and Optimization of Real-Time Data Quality Control (DQC) Platform on Bilibili's Big Data System
ITPUB
ITPUB
Jan 22, 2023 · Big Data

How Flink Table Store Powers Real‑Time Financial Data Warehousing

This article details a banking‑focused real‑time data‑warehouse solution that leverages Flink Table Store to handle both incremental fact‑table updates and full‑table dimension calculations, compares three traditional approaches, and explains data ingestion, query modes, export options, and future streaming‑warehouse directions.

BankingELTFlink
0 likes · 20 min read
How Flink Table Store Powers Real‑Time Financial Data Warehousing
DataFunSummit
DataFunSummit
Jan 8, 2023 · Big Data

Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis

This article presents a comprehensive overview of evolving big‑data architectures—from classic offline warehouses to Lambda and Kappa models—and details a streaming‑batch integrated solution that addresses latency, data freshness, and multi‑table join challenges to achieve minute‑level real‑time multi‑dimensional analytics.

Batch ProcessingData WarehouseKappa architecture
0 likes · 18 min read
Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 3, 2023 · Big Data

Migrating Hive SQL Jobs to Flink Using the SQL Gateway

This article explains how to use Apache Flink 1.16's SQL Gateway to migrate Hive SQL tasks to Flink, covering the underlying Hive‑on‑Flink architecture, dialect compatibility, streaming and batch demos, configuration details, and practical tips for developers and platform engineers.

Batch ProcessingBig DataFlink
0 likes · 19 min read
Migrating Hive SQL Jobs to Flink Using the SQL Gateway
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 15, 2022 · Big Data

Migrating Hive SQL to Flink SQL: Motivation, Challenges, Practice, Demo, and Future Plans

This technical article presents a comprehensive overview of migrating Hive SQL to Flink SQL, covering the motivations behind the migration, key challenges such as compatibility, stability and performance, practical implementation steps, a detailed demo, future development directions, and a Q&A session addressing common concerns.

Batch ProcessingBig DataData Lake
0 likes · 13 min read
Migrating Hive SQL to Flink SQL: Motivation, Challenges, Practice, Demo, and Future Plans
Architect's Tech Stack
Architect's Tech Stack
Dec 8, 2022 · Backend Development

Implementing Streaming Reads with MyBatis for Large-Scale Java Report Export

This article explains how to overcome export failures caused by large data volumes in a legacy Java system by switching from default full-result JDBC reads to a forward‑only streaming approach using MyBatis, detailing environment setup, configuration changes, and complete code examples for controller, service, DAO, and mapper layers.

JDBCJavaMyBatis
0 likes · 5 min read
Implementing Streaming Reads with MyBatis for Large-Scale Java Report Export
Bilibili Tech
Bilibili Tech
Nov 29, 2022 · Big Data

How Bilibili Supercharged Flink: Checkpoint, HA, and Runtime Optimizations

This article details Bilibili's extensive enhancements to Flink's runtime—including checkpoint recoverability, operator ID stability, state processor extensions, hybrid high‑availability, regional checkpointing, and load‑based channel selection—to improve scalability, reliability, and operational efficiency of large‑scale streaming jobs.

Big DataCheckpointFlink
0 likes · 32 min read
How Bilibili Supercharged Flink: Checkpoint, HA, and Runtime Optimizations
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 28, 2022 · Big Data

Comprehensive Guide to Big Data Interview Topics: Log Collection, Data Synchronization, Offline Development, Real‑time Technology, Data Services, and Data Mining

This article provides an extensive overview of big‑data interview subjects, covering browser and mobile log collection methods, data synchronization techniques (batch, real‑time, sharding), offline data development platforms, streaming architectures, data service evolution, performance optimization, and data‑mining layers and applications.

Big DataStreamingdata mining
0 likes · 17 min read
Comprehensive Guide to Big Data Interview Topics: Log Collection, Data Synchronization, Offline Development, Real‑time Technology, Data Services, and Data Mining
ITPUB
ITPUB
Nov 18, 2022 · Big Data

How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes

This article introduces Iceberg’s table‑format fundamentals, details Xiaomi’s large‑scale deployment of Iceberg for CDC and log ingestion, explores their streaming‑batch integration experiments, outlines future roadmap items, and provides a comprehensive Q&A covering practical challenges and solutions.

Batch ProcessingBig DataData Lake
0 likes · 23 min read
How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes
DataFunTalk
DataFunTalk
Nov 6, 2022 · Big Data

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

BitSail, an open‑source data integration engine from ByteDance, provides a unified solution for batch, streaming, full‑load, and incremental data synchronization across heterogeneous sources, detailing its background, technical evolution, architecture, low‑cost co‑building features, compatibility strategies, and future roadmap.

CDCData IntegrationFlink
0 likes · 18 min read
BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities
DataFunSummit
DataFunSummit
Nov 4, 2022 · Big Data

Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions

ByteDance’s data platform team explains their real‑time data lake implementation, covering its evolving definition, six core capabilities, challenges such as data management, concurrent updates, performance and log ingestion, and detailed case studies of multi‑stage deployment, indexing, metadata services, and future roadmap.

HudiReal-time Data LakeStreaming
0 likes · 32 min read
Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions
21CTO
21CTO
Nov 1, 2022 · Cloud Computing

How Netflix Engineers Seamless Streaming with Cloud‑Based Encoding and CDN

Netflix delivers billions of hours of video by compressing and transcoding raw movies into multiple formats, splitting them into small chunks processed in parallel on AWS, storing them in S3, and distributing them via its custom Open Connect CDN to ensure low‑latency, high‑quality playback worldwide.

AWSCDNNetflix
0 likes · 10 min read
How Netflix Engineers Seamless Streaming with Cloud‑Based Encoding and CDN
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Oct 27, 2022 · Backend Development

Understanding Kafka: Architecture, Principles, Features, and Use Cases

This article explains Kafka's distributed publish‑subscribe architecture, detailing its core components, underlying mechanisms with Zookeeper coordination, key features such as high throughput and fault tolerance, and common application scenarios like log collection, user activity tracking, and stream processing.

Backend ArchitectureDistributed MessagingKafka
0 likes · 5 min read
Understanding Kafka: Architecture, Principles, Features, and Use Cases
DataFunTalk
DataFunTalk
Oct 19, 2022 · Big Data

Understanding Flink Table Store: Design, Usage, and Roadmap

Flink Table Store, an Apache Flink subproject, provides a unified stream‑batch storage layer with SQL‑based table APIs, addressing real‑time and offline data needs, detailing its design goals, usage patterns, architectural layers, implementation choices, and upcoming roadmap.

FlinkLSM‑TreeStreaming
0 likes · 14 min read
Understanding Flink Table Store: Design, Usage, and Roadmap
ITPUB
ITPUB
Oct 15, 2022 · Big Data

Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes

This talk introduces the evolution of data lakes, outlines Apache Hudi’s core features, details the Flink‑Hudi integration architecture—including write pipelines, small‑file handling, and read strategies—covers real‑world use cases such as near‑real‑time DB ingestion, OLAP, and ETL, and previews upcoming Hudi roadmap items.

Apache HudiBig DataData Lake
0 likes · 21 min read
Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes
Java High-Performance Architecture
Java High-Performance Architecture
Oct 11, 2022 · Operations

How Meituan Optimized Kafka for Massive Scale: Reducing Latency and Managing Clusters

This article details Meituan's real‑world challenges with a 15,000‑node Kafka deployment and explains the application‑layer and system‑layer optimizations—such as disk balancing, migration pipeline acceleration, fetcher isolation, RAID acceleration, cgroup isolation, and an SSD‑based cache—that together dramatically cut read/write latency and simplify large‑scale cluster management.

Cluster ManagementMeituanStreaming
0 likes · 23 min read
How Meituan Optimized Kafka for Massive Scale: Reducing Latency and Managing Clusters
DataFunTalk
DataFunTalk
Oct 4, 2022 · Big Data

Near‑Real‑Time Data Lake Practices in TikTok E‑commerce Data Warehouse

The presentation by TikTok e‑commerce data‑warehouse engineer Ma Wenyuan explains data‑lake characteristics, near‑real‑time architecture, and practical e‑commerce use cases, highlighting Apache Hudi features, hybrid batch‑stream processing, and future challenges for scaling and integration.

Data LakeHudiStreaming
0 likes · 13 min read
Near‑Real‑Time Data Lake Practices in TikTok E‑commerce Data Warehouse
DataFunTalk
DataFunTalk
Sep 17, 2022 · Big Data

Real-Time Data Warehouse Practices with Hudi at ByteDance

This presentation details ByteDance's real‑time data‑warehouse implementations using Apache Hudi, covering scenario classifications, challenges of traditional offline warehouses, practical solutions for ingestion, upsert, validation, indexing, query optimization, and future plans for extensible indexing and unified batch‑stream processing.

Data LakeHudiStreaming
0 likes · 16 min read
Real-Time Data Warehouse Practices with Hudi at ByteDance
Top Architect
Top Architect
Sep 17, 2022 · Big Data

Meituan's Kafka Architecture: Challenges and Optimizations at Massive Scale

This article details how Meituan's Kafka platform, serving over 15,000 machines and handling petabytes of daily traffic, faces read/write latency, slow nodes, and large‑scale cluster management challenges, and describes a series of application‑layer, system‑layer, and operational optimizations—including disk balancing, migration pipelines, fetcher isolation, consumer async, SSD caching, isolation strategies, full‑link monitoring, lifecycle management, and TOR disaster recovery—to improve performance and reliability.

KafkaMeituanStreaming
0 likes · 22 min read
Meituan's Kafka Architecture: Challenges and Optimizations at Massive Scale
DataFunTalk
DataFunTalk
Sep 11, 2022 · Big Data

Flink Table Store v0.2: Application Scenarios, Core Features, and Future Roadmap

This article introduces Flink Table Store v0.2, explains its four primary application scenarios—offline warehouse acceleration, partial update, pre‑aggregation rollup, and real‑time warehouse enhancement—details the core lake‑storage architecture, bucket management, append‑only mode, and outlines the project’s future roadmap and trade‑off considerations.

BatchFlinkLake Storage
0 likes · 16 min read
Flink Table Store v0.2: Application Scenarios, Core Features, and Future Roadmap
DataFunTalk
DataFunTalk
Aug 25, 2022 · Big Data

Applying OpenMLDB for Efficient AI Toolchain and Data‑Driven Architecture at Akulaku

This article presents Akulaku’s practical experience with OpenMLDB, describing the company’s data‑driven requirements, the design of a unified stream‑batch architecture, implementation details across offline, online and RocksDB modes, and future recommendations for high‑performance, scenario‑agnostic big‑data processing.

AIBatch ProcessingOpenMLDB
0 likes · 17 min read
Applying OpenMLDB for Efficient AI Toolchain and Data‑Driven Architecture at Akulaku
Hulu Beijing
Hulu Beijing
Aug 19, 2022 · Artificial Intelligence

Disney’s M5 Model: Multi‑Modal, Multi‑Interest, Multi‑Scenario Boost for Streaming Recommendations

Disney’s Content Discovery team introduces M5, a multi‑modal, multi‑interest, multi‑scenario recall model that enhances VOD and live streaming recommendations by leveraging rich metadata, user behavior, and contextual features, outperforming baseline methods with significant hit‑ratio gains across Hulu and Disney+.

Deep LearningM5 modelRecommendation Systems
0 likes · 22 min read
Disney’s M5 Model: Multi‑Modal, Multi‑Interest, Multi‑Scenario Boost for Streaming Recommendations
ITPUB
ITPUB
Aug 13, 2022 · Big Data

How Alibaba Uses Flink to Power Massive Real‑Time Risk Control

This article explains how Alibaba leverages Flink to handle over 40 billion events per second across all business units, detailing risk‑control concepts, rule types, architectural stages, resource tuning, dynamic CEP, shared computing, and the FY23 roadmap for large‑scale streaming risk management.

AlibabaBig DataCEP
0 likes · 16 min read
How Alibaba Uses Flink to Power Massive Real‑Time Risk Control
Wukong Talks Architecture
Wukong Talks Architecture
Aug 9, 2022 · Big Data

Kafka Basics: 15 Key Questions and In‑Depth Answers

This comprehensive guide covers Kafka’s core concepts, architecture, Zookeeper role, producer sending modes, partitioning strategies, replica types, message deletion, performance optimizations, consumer models, offset management, and best‑practice recommendations for scaling and ensuring ordered delivery in distributed streaming systems.

PartitioningStreamingZooKeeper
0 likes · 31 min read
Kafka Basics: 15 Key Questions and In‑Depth Answers
IT Architects Alliance
IT Architects Alliance
Aug 3, 2022 · Big Data

Understanding Kafka Architecture: Topics, Partitions, Replication, Log Segmentation, Zero‑Copy, and Zookeeper Integration

This article explains Kafka's core concepts—including topics, partitions and replicas, log segment storage, leader‑follower mechanics, consumer groups, network threading model, zero‑copy I/O, and the essential role of Zookeeper for broker, topic, consumer, and offset management—providing a comprehensive overview for developers and architects.

Big DataKafkaStreaming
0 likes · 10 min read
Understanding Kafka Architecture: Topics, Partitions, Replication, Log Segmentation, Zero‑Copy, and Zookeeper Integration
HomeTech
HomeTech
Jul 20, 2022 · Big Data

Design and Implementation of a Real-Time Advertising Data Warehouse Using Flink and StarRocks

This article presents a comprehensive case study of building a real‑time advertising data warehouse at Auto Home, detailing the evaluation of streaming engines and storage solutions, the layered architecture design, implementation steps with Flink and StarRocks, monitoring practices, encountered issues, and future roadmap, demonstrating how second‑level data freshness and high accuracy were achieved.

FlinkStarRocksStreaming
0 likes · 10 min read
Design and Implementation of a Real-Time Advertising Data Warehouse Using Flink and StarRocks
Open Source Linux
Open Source Linux
Jul 19, 2022 · Backend Development

Master Kafka Basics: Visual Guide to Topics, Partitions, and Architecture

This article visually explains Kafka's core concepts—including producers, consumers, topics, partitions, consumer groups, and cluster architecture—so readers can clearly understand how messages flow, are stored, and remain fault‑tolerant within a distributed streaming system.

BackendKafkaMessage Queue
0 likes · 6 min read
Master Kafka Basics: Visual Guide to Topics, Partitions, and Architecture
MaGe Linux Operations
MaGe Linux Operations
Jun 19, 2022 · Big Data

Visualizing Kafka: Core Concepts Explained with Diagrams

This article provides a diagram‑driven walkthrough of Kafka’s fundamental concepts—including topics, partitions, producers, consumers, consumer groups, and cluster architecture—explaining how messages flow, are stored, and achieve reliability and ordering within a distributed streaming system.

Cluster ArchitectureKafkaPartitions
0 likes · 6 min read
Visualizing Kafka: Core Concepts Explained with Diagrams
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 14, 2022 · Big Data

Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?

This article examines the core trade‑offs of data warehouses—freshness, query latency, and cost—compares offline and real‑time architectures, introduces the concept of a streaming data warehouse, and details how Apache Flink Table Store aims to provide a unified, low‑cost solution.

Big DataData WarehouseFlink
0 likes · 19 min read
Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?
Efficient Ops
Efficient Ops
Jun 7, 2022 · Big Data

Visualizing Kafka: Core Concepts Explained with Diagrams

This article visually breaks down Kafka’s fundamental concepts—including topics, partitions, producers, consumers, consumer groups, and cluster architecture—so readers can grasp how messages flow, are stored, and achieve load balancing and ordering within a distributed streaming platform.

Distributed SystemsKafkaMessage Queue
0 likes · 6 min read
Visualizing Kafka: Core Concepts Explained with Diagrams
DataFunTalk
DataFunTalk
May 23, 2022 · Big Data

Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions

ByteDance shares its real‑time data lake implementation, covering the evolving definition of data lakes, six core capabilities, challenges such as data management, weak concurrent updates, performance, and log ingestion, and detailed solutions including Hudi Metastore Server, bucket indexing, multi‑stage use cases, and future roadmap.

Batch ProcessingHudiReal-time Data Lake
0 likes · 32 min read
Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions
Alibaba Cloud Developer
Alibaba Cloud Developer
May 18, 2022 · Big Data

Why Delta Lake Is Revolutionizing Data Lakes with ACID Guarantees

This article explains how Delta Lake adds reliability to data lakes by offering ACID transactions, scalable metadata, and unified batch‑and‑stream processing, outlines the challenges it solves, details its implementation principles, and demonstrates a practical demo for building an integrated data warehouse.

ACIDBig DataData Lake
0 likes · 9 min read
Why Delta Lake Is Revolutionizing Data Lakes with ACID Guarantees
Big Data Technology & Architecture
Big Data Technology & Architecture
May 15, 2022 · Big Data

Understanding Flink Window Table-Valued Functions (TVF) and Incremental Optimization

This article explains the concept of window table-valued functions in Flink, compares the old grouped‑window syntax with the new TVF syntax, details the physical and execution plans, introduces sliced windows for state reduction, and presents a small incremental‑output improvement with code examples.

Big DataFlinkIncremental Aggregation
0 likes · 12 min read
Understanding Flink Window Table-Valued Functions (TVF) and Incremental Optimization
Zuoyebang Tech Team
Zuoyebang Tech Team
May 9, 2022 · Big Data

How Flink SQL Powered Real‑Time Learning Analytics at Zuoyebang

Zuoyebang’s big‑data team shares how they evolved from SparkStreaming to a Flink‑SQL‑centric real‑time platform, detailing three development stages, challenges in DAG optimization, Redis‑based table design, and platform features for unified deployment, ease of use, and operational governance.

FlinkReal-TimeSQL
0 likes · 14 min read
How Flink SQL Powered Real‑Time Learning Analytics at Zuoyebang
58 Tech
58 Tech
May 5, 2022 · Big Data

Low-Code Real-Time Data Warehouse Construction System Using Flink

This article describes a low‑code, Flink‑based real‑time data‑warehouse construction system that abstracts the warehouse building process into ODS, DWD, DWS, and ADS layers, leverages a domain‑specific language and plugin engine to reduce development effort, and details its architecture, DSL design, plugin extensibility, dimension‑table completion, stream merging, aggregation, and storage strategies.

Big DataDSLFlink
0 likes · 11 min read
Low-Code Real-Time Data Warehouse Construction System Using Flink
Baidu Geek Talk
Baidu Geek Talk
Apr 18, 2022 · Backend Development

Boost Mini‑Program Install Speed by 21% with Streaming Download Pipeline

This article analyzes the performance bottlenecks of Baidu mini‑program package installation, proposes a streaming download approach that parallelizes network I/O with signature verification and decompression, and provides detailed Java implementation using a MultiPipe pipeline to achieve a 21% reduction in download time.

BackendDownload OptimizationJava
0 likes · 12 min read
Boost Mini‑Program Install Speed by 21% with Streaming Download Pipeline
DataFunSummit
DataFunSummit
Apr 6, 2022 · Big Data

Real-time Dimension Modeling with Flink SQL: Challenges and Solutions

This article presents a JD.com case study on applying Flink SQL for real‑time dimension modeling, detailing two complex streaming scenarios—full‑join of multiple streams and full‑group aggregation—along with the associated challenges of historical data handling, state management, and performance optimization, and proposes component‑based architectural solutions.

Big DataFlinkReal-Time
0 likes · 14 min read
Real-time Dimension Modeling with Flink SQL: Challenges and Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 28, 2022 · Big Data

Real-time Dimension Modeling with Flink SQL: Problems, Challenges, and Solutions

This article presents JD's real-time dimension modeling case using Flink SQL, detailing two complex streaming scenarios, the difficulties of handling historical data and state management, and a component‑based solution that leverages external KV stores and optimized Flink operators to improve performance and scalability.

Big DataFlinkReal-Time
0 likes · 13 min read
Real-time Dimension Modeling with Flink SQL: Problems, Challenges, and Solutions
Su San Talks Tech
Su San Talks Tech
Mar 27, 2022 · Big Data

Top 10 Advanced Kafka Interview Questions with In‑Depth Answers

This article provides a comprehensive collection of advanced Kafka interview questions covering core architecture, storage mechanisms, replication, leader election, controller responsibilities, consumer group rebalancing, message semantics, partition assignment strategies, performance tuning, and practical tips for handling large message backlogs.

ConsumerGroupKafkaReplication
0 likes · 33 min read
Top 10 Advanced Kafka Interview Questions with In‑Depth Answers
Baidu Geek Talk
Baidu Geek Talk
Mar 7, 2022 · Backend Development

Design and Implementation of GDP Streaming RPC Framework (Go Version of brpc Streaming)

The Go‑based GDP Streaming RPC framework extends Baidu’s internal brpc‑compatible RPC system with a high‑performance streaming transport that preserves message order, supports multiple concurrent streams per socket, offers customizable serialization and event‑driven handlers, and enables efficient large‑scale data transfers such as voice or replica synchronization, achieving comparable latency and throughput to the original C++ implementation.

GoRPCStreaming
0 likes · 12 min read
Design and Implementation of GDP Streaming RPC Framework (Go Version of brpc Streaming)
vivo Internet Technology
vivo Internet Technology
Feb 23, 2022 · Big Data

Kafka-based Real-Time Data Warehouse: Architecture and Practice for Search

The article explains how Kafka serves as the core of a real‑time data warehouse for search, detailing its advantages over traditional databases, integration with Flink for low‑latency stream processing, architectural patterns such as Lambda/Kappa, scaling challenges, and comprehensive monitoring using Kafka Eagle.

Apache KafkaData IntegrationFlink
0 likes · 15 min read
Kafka-based Real-Time Data Warehouse: Architecture and Practice for Search
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 23, 2022 · Big Data

Understanding Mini‑Batch Streaming Aggregation in Flink SQL

This article explains Flink SQL’s streaming aggregation Mini‑Batch feature, covering its purpose, configuration parameters, underlying optimizer rules, operator implementations, watermark handling, buffer processing, and the optional Local‑Global two‑phase aggregation optimization for improving throughput and reducing state overhead in large‑scale data pipelines.

Big DataFlinkMini-Batch
0 likes · 10 min read
Understanding Mini‑Batch Streaming Aggregation in Flink SQL
Volcano Engine Developer Services
Volcano Engine Developer Services
Feb 16, 2022 · Big Data

ByteDance’s Journey to a Unified Data Lake with Flink and Hudi

This article recounts ByteDance’s evolution from batch‑only Flink pipelines to a unified data‑lake integration platform, detailing the three integration modes, challenges with Spark‑based CDC, the decision to adopt Hudi over Iceberg, and how Hudi’s indexing and Merge‑On‑Read formats enable near‑real‑time analytics at massive scale.

CDCFlinkHudi
0 likes · 10 min read
ByteDance’s Journey to a Unified Data Lake with Flink and Hudi
Architect
Architect
Feb 12, 2022 · Big Data

In-Depth Overview of Apache Kafka Architecture and Core Concepts

This article provides a comprehensive introduction to Apache Kafka, covering its distributed streaming platform features, message queue patterns, topic and partition design, broker and cluster roles, producer and consumer mechanics, partition assignment strategies, data storage, reliability guarantees, and performance optimizations such as zero‑copy and batch processing.

ConsumerMessage QueueProducer
0 likes · 23 min read
In-Depth Overview of Apache Kafka Architecture and Core Concepts
Su San Talks Tech
Su San Talks Tech
Jan 27, 2022 · Big Data

Why Kafka 2.8 Is Dropping Zookeeper and What It Means for You

This article explains how Kafka 2.8 removes its dependency on Zookeeper, describes the roles of brokers, topics, partitions, and the controller in the Zookeeper‑based architecture, and outlines the KIP‑500 upgrade that replaces Zookeeper with a quorum‑based KRaft controller to improve scalability and operational simplicity.

Distributed SystemsKIP-500KRaft
0 likes · 9 min read
Why Kafka 2.8 Is Dropping Zookeeper and What It Means for You
Baidu Geek Talk
Baidu Geek Talk
Jan 26, 2022 · Big Data

How a Real‑Time CDP Solves Data Silos: Architecture, Tech Choices & Lessons

This article examines the design and implementation of a tenant‑level real‑time Customer Data Platform, detailing CDP fundamentals, business and technical challenges, key architectural components, technology selections such as graph databases, stream processing, storage engines, and the operational practices that enable high‑throughput, low‑latency data integration and analytics.

CDPData IntegrationFlink
0 likes · 42 min read
How a Real‑Time CDP Solves Data Silos: Architecture, Tech Choices & Lessons
JD Cloud Developers
JD Cloud Developers
Jan 13, 2022 · Fundamentals

How AVS2 Outperforms HEVC: Inside China’s Next‑Gen Video Codec

The article introduces China’s AVS2 video codec, detailing its standards, technical implementation, code extensions for FLV and HLS, bitstream structure, performance comparisons with HEVC and x265, and JD Cloud’s current support and future plans for commercial deployment.

AVS2HEVCStreaming
0 likes · 7 min read
How AVS2 Outperforms HEVC: Inside China’s Next‑Gen Video Codec
dbaplus Community
dbaplus Community
Jan 5, 2022 · Big Data

How ByteDance Optimized Flink SQL for Real‑World Streaming at Scale

This article details ByteDance's practical experience with Apache Flink, covering SQL extensions, a visual SQL platform, performance tweaks such as window mini‑batching and custom windows, join and checkpoint recovery improvements, stream‑batch integration experiments, and future roadmap plans.

Batch IntegrationCheckpointFlink
0 likes · 16 min read
How ByteDance Optimized Flink SQL for Real‑World Streaming at Scale
DataFunTalk
DataFunTalk
Jan 1, 2022 · Big Data

JD's Flink Journey: Evolution, Optimizations, and Future Directions

This article details JD's adoption of Flink for real‑time computing, covering its evolution from Storm to Flink on Kubernetes, the platform architecture, major optimization techniques such as preview topology, backpressure handling, dynamic rebalance, checkpoint‑as‑savepoint, and outlines future plans including stream‑batch integration, stability improvements, intelligent operations, and AI integration.

Big DataFlinkJD
0 likes · 10 min read
JD's Flink Journey: Evolution, Optimizations, and Future Directions
Tencent Cloud Developer
Tencent Cloud Developer
Dec 28, 2021 · Industry Insights

How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses

This article analyzes the challenges of massive data query efficiency, explains how Flink's stream processing and ClickHouse's OLAP engine complement each other, and presents a layered real‑time data‑warehouse architecture with practical guidance on data ingestion, write strategies, quality assurance, and evolving batch‑stream integration patterns.

Big DataClickHouseFlink
0 likes · 19 min read
How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses
Su San Talks Tech
Su San Talks Tech
Dec 28, 2021 · Big Data

What Makes Kafka the Backbone of Real‑Time Big Data Processing?

This article provides a comprehensive overview of Apache Kafka, covering its distributed architecture, key advantages and drawbacks, the role of ZooKeeper, message delivery semantics, partitioning strategies, storage mechanisms, and performance optimizations such as zero‑copy and batch processing, all essential for high‑throughput real‑time data pipelines.

Big DataDistributed MessagingStreaming
0 likes · 23 min read
What Makes Kafka the Backbone of Real‑Time Big Data Processing?
Youku Technology
Youku Technology
Dec 10, 2021 · Mobile Development

Overview and Architecture Design of Youku Playback SDK Kernel with Performance Optimization

The Youku Playback SDK kernel provides a cross‑platform, high‑reliability framework that decouples data acquisition, decoding, and rendering into independent AVSource, AVDecoder, and AVRender modules, enabling efficient thread usage, configurable builds for partners, adaptive buffering, health monitoring, and comprehensive error handling for optimal playback performance.

DRMMedia KernelPerformance Optimization
0 likes · 10 min read
Overview and Architecture Design of Youku Playback SDK Kernel with Performance Optimization
Java Architect Essentials
Java Architect Essentials
Dec 7, 2021 · Big Data

Apache Kafka 3.0 Release Highlights and New Features

The article provides a comprehensive overview of Apache Kafka 3.0, detailing its core APIs, two main use‑cases, major feature additions, deprecations, KRaft consensus improvements, enhanced producer guarantees, and numerous KIP‑driven changes across the broker, client, Connect, Streams, and MirrorMaker components.

Apache KafkaEvent StreamingKIP
0 likes · 14 min read
Apache Kafka 3.0 Release Highlights and New Features
Programmer DD
Programmer DD
Nov 27, 2021 · Operations

How Netflix’s Open Connect CDN Powers Seamless Streaming Worldwide

Netflix’s Open Connect CDN, a proprietary content‑delivery network built over a decade, strategically places millions of server copies close to ISPs, uses multiple bitrate replicas, and dynamically shifts content to flash storage, ensuring high‑quality streaming even during peak demand and network outages.

CDNInfrastructureNetflix
0 likes · 12 min read
How Netflix’s Open Connect CDN Powers Seamless Streaming Worldwide
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Nov 19, 2021 · Big Data

Real‑Time Data Warehouse Practices with Apache Kudu: Architecture, Partitioning, and Platformization

This article reviews the challenges of building a real‑time data warehouse, compares Lambda and Kappa architectures, introduces Apache Kudu’s master‑tablet design, storage model and partition strategies, and shares practical experiences and future directions for a Kudu‑based streaming analytics platform.

Apache KuduBig DataKappa architecture
0 likes · 8 min read
Real‑Time Data Warehouse Practices with Apache Kudu: Architecture, Partitioning, and Platformization
Douyu Streaming
Douyu Streaming
Nov 12, 2021 · Fundamentals

How FLV and RTP Interact in Douyu’s Low‑Latency WebRTC Streaming

This article explains the end‑to‑end workflow of Douyu’s fast live streaming system, detailing how FLV tags are converted to RTP packets and back, covering WebRTC’s SDP/ICE/DTLS handshake, FLV and RTP header structures, payload formats for audio (OPUS) and video (H.264), and the server‑side processing pipeline.

FLVRTPStreaming
0 likes · 19 min read
How FLV and RTP Interact in Douyu’s Low‑Latency WebRTC Streaming
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration
0 likes · 29 min read
Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough