Tagged articles

560 articles

Page 3 of 6

May 5, 2023 · Backend Development

Exporting Millions of Records with JPA and MyBatis Using Streaming and CSV in Spring Boot

This article explains how to avoid OutOfMemoryError when exporting massive MySQL datasets by streaming data with JPA or MyBatis, writing each record directly to a CSV file, and provides complete Spring Boot code examples, performance comparisons, and deployment tips.

CSVLargeDataExportMemoryOptimization

0 likes · 11 min read

Exporting Millions of Records with JPA and MyBatis Using Streaming and CSV in Spring Boot

Java High-Performance Architecture

May 3, 2023 · Backend Development

Export Millions of MySQL Records with SpringBoot Without OOM

This article explains how to export large MySQL datasets in SpringBoot by streaming data directly to CSV, avoiding full‑memory loads that cause OutOfMemoryError, and provides complete JPA and MyBatis implementations, performance testing, and practical code examples for production use.

CSVData ExportMyBatis

0 likes · 13 min read

Export Millions of MySQL Records with SpringBoot Without OOM

Bilibili Tech

Apr 28, 2023 · Cloud Native

Remote StateBackend for Flink: Design, Optimizations, and Cloud‑Native Migration

To enable Bilibili’s cloud‑native migration, the team built a RemoteStateBackend that moves Flink’s keyed state to the Taishan KV store, using deterministic KeyGroup placement, per‑shard snapshots, asynchronous write batching, off‑heap caching with Bloom‑filter filtering, and a fixed‑size memory model, which together reduce checkpoint overhead, improve disk utilization, and accelerate rescaling for more than one hundred production jobs.

CloudNativeFlinkPerformanceOptimization

0 likes · 18 min read

Remote StateBackend for Flink: Design, Optimizations, and Cloud‑Native Migration

Architects Research Society

Apr 25, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the purpose, semantics, and design of Apache Kafka's transaction API, describes how exactly‑once processing is achieved in stream‑processing applications, outlines the Java client usage, and discusses the internal components, performance considerations, and best‑practice tips for developers.

Distributed SystemsExactly-OnceJava

0 likes · 16 min read

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

IT Architects Alliance

Mar 28, 2023 · Big Data

Kafka Core Concepts, Architecture, Performance Optimization, and Production Deployment Guide

This comprehensive guide explains Kafka's core value as a message queue, its fundamental concepts, cluster architecture, high‑performance data handling, resource planning for large‑scale deployments, operational tools, consumer‑group mechanics, offset management, rebalance strategies, and custom partitioner implementation.

DeploymentReplicationStreaming

0 likes · 29 min read

Kafka Core Concepts, Architecture, Performance Optimization, and Production Deployment Guide

Big Data Technology & Architecture

Mar 27, 2023 · Big Data

Key Updates in Apache Flink 1.17: Batch and Streaming Enhancements

The article reviews Apache Flink 1.17's major batch and streaming improvements, including new Delete/Update APIs, performance boosts, SQL client gateway, checkpoint and watermark enhancements, StateBackend upgrades, and practical use‑case scenarios for data engineers.

Apache FlinkBatch ProcessingBig Data

0 likes · 7 min read

Key Updates in Apache Flink 1.17: Batch and Streaming Enhancements

Qunar Tech Salon

Mar 21, 2023 · Frontend Development

Performance Analysis of React CSR, SSR, and React 18 Streaming SSR

This article examines how different React rendering strategies—client‑side rendering, server‑side rendering, and the new React 18 Streaming SSR—affect key web performance metrics such as TTI, FCP, and first‑paint, and demonstrates substantial latency reductions achieved through streaming and selective hydration.

SSRStreamingTTI

0 likes · 11 min read

Performance Analysis of React CSR, SSR, and React 18 Streaming SSR

Architect

Mar 17, 2023 · Backend Development

Million‑Scale Data Export with JPA and MyBatis in Spring Boot

This article explains how to export tens of millions of rows from MySQL using Spring Boot by streaming data with JPA or MyBatis, avoiding OutOfMemoryError, switching to CSV format, and provides complete code examples, performance comparison, and tips for generating test data.

CSVDataExportMyBatis

0 likes · 12 min read

Million‑Scale Data Export with JPA and MyBatis in Spring Boot

DataFunTalk

Mar 9, 2023 · Big Data

Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang

This article presents a comprehensive case study of Manbang's real‑time data platform, detailing its business background, cloud‑native Flink + Hologres architecture, migration from self‑built clusters, real‑time product features, decision‑making workflows, and future roadmap, highlighting performance and cost benefits.

FlinkLogisticsStreaming

0 likes · 16 min read

Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang

Selected Java Interview Questions

Mar 7, 2023 · Backend Development

Streaming Large-Scale Data Export in SpringBoot Using JPA and MyBatis to Avoid OOM

The article explains how to prevent OutOfMemoryError when exporting massive MySQL datasets by streaming records with JPA or MyBatis, writing them directly to CSV files, and demonstrates significant memory savings compared to traditional batch export methods.

CSVDataExportMyBatis

0 likes · 10 min read

Streaming Large-Scale Data Export in SpringBoot Using JPA and MyBatis to Avoid OOM

Architecture Digest

Mar 1, 2023 · Databases

Handling Large Data Queries in MySQL with MyBatis: Regular, Stream, and Cursor Approaches

To efficiently process massive MySQL result sets without exhausting JVM memory, this article explains three query strategies—regular pagination, stream-based retrieval using MyBatis cursors, and cursor-based fetching with configurable fetchSize—detailing their implementations, advantages, and practical considerations.

CursorLargeDataMyBatis

0 likes · 9 min read

Handling Large Data Queries in MySQL with MyBatis: Regular, Stream, and Cursor Approaches

Alibaba Cloud Big Data AI Platform

Mar 1, 2023 · Big Data

How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain

This article describes the challenges of a highly complex supply‑chain system, the evolution from early MySQL‑based reporting to a modern real‑time data platform using Flink, Kafka, ClickHouse, Hologres and other cloud services, and the tools and lessons learned to achieve low‑latency, high‑throughput analytics.

ClickHouseFlinkKafka

0 likes · 11 min read

How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain

DataFunSummit

Feb 28, 2023 · Big Data

Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans

This article introduces the Iceberg table format, explains its core architecture and advantages such as transactionality, implicit partitioning and row‑level updates, details Xiaomi's practical deployments—including CDC pipelines, partition strategies, compaction services, and stream‑batch integration—and outlines future development directions.

Data LakeFlinkIceberg

0 likes · 20 min read

Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans

DeWu Technology

Feb 24, 2023 · Big Data

Real-Time Data Architecture Evolution for a Complex Supply Chain

The article traces Dewu’s supply‑chain data platform from slow MySQL reporting through early CDC‑based wide tables to a Flink‑Kafka‑ClickHouse 1.0 design, then to a more scalable Flink‑Kafka‑Hologres 2.0 architecture that solves upsert and compute‑storage separation, while detailing key operational tricks, code‑generation tools, and future plans for lake‑house integration.

Big DataClickHouseFlink

0 likes · 10 min read

Real-Time Data Architecture Evolution for a Complex Supply Chain

Alimama Tech

Feb 15, 2023 · Big Data

Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview

Dolphin, Alibaba’s hyper‑converged multi‑modal big‑data engine, unifies OLAP, AI, streaming, and batch workloads on a decoupled compute‑storage MPP foundation, offering a Dolphin SQL layer, advanced bitmap/GroupTable/AFile indexes, intelligent materialization, and one‑write‑multiple‑read storage that cuts costs over 70% while delivering sub‑millisecond queries on trillion‑row datasets.

AIBig DataOLAP

0 likes · 14 min read

Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview

Architecture Digest

Feb 9, 2023 · Big Data

Understanding Kafka Messages, Topics, Partitions, and Consumers

This article explains Kafka's core concepts—including messages as byte arrays, optional keys for partition control, topic and partition organization, producer and consumer roles, offsets, consumer groups, and broker clusters—providing a concise technical overview for developers learning Kafka.

ConsumerKafkaMessage

0 likes · 6 min read

Understanding Kafka Messages, Topics, Partitions, and Consumers

Big Data Technology & Architecture

Feb 8, 2023 · Big Data

Enabling Early‑Fire Window Computation in Flink SQL for Real‑Time Metrics

This article explains how to configure Flink SQL to emit early‑fire results for tumbling windows, allowing real‑time aggregation of metrics like PV and UV, and provides complete example code, execution output, and a discussion of current limitations.

Early FireFlinkKafka

0 likes · 10 min read

Enabling Early‑Fire Window Computation in Flink SQL for Real‑Time Metrics

Bilibili Tech

Jan 31, 2023 · Big Data

Design and Optimization of Real-Time Data Quality Control (DQC) Platform on Bilibili's Big Data System

Bilibili redesigned its real-time data-quality control platform by replacing per-rule Flink jobs with a unified, dynamically-configured architecture that classifies Kafka topics, aggregates via InfluxDB full-table and continuous queries, mitigates data inflation, adds a high-performance proxy, and implements robust monitoring and recovery to ensure scalable, reliable data quality for its big-data services.

Big DataDQCFlink

0 likes · 22 min read

Design and Optimization of Real-Time Data Quality Control (DQC) Platform on Bilibili's Big Data System

ITPUB

Jan 22, 2023 · Big Data

How Flink Table Store Powers Real‑Time Financial Data Warehousing

This article details a banking‑focused real‑time data‑warehouse solution that leverages Flink Table Store to handle both incremental fact‑table updates and full‑table dimension calculations, compares three traditional approaches, and explains data ingestion, query modes, export options, and future streaming‑warehouse directions.

BankingELTFlink

0 likes · 20 min read

How Flink Table Store Powers Real‑Time Financial Data Warehousing

Sohu Tech Products

Jan 18, 2023 · Big Data

Root Cause Analysis of Flink TaskManager Failover Causing Data Reprocessing and Business Impact

An incident report details how a scheduled machine reboot on Alibaba Cloud triggered a Flink TaskManager failover, leading to excessive data replay, increased ES pressure, and significant business latency, and explains the root cause involving disabled checkpoints and timestamp‑based offset consumption.

CheckpointFlinkKafka

0 likes · 10 min read

Root Cause Analysis of Flink TaskManager Failover Causing Data Reprocessing and Business Impact

DataFunSummit

Jan 8, 2023 · Big Data

Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis

This article presents a comprehensive overview of evolving big‑data architectures—from classic offline warehouses to Lambda and Kappa models—and details a streaming‑batch integrated solution that addresses latency, data freshness, and multi‑table join challenges to achieve minute‑level real‑time multi‑dimensional analytics.

Batch ProcessingData WarehouseKappa architecture

0 likes · 18 min read

Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis

Big Data Technology & Architecture

Jan 3, 2023 · Big Data

Migrating Hive SQL Jobs to Flink Using the SQL Gateway

This article explains how to use Apache Flink 1.16's SQL Gateway to migrate Hive SQL tasks to Flink, covering the underlying Hive‑on‑Flink architecture, dialect compatibility, streaming and batch demos, configuration details, and practical tips for developers and platform engineers.

Batch ProcessingBig DataFlink

0 likes · 19 min read

Migrating Hive SQL Jobs to Flink Using the SQL Gateway

Java Architect Essentials

Dec 29, 2022 · Databases

Implementing Streaming Reads with MyBatis and JDBC for Large Report Exports

This article explains how to overcome export failures for reports exceeding ten thousand rows by using MyBatis streaming reads with JDBC, detailing the three read modes, required MyBatis configurations, and providing complete controller, service, DAO, and mapper code examples.

JDBCMyBatisResultHandler

0 likes · 5 min read

Implementing Streaming Reads with MyBatis and JDBC for Large Report Exports

Big Data Technology & Architecture

Dec 15, 2022 · Big Data

Migrating Hive SQL to Flink SQL: Motivation, Challenges, Practice, Demo, and Future Plans

This technical article presents a comprehensive overview of migrating Hive SQL to Flink SQL, covering the motivations behind the migration, key challenges such as compatibility, stability and performance, practical implementation steps, a detailed demo, future development directions, and a Q&A session addressing common concerns.

Batch ProcessingBig DataData Lake

0 likes · 13 min read

Migrating Hive SQL to Flink SQL: Motivation, Challenges, Practice, Demo, and Future Plans

Architect's Tech Stack

Dec 8, 2022 · Backend Development

Implementing Streaming Reads with MyBatis for Large-Scale Java Report Export

This article explains how to overcome export failures caused by large data volumes in a legacy Java system by switching from default full-result JDBC reads to a forward‑only streaming approach using MyBatis, detailing environment setup, configuration changes, and complete code examples for controller, service, DAO, and mapper layers.

JDBCJavaMyBatis

0 likes · 5 min read

Implementing Streaming Reads with MyBatis for Large-Scale Java Report Export

政采云技术

Dec 8, 2022 · Big Data

Understanding Flink's Asynchronous Barrier Snapshotting (ABS) Checkpoint Algorithm

This article explains the Asynchronous Barrier Snapshotting algorithm used by Apache Flink for checkpointing, detailing its origins from the Chandy‑Lamport algorithm, its operation in both acyclic and cyclic dataflow graphs, barrier alignment, and the fault‑recovery process.

Asynchronous Barrier SnapshottingCheckpointDistributed Systems

0 likes · 10 min read

Understanding Flink's Asynchronous Barrier Snapshotting (ABS) Checkpoint Algorithm

Bilibili Tech

Nov 29, 2022 · Big Data

How Bilibili Supercharged Flink: Checkpoint, HA, and Runtime Optimizations

This article details Bilibili's extensive enhancements to Flink's runtime—including checkpoint recoverability, operator ID stability, state processor extensions, hybrid high‑availability, regional checkpointing, and load‑based channel selection—to improve scalability, reliability, and operational efficiency of large‑scale streaming jobs.

Big DataCheckpointFlink

0 likes · 32 min read

How Bilibili Supercharged Flink: Checkpoint, HA, and Runtime Optimizations

Big Data Technology & Architecture

Nov 28, 2022 · Big Data

Comprehensive Guide to Big Data Interview Topics: Log Collection, Data Synchronization, Offline Development, Real‑time Technology, Data Services, and Data Mining

This article provides an extensive overview of big‑data interview subjects, covering browser and mobile log collection methods, data synchronization techniques (batch, real‑time, sharding), offline data development platforms, streaming architectures, data service evolution, performance optimization, and data‑mining layers and applications.

Big DataStreamingdata mining

0 likes · 17 min read

Comprehensive Guide to Big Data Interview Topics: Log Collection, Data Synchronization, Offline Development, Real‑time Technology, Data Services, and Data Mining

ITPUB

Nov 18, 2022 · Big Data

How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes

This article introduces Iceberg’s table‑format fundamentals, details Xiaomi’s large‑scale deployment of Iceberg for CDC and log ingestion, explores their streaming‑batch integration experiments, outlines future roadmap items, and provides a comprehensive Q&A covering practical challenges and solutions.

Batch ProcessingBig DataData Lake

0 likes · 23 min read

How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes

DataFunTalk

Nov 13, 2022 · Big Data

Iceberg Data Lake: Technology Overview, Xiaomi Practices, and Stream‑Batch Integration

This article presents an overview of the Iceberg table format, its core architecture and advantages, details Xiaomi’s large‑scale deployment and use cases, explores stream‑batch integration with Spark and Flink, outlines data correction methods, future plans, and answers common technical questions.

Data LakeFlinkIceberg

0 likes · 20 min read

Iceberg Data Lake: Technology Overview, Xiaomi Practices, and Stream‑Batch Integration

DataFunTalk

Nov 9, 2022 · Artificial Intelligence

Design and Usage of Flink ML Java and Python APIs, Ecosystem Construction, and Future Directions

This article introduces the Flink Machine Learning Library, detailing the design and usage of its Java and Python APIs, core interfaces such as WithParams, Stage, Estimator, and AlgoOperator, workflow for training and inference, pipeline/graph construction, ecosystem initiatives, and upcoming development plans.

AIFlinkJava API

0 likes · 12 min read

Design and Usage of Flink ML Java and Python APIs, Ecosystem Construction, and Future Directions

DataFunTalk

Nov 6, 2022 · Big Data

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

BitSail, an open‑source data integration engine from ByteDance, provides a unified solution for batch, streaming, full‑load, and incremental data synchronization across heterogeneous sources, detailing its background, technical evolution, architecture, low‑cost co‑building features, compatibility strategies, and future roadmap.

CDCData IntegrationFlink

0 likes · 18 min read

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

DataFunSummit

Nov 4, 2022 · Big Data

Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions

ByteDance’s data platform team explains their real‑time data lake implementation, covering its evolving definition, six core capabilities, challenges such as data management, concurrent updates, performance and log ingestion, and detailed case studies of multi‑stage deployment, indexing, metadata services, and future roadmap.

HudiReal-time Data LakeStreaming

0 likes · 32 min read

Real-Time Data Lake Practice at ByteDance: Architecture, Challenges, and Solutions

21CTO

Nov 1, 2022 · Cloud Computing

How Netflix Engineers Seamless Streaming with Cloud‑Based Encoding and CDN

Netflix delivers billions of hours of video by compressing and transcoding raw movies into multiple formats, splitting them into small chunks processed in parallel on AWS, storing them in S3, and distributing them via its custom Open Connect CDN to ensure low‑latency, high‑quality playback worldwide.

AWSCDNNetflix

0 likes · 10 min read

How Netflix Engineers Seamless Streaming with Cloud‑Based Encoding and CDN

Mike Chen's Internet Architecture

Oct 27, 2022 · Backend Development

Understanding Kafka: Architecture, Principles, Features, and Use Cases

This article explains Kafka's distributed publish‑subscribe architecture, detailing its core components, underlying mechanisms with Zookeeper coordination, key features such as high throughput and fault tolerance, and common application scenarios like log collection, user activity tracking, and stream processing.

Backend ArchitectureDistributed MessagingKafka

0 likes · 5 min read

Understanding Kafka: Architecture, Principles, Features, and Use Cases

DataFunTalk

Oct 19, 2022 · Big Data

Understanding Flink Table Store: Design, Usage, and Roadmap

Flink Table Store, an Apache Flink subproject, provides a unified stream‑batch storage layer with SQL‑based table APIs, addressing real‑time and offline data needs, detailing its design goals, usage patterns, architectural layers, implementation choices, and upcoming roadmap.

FlinkLSM‑TreeStreaming

0 likes · 14 min read

Understanding Flink Table Store: Design, Usage, and Roadmap

DataFunTalk

Oct 17, 2022 · Big Data

Thoughts and Practices on ByteDance Streaming Data Warehouse and Real‑Time Service Analysis

The article presents ByteDance's challenges with massive real‑time data processing and describes how they integrated a streaming data warehouse with Flink Table Store, cloud‑native architecture, and real‑time service analysis to achieve low‑latency, high‑throughput analytics and end‑to‑end consistency.

FlinkReal-time analyticsStreaming

0 likes · 13 min read

Thoughts and Practices on ByteDance Streaming Data Warehouse and Real‑Time Service Analysis

ITPUB

Oct 15, 2022 · Big Data

Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes

This talk introduces the evolution of data lakes, outlines Apache Hudi’s core features, details the Flink‑Hudi integration architecture—including write pipelines, small‑file handling, and read strategies—covers real‑world use cases such as near‑real‑time DB ingestion, OLAP, and ETL, and previews upcoming Hudi roadmap items.

Apache HudiBig DataData Lake

0 likes · 21 min read

Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes

DataFunTalk

Oct 14, 2022 · Big Data

Exploring Flink and Apache Hudi for Streaming Data Lakes: Design, Practices, and Roadmap

This article presents a comprehensive overview of using Flink with Apache Hudi to build streaming data lake solutions, covering Hudi's background, core features, Flink‑Hudi integration design, practical use cases, recent roadmap updates, and a Q&A session.

Apache HudiData LakeFlink

0 likes · 19 min read

Exploring Flink and Apache Hudi for Streaming Data Lakes: Design, Practices, and Roadmap

Java High-Performance Architecture

Oct 11, 2022 · Operations

How Meituan Optimized Kafka for Massive Scale: Reducing Latency and Managing Clusters

This article details Meituan's real‑world challenges with a 15,000‑node Kafka deployment and explains the application‑layer and system‑layer optimizations—such as disk balancing, migration pipeline acceleration, fetcher isolation, RAID acceleration, cgroup isolation, and an SSD‑based cache—that together dramatically cut read/write latency and simplify large‑scale cluster management.

Cluster ManagementMeituanStreaming

0 likes · 23 min read

How Meituan Optimized Kafka for Massive Scale: Reducing Latency and Managing Clusters

DataFunTalk

Oct 4, 2022 · Big Data

Near‑Real‑Time Data Lake Practices in TikTok E‑commerce Data Warehouse

The presentation by TikTok e‑commerce data‑warehouse engineer Ma Wenyuan explains data‑lake characteristics, near‑real‑time architecture, and practical e‑commerce use cases, highlighting Apache Hudi features, hybrid batch‑stream processing, and future challenges for scaling and integration.

Data LakeHudiStreaming

0 likes · 13 min read

Near‑Real‑Time Data Lake Practices in TikTok E‑commerce Data Warehouse

DataFunTalk

Sep 17, 2022 · Big Data

Real-Time Data Warehouse Practices with Hudi at ByteDance

This presentation details ByteDance's real‑time data‑warehouse implementations using Apache Hudi, covering scenario classifications, challenges of traditional offline warehouses, practical solutions for ingestion, upsert, validation, indexing, query optimization, and future plans for extensible indexing and unified batch‑stream processing.

Data LakeHudiStreaming

0 likes · 16 min read

Real-Time Data Warehouse Practices with Hudi at ByteDance

Top Architect

Sep 17, 2022 · Big Data

Meituan's Kafka Architecture: Challenges and Optimizations at Massive Scale

This article details how Meituan's Kafka platform, serving over 15,000 machines and handling petabytes of daily traffic, faces read/write latency, slow nodes, and large‑scale cluster management challenges, and describes a series of application‑layer, system‑layer, and operational optimizations—including disk balancing, migration pipelines, fetcher isolation, consumer async, SSD caching, isolation strategies, full‑link monitoring, lifecycle management, and TOR disaster recovery—to improve performance and reliability.

KafkaMeituanStreaming

0 likes · 22 min read

Meituan's Kafka Architecture: Challenges and Optimizations at Massive Scale

DataFunTalk

Sep 11, 2022 · Big Data

Flink Table Store v0.2: Application Scenarios, Core Features, and Future Roadmap

This article introduces Flink Table Store v0.2, explains its four primary application scenarios—offline warehouse acceleration, partial update, pre‑aggregation rollup, and real‑time warehouse enhancement—details the core lake‑storage architecture, bucket management, append‑only mode, and outlines the project’s future roadmap and trade‑off considerations.

BatchFlinkLake Storage

0 likes · 16 min read

Flink Table Store v0.2: Application Scenarios, Core Features, and Future Roadmap

DataFunTalk

Aug 25, 2022 · Big Data

Applying OpenMLDB for Efficient AI Toolchain and Data‑Driven Architecture at Akulaku

This article presents Akulaku’s practical experience with OpenMLDB, describing the company’s data‑driven requirements, the design of a unified stream‑batch architecture, implementation details across offline, online and RocksDB modes, and future recommendations for high‑performance, scenario‑agnostic big‑data processing.

AIBatch ProcessingOpenMLDB

0 likes · 17 min read

Applying OpenMLDB for Efficient AI Toolchain and Data‑Driven Architecture at Akulaku

Zhuanzhuan Tech

Aug 24, 2022 · Big Data

Real-Time Data Warehouse Architecture Using Flink: Design, Implementation, and Challenges

This article details the design and implementation of a real‑time data warehouse for an advertising platform, covering business background, challenges, a Lambda‑based architecture, Flink stream processing setup, ETL logic, sink handling, and performance results, concluding with future improvement directions.

ETLFlinkLambda architecture

0 likes · 11 min read

Real-Time Data Warehouse Architecture Using Flink: Design, Implementation, and Challenges

Hulu Beijing

Aug 19, 2022 · Artificial Intelligence

Disney’s M5 Model: Multi‑Modal, Multi‑Interest, Multi‑Scenario Boost for Streaming Recommendations

Disney’s Content Discovery team introduces M5, a multi‑modal, multi‑interest, multi‑scenario recall model that enhances VOD and live streaming recommendations by leveraging rich metadata, user behavior, and contextual features, outperforming baseline methods with significant hit‑ratio gains across Hulu and Disney+.

Deep LearningM5 modelRecommendation Systems

0 likes · 22 min read

Disney’s M5 Model: Multi‑Modal, Multi‑Interest, Multi‑Scenario Boost for Streaming Recommendations

ITPUB

Aug 13, 2022 · Big Data

How Alibaba Uses Flink to Power Massive Real‑Time Risk Control

This article explains how Alibaba leverages Flink to handle over 40 billion events per second across all business units, detailing risk‑control concepts, rule types, architectural stages, resource tuning, dynamic CEP, shared computing, and the FY23 roadmap for large‑scale streaming risk management.

AlibabaBig DataCEP

0 likes · 16 min read

How Alibaba Uses Flink to Power Massive Real‑Time Risk Control

Wukong Talks Architecture

Aug 9, 2022 · Big Data

Kafka Basics: 15 Key Questions and In‑Depth Answers

This comprehensive guide covers Kafka’s core concepts, architecture, Zookeeper role, producer sending modes, partitioning strategies, replica types, message deletion, performance optimizations, consumer models, offset management, and best‑practice recommendations for scaling and ensuring ordered delivery in distributed streaming systems.

PartitioningStreamingZooKeeper

0 likes · 31 min read

Kafka Basics: 15 Key Questions and In‑Depth Answers

IT Architects Alliance

Aug 3, 2022 · Big Data

Understanding Kafka Architecture: Topics, Partitions, Replication, Log Segmentation, Zero‑Copy, and Zookeeper Integration

This article explains Kafka's core concepts—including topics, partitions and replicas, log segment storage, leader‑follower mechanics, consumer groups, network threading model, zero‑copy I/O, and the essential role of Zookeeper for broker, topic, consumer, and offset management—providing a comprehensive overview for developers and architects.

Big DataKafkaStreaming

0 likes · 10 min read

Understanding Kafka Architecture: Topics, Partitions, Replication, Log Segmentation, Zero‑Copy, and Zookeeper Integration

HomeTech

Jul 20, 2022 · Big Data

Design and Implementation of a Real-Time Advertising Data Warehouse Using Flink and StarRocks

This article presents a comprehensive case study of building a real‑time advertising data warehouse at Auto Home, detailing the evaluation of streaming engines and storage solutions, the layered architecture design, implementation steps with Flink and StarRocks, monitoring practices, encountered issues, and future roadmap, demonstrating how second‑level data freshness and high accuracy were achieved.

FlinkStarRocksStreaming

0 likes · 10 min read

Design and Implementation of a Real-Time Advertising Data Warehouse Using Flink and StarRocks

Top Architect

Jul 20, 2022 · Big Data

Kafka Core Concepts: Basics, Producers/Consumers, Topics, Partitions, and Architecture

This article provides a comprehensive overview of Kafka, covering its fundamental concepts such as producers and consumers, topics and consumer groups, partitions and ordering, as well as the cluster architecture involving ZooKeeper, replication, and leader‑follower mechanisms, illustrated with diagrams.

Big DataMessage QueueStreaming

0 likes · 7 min read

Kafka Core Concepts: Basics, Producers/Consumers, Topics, Partitions, and Architecture

Open Source Linux

Jul 19, 2022 · Backend Development

Master Kafka Basics: Visual Guide to Topics, Partitions, and Architecture

This article visually explains Kafka's core concepts—including producers, consumers, topics, partitions, consumer groups, and cluster architecture—so readers can clearly understand how messages flow, are stored, and remain fault‑tolerant within a distributed streaming system.

BackendKafkaMessage Queue

0 likes · 6 min read

Master Kafka Basics: Visual Guide to Topics, Partitions, and Architecture

Top Architect

Jun 25, 2022 · Big Data

Kafka Core Concepts: Basics, Producers & Consumers, Topics, Partitions, and Architecture

This article provides a visual and textual overview of Kafka's fundamental concepts—including its role as a streaming/message queue system, the producer‑consumer model, topics and partitions, consumer groups, and the cluster architecture managed by ZooKeeper—while also noting promotional offers and community links.

Message QueueStreaming

0 likes · 9 min read

MaGe Linux Operations

Jun 19, 2022 · Big Data

Visualizing Kafka: Core Concepts Explained with Diagrams

This article provides a diagram‑driven walkthrough of Kafka’s fundamental concepts—including topics, partitions, producers, consumers, consumer groups, and cluster architecture—explaining how messages flow, are stored, and achieve reliability and ordering within a distributed streaming system.

Cluster ArchitectureKafkaPartitions

0 likes · 6 min read

Visualizing Kafka: Core Concepts Explained with Diagrams

Alibaba Cloud Developer

Jun 14, 2022 · Big Data

Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?

This article examines the core trade‑offs of data warehouses—freshness, query latency, and cost—compares offline and real‑time architectures, introduces the concept of a streaming data warehouse, and details how Apache Flink Table Store aims to provide a unified, low‑cost solution.

Big DataData WarehouseFlink

0 likes · 19 min read

Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?

Efficient Ops

Jun 7, 2022 · Big Data

Visualizing Kafka: Core Concepts Explained with Diagrams

This article visually breaks down Kafka’s fundamental concepts—including topics, partitions, producers, consumers, consumer groups, and cluster architecture—so readers can grasp how messages flow, are stored, and achieve load balancing and ordering within a distributed streaming platform.

Distributed SystemsKafkaMessage Queue

0 likes · 6 min read

IT Architects Alliance

Jun 1, 2022 · Big Data

Kafka Core Concepts: Producers, Consumers, Topics, Partitions, and Architecture

This article explains the fundamental concepts of Apache Kafka, covering its role as a streaming platform, the producer‑consumer model, how topics and partitions work, consumer groups for load balancing, message ordering, replication with leaders and followers, and the coordination role of ZooKeeper.

Big DataConsumerKafka

0 likes · 5 min read

Kafka Core Concepts: Producers, Consumers, Topics, Partitions, and Architecture

DataFunTalk

May 23, 2022 · Big Data

Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions

ByteDance shares its real‑time data lake implementation, covering the evolving definition of data lakes, six core capabilities, challenges such as data management, weak concurrent updates, performance, and log ingestion, and detailed solutions including Hudi Metastore Server, bucket indexing, multi‑stage use cases, and future roadmap.

Batch ProcessingHudiReal-time Data Lake

0 likes · 32 min read

Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions

Alibaba Cloud Developer

May 18, 2022 · Big Data

Why Delta Lake Is Revolutionizing Data Lakes with ACID Guarantees

This article explains how Delta Lake adds reliability to data lakes by offering ACID transactions, scalable metadata, and unified batch‑and‑stream processing, outlines the challenges it solves, details its implementation principles, and demonstrates a practical demo for building an integrated data warehouse.

ACIDBig DataData Lake

0 likes · 9 min read

Why Delta Lake Is Revolutionizing Data Lakes with ACID Guarantees

Big Data Technology & Architecture

May 15, 2022 · Big Data

Understanding Flink Window Table-Valued Functions (TVF) and Incremental Optimization

This article explains the concept of window table-valued functions in Flink, compares the old grouped‑window syntax with the new TVF syntax, details the physical and execution plans, introduces sliced windows for state reduction, and presents a small incremental‑output improvement with code examples.

Big DataFlinkIncremental Aggregation

0 likes · 12 min read

Understanding Flink Window Table-Valued Functions (TVF) and Incremental Optimization

Zuoyebang Tech Team

May 9, 2022 · Big Data

How Flink SQL Powered Real‑Time Learning Analytics at Zuoyebang

Zuoyebang’s big‑data team shares how they evolved from SparkStreaming to a Flink‑SQL‑centric real‑time platform, detailing three development stages, challenges in DAG optimization, Redis‑based table design, and platform features for unified deployment, ease of use, and operational governance.

FlinkReal-TimeSQL

0 likes · 14 min read

How Flink SQL Powered Real‑Time Learning Analytics at Zuoyebang

58 Tech

May 5, 2022 · Big Data

Low-Code Real-Time Data Warehouse Construction System Using Flink

This article describes a low‑code, Flink‑based real‑time data‑warehouse construction system that abstracts the warehouse building process into ODS, DWD, DWS, and ADS layers, leverages a domain‑specific language and plugin engine to reduce development effort, and details its architecture, DSL design, plugin extensibility, dimension‑table completion, stream merging, aggregation, and storage strategies.

Big DataDSLFlink

0 likes · 11 min read

Low-Code Real-Time Data Warehouse Construction System Using Flink

Big Data Technology & Architecture

Apr 27, 2022 · Big Data

Understanding Window Table-Valued Functions (TVF) in Flink and Their Optimizations

This article explains Flink's window table-valued functions (TVF), shows how they replace the old grouped‑window syntax with concrete SQL examples, describes the physical planning rules, introduces sliced windows for state efficiency, and presents a small incremental‑output improvement for cumulative windows.

Big DataFlinkSQL

0 likes · 11 min read

Understanding Window Table-Valued Functions (TVF) in Flink and Their Optimizations

Baidu Geek Talk

Apr 18, 2022 · Backend Development

Boost Mini‑Program Install Speed by 21% with Streaming Download Pipeline

This article analyzes the performance bottlenecks of Baidu mini‑program package installation, proposes a streaming download approach that parallelizes network I/O with signature verification and decompression, and provides detailed Java implementation using a MultiPipe pipeline to achieve a 21% reduction in download time.

BackendDownload OptimizationJava

0 likes · 12 min read

Boost Mini‑Program Install Speed by 21% with Streaming Download Pipeline

DataFunSummit

Apr 6, 2022 · Big Data

Real-time Dimension Modeling with Flink SQL: Challenges and Solutions

This article presents a JD.com case study on applying Flink SQL for real‑time dimension modeling, detailing two complex streaming scenarios—full‑join of multiple streams and full‑group aggregation—along with the associated challenges of historical data handling, state management, and performance optimization, and proposes component‑based architectural solutions.

Big DataFlinkReal-Time

0 likes · 14 min read

Real-time Dimension Modeling with Flink SQL: Challenges and Solutions

Youku Technology

Mar 30, 2022 · Operations

Ali266: Deployment of the H.266/VVC Codec on Youku – Architecture, Performance, and Business Impact

In January 2022 Youku became the first platform to deploy Alibaba’s Ali266 H.266/VVC codec, cutting bitrate by up to 40% and stall rates by 50% while delivering real‑time 720p/1080p encoding and 4K decoding on mobile devices, yielding significant bandwidth savings and power efficiency.

Ali266H.266Streaming

0 likes · 12 min read

Ali266: Deployment of the H.266/VVC Codec on Youku – Architecture, Performance, and Business Impact

Big Data Technology & Architecture

Mar 28, 2022 · Big Data

Real-time Dimension Modeling with Flink SQL: Problems, Challenges, and Solutions

This article presents JD's real-time dimension modeling case using Flink SQL, detailing two complex streaming scenarios, the difficulties of handling historical data and state management, and a component‑based solution that leverages external KV stores and optimized Flink operators to improve performance and scalability.

Big DataFlinkReal-Time

0 likes · 13 min read

Real-time Dimension Modeling with Flink SQL: Problems, Challenges, and Solutions

Su San Talks Tech

Mar 27, 2022 · Big Data

End‑to‑End Streaming Data Pipeline with Kafka, Flink, and Apache Griffin

This tutorial demonstrates how to build a complete streaming data pipeline by configuring JDK, MySQL, Hadoop, Hive, Spark, Kafka, and Griffin, generating test data with shell scripts, processing it with Flink, and validating data quality using Apache Griffin in a Spark‑based deployment.

Apache GriffinBig DataData Quality

0 likes · 13 min read

End‑to‑End Streaming Data Pipeline with Kafka, Flink, and Apache Griffin

Big Data Technology & Architecture

Mar 8, 2022 · Big Data

Flink CDC 2.0: Concepts, Architecture, and Hands‑On Implementation

This article introduces the fundamentals of Flink CDC, explains its application scenarios and underlying technologies, compares query‑based and log‑based CDC, showcases open‑source solutions, and provides detailed Java and SQL examples for building real‑time ETL pipelines with MySQL and Flink.

Apache FlinkChange Data CaptureETL

0 likes · 24 min read

Flink CDC 2.0: Concepts, Architecture, and Hands‑On Implementation

Baidu Geek Talk

Mar 7, 2022 · Backend Development

Design and Implementation of GDP Streaming RPC Framework (Go Version of brpc Streaming)

The Go‑based GDP Streaming RPC framework extends Baidu’s internal brpc‑compatible RPC system with a high‑performance streaming transport that preserves message order, supports multiple concurrent streams per socket, offers customizable serialization and event‑driven handlers, and enables efficient large‑scale data transfers such as voice or replica synchronization, achieving comparable latency and throughput to the original C++ implementation.

GoRPCStreaming

0 likes · 12 min read

Design and Implementation of GDP Streaming RPC Framework (Go Version of brpc Streaming)

Bitu Technology

Mar 1, 2022 · Product Management

US Streaming Industry Insights and Outlook: AVOD Growth, Tubi 2021 Performance, and Pandemic Impact

The article analyzes the rapid rise of ad‑supported streaming (AVOD) in the United States, highlighting Tubi's 2021 viewership surge, pandemic‑driven user behavior shifts, demographic trends, advertising revenue growth, and future content and hiring plans.

AVODAdvertisingMarket Trends

0 likes · 8 min read

US Streaming Industry Insights and Outlook: AVOD Growth, Tubi 2021 Performance, and Pandemic Impact

vivo Internet Technology

Feb 23, 2022 · Big Data

Kafka-based Real-Time Data Warehouse: Architecture and Practice for Search

The article explains how Kafka serves as the core of a real‑time data warehouse for search, detailing its advantages over traditional databases, integration with Flink for low‑latency stream processing, architectural patterns such as Lambda/Kappa, scaling challenges, and comprehensive monitoring using Kafka Eagle.

Apache KafkaData IntegrationFlink

0 likes · 15 min read

Kafka-based Real-Time Data Warehouse: Architecture and Practice for Search

Big Data Technology & Architecture

Feb 23, 2022 · Big Data

Understanding Mini‑Batch Streaming Aggregation in Flink SQL

This article explains Flink SQL’s streaming aggregation Mini‑Batch feature, covering its purpose, configuration parameters, underlying optimizer rules, operator implementations, watermark handling, buffer processing, and the optional Local‑Global two‑phase aggregation optimization for improving throughput and reducing state overhead in large‑scale data pipelines.

Big DataFlinkMini-Batch

0 likes · 10 min read

Understanding Mini‑Batch Streaming Aggregation in Flink SQL

IT Architects Alliance

Feb 22, 2022 · Big Data

Understanding Kafka's Core Design: Topics, Partitions, Consumer Groups, and Cluster Architecture

This article explains Kafka's fundamental concepts—including topics, partitions, producers, consumers, replication, consumer groups, and the role of Zookeeper—while also covering performance optimizations such as sequential writes, zero‑copy, log segmentation, and its reactor‑style network design.

Big DataKafkaStreaming

0 likes · 11 min read

Understanding Kafka's Core Design: Topics, Partitions, Consumer Groups, and Cluster Architecture

Architects Research Society

Feb 19, 2022 · Big Data

Understanding Apache Kafka Transactions: Semantics, API Usage, and Practical Guidance

This article explains the design goals, exactly‑once semantics, Java transaction API, internal components such as the coordinator and transaction log, data‑flow interactions, performance considerations, and best‑practice tips for using Apache Kafka transactions in stream‑processing applications.

Distributed SystemsJavaKafka

0 likes · 18 min read

Understanding Apache Kafka Transactions: Semantics, API Usage, and Practical Guidance

Volcano Engine Developer Services

Feb 16, 2022 · Big Data

ByteDance’s Journey to a Unified Data Lake with Flink and Hudi

This article recounts ByteDance’s evolution from batch‑only Flink pipelines to a unified data‑lake integration platform, detailing the three integration modes, challenges with Spark‑based CDC, the decision to adopt Hudi over Iceberg, and how Hudi’s indexing and Merge‑On‑Read formats enable near‑real‑time analytics at massive scale.

CDCFlinkHudi

0 likes · 10 min read

ByteDance’s Journey to a Unified Data Lake with Flink and Hudi

Selected Java Interview Questions

Feb 13, 2022 · Databases

Implementing Streaming Reads with MyBatis to Export Large Datasets

To overcome export failures when report data exceeds ten thousand rows, this guide demonstrates how to configure MyBatis for forward-only streaming reads by adjusting JDBC settings, adding ResultHandler callbacks, and modifying mapper XML, enabling efficient memory usage during large result set processing.

JDBCJavaMyBatis

0 likes · 5 min read

Implementing Streaming Reads with MyBatis to Export Large Datasets

Architect

Feb 12, 2022 · Big Data

In-Depth Overview of Apache Kafka Architecture and Core Concepts

This article provides a comprehensive introduction to Apache Kafka, covering its distributed streaming platform features, message queue patterns, topic and partition design, broker and cluster roles, producer and consumer mechanics, partition assignment strategies, data storage, reliability guarantees, and performance optimizations such as zero‑copy and batch processing.

ConsumerMessage QueueProducer

0 likes · 23 min read

In-Depth Overview of Apache Kafka Architecture and Core Concepts

Su San Talks Tech

Jan 27, 2022 · Big Data

Why Kafka 2.8 Is Dropping Zookeeper and What It Means for You

This article explains how Kafka 2.8 removes its dependency on Zookeeper, describes the roles of brokers, topics, partitions, and the controller in the Zookeeper‑based architecture, and outlines the KIP‑500 upgrade that replaces Zookeeper with a quorum‑based KRaft controller to improve scalability and operational simplicity.

Distributed SystemsKIP-500KRaft

0 likes · 9 min read

Why Kafka 2.8 Is Dropping Zookeeper and What It Means for You

Baidu Geek Talk

Jan 26, 2022 · Big Data

How a Real‑Time CDP Solves Data Silos: Architecture, Tech Choices & Lessons

This article examines the design and implementation of a tenant‑level real‑time Customer Data Platform, detailing CDP fundamentals, business and technical challenges, key architectural components, technology selections such as graph databases, stream processing, storage engines, and the operational practices that enable high‑throughput, low‑latency data integration and analytics.

CDPData IntegrationFlink

0 likes · 42 min read

How a Real‑Time CDP Solves Data Silos: Architecture, Tech Choices & Lessons

Tencent IMWeb Frontend Team

Jan 24, 2022 · Fundamentals

Understanding Video Transmission: GOP, Frame Types, and DTS/PTS Explained

Video transmission relies on compressing frames into groups (GOP) composed of I, P, and B frames, with decoding and presentation timestamps (DTS/PTS) coordinating playback order, ensuring efficient storage and smooth streaming despite differing frame dependencies.

B-frameDTSGOP

0 likes · 6 min read

Understanding Video Transmission: GOP, Frame Types, and DTS/PTS Explained

JD Cloud Developers

Jan 13, 2022 · Fundamentals

How AVS2 Outperforms HEVC: Inside China’s Next‑Gen Video Codec

The article introduces China’s AVS2 video codec, detailing its standards, technical implementation, code extensions for FLV and HLS, bitstream structure, performance comparisons with HEVC and x265, and JD Cloud’s current support and future plans for commercial deployment.

AVS2HEVCStreaming

0 likes · 7 min read

How AVS2 Outperforms HEVC: Inside China’s Next‑Gen Video Codec

DataFunTalk

Jan 13, 2022 · Big Data

Advanced Features of the Pravega Flink Connector Table API: Schema Registry, Catalog Integration, and Debezium Support

This article summarizes the Pravega Schema Registry project, its integration with Flink's Catalog API, the addition of Debezium CDC support, and the related implementation challenges, providing detailed DDL examples, code snippets, and architectural diagrams for building real‑time data pipelines.

CDCCatalog APIDebezium

0 likes · 15 min read

Advanced Features of the Pravega Flink Connector Table API: Schema Registry, Catalog Integration, and Debezium Support

Tencent IMWeb Frontend Team

Jan 10, 2022 · Backend Development

Mastering Efficient File Downloads and Large-Scale Excel Export in Node.js

This article explains how to implement simple and streaming file downloads with Node.js and Koa, handle content disposition and progress display, support resumable downloads, and efficiently generate large Excel reports using ExcelJS with chunked queries and streaming to overcome memory bottlenecks.

ExcelJSFile DownloadKoa

0 likes · 12 min read

Mastering Efficient File Downloads and Large-Scale Excel Export in Node.js

dbaplus Community

Jan 5, 2022 · Big Data

How ByteDance Optimized Flink SQL for Real‑World Streaming at Scale

This article details ByteDance's practical experience with Apache Flink, covering SQL extensions, a visual SQL platform, performance tweaks such as window mini‑batching and custom windows, join and checkpoint recovery improvements, stream‑batch integration experiments, and future roadmap plans.

Batch IntegrationCheckpointFlink

0 likes · 16 min read

How ByteDance Optimized Flink SQL for Real‑World Streaming at Scale

DataFunTalk

Jan 1, 2022 · Big Data

JD's Flink Journey: Evolution, Optimizations, and Future Directions

This article details JD's adoption of Flink for real‑time computing, covering its evolution from Storm to Flink on Kubernetes, the platform architecture, major optimization techniques such as preview topology, backpressure handling, dynamic rebalance, checkpoint‑as‑savepoint, and outlines future plans including stream‑batch integration, stability improvements, intelligent operations, and AI integration.

Big DataFlinkJD

0 likes · 10 min read

JD's Flink Journey: Evolution, Optimizations, and Future Directions

Tencent Cloud Developer

Dec 28, 2021 · Industry Insights

How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses

This article analyzes the challenges of massive data query efficiency, explains how Flink's stream processing and ClickHouse's OLAP engine complement each other, and presents a layered real‑time data‑warehouse architecture with practical guidance on data ingestion, write strategies, quality assurance, and evolving batch‑stream integration patterns.

Big DataClickHouseFlink

0 likes · 19 min read

How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses

Su San Talks Tech

Dec 28, 2021 · Big Data

What Makes Kafka the Backbone of Real‑Time Big Data Processing?

This article provides a comprehensive overview of Apache Kafka, covering its distributed architecture, key advantages and drawbacks, the role of ZooKeeper, message delivery semantics, partitioning strategies, storage mechanisms, and performance optimizations such as zero‑copy and batch processing, all essential for high‑throughput real‑time data pipelines.

Big DataDistributed MessagingStreaming

0 likes · 23 min read

What Makes Kafka the Backbone of Real‑Time Big Data Processing?

Architects Research Society

Dec 17, 2021 · Big Data

Understanding Apache Kafka Transactions: Semantics, API Usage, and Performance Considerations

This article explains the design goals, exactly‑once semantics, Java transaction API, coordinator and log architecture, and practical performance trade‑offs of Apache Kafka's transactional messaging, helping developers build reliable stream‑processing applications.

Exactly-OnceJavaKafka

0 likes · 16 min read

Understanding Apache Kafka Transactions: Semantics, API Usage, and Performance Considerations

Youku Technology

Dec 10, 2021 · Mobile Development

Overview and Architecture Design of Youku Playback SDK Kernel with Performance Optimization

The Youku Playback SDK kernel provides a cross‑platform, high‑reliability framework that decouples data acquisition, decoding, and rendering into independent AVSource, AVDecoder, and AVRender modules, enabling efficient thread usage, configurable builds for partners, adaptive buffering, health monitoring, and comprehensive error handling for optimal playback performance.

DRMMedia KernelPerformance Optimization

0 likes · 10 min read

Overview and Architecture Design of Youku Playback SDK Kernel with Performance Optimization

Java Architect Essentials

Dec 7, 2021 · Big Data

Apache Kafka 3.0 Release Highlights and New Features

The article provides a comprehensive overview of Apache Kafka 3.0, detailing its core APIs, two main use‑cases, major feature additions, deprecations, KRaft consensus improvements, enhanced producer guarantees, and numerous KIP‑driven changes across the broker, client, Connect, Streams, and MirrorMaker components.

Apache KafkaEvent StreamingKIP

0 likes · 14 min read

Apache Kafka 3.0 Release Highlights and New Features

DataFunSummit

Dec 4, 2021 · Big Data

Building a Real-Time Data Warehouse with Flink: Hive Integration, Upsert‑Kafka, and CDC Connectors

This tutorial explains how to use Apache Flink 1.12 to construct a unified streaming‑batch data warehouse by integrating Hive via HiveCatalog and HiveDialect, performing read/write operations, configuring upsert‑Kafka sinks, and leveraging Flink CDC connectors for change data capture from MySQL and other sources.

CDCFlinkHive

0 likes · 46 min read

Building a Real-Time Data Warehouse with Flink: Hive Integration, Upsert‑Kafka, and CDC Connectors

Programmer DD

Nov 27, 2021 · Operations

How Netflix’s Open Connect CDN Powers Seamless Streaming Worldwide

Netflix’s Open Connect CDN, a proprietary content‑delivery network built over a decade, strategically places millions of server copies close to ISPs, uses multiple bitrate replicas, and dynamically shifts content to flash storage, ensuring high‑quality streaming even during peak demand and network outages.

CDNInfrastructureNetflix

0 likes · 12 min read

How Netflix’s Open Connect CDN Powers Seamless Streaming Worldwide

Tongcheng Travel Technology Center

Nov 19, 2021 · Big Data

Real‑Time Data Warehouse Practices with Apache Kudu: Architecture, Partitioning, and Platformization

This article reviews the challenges of building a real‑time data warehouse, compares Lambda and Kappa architectures, introduces Apache Kudu’s master‑tablet design, storage model and partition strategies, and shares practical experiences and future directions for a Kudu‑based streaming analytics platform.

Apache KuduBig DataKappa architecture

0 likes · 8 min read

Real‑Time Data Warehouse Practices with Apache Kudu: Architecture, Partitioning, and Platformization

Douyu Streaming

Nov 12, 2021 · Fundamentals

How FLV and RTP Interact in Douyu’s Low‑Latency WebRTC Streaming

This article explains the end‑to‑end workflow of Douyu’s fast live streaming system, detailing how FLV tags are converted to RTP packets and back, covering WebRTC’s SDP/ICE/DTLS handshake, FLV and RTP header structures, payload formats for audio (OPUS) and video (H.264), and the server‑side processing pipeline.

FLVRTPStreaming

0 likes · 19 min read

How FLV and RTP Interact in Douyu’s Low‑Latency WebRTC Streaming

The Dominant Programmer

Nov 8, 2021 · Backend Development

How to Build an Nginx RTMP Server on Windows and Stream Local Video with FFmpeg

This guide walks through the fundamentals of the RTMP protocol, introduces FFmpeg, shows how to download and configure Nginx‑RTMP on Windows, create the necessary directories, start the server, craft a batch script to push a local video stream, and verify playback with VLC.

NginxRTMPStreaming

0 likes · 6 min read

How to Build an Nginx RTMP Server on Windows and Stream Local Video with FFmpeg

Big Data Technology & Architecture

Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration

0 likes · 29 min read

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

Big Data Technology & Architecture

Nov 4, 2021 · Big Data

Understanding Flink State, Checkpoints, Savepoints, and Fault Tolerance

This article explains Flink's state concepts, the distinction between keyed and operator state, available state backends, TTL configuration, the mechanics of checkpoints and savepoints, and the two‑phase commit protocol for ensuring exactly‑once processing in streaming applications.

CheckpointsFlinkSavepoints

0 likes · 21 min read

Understanding Flink State, Checkpoints, Savepoints, and Fault Tolerance