Tagged articles

560 articles

Page 5 of 6

Sep 29, 2020 · Big Data

Implementing Real-Time TopN Rankings with Apache Flink

This article demonstrates how to develop a real-time TopN ranking feature in Apache Flink, covering stream setup, word count aggregation, global and grouped TopN calculations, and nested TopN strategies to mitigate hotspot issues, complete with Java code examples.

Big DataFlinkReal-Time

0 likes · 8 min read

Implementing Real-Time TopN Rankings with Apache Flink

Big Data Technology & Architecture

Sep 19, 2020 · Big Data

Understanding Flink Timer Mechanism and Its Internal Implementation

This article explains how Flink's Timer mechanism works, covering its usage in KeyedProcessFunction, the underlying TimerService and InternalTimerService implementations, the role of triggers, and the detailed code paths for processing‑time and event‑time timers, while highlighting performance considerations.

FlinkInternalTimerServiceKeyedProcessFunction

0 likes · 16 min read

Understanding Flink Timer Mechanism and Its Internal Implementation

DataFunTalk

Sep 6, 2020 · Big Data

OPPO's Real-Time Data Warehouse Architecture and Practices Based on Apache Flink

OPPO's data platform engineer Zhang Jun shares the design and implementation of OPPO's real‑time data warehouse built on Apache Flink, covering background, top‑level architecture, practical deployment, and future directions such as enhanced SQL development, resource scheduling, and automated configuration.

Data PlatformFlinkStreaming

0 likes · 15 min read

OPPO's Real-Time Data Warehouse Architecture and Practices Based on Apache Flink

58 Tech

Aug 24, 2020 · Big Data

Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control

This article presents the concepts, architecture, and practical techniques of an online real‑time feature system used in intelligent risk‑control, covering feature definition, time‑window types, calculation functions, distributed processing, low‑latency storage, and operational challenges in high‑concurrency environments.

Big DataReal-time ProcessingStreaming

0 likes · 16 min read

Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control

Big Data Technology & Architecture

Aug 23, 2020 · Big Data

Integrating Flink 1.11 with Hive Streaming, Kafka, and Table API

This article demonstrates how to use Flink 1.11's enhanced Hive integration to stream data from a Kafka source, write it into partitioned Hive tables with checkpoint‑driven commits, and read Hive tables as a continuous source using dynamic table options and table hints.

Big DataFlinkKafka

0 likes · 13 min read

Integrating Flink 1.11 with Hive Streaming, Kafka, and Table API

Top Architect

Aug 11, 2020 · Big Data

Kafka Basics and Cluster Architecture Overview

This article provides a comprehensive introduction to Kafka, covering its role as a messaging system, core concepts such as topics, partitions, producers, consumers, and messages, and then delves into the cluster architecture including replicas, consumer groups, controller coordination with Zookeeper, performance optimizations, log segmentation, and network design.

Cluster ArchitectureKafkaMessage Queue

0 likes · 11 min read

Kafka Basics and Cluster Architecture Overview

Architects Research Society

Aug 4, 2020 · Backend Development

Apache Kafka vs RabbitMQ: Architecture, Pull vs Push, Performance, and Best Use Cases

This article compares Apache Kafka and RabbitMQ, detailing their architectural differences, message handling models (pull vs push), performance characteristics, and ideal use cases, helping readers choose the appropriate messaging system for streaming, high‑throughput, or legacy protocol scenarios.

KafkaMessagingPerformance

0 likes · 10 min read

Apache Kafka vs RabbitMQ: Architecture, Pull vs Push, Performance, and Best Use Cases

Big Data Technology & Architecture

Aug 4, 2020 · Big Data

Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)

This article explains how to use Spark Streaming's Direct Approach with Kafka, manually manage offsets, and provides complete Java and Scala implementations—including a JavaKafkaManager class, a demo application, and a Scala KafkaManager—illustrating the creation of DirectKafkaInputDStream, offset handling, and integration with Spark.

KafkaOffset ManagementScala

0 likes · 14 min read

Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)

DataFunTalk

Aug 2, 2020 · Big Data

Building Real-Time Data Warehouses with Apache Flink: Goals, Architecture, and Best Practices

This article presents a comprehensive guide to constructing real-time data warehouses using Apache Flink, covering the motivations, design principles, application scenarios, layer-by-layer architecture, metadata and lineage management, quality assurance, and the supporting toolchain for reliable streaming analytics.

Data ArchitectureETLFlink

0 likes · 24 min read

Building Real-Time Data Warehouses with Apache Flink: Goals, Architecture, and Best Practices

Top Architect

Aug 1, 2020 · Cloud Computing

Design Analysis of Netflix’s Cloud‑Based Microservices Architecture

This article examines Netflix’s cloud‑based microservices architecture, detailing its client, backend, CDN components, design goals such as high availability, low latency, scalability, and the trade‑offs, resilience mechanisms, and scalability strategies employed on AWS to support millions of global streaming users.

AWSMicroservicesNetflix

0 likes · 22 min read

Design Analysis of Netflix’s Cloud‑Based Microservices Architecture

Top Architect

Jul 30, 2020 · Backend Development

RabbitMQ vs Apache Kafka: Architectural Differences, Pros & Cons, and How to Choose

This article compares RabbitMQ and Apache Kafka, explaining their internal mechanisms, key differences in ordering, routing, timing, retention, fault tolerance, scalability, and consumer complexity, and provides guidance on when to choose each technology for modern micro‑service architectures.

ArchitectureComparisonKafka

0 likes · 23 min read

RabbitMQ vs Apache Kafka: Architectural Differences, Pros & Cons, and How to Choose

Alibaba Cloud Developer

Jul 22, 2020 · Big Data

Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More

This article surveys the rapidly evolving big data landscape by reviewing a wide range of Apache projects—including Hadoop, Spark, Flink, HBase, Kudu, Impala, Kafka, and others—detailing their core components, architectures, strengths, and typical use‑cases for building distributed data platforms.

ApacheBig DataDistributed Systems

0 likes · 20 min read

Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More

Big Data Technology & Architecture

Jul 18, 2020 · Big Data

Common Spark SQL, Spark Core, PySpark, and Streaming Issues and Their Solutions

This article compiles frequent Spark SQL, Spark Core, PySpark, and Streaming problems—such as filesystem errors, configuration pitfalls, memory limits, shuffle failures, and version incompatibilities—along with concise explanations of their causes and step‑by‑step remediation methods for big‑data environments.

Big DataPySparkSpark

0 likes · 14 min read

Common Spark SQL, Spark Core, PySpark, and Streaming Issues and Their Solutions

Architects Research Society

Jul 16, 2020 · Big Data

Differences Between MQTT and Kafka: Protocol Design, Use Cases, and Integration

The article explains how MQTT, a lightweight IoT messaging protocol, and Kafka, a distributed streaming platform, differ in architecture, purpose, and design goals despite both using publish/subscribe, and discusses their complementary integration via bridges such as EMQ X.

IoTKafkaMQTT

0 likes · 5 min read

Differences Between MQTT and Kafka: Protocol Design, Use Cases, and Integration

Architects Research Society

Jul 15, 2020 · Big Data

Introduction to Apache Kafka: A Distributed Streaming Platform

This article provides a comprehensive overview of Apache Kafka, explaining its distributed, fault‑tolerant architecture, horizontal scalability, disk‑based commit log, replication mechanisms, Streams API, KSQL, and why it is widely adopted as the backbone of event‑driven, high‑throughput systems.

Distributed SystemsKafkaMessage Queue

0 likes · 15 min read

Introduction to Apache Kafka: A Distributed Streaming Platform

dbaplus Community

Jul 7, 2020 · Big Data

How Flink + ClickHouse Power Real‑Time Analytics at Scale

This article explains how FunTouTiao builds a high‑performance real‑time analytics pipeline using Flink, Hive, and ClickHouse, covering business scenarios, hour‑level and second‑level Flink‑to‑Hive architectures, streaming file sink mechanics, multi‑user permissions, ClickHouse performance tricks, and future roadmap for unified stream‑batch storage.

Big DataClickHouseFlink

0 likes · 18 min read

How Flink + ClickHouse Power Real‑Time Analytics at Scale

DataFunTalk

Jun 30, 2020 · Big Data

Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team

This article details Shopee Singapore Data Team’s implementation of a Flink‑based real‑time data warehouse, covering background challenges, layered architecture integrating Kafka, HBase, Druid, Hive, streaming pipelines, job management, monitoring, and future plans to expand Flink SQL support.

FlinkReal-TimeShopee

0 likes · 15 min read

Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team

Big Data Technology Architecture

Jun 29, 2020 · Big Data

Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink

This article summarizes the objectives, design principles, application scenarios, layer‑by‑layer construction methods, quality assurance mechanisms, and supporting tools for building a real‑time data warehouse using Apache Flink, providing practical guidance for data engineers and architects.

Apache FlinkData QualityFlink

0 likes · 24 min read

Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink

Architecture Digest

Jun 24, 2020 · Big Data

Preventing Message Loss and Achieving Exactly‑Once Semantics in Kafka

This article explains common scenarios where Kafka messages can be lost on the producer, consumer, or broker side, and provides practical configurations—including callbacks, acks, retries, manual offset commits, idempotent and transactional producers—to ensure reliable delivery and exactly‑once processing.

Exactly-OnceIdempotenceMessage Loss

0 likes · 10 min read

Preventing Message Loss and Achieving Exactly‑Once Semantics in Kafka

Full-Stack Internet Architecture

Jun 23, 2020 · Backend Development

Common Kafka Interview Questions and Answers

This article reviews common Kafka interview questions, covering delay queues, idempotence, replica states, offsets, message ordering, and handling duplicate consumption, and includes example code for enabling idempotent producers along with explanations of time‑wheel mechanisms and practical solutions to consumer rebalance issues.

ConsumerIdempotenceKafka

0 likes · 9 min read

Common Kafka Interview Questions and Answers

DataFunTalk

Jun 18, 2020 · Big Data

Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices

QuTouTiao leverages Flink and ClickHouse to build a high‑performance real‑time analytics platform that supports hourly Hive pipelines and sub‑second ClickHouse queries, achieving sub‑second response for 80% of requests through streaming ingestion, exactly‑once semantics, multi‑cluster coordination, and optimized ClickHouse storage and connector designs.

Big DataClickHouseFlink

0 likes · 16 min read

Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices

Big Data Technology & Architecture

Jun 13, 2020 · Big Data

Hot Goods Top‑N Calculation with Flink Event‑Time Sliding Windows

This article explains how to compute the top‑N hot products or brands within a time window using Apache Flink, covering data modeling, event‑time handling, sliding windows, custom aggregation functions, and result sorting with complete Java code examples.

EventTimeFlinkStreaming

0 likes · 14 min read

Hot Goods Top‑N Calculation with Flink Event‑Time Sliding Windows

Beike Product & Technology

Jun 12, 2020 · Big Data

Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform

This article describes the evolution of a real‑time computing platform from SQL 1.0 built on Spark Structured Streaming to SQL 2.0 powered by Flink‑SQL, covering dynamic tables, continuous queries, dimension‑table joins, cache optimization, DDL extensions, platformization, operational challenges and future roadmap.

Big DataDimension TableFlink

0 likes · 19 min read

Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform

Big Data Technology & Architecture

Jun 9, 2020 · Big Data

Comprehensive Overview and Best Practices for Apache Spark Streaming

This article provides a detailed introduction to Spark Streaming, covering its architecture, DStream concepts, initialization, data sources, transformations, windowed aggregations, output operations, checkpointing, fault‑tolerance semantics, deployment, performance tuning, and monitoring for building reliable high‑throughput streaming applications.

Big DataDstreamScala

0 likes · 17 min read

Comprehensive Overview and Best Practices for Apache Spark Streaming

dbaplus Community

Jun 2, 2020 · Big Data

How Cainiao Built a Scalable Real‑Time Data Warehouse with Flink

Facing growing order volumes and strict timeliness demands, Cainiao’s tech team overhauled its real‑time data warehouse by redesigning data models, adopting Flink for streaming computation, upgrading data services, and exploring innovative tools, sharing practical lessons and future directions for large‑scale logistics analytics.

Big DataFlinkLogistics

0 likes · 18 min read

How Cainiao Built a Scalable Real‑Time Data Warehouse with Flink

ITPUB

May 28, 2020 · Databases

How UPSQL Proxy Implements MySQL Streaming to Boost Performance

This article explains the MySQL communication protocol, result‑set structure, client library interfaces, and the difference between store‑result and streaming modes, then details how UPSQL Proxy 2.4.0 adopts streaming to reduce latency and memory usage in distributed database environments.

Database MiddlewareMySQLResultSet

0 likes · 6 min read

How UPSQL Proxy Implements MySQL Streaming to Boost Performance

macrozheng

May 21, 2020 · Big Data

Mastering Kafka: Core Concepts, Architecture, and Reliability Guarantees

This comprehensive guide covers Kafka's definition, publish/subscribe model, key components, storage mechanisms, producer and consumer strategies, and reliability features such as ACK levels, ISR, and exactly‑once semantics, providing a solid foundation for real‑time big‑data processing.

Big DataDistributed SystemsKafka

0 likes · 16 min read

Mastering Kafka: Core Concepts, Architecture, and Reliability Guarantees

DataFunTalk

May 14, 2020 · Big Data

Building a Real-Time Data Warehouse at Cainiao: Architecture, Model Upgrades, Engine Enhancements, and Service Innovations

This article shares Cainiao's practical experience in constructing a real-time data warehouse, covering the shortcomings of the previous architecture, the evolution of data models, the migration to Flink with advanced features like retraction and timer services, and the modernization of data services and tooling to support high‑throughput logistics scenarios.

Big DataData ServiceFlink

0 likes · 16 min read

Building a Real-Time Data Warehouse at Cainiao: Architecture, Model Upgrades, Engine Enhancements, and Service Innovations

Big Data Technology & Architecture

May 14, 2020 · Big Data

Understanding Flink 1.10 TaskManager Memory Model and Configuration Parameters

This article explains the new unified TaskManager memory model introduced in Flink 1.10, detailing each memory component, its configuration parameters, how they map to JVM settings, and practical guidance for both standalone and containerized deployments, including a concrete YARN example.

BatchBig DataFlink

0 likes · 10 min read

Understanding Flink 1.10 TaskManager Memory Model and Configuration Parameters

DataFunTalk

May 11, 2020 · Big Data

Designing a Real-Time Data System with Flink: Architecture, Data Modeling, and UV Metric Computation

This article outlines a comprehensive real‑time data system built on Apache Flink, covering its application scenarios, layered architecture, data model stratification, construction methods, and a concrete Flink SQL example for calculating UV metrics from Kafka‑sourced page‑view data.

Data ArchitectureFlinkKafka

0 likes · 24 min read

Designing a Real-Time Data System with Flink: Architecture, Data Modeling, and UV Metric Computation

Big Data Technology & Architecture

Apr 28, 2020 · Big Data

Big Data Practice Exercises: Spark, Kafka, and MySQL Integration with Scala and Java

This article presents a series of hands‑on big‑data exercises, including Spark Scala data analysis, Kafka topic creation and custom partitioning, and MySQL table design with Scala‑based streaming calculations, providing complete source code and step‑by‑step solutions for each task.

Big DataKafkaMySQL

0 likes · 25 min read

Big Data Practice Exercises: Spark, Kafka, and MySQL Integration with Scala and Java

DataFunTalk

Apr 15, 2020 · Big Data

Apache Flink OLAP Engine: Architecture, Optimizations, and Use Cases

This article presents an in‑depth overview of Apache Flink's new OLAP engine, covering OLAP fundamentals, the three OLAP models, Flink's unified streaming‑batch‑OLAP architecture, performance optimizations, benchmark results, and future development directions.

Apache FlinkBig DataOLAP

0 likes · 11 min read

Apache Flink OLAP Engine: Architecture, Optimizations, and Use Cases

Qunar Tech Salon

Apr 8, 2020 · Backend Development

RabbitMQ vs Kafka: A Technical Comparison of Messaging Systems

This article explains the fundamental differences between RabbitMQ and Apache Kafka, covering asynchronous messaging patterns, the internal architectures of both systems, their respective strengths and weaknesses, and guidance on choosing the appropriate solution for various scenarios.

BackendMessagingStreaming

0 likes · 10 min read

RabbitMQ vs Kafka: A Technical Comparison of Messaging Systems

DataFunTalk

Mar 28, 2020 · Big Data

Applying Flink State Management for Real-Time Recommendation Scenarios

This article explains how Apache Flink's flexible state management can be leveraged to solve data correlation challenges in real‑time recommendation platforms, compares Flink with Spark and Storm, describes the underlying broadcast and managed state mechanisms, and provides a step‑by‑step implementation using Kafka, Druid, and custom broadcast functions.

Big DataFlinkReal-Time

0 likes · 14 min read

Applying Flink State Management for Real-Time Recommendation Scenarios

DataFunTalk

Mar 6, 2020 · Artificial Intelligence

Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration

This article reviews recent progress in Apache Flink's AI ecosystem, explaining how Flink unifies batch and stream processing for machine‑learning pipelines, introduces the Flink ML Pipeline and Alink library, describes the AI Flow framework for end‑to‑end ML workflows, and presents a novel mini‑batch streaming iteration mechanism to support both offline and online learning scenarios.

AI FlowApache FlinkMini-batch Iteration

0 likes · 13 min read

Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration

Big Data Technology Architecture

Feb 26, 2020 · Big Data

Comprehensive Guide to Kafka Architecture, Messaging Mechanisms, Replication, Controllers, and Consumer Rebalance

This article provides an in‑depth yet approachable overview of Kafka's core concepts—including its architecture, terminology, message‑sending pipeline, replication strategy, controller role, and consumer group rebalance mechanisms—helping readers quickly grasp how Kafka works as a high‑throughput distributed messaging and streaming platform.

Consumer RebalanceDistributed MessagingKafka

0 likes · 21 min read

Comprehensive Guide to Kafka Architecture, Messaging Mechanisms, Replication, Controllers, and Consumer Rebalance

Big Data Technology & Architecture

Feb 22, 2020 · Big Data

Understanding Flink's Asynchronous Barrier Snapshot (ABS) Algorithm for Checkpointing

This article explains how Apache Flink implements fault‑tolerant checkpointing using the Asynchronous Barrier Snapshot (ABS) algorithm, a localized version of the Chandy‑Lamport distributed snapshot, covering barriers, snapshot alignment, exactly‑once versus at‑least‑once semantics, and handling of cyclic dataflow graphs.

Asynchronous Barrier SnapshotDistributed SystemsFlink

0 likes · 9 min read

Understanding Flink's Asynchronous Barrier Snapshot (ABS) Algorithm for Checkpointing

Tencent Cloud Developer

Feb 18, 2020 · Backend Development

Technical Overview of Tencent Cloud CKafka for High-Scale Online Classroom Messaging

Tencent Cloud CKafka powers Tencent Classroom’s pandemic‑era online teaching by replacing a custom queue with a high‑performance, highly available, partition‑based message bus that scales to millions of real‑time interactions, offers configurable replication and tuning for reliability, and integrates with big‑data and streaming tools for analytics.

CKafkaKafkaMessage Queue

0 likes · 15 min read

Technical Overview of Tencent Cloud CKafka for High-Scale Online Classroom Messaging

Big Data Technology Architecture

Feb 13, 2020 · Big Data

Evolution of Cainiao's Real-Time Data Warehouse Architecture: Model, Compute Engine, and Data Service Upgrades

The talk details Cainiao’s evolution of its real‑time data warehouse architecture, covering the original 2016 model, compute and service challenges, the 2017 multi‑layer data model redesign, migration to Flink, practical cases of state retraction, timeout statistics, smart optimizations, and the unified data service platform.

Data ServiceFlinkStreaming

0 likes · 16 min read

Evolution of Cainiao's Real-Time Data Warehouse Architecture: Model, Compute Engine, and Data Service Upgrades

Big Data Technology Architecture

Feb 11, 2020 · Big Data

Building Bilibili's Real-Time Streaming Platform with Apache Flink and AI

The presentation by Bilibili's real‑time platform lead details the design and implementation of a Flink‑based streaming data platform, explains how AI workloads are integrated, shares architectural decisions and operational insights, and provides the full slide deck for knowledge dissemination.

AI integrationApache FlinkBilibili

0 likes · 2 min read

Building Bilibili's Real-Time Streaming Platform with Apache Flink and AI

Big Data Technology Architecture

Feb 8, 2020 · Big Data

Meituan-Dianping Real-Time Data Warehouse Platform Built on Apache Flink: Architecture, Practices, and Future Directions

Meituan-Dianping’s senior technical expert shares the evolution, architecture, and implementation of their Apache Flink‑based real‑time data warehouse platform, covering platform evolution, layered design, job and resource management, business warehouse use cases, and future development considerations.

FlinkMeituan-DianpingStreaming

0 likes · 16 min read

Meituan-Dianping Real-Time Data Warehouse Platform Built on Apache Flink: Architecture, Practices, and Future Directions

dbaplus Community

Jan 14, 2020 · Big Data

How OPPO Built a Real‑Time Data Warehouse with Flink SQL

This article details{32-64 words} OPPO's evolution from an offline data warehouse to a real‑time platform, describing the business scale, data‑mid platform architecture, migration strategy using Flink SQL, extensions like AthenaX, and practical use cases such as real‑time ETL, CTR calculation, and tag import.

ETLFlinkStreaming

0 likes · 18 min read

How OPPO Built a Real‑Time Data Warehouse with Flink SQL

Big Data Technology & Architecture

Jan 10, 2020 · Big Data

Async I/O for Dimension Table Joins in Apache Flink

This article explains how to handle dimension table joins in Apache Flink streaming by leveraging Async I/O to perform non‑blocking external lookups, provides detailed code examples for both synchronous and asynchronous functions, discusses configuration parameters, and outlines best practices and pitfalls.

Big DataDimension Table JoinFlink

0 likes · 16 min read

Async I/O for Dimension Table Joins in Apache Flink

Tencent Cloud Developer

Jan 10, 2020 · Cloud Computing

Tencent Classroom Cloud VOD HLS Playback Architecture and Optimization

The article outlines Tencent Classroom’s cloud VOD solution, detailing HLS streaming fundamentals, a Mongoose‑based local HTTP proxy with LFU caching and pre‑loading, performance optimizations for latency, buffering, security, and playback reliability, and common transcoding pitfalls with practical fixes, highlighting cloud migration benefits.

Cache OptimizationStreamingTencent Cloud

0 likes · 13 min read

Tencent Classroom Cloud VOD HLS Playback Architecture and Optimization

Big Data Technology & Architecture

Jan 2, 2020 · Big Data

Structured Streaming: Design, Challenges, Programming Model, and Performance Evaluation

This article provides a comprehensive overview of Apache Spark Structured Streaming, describing its declarative API, the challenges of stream processing, the programming model with code examples, query planning, execution modes, production use cases, and performance benchmarks compared with other streaming systems.

Big DataSparkStreaming

0 likes · 42 min read

Structured Streaming: Design, Challenges, Programming Model, and Performance Evaluation

360 Quality & Efficiency

Dec 31, 2019 · Backend Development

Understanding the LocalServer Video Caching Mechanism

The article explains how LocalServer pre‑caches video data to improve start‑up speed and playback smoothness, detailing when and how much data is cached, storage policies, handling of network interruptions, and practical testing points for developers.

LocalServerStreamingnetwork optimization

0 likes · 6 min read

Understanding the LocalServer Video Caching Mechanism

Cloud Native Technology Community

Dec 30, 2019 · Big Data

Kafka 2.4.0 Release Summary: New Features, Improvements, and Bug Fixes

The article provides a comprehensive overview of Apache Kafka 2.4.0, detailing its major new capabilities such as consumer replica fetching, progressive cooperative rebalancing, MirrorMaker 2.0, new Java authentication APIs, and extensive bug fixes, along with upgrade considerations and related resources.

Apache KafkaBig DataRelease Notes

0 likes · 26 min read

Kafka 2.4.0 Release Summary: New Features, Improvements, and Bug Fixes

Big Data Technology & Architecture

Dec 22, 2019 · Big Data

Dynamic Resource Allocation in Spark Streaming: Problems, Mechanisms, and Practical Guidelines

The article explains Spark's default static resource allocation, analyzes the limitations of its Dynamic Resource Allocation (DRA) for streaming workloads, describes the internal Spark components and code paths involved, and proposes concrete design and configuration recommendations for implementing more responsive executor scaling.

Big DataDynamic Resource AllocationExecutor Management

0 likes · 11 min read

Dynamic Resource Allocation in Spark Streaming: Problems, Mechanisms, and Practical Guidelines

Big Data Technology & Architecture

Dec 19, 2019 · Big Data

Apache Kafka 2.4.0 Release: New Features and Improvements

Apache Kafka 2.4.0 introduces a range of new capabilities—including consumer replica fetching, incremental cooperative rebalancing, MirrorMaker 2.0, a new Java authorization API, KTable non‑key joins, administrative replica reassignment, protected REST endpoints, and offset deletion—along with numerous performance and stability improvements.

Apache KafkaBig DataDistributed Systems

0 likes · 3 min read

Apache Kafka 2.4.0 Release: New Features and Improvements

Big Data Technology & Architecture

Dec 10, 2019 · Big Data

Implementing Real-Time TopN Rankings with Apache Flink

This article explains how to develop a real‑time TopN ranking feature using Apache Flink, covering both global and grouped TopN implementations, nested TopN strategies, and provides complete Java code snippets for environment setup, word counting, windowing, and custom TopN functions.

FlinkReal-TimeStreaming

0 likes · 8 min read

Big Data Technology & Architecture

Dec 9, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

This article explains how to develop a real‑time ETL application using Apache Flink that reads events from Kafka, partitions them by event time into HDFS directories, and achieves exactly‑once processing through checkpointing, custom bucket assigners, and proper state backend configuration.

Apache FlinkBig DataExactly-Once

0 likes · 11 min read

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

Alibaba Cloud Developer

Dec 5, 2019 · Artificial Intelligence

How Alibaba’s Alink Empowers Real‑Time Machine Learning on Flink

Alink, Alibaba’s open‑source machine‑learning platform built on Apache Flink, offers a rich library of batch and streaming algorithms, a Python API, iterative computation optimizations, and real‑world case studies, positioning it as a powerful AI solution for large‑scale, low‑latency data processing.

AIAlinkFlink

0 likes · 13 min read

How Alibaba’s Alink Empowers Real‑Time Machine Learning on Flink

Big Data Technology & Architecture

Dec 2, 2019 · Big Data

Implementing Custom Flink Sources and Sinks for RocketMQ and HBase Streaming

This article explains how to create custom Flink SourceFunction and SinkFunction implementations, demonstrates a RocketMQ source and an HBase sink with full code examples, and discusses checkpointing, event‑time handling, and deployment of the streaming job on a Flink‑on‑YARN cluster.

Big DataFlinkHBase

0 likes · 16 min read

Implementing Custom Flink Sources and Sinks for RocketMQ and HBase Streaming

Big Data Technology & Architecture

Nov 26, 2019 · Big Data

Understanding Flink SQL Window Functions: Types, Implementation, and Emit Triggers

This article provides a comprehensive overview of Flink SQL window functions, detailing time‑based window types, their underlying implementation in the StreamExecGroupWindowAggregate operator, the processing flow of WindowOperator, timer handling, emit/trigger strategies, and practical code examples for Tumble, Hop, and Session windows.

Big DataEmitFlink

0 likes · 20 min read

Understanding Flink SQL Window Functions: Types, Implementation, and Emit Triggers

Big Data Technology & Architecture

Nov 25, 2019 · Big Data

Lightweight Dimension Table Join in Flink Using a Scheduled Cache

The article demonstrates how to enrich Flink streaming ETL jobs with slowly changing dimension data by periodically loading MySQL tables into an in‑memory cache and performing a simple map‑side join within a custom RichMapFunction implementation.

CacheDimension joinETL

0 likes · 5 min read

Lightweight Dimension Table Join in Flink Using a Scheduled Cache

Architecture Digest

Nov 25, 2019 · Big Data

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

This article provides a comprehensive overview of Apache Kafka, covering its fundamental capabilities, typical use cases, core components, key APIs, and essential concepts such as topics, partitions, segments, brokers, producers, and consumers, illustrated with diagrams.

APIsBig DataDistributed Systems

0 likes · 8 min read

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

G7 EasyFlow Tech Circle

Nov 21, 2019 · Big Data

How G7 Combines AI, Big Data, and IoT to Transform Logistics

This article presents a detailed overview of G7's AI‑plus‑Big‑Data‑plus‑IoT platform for logistics, describing its neutral open architecture, real‑time data pipelines using Kafka and Flink, Lambda‑style storage in HBase/Hive, and the resulting safety‑insurance and analytics capabilities.

AIFlinkIoT

0 likes · 10 min read

How G7 Combines AI, Big Data, and IoT to Transform Logistics

Architects Research Society

Nov 19, 2019 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the design goals, exactly‑once semantics, Java transaction API, internal components such as the transaction coordinator and log, data‑flow interactions, performance considerations, and practical tips for using Apache Kafka transactions in stream‑processing applications.

Distributed SystemsExactly-OnceKafka

0 likes · 15 min read

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

Hulu Beijing

Nov 15, 2019 · Artificial Intelligence

How Content-Based Video Relevance Prediction Advances Personalized Streaming

The CBVRP (Content-Based Video Relevance Prediction) challenge, co‑hosted by Hulu and ACM MM 2019, showcased the shift from user‑based collaborative filtering to content‑driven recommendation, highlighted winning teams and their papers, and underscored the ongoing research importance of cold‑start video recommendation for streaming platforms.

MultimediaStreamingcold start

0 likes · 15 min read

How Content-Based Video Relevance Prediction Advances Personalized Streaming

Big Data Technology & Architecture

Nov 11, 2019 · Big Data

Connecting Apache Kafka with Flink 1.9 – Overview, Compatibility, and Code Samples

This article explains how to use Flink 1.9's built‑in Kafka connector, covering supported versions, Maven dependencies, consumer and producer configuration in Java and Scala, checkpointing, offset handling, partition discovery, timestamps, watermarks, and provides a complete runnable example.

ConnectorFlinkKafka

0 likes · 12 min read

Connecting Apache Kafka with Flink 1.9 – Overview, Compatibility, and Code Samples

Big Data Technology & Architecture

Nov 9, 2019 · Big Data

Comparative Study of Apache Flink and Spark Streaming at Xiaomi: Architecture, Performance, and Serialization

This article examines Xiaomi's migration from Spark Streaming to Apache Flink, comparing scheduling strategies, mini‑batch versus true streaming, resource utilization, latency, and serialization mechanisms, and concludes with practical insights and custom optimization techniques for large‑scale data processing.

Big DataFlinkMini-Batch

0 likes · 17 min read

Comparative Study of Apache Flink and Spark Streaming at Xiaomi: Architecture, Performance, and Serialization

Big Data Technology & Architecture

Nov 7, 2019 · Big Data

Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings

This article demonstrates how to build a one‑second‑refresh real‑time dashboard for e‑commerce order data using Apache Flink, Kafka, and Redis, covering JSON message parsing, processing‑time windows, stateful aggregation for site‑level KPIs, and efficient top‑N product ranking via Redis sorted sets.

DashboardFlinkKafka

0 likes · 11 min read

Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings

DataFunTalk

Nov 7, 2019 · Big Data

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

This article details Beike's real‑time computing engine, covering its background, streaming platform built on Spark Streaming and Flink, data ingestion via Kafka, metadata handling, SQL‑based task development, monitoring, storage solutions, and future roadmap for resource management and AI‑enhanced monitoring.

Big DataFlinkKafka

0 likes · 14 min read

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

FunTester

Oct 24, 2019 · Backend Development

Why qrpc Beats gRPC: A Lightweight, High‑Performance RPC Framework

qrpc is a lightweight, high‑performance RPC framework that adopts gRPC's streaming and bidirectional concepts without HTTP/2, offering a smaller binary, lower memory usage, up to three‑fold throughput gains, and flexible modes such as blocking, non‑blocking, streaming, push, and bidirectional calls, all demonstrated with Go code examples and real‑world use cases.

GoMicroservicesNetworking

0 likes · 12 min read

Why qrpc Beats gRPC: A Lightweight, High‑Performance RPC Framework

Architects Research Society

Oct 13, 2019 · Databases

What is Debezium? Overview, Architecture, and Features

Debezium is an open‑source distributed platform built on Apache Kafka that turns existing databases into real‑time event streams by capturing row‑level changes via change data capture, offering source and embedded connectors, flexible topic routing, and features such as snapshots, filtering, masking, and monitoring.

CDCChange Data CaptureDebezium

0 likes · 7 min read

What is Debezium? Overview, Architecture, and Features

Big Data Technology & Architecture

Sep 28, 2019 · Big Data

Two-Phase Commit (2PC) in Flink: Mechanism, Implementation, and Kafka Integration

This article explains the fundamentals of the two‑phase commit protocol, details its two stages (prepare and commit), discusses its advantages and drawbacks, and shows how Apache Flink implements 2PC for exactly‑once semantics with Kafka using the TwoPhaseCommitSinkFunction and related code examples.

Distributed SystemsFlinkKafka

0 likes · 9 min read

Two-Phase Commit (2PC) in Flink: Mechanism, Implementation, and Kafka Integration

Big Data Technology & Architecture

Sep 26, 2019 · Big Data

Comparing Apache Pulsar and Kafka: Messaging Models, Subscriptions, Acknowledgment, and Retention

This article compares Apache Pulsar and Kafka, explaining their messaging models, queue versus stream use cases, subscription types, acknowledgment mechanisms, and message retention/TTL features to help readers choose a high‑performance, highly available streaming platform.

Apache PulsarKafkaMessage Acknowledgment

0 likes · 10 min read

Comparing Apache Pulsar and Kafka: Messaging Models, Subscriptions, Acknowledgment, and Retention

Big Data Technology & Architecture

Sep 15, 2019 · Big Data

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

This article presents a comprehensive collection of Flink interview questions covering fundamental concepts, advanced topics, and source‑code details to help candidates prepare effectively for Flink‑related technical interviews.

Apache FlinkBig DataFlink

0 likes · 6 min read

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

Big Data Technology & Architecture

Sep 11, 2019 · Big Data

Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan

This article reviews the evolution and key components of big data platforms at leading Chinese internet companies—Taobao, Didi, and Meituan—detailing their data sources, synchronization tools, storage layers, processing engines, and scheduling systems to provide practical guidance for building robust big data infrastructures.

ArchitectureBig DataData Platform

0 likes · 9 min read

Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan

DataFunTalk

Sep 5, 2019 · Big Data

Apache Beam Architecture Principles and Practical Application

This article introduces Apache Beam as a unified programming model for batch and streaming data processing, explains its architecture, core components, advantages, extensibility, and demonstrates practical usage with KafkaIO, BeamSQL, and AIoT scenarios across multiple runners.

Apache BeamKafkaStreaming

0 likes · 16 min read

Apache Beam Architecture Principles and Practical Application

Alibaba Cloud Developer

Sep 5, 2019 · Artificial Intelligence

How AI and Big Data Powered the Hit Series ‘The Longest Day in Chang’an’

The article explains how Alibaba/Youku leveraged AI recommendation, big‑data analytics, HDR upscaling, interactive multi‑stream playback, smart bitrate selection, and a robust anti‑leech system to turn the drama “The Longest Day in Chang’an” into a data‑driven blockbuster, detailing the underlying technologies and their impact on production, distribution, and viewer experience.

AIHDRMedia Platform

0 likes · 8 min read

How AI and Big Data Powered the Hit Series ‘The Longest Day in Chang’an’

Alibaba Cloud Developer

Sep 5, 2019 · Artificial Intelligence

How AI and Big Data Powered the Hit Series “The Longest Day in Chang’an”

The article explores how Alibaba/Youku leveraged AI recommendation, big‑data analytics, interactive streaming, HDR reconstruction, smart bitrate selection, anti‑piracy defenses, and a high‑performance media‑asset platform to turn the drama “The Longest Day in Chang’an” into a blockbuster, detailing the underlying technologies and their impact on production and viewer experience.

AIAnti-PiracyHDR

0 likes · 7 min read

Tongcheng Travel Technology Center

Sep 3, 2019 · Big Data

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

This article details the design, implementation, and optimization of a Flink‑based real‑time computing platform at Tongcheng‑Elong, covering the evolution from Storm to Flink, support for FlinkSQL and FlinkStream, metric collection, logging, data lineage, savepoint management, and numerous stability fixes contributed back to the open‑source community.

Big DataData LineageFlink

0 likes · 16 min read

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

dbaplus Community

Aug 27, 2019 · Big Data

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

This article explains how eBay’s Sherlock.IO monitoring platform processes billions of logs, events, and metrics daily using Flink Streaming jobs, detailing a metadata‑driven architecture, shared job strategies, Heartbeat‑based monitoring, job isolation, back‑pressure handling, and real‑world use cases such as Event Alerting, Eventzon, and Netmon.

Big DataFlinkReal-time Processing

0 likes · 18 min read

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

JavaEdge

Aug 25, 2019 · Big Data

Which Kafka Distribution Fits Your Needs? A Detailed Comparison

This article compares the main Kafka distributions—Apache Kafka, Confluent Kafka, and CDH/HDP Kafka—examining their origins, feature sets, ecosystem support, and trade‑offs to help you choose the most suitable version for your streaming workloads.

Streamingbig-dataconfluent

0 likes · 10 min read

Which Kafka Distribution Fits Your Needs? A Detailed Comparison

Big Data Technology & Architecture

Aug 20, 2019 · Big Data

OPPO’s Real‑Time Data Warehouse Construction with Apache Flink

The article summarizes a 2019 Apache Flink Meetup in Shenzhen where OPPO’s big‑data platform lead explains how the company built a real‑time data warehouse using Flink SQL extensions, presents four key aspects of the evolution, application cases, and future directions.

Big DataFlinkOPPO

0 likes · 3 min read

OPPO’s Real‑Time Data Warehouse Construction with Apache Flink

HomeTech

Aug 14, 2019 · Big Data

Real-Time Data Warehouse Development with Flink: Architecture, Implementation, and Lessons Learned

This article describes the motivation, technology selection, implementation details, and encountered challenges of building a real‑time data warehouse using Flink, covering streaming computation, code examples, dimension‑table caching, state backend choices, and best practices for production deployment.

FlinkKafkaState Backend

0 likes · 8 min read

Real-Time Data Warehouse Development with Flink: Architecture, Implementation, and Lessons Learned

DataFunTalk

Aug 9, 2019 · Big Data

Performance Optimization Techniques for Spark and Spark Streaming Applications

This article explains how to improve Spark and Spark Streaming performance by tuning serialization, broadcast variables, parallelism, batch intervals, memory usage, garbage collection, and Kafka integration, providing practical code examples and real‑world optimization results.

Broadcast VariablesKryoMemory Optimization

0 likes · 32 min read

Performance Optimization Techniques for Spark and Spark Streaming Applications

vivo Internet Technology

Aug 7, 2019 · Big Data

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

The article gives a thorough overview of Apache Kafka, explaining its core concepts, architecture, deployment steps, monitoring tools, and offset management, including broker and topic structures, producer/consumer APIs, replication, leader election, consumer groups, offset committing, and practical configuration and troubleshooting guidance.

Big DataKafkaMessaging

0 likes · 36 min read

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

Ctrip Technology

Aug 7, 2019 · Big Data

Improving Log Replay Efficiency with Flink and Elasticsearch at Ctrip Ticket Frontend

The article describes how Ctrip's ticket front‑end team replaced a slow, manual log‑pulling process with a Flink‑based real‑time pipeline that streams Kafka data, indexes it in Elasticsearch, and enables second‑level log retrieval for automated scenario replay, dramatically reducing CI cycle time.

Automation testingBig DataElasticsearch

0 likes · 7 min read

Improving Log Replay Efficiency with Flink and Elasticsearch at Ctrip Ticket Frontend

Big Data Technology & Architecture

Aug 4, 2019 · Big Data

Apache Pulsar vs Apache Kafka: Architecture, Performance, and Advantages

This article compares Apache Kafka and Apache Pulsar, detailing Kafka's scalability challenges, Pulsar's architectural benefits, performance gains, multi‑tenant support, security features, and provides code examples and migration guidance for large‑scale streaming applications.

Apache PulsarBig DataDistributed Systems

0 likes · 11 min read

Apache Pulsar vs Apache Kafka: Architecture, Performance, and Advantages

dbaplus Community

Jul 30, 2019 · Big Data

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

With the surge in real‑time data from sensors and devices, choosing the right streaming engine is critical; this article compares Apache Spark and Apache Flink—examining their architectures, micro‑batch vs continuous processing, strengths, limitations, and use‑case suitability for Kafka‑driven pipelines.

Big DataFlinkKafka

0 likes · 14 min read

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

Big Data Technology & Architecture

Jul 23, 2019 · Big Data

Understanding Google Dataflow: Model, Windowing, Triggers, and Incremental Processing

This article explains the Google Dataflow model, covering its unified batch‑and‑stream architecture, windowing and triggering mechanisms, core primitives, time domains, and how these concepts form the foundation of modern big‑data stream processing systems.

Big DataDataflowGoogle Cloud

0 likes · 13 min read

Understanding Google Dataflow: Model, Windowing, Triggers, and Incremental Processing

Big Data Technology & Architecture

Jun 30, 2019 · Big Data

Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series

This article presents a carefully organized catalogue of over a hundred technical posts covering Flink source‑code analysis, fundamental and advanced big‑data structures, Hadoop ecosystem components, real‑time streaming with Spark and Kafka, as well as system design guidelines and miscellaneous insights, each linked to its original publication for easy reference.

Big DataDistributed SystemsFlink

0 likes · 6 min read

Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series

Big Data Technology & Architecture

Jun 19, 2019 · Big Data

Understanding Spark Structured Streaming StateStore: Architecture, Operations, and Fault Recovery

This article explains the design and implementation of Spark Structured Streaming's StateStore module, covering its distributed architecture, state sharding, versioning, batch read/write, migration, update/query APIs, maintenance compaction, and fault‑tolerance mechanisms that enable incremental continuous queries with exactly‑once guarantees.

Big DataSparkStateStore

0 likes · 8 min read

Understanding Spark Structured Streaming StateStore: Architecture, Operations, and Fault Recovery

Big Data Technology Architecture

Jun 4, 2019 · Big Data

Understanding Kafka Exactly-Once Semantics, Idempotence, and Transactions

This article explains Kafka's Exactly-Once Semantics (EOS), the role of idempotence, and how transactional support works, covering EOS semantics, producer id and sequence numbers, configuration properties, and providing Java code examples for initializing, beginning, committing, and aborting transactions.

Exactly-OnceIdempotenceKafka

0 likes · 8 min read

Understanding Kafka Exactly-Once Semantics, Idempotence, and Transactions

Huajiao Technology

Jun 4, 2019 · Industry Insights

Understanding Streaming Media: From RTMP to MPEG‑DASH Explained

This article explains streaming media fundamentals and compares major live‑streaming protocols—including RTMP, HTTP‑FLV, HLS, HDS, and MPEG‑DASH—detailing their architectures, workflow diagrams, MPD structure, segment handling, and practical steps for building a DASH demo with MP4Box and dash.js.

MPEG-DASHRTMPStreaming

0 likes · 16 min read

Understanding Streaming Media: From RTMP to MPEG‑DASH Explained

DataFunTalk

Jun 3, 2019 · Big Data

Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink

This article examines the need for real‑time computation, explains streaming versus real‑time concepts, and compares Apache Spark and Apache Flink—covering their architectures, micro‑batch and continuous processing, advantages, limitations, windowing, event‑time handling, and watermarks—to guide engine selection for Kafka‑driven workloads.

FlinkKafkaSpark

0 likes · 15 min read

Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink

DataFunTalk

May 27, 2019 · Big Data

Practical Applications and Ecosystem Integration of Apache Kafka

This article explores Apache Kafka’s evolution, core messaging and stream processing capabilities, typical use cases, internal storage mechanisms, API choices, and best practices for deploying Kafka on Kubernetes, providing readers with comprehensive guidance to assess suitability and implement effective Kafka solutions.

Apache KafkaKafka APIsKubernetes

0 likes · 16 min read

Practical Applications and Ecosystem Integration of Apache Kafka

Big Data Technology & Architecture

May 20, 2019 · Big Data

Kafka Configuration, Monitoring, and Performance Optimization Best Practices

This article summarizes practical Kafka best‑practice guidelines covering hardware sizing, OS and JVM tuning, disk layout choices, replica and controller settings, broker and topic evaluation, as well as producer and consumer configuration, monitoring metrics, and strategies to prevent data loss.

KafkaStreamingbigdata

0 likes · 14 min read

Kafka Configuration, Monitoring, and Performance Optimization Best Practices

Big Data Technology Architecture

May 18, 2019 · Big Data

Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations

This article explains Kafka message structure and offset retrieval, details Hadoop's map and reduce shuffle processes, outlines Spark's deployment modes, describes HDFS read/write mechanisms, compares reduceByKey and groupByKey performance, and discusses Spark streaming integration with Kafka and data loss prevention.

HDFSHadoopKafka

0 likes · 10 min read

Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations

Big Data Technology & Architecture

May 15, 2019 · Backend Development

How ByteDance Uses Kafka – Presentation by Gong Yunfei at the 2019 Apache Flink x Apache Kafka Conference

This article presents Gong Yunfei's 2019 talk from the Beijing Apache Flink x Apache Kafka conference, detailing how ByteDance leverages Kafka for its large‑scale streaming and data processing needs.

Apache FlinkByteDanceDistributed Systems

0 likes · 2 min read

How ByteDance Uses Kafka – Presentation by Gong Yunfei at the 2019 Apache Flink x Apache Kafka Conference

Big Data Technology & Architecture

Apr 29, 2019 · Big Data

Understanding Retract Updates in FlinkSQL: Append vs Retract Modes

FlinkSQL's retract updates allow handling of data modifications in streaming queries by using toRetractStream, contrasting with the append-only toAppendStream mode, and this article explains the differences, when each mode applies, and provides illustrative examples and visual diagrams.

Append ModeBig DataFlinkSQL

0 likes · 3 min read

Understanding Retract Updates in FlinkSQL: Append vs Retract Modes

Big Data Technology Architecture

Apr 22, 2019 · Big Data

Comparison of Apache Spark and Apache Flink: Programming Models, Streaming, State Management, and Exactly-Once Semantics

This article compares Apache Spark and Apache Flink, outlining their programming models, streaming mechanisms, state management, time semantics, and exactly‑once guarantees, and highlights the strengths and differences of each framework for batch and real‑time big‑data processing.

Apache FlinkApache SparkExactly-Once

0 likes · 8 min read

Comparison of Apache Spark and Apache Flink: Programming Models, Streaming, State Management, and Exactly-Once Semantics

Big Data Technology & Architecture

Mar 21, 2019 · Big Data

Apache Flink Table API Tutorial and End‑to‑End Examples

This article provides a comprehensive tutorial on Apache Flink's Table API, explaining its concepts, core features, and a wide range of operators such as SELECT, WHERE, GROUP BY, UNION, JOIN, and various window functions, while offering complete Scala code examples, custom sources, sinks, and an end‑to‑end job that computes page‑view counts per region using event‑time tumbling windows.

Big DataFlinkScala

0 likes · 36 min read

Apache Flink Table API Tutorial and End‑to‑End Examples

Big Data Technology & Architecture

Mar 20, 2019 · Databases

Understanding JOIN Operators: From Traditional Databases to Apache Flink Streaming

This article explains the purpose and types of SQL JOIN operators, demonstrates their syntax and semantics with examples, compares traditional database joins to Apache Flink's streaming two‑stream join implementation, and discusses optimization techniques such as state management, shuffle handling, and join reordering.

Apache FlinkState ManagementStreaming

0 likes · 22 min read

Understanding JOIN Operators: From Traditional Databases to Apache Flink Streaming

Big Data Technology & Architecture

Mar 19, 2019 · Big Data

Comprehensive Overview of SQL and Apache Flink SQL Features with Practical Code Examples

This article provides an in-depth introduction to SQL, its history and ANSI standards, then details Apache Flink's SQL capabilities—including SELECT, WHERE, GROUP BY, UNION, JOIN, window functions, and user-defined functions—accompanied by extensive code examples and a complete end‑to‑end Flink job implementation.

Apache FlinkBig DataStreaming

0 likes · 34 min read

Comprehensive Overview of SQL and Apache Flink SQL Features with Practical Code Examples

Big Data Technology & Architecture

Mar 18, 2019 · Big Data

Introduction to Apache Kafka and Its Integration with Apache Flink

This article provides a step‑by‑step guide on installing Apache Kafka, creating topics, producing and consuming messages via command line, and demonstrates how to integrate Kafka with Apache Flink using the Flink‑Kafka connector, custom serialization schemas, and event‑time window processing.

Apache FlinkEvent TimeMessage Queue

0 likes · 23 min read

Introduction to Apache Kafka and Its Integration with Apache Flink

Big Data Technology & Architecture

Mar 7, 2019 · Big Data

Real-time Kafka Message Consumption and MySQL Sink with Apache Flink

This tutorial explains how to consume Kafka messages in real time using Apache Flink and persist them into a MySQL database by adding the JDBC dependency, implementing a custom RichSinkFunction, and configuring a Flink job with a Kafka source and MySQL sink.

FlinkMySQLSink

0 likes · 4 min read

Real-time Kafka Message Consumption and MySQL Sink with Apache Flink

Big Data Technology & Architecture

Mar 6, 2019 · Big Data

Using Flink Redis Sink for Streaming WordCount from Kafka to Redis

This tutorial demonstrates how to integrate Apache Flink with Redis as a sink, showing the Maven dependency, a custom RedisMapper implementation, and a complete Flink job that reads Kafka messages, performs word count, and stores results in Redis, with plans for HBase and MySQL extensions.

Big DataFlinkStreaming

0 likes · 4 min read

Using Flink Redis Sink for Streaming WordCount from Kafka to Redis