Tagged articles
560 articles
Page 5 of 6
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 19, 2020 · Big Data

Understanding Flink Timer Mechanism and Its Internal Implementation

This article explains how Flink's Timer mechanism works, covering its usage in KeyedProcessFunction, the underlying TimerService and InternalTimerService implementations, the role of triggers, and the detailed code paths for processing‑time and event‑time timers, while highlighting performance considerations.

FlinkInternalTimerServiceKeyedProcessFunction
0 likes · 16 min read
Understanding Flink Timer Mechanism and Its Internal Implementation
DataFunTalk
DataFunTalk
Sep 6, 2020 · Big Data

OPPO's Real-Time Data Warehouse Architecture and Practices Based on Apache Flink

OPPO's data platform engineer Zhang Jun shares the design and implementation of OPPO's real‑time data warehouse built on Apache Flink, covering background, top‑level architecture, practical deployment, and future directions such as enhanced SQL development, resource scheduling, and automated configuration.

Data PlatformFlinkStreaming
0 likes · 15 min read
OPPO's Real-Time Data Warehouse Architecture and Practices Based on Apache Flink
58 Tech
58 Tech
Aug 24, 2020 · Big Data

Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control

This article presents the concepts, architecture, and practical techniques of an online real‑time feature system used in intelligent risk‑control, covering feature definition, time‑window types, calculation functions, distributed processing, low‑latency storage, and operational challenges in high‑concurrency environments.

Big DataReal-time ProcessingStreaming
0 likes · 16 min read
Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control
Top Architect
Top Architect
Aug 11, 2020 · Big Data

Kafka Basics and Cluster Architecture Overview

This article provides a comprehensive introduction to Kafka, covering its role as a messaging system, core concepts such as topics, partitions, producers, consumers, and messages, and then delves into the cluster architecture including replicas, consumer groups, controller coordination with Zookeeper, performance optimizations, log segmentation, and network design.

Cluster ArchitectureKafkaMessage Queue
0 likes · 11 min read
Kafka Basics and Cluster Architecture Overview
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 4, 2020 · Big Data

Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)

This article explains how to use Spark Streaming's Direct Approach with Kafka, manually manage offsets, and provides complete Java and Scala implementations—including a JavaKafkaManager class, a demo application, and a Scala KafkaManager—illustrating the creation of DirectKafkaInputDStream, offset handling, and integration with Spark.

KafkaOffset ManagementScala
0 likes · 14 min read
Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)
Top Architect
Top Architect
Aug 1, 2020 · Cloud Computing

Design Analysis of Netflix’s Cloud‑Based Microservices Architecture

This article examines Netflix’s cloud‑based microservices architecture, detailing its client, backend, CDN components, design goals such as high availability, low latency, scalability, and the trade‑offs, resilience mechanisms, and scalability strategies employed on AWS to support millions of global streaming users.

AWSMicroservicesNetflix
0 likes · 22 min read
Design Analysis of Netflix’s Cloud‑Based Microservices Architecture
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 22, 2020 · Big Data

Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More

This article surveys the rapidly evolving big data landscape by reviewing a wide range of Apache projects—including Hadoop, Spark, Flink, HBase, Kudu, Impala, Kafka, and others—detailing their core components, architectures, strengths, and typical use‑cases for building distributed data platforms.

ApacheBig DataDistributed Systems
0 likes · 20 min read
Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More
Architects Research Society
Architects Research Society
Jul 15, 2020 · Big Data

Introduction to Apache Kafka: A Distributed Streaming Platform

This article provides a comprehensive overview of Apache Kafka, explaining its distributed, fault‑tolerant architecture, horizontal scalability, disk‑based commit log, replication mechanisms, Streams API, KSQL, and why it is widely adopted as the backbone of event‑driven, high‑throughput systems.

Distributed SystemsKafkaMessage Queue
0 likes · 15 min read
Introduction to Apache Kafka: A Distributed Streaming Platform
dbaplus Community
dbaplus Community
Jul 7, 2020 · Big Data

How Flink + ClickHouse Power Real‑Time Analytics at Scale

This article explains how FunTouTiao builds a high‑performance real‑time analytics pipeline using Flink, Hive, and ClickHouse, covering business scenarios, hour‑level and second‑level Flink‑to‑Hive architectures, streaming file sink mechanics, multi‑user permissions, ClickHouse performance tricks, and future roadmap for unified stream‑batch storage.

Big DataClickHouseFlink
0 likes · 18 min read
How Flink + ClickHouse Power Real‑Time Analytics at Scale
DataFunTalk
DataFunTalk
Jun 30, 2020 · Big Data

Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team

This article details Shopee Singapore Data Team’s implementation of a Flink‑based real‑time data warehouse, covering background challenges, layered architecture integrating Kafka, HBase, Druid, Hive, streaming pipelines, job management, monitoring, and future plans to expand Flink SQL support.

FlinkReal-TimeShopee
0 likes · 15 min read
Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team
Big Data Technology Architecture
Big Data Technology Architecture
Jun 29, 2020 · Big Data

Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink

This article summarizes the objectives, design principles, application scenarios, layer‑by‑layer construction methods, quality assurance mechanisms, and supporting tools for building a real‑time data warehouse using Apache Flink, providing practical guidance for data engineers and architects.

Apache FlinkData QualityFlink
0 likes · 24 min read
Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink
Architecture Digest
Architecture Digest
Jun 24, 2020 · Big Data

Preventing Message Loss and Achieving Exactly‑Once Semantics in Kafka

This article explains common scenarios where Kafka messages can be lost on the producer, consumer, or broker side, and provides practical configurations—including callbacks, acks, retries, manual offset commits, idempotent and transactional producers—to ensure reliable delivery and exactly‑once processing.

Exactly-OnceIdempotenceMessage Loss
0 likes · 10 min read
Preventing Message Loss and Achieving Exactly‑Once Semantics in Kafka
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Jun 23, 2020 · Backend Development

Common Kafka Interview Questions and Answers

This article reviews common Kafka interview questions, covering delay queues, idempotence, replica states, offsets, message ordering, and handling duplicate consumption, and includes example code for enabling idempotent producers along with explanations of time‑wheel mechanisms and practical solutions to consumer rebalance issues.

ConsumerIdempotenceKafka
0 likes · 9 min read
Common Kafka Interview Questions and Answers
DataFunTalk
DataFunTalk
Jun 18, 2020 · Big Data

Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices

QuTouTiao leverages Flink and ClickHouse to build a high‑performance real‑time analytics platform that supports hourly Hive pipelines and sub‑second ClickHouse queries, achieving sub‑second response for 80% of requests through streaming ingestion, exactly‑once semantics, multi‑cluster coordination, and optimized ClickHouse storage and connector designs.

Big DataClickHouseFlink
0 likes · 16 min read
Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices
Beike Product & Technology
Beike Product & Technology
Jun 12, 2020 · Big Data

Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform

This article describes the evolution of a real‑time computing platform from SQL 1.0 built on Spark Structured Streaming to SQL 2.0 powered by Flink‑SQL, covering dynamic tables, continuous queries, dimension‑table joins, cache optimization, DDL extensions, platformization, operational challenges and future roadmap.

Big DataDimension TableFlink
0 likes · 19 min read
Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 9, 2020 · Big Data

Comprehensive Overview and Best Practices for Apache Spark Streaming

This article provides a detailed introduction to Spark Streaming, covering its architecture, DStream concepts, initialization, data sources, transformations, windowed aggregations, output operations, checkpointing, fault‑tolerance semantics, deployment, performance tuning, and monitoring for building reliable high‑throughput streaming applications.

Big DataDstreamScala
0 likes · 17 min read
Comprehensive Overview and Best Practices for Apache Spark Streaming
dbaplus Community
dbaplus Community
Jun 2, 2020 · Big Data

How Cainiao Built a Scalable Real‑Time Data Warehouse with Flink

Facing growing order volumes and strict timeliness demands, Cainiao’s tech team overhauled its real‑time data warehouse by redesigning data models, adopting Flink for streaming computation, upgrading data services, and exploring innovative tools, sharing practical lessons and future directions for large‑scale logistics analytics.

Big DataFlinkLogistics
0 likes · 18 min read
How Cainiao Built a Scalable Real‑Time Data Warehouse with Flink
ITPUB
ITPUB
May 28, 2020 · Databases

How UPSQL Proxy Implements MySQL Streaming to Boost Performance

This article explains the MySQL communication protocol, result‑set structure, client library interfaces, and the difference between store‑result and streaming modes, then details how UPSQL Proxy 2.4.0 adopts streaming to reduce latency and memory usage in distributed database environments.

Database MiddlewareMySQLResultSet
0 likes · 6 min read
How UPSQL Proxy Implements MySQL Streaming to Boost Performance
macrozheng
macrozheng
May 21, 2020 · Big Data

Mastering Kafka: Core Concepts, Architecture, and Reliability Guarantees

This comprehensive guide covers Kafka's definition, publish/subscribe model, key components, storage mechanisms, producer and consumer strategies, and reliability features such as ACK levels, ISR, and exactly‑once semantics, providing a solid foundation for real‑time big‑data processing.

Big DataDistributed SystemsKafka
0 likes · 16 min read
Mastering Kafka: Core Concepts, Architecture, and Reliability Guarantees
DataFunTalk
DataFunTalk
May 14, 2020 · Big Data

Building a Real-Time Data Warehouse at Cainiao: Architecture, Model Upgrades, Engine Enhancements, and Service Innovations

This article shares Cainiao's practical experience in constructing a real-time data warehouse, covering the shortcomings of the previous architecture, the evolution of data models, the migration to Flink with advanced features like retraction and timer services, and the modernization of data services and tooling to support high‑throughput logistics scenarios.

Big DataData ServiceFlink
0 likes · 16 min read
Building a Real-Time Data Warehouse at Cainiao: Architecture, Model Upgrades, Engine Enhancements, and Service Innovations
DataFunTalk
DataFunTalk
Apr 15, 2020 · Big Data

Apache Flink OLAP Engine: Architecture, Optimizations, and Use Cases

This article presents an in‑depth overview of Apache Flink's new OLAP engine, covering OLAP fundamentals, the three OLAP models, Flink's unified streaming‑batch‑OLAP architecture, performance optimizations, benchmark results, and future development directions.

Apache FlinkBig DataOLAP
0 likes · 11 min read
Apache Flink OLAP Engine: Architecture, Optimizations, and Use Cases
Qunar Tech Salon
Qunar Tech Salon
Apr 8, 2020 · Backend Development

RabbitMQ vs Kafka: A Technical Comparison of Messaging Systems

This article explains the fundamental differences between RabbitMQ and Apache Kafka, covering asynchronous messaging patterns, the internal architectures of both systems, their respective strengths and weaknesses, and guidance on choosing the appropriate solution for various scenarios.

BackendMessagingStreaming
0 likes · 10 min read
RabbitMQ vs Kafka: A Technical Comparison of Messaging Systems
DataFunTalk
DataFunTalk
Mar 28, 2020 · Big Data

Applying Flink State Management for Real-Time Recommendation Scenarios

This article explains how Apache Flink's flexible state management can be leveraged to solve data correlation challenges in real‑time recommendation platforms, compares Flink with Spark and Storm, describes the underlying broadcast and managed state mechanisms, and provides a step‑by‑step implementation using Kafka, Druid, and custom broadcast functions.

Big DataFlinkReal-Time
0 likes · 14 min read
Applying Flink State Management for Real-Time Recommendation Scenarios
DataFunTalk
DataFunTalk
Mar 6, 2020 · Artificial Intelligence

Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration

This article reviews recent progress in Apache Flink's AI ecosystem, explaining how Flink unifies batch and stream processing for machine‑learning pipelines, introduces the Flink ML Pipeline and Alink library, describes the AI Flow framework for end‑to‑end ML workflows, and presents a novel mini‑batch streaming iteration mechanism to support both offline and online learning scenarios.

AI FlowApache FlinkMini-batch Iteration
0 likes · 13 min read
Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration
Big Data Technology Architecture
Big Data Technology Architecture
Feb 26, 2020 · Big Data

Comprehensive Guide to Kafka Architecture, Messaging Mechanisms, Replication, Controllers, and Consumer Rebalance

This article provides an in‑depth yet approachable overview of Kafka's core concepts—including its architecture, terminology, message‑sending pipeline, replication strategy, controller role, and consumer group rebalance mechanisms—helping readers quickly grasp how Kafka works as a high‑throughput distributed messaging and streaming platform.

Consumer RebalanceDistributed MessagingKafka
0 likes · 21 min read
Comprehensive Guide to Kafka Architecture, Messaging Mechanisms, Replication, Controllers, and Consumer Rebalance
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 22, 2020 · Big Data

Understanding Flink's Asynchronous Barrier Snapshot (ABS) Algorithm for Checkpointing

This article explains how Apache Flink implements fault‑tolerant checkpointing using the Asynchronous Barrier Snapshot (ABS) algorithm, a localized version of the Chandy‑Lamport distributed snapshot, covering barriers, snapshot alignment, exactly‑once versus at‑least‑once semantics, and handling of cyclic dataflow graphs.

Asynchronous Barrier SnapshotDistributed SystemsFlink
0 likes · 9 min read
Understanding Flink's Asynchronous Barrier Snapshot (ABS) Algorithm for Checkpointing
Tencent Cloud Developer
Tencent Cloud Developer
Feb 18, 2020 · Backend Development

Technical Overview of Tencent Cloud CKafka for High-Scale Online Classroom Messaging

Tencent Cloud CKafka powers Tencent Classroom’s pandemic‑era online teaching by replacing a custom queue with a high‑performance, highly available, partition‑based message bus that scales to millions of real‑time interactions, offers configurable replication and tuning for reliability, and integrates with big‑data and streaming tools for analytics.

CKafkaKafkaMessage Queue
0 likes · 15 min read
Technical Overview of Tencent Cloud CKafka for High-Scale Online Classroom Messaging
Big Data Technology Architecture
Big Data Technology Architecture
Feb 13, 2020 · Big Data

Evolution of Cainiao's Real-Time Data Warehouse Architecture: Model, Compute Engine, and Data Service Upgrades

The talk details Cainiao’s evolution of its real‑time data warehouse architecture, covering the original 2016 model, compute and service challenges, the 2017 multi‑layer data model redesign, migration to Flink, practical cases of state retraction, timeout statistics, smart optimizations, and the unified data service platform.

Data ServiceFlinkStreaming
0 likes · 16 min read
Evolution of Cainiao's Real-Time Data Warehouse Architecture: Model, Compute Engine, and Data Service Upgrades
Big Data Technology Architecture
Big Data Technology Architecture
Feb 8, 2020 · Big Data

Meituan-Dianping Real-Time Data Warehouse Platform Built on Apache Flink: Architecture, Practices, and Future Directions

Meituan-Dianping’s senior technical expert shares the evolution, architecture, and implementation of their Apache Flink‑based real‑time data warehouse platform, covering platform evolution, layered design, job and resource management, business warehouse use cases, and future development considerations.

FlinkMeituan-DianpingStreaming
0 likes · 16 min read
Meituan-Dianping Real-Time Data Warehouse Platform Built on Apache Flink: Architecture, Practices, and Future Directions
dbaplus Community
dbaplus Community
Jan 14, 2020 · Big Data

How OPPO Built a Real‑Time Data Warehouse with Flink SQL

This article details{32-64 words} OPPO's evolution from an offline data warehouse to a real‑time platform, describing the business scale, data‑mid platform architecture, migration strategy using Flink SQL, extensions like AthenaX, and practical use cases such as real‑time ETL, CTR calculation, and tag import.

ETLFlinkStreaming
0 likes · 18 min read
How OPPO Built a Real‑Time Data Warehouse with Flink SQL
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 10, 2020 · Big Data

Async I/O for Dimension Table Joins in Apache Flink

This article explains how to handle dimension table joins in Apache Flink streaming by leveraging Async I/O to perform non‑blocking external lookups, provides detailed code examples for both synchronous and asynchronous functions, discusses configuration parameters, and outlines best practices and pitfalls.

Big DataDimension Table JoinFlink
0 likes · 16 min read
Async I/O for Dimension Table Joins in Apache Flink
Tencent Cloud Developer
Tencent Cloud Developer
Jan 10, 2020 · Cloud Computing

Tencent Classroom Cloud VOD HLS Playback Architecture and Optimization

The article outlines Tencent Classroom’s cloud VOD solution, detailing HLS streaming fundamentals, a Mongoose‑based local HTTP proxy with LFU caching and pre‑loading, performance optimizations for latency, buffering, security, and playback reliability, and common transcoding pitfalls with practical fixes, highlighting cloud migration benefits.

Cache OptimizationStreamingTencent Cloud
0 likes · 13 min read
Tencent Classroom Cloud VOD HLS Playback Architecture and Optimization
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 2, 2020 · Big Data

Structured Streaming: Design, Challenges, Programming Model, and Performance Evaluation

This article provides a comprehensive overview of Apache Spark Structured Streaming, describing its declarative API, the challenges of stream processing, the programming model with code examples, query planning, execution modes, production use cases, and performance benchmarks compared with other streaming systems.

Big DataSparkStreaming
0 likes · 42 min read
Structured Streaming: Design, Challenges, Programming Model, and Performance Evaluation
360 Quality & Efficiency
360 Quality & Efficiency
Dec 31, 2019 · Backend Development

Understanding the LocalServer Video Caching Mechanism

The article explains how LocalServer pre‑caches video data to improve start‑up speed and playback smoothness, detailing when and how much data is cached, storage policies, handling of network interruptions, and practical testing points for developers.

LocalServerStreamingnetwork optimization
0 likes · 6 min read
Understanding the LocalServer Video Caching Mechanism
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 22, 2019 · Big Data

Dynamic Resource Allocation in Spark Streaming: Problems, Mechanisms, and Practical Guidelines

The article explains Spark's default static resource allocation, analyzes the limitations of its Dynamic Resource Allocation (DRA) for streaming workloads, describes the internal Spark components and code paths involved, and proposes concrete design and configuration recommendations for implementing more responsive executor scaling.

Big DataDynamic Resource AllocationExecutor Management
0 likes · 11 min read
Dynamic Resource Allocation in Spark Streaming: Problems, Mechanisms, and Practical Guidelines
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 19, 2019 · Big Data

Apache Kafka 2.4.0 Release: New Features and Improvements

Apache Kafka 2.4.0 introduces a range of new capabilities—including consumer replica fetching, incremental cooperative rebalancing, MirrorMaker 2.0, a new Java authorization API, KTable non‑key joins, administrative replica reassignment, protected REST endpoints, and offset deletion—along with numerous performance and stability improvements.

Apache KafkaBig DataDistributed Systems
0 likes · 3 min read
Apache Kafka 2.4.0 Release: New Features and Improvements
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 10, 2019 · Big Data

Implementing Real-Time TopN Rankings with Apache Flink

This article explains how to develop a real‑time TopN ranking feature using Apache Flink, covering both global and grouped TopN implementations, nested TopN strategies, and provides complete Java code snippets for environment setup, word counting, windowing, and custom TopN functions.

FlinkReal-TimeStreaming
0 likes · 8 min read
Implementing Real-Time TopN Rankings with Apache Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 9, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

This article explains how to develop a real‑time ETL application using Apache Flink that reads events from Kafka, partitions them by event time into HDFS directories, and achieves exactly‑once processing through checkpointing, custom bucket assigners, and proper state backend configuration.

Apache FlinkBig DataExactly-Once
0 likes · 11 min read
Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 5, 2019 · Artificial Intelligence

How Alibaba’s Alink Empowers Real‑Time Machine Learning on Flink

Alink, Alibaba’s open‑source machine‑learning platform built on Apache Flink, offers a rich library of batch and streaming algorithms, a Python API, iterative computation optimizations, and real‑world case studies, positioning it as a powerful AI solution for large‑scale, low‑latency data processing.

AIAlinkFlink
0 likes · 13 min read
How Alibaba’s Alink Empowers Real‑Time Machine Learning on Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 26, 2019 · Big Data

Understanding Flink SQL Window Functions: Types, Implementation, and Emit Triggers

This article provides a comprehensive overview of Flink SQL window functions, detailing time‑based window types, their underlying implementation in the StreamExecGroupWindowAggregate operator, the processing flow of WindowOperator, timer handling, emit/trigger strategies, and practical code examples for Tumble, Hop, and Session windows.

Big DataEmitFlink
0 likes · 20 min read
Understanding Flink SQL Window Functions: Types, Implementation, and Emit Triggers
Architecture Digest
Architecture Digest
Nov 25, 2019 · Big Data

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

This article provides a comprehensive overview of Apache Kafka, covering its fundamental capabilities, typical use cases, core components, key APIs, and essential concepts such as topics, partitions, segments, brokers, producers, and consumers, illustrated with diagrams.

APIsBig DataDistributed Systems
0 likes · 8 min read
Introduction to Apache Kafka: Core Concepts, Architecture, and APIs
G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
Nov 21, 2019 · Big Data

How G7 Combines AI, Big Data, and IoT to Transform Logistics

This article presents a detailed overview of G7's AI‑plus‑Big‑Data‑plus‑IoT platform for logistics, describing its neutral open architecture, real‑time data pipelines using Kafka and Flink, Lambda‑style storage in HBase/Hive, and the resulting safety‑insurance and analytics capabilities.

AIFlinkIoT
0 likes · 10 min read
How G7 Combines AI, Big Data, and IoT to Transform Logistics
Hulu Beijing
Hulu Beijing
Nov 15, 2019 · Artificial Intelligence

How Content-Based Video Relevance Prediction Advances Personalized Streaming

The CBVRP (Content-Based Video Relevance Prediction) challenge, co‑hosted by Hulu and ACM MM 2019, showcased the shift from user‑based collaborative filtering to content‑driven recommendation, highlighted winning teams and their papers, and underscored the ongoing research importance of cold‑start video recommendation for streaming platforms.

MultimediaStreamingcold start
0 likes · 15 min read
How Content-Based Video Relevance Prediction Advances Personalized Streaming
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 9, 2019 · Big Data

Comparative Study of Apache Flink and Spark Streaming at Xiaomi: Architecture, Performance, and Serialization

This article examines Xiaomi's migration from Spark Streaming to Apache Flink, comparing scheduling strategies, mini‑batch versus true streaming, resource utilization, latency, and serialization mechanisms, and concludes with practical insights and custom optimization techniques for large‑scale data processing.

Big DataFlinkMini-Batch
0 likes · 17 min read
Comparative Study of Apache Flink and Spark Streaming at Xiaomi: Architecture, Performance, and Serialization
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 7, 2019 · Big Data

Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings

This article demonstrates how to build a one‑second‑refresh real‑time dashboard for e‑commerce order data using Apache Flink, Kafka, and Redis, covering JSON message parsing, processing‑time windows, stateful aggregation for site‑level KPIs, and efficient top‑N product ranking via Redis sorted sets.

DashboardFlinkKafka
0 likes · 11 min read
Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings
DataFunTalk
DataFunTalk
Nov 7, 2019 · Big Data

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

This article details Beike's real‑time computing engine, covering its background, streaming platform built on Spark Streaming and Flink, data ingestion via Kafka, metadata handling, SQL‑based task development, monitoring, storage solutions, and future roadmap for resource management and AI‑enhanced monitoring.

Big DataFlinkKafka
0 likes · 14 min read
Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans
FunTester
FunTester
Oct 24, 2019 · Backend Development

Why qrpc Beats gRPC: A Lightweight, High‑Performance RPC Framework

qrpc is a lightweight, high‑performance RPC framework that adopts gRPC's streaming and bidirectional concepts without HTTP/2, offering a smaller binary, lower memory usage, up to three‑fold throughput gains, and flexible modes such as blocking, non‑blocking, streaming, push, and bidirectional calls, all demonstrated with Go code examples and real‑world use cases.

GoMicroservicesNetworking
0 likes · 12 min read
Why qrpc Beats gRPC: A Lightweight, High‑Performance RPC Framework
Architects Research Society
Architects Research Society
Oct 13, 2019 · Databases

What is Debezium? Overview, Architecture, and Features

Debezium is an open‑source distributed platform built on Apache Kafka that turns existing databases into real‑time event streams by capturing row‑level changes via change data capture, offering source and embedded connectors, flexible topic routing, and features such as snapshots, filtering, masking, and monitoring.

CDCChange Data CaptureDebezium
0 likes · 7 min read
What is Debezium? Overview, Architecture, and Features
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 11, 2019 · Big Data

Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan

This article reviews the evolution and key components of big data platforms at leading Chinese internet companies—Taobao, Didi, and Meituan—detailing their data sources, synchronization tools, storage layers, processing engines, and scheduling systems to provide practical guidance for building robust big data infrastructures.

ArchitectureBig DataData Platform
0 likes · 9 min read
Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan
DataFunTalk
DataFunTalk
Sep 5, 2019 · Big Data

Apache Beam Architecture Principles and Practical Application

This article introduces Apache Beam as a unified programming model for batch and streaming data processing, explains its architecture, core components, advantages, extensibility, and demonstrates practical usage with KafkaIO, BeamSQL, and AIoT scenarios across multiple runners.

Apache BeamKafkaStreaming
0 likes · 16 min read
Apache Beam Architecture Principles and Practical Application
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 5, 2019 · Artificial Intelligence

How AI and Big Data Powered the Hit Series ‘The Longest Day in Chang’an’

The article explains how Alibaba/Youku leveraged AI recommendation, big‑data analytics, HDR upscaling, interactive multi‑stream playback, smart bitrate selection, and a robust anti‑leech system to turn the drama “The Longest Day in Chang’an” into a data‑driven blockbuster, detailing the underlying technologies and their impact on production, distribution, and viewer experience.

AIHDRMedia Platform
0 likes · 8 min read
How AI and Big Data Powered the Hit Series ‘The Longest Day in Chang’an’
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 5, 2019 · Artificial Intelligence

How AI and Big Data Powered the Hit Series “The Longest Day in Chang’an”

The article explores how Alibaba/Youku leveraged AI recommendation, big‑data analytics, interactive streaming, HDR reconstruction, smart bitrate selection, anti‑piracy defenses, and a high‑performance media‑asset platform to turn the drama “The Longest Day in Chang’an” into a blockbuster, detailing the underlying technologies and their impact on production and viewer experience.

AIAnti-PiracyHDR
0 likes · 7 min read
How AI and Big Data Powered the Hit Series “The Longest Day in Chang’an”
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Sep 3, 2019 · Big Data

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

This article details the design, implementation, and optimization of a Flink‑based real‑time computing platform at Tongcheng‑Elong, covering the evolution from Storm to Flink, support for FlinkSQL and FlinkStream, metric collection, logging, data lineage, savepoint management, and numerous stability fixes contributed back to the open‑source community.

Big DataData LineageFlink
0 likes · 16 min read
Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong
dbaplus Community
dbaplus Community
Aug 27, 2019 · Big Data

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

This article explains how eBay’s Sherlock.IO monitoring platform processes billions of logs, events, and metrics daily using Flink Streaming jobs, detailing a metadata‑driven architecture, shared job strategies, Heartbeat‑based monitoring, job isolation, back‑pressure handling, and real‑world use cases such as Event Alerting, Eventzon, and Netmon.

Big DataFlinkReal-time Processing
0 likes · 18 min read
How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming
JavaEdge
JavaEdge
Aug 25, 2019 · Big Data

Which Kafka Distribution Fits Your Needs? A Detailed Comparison

This article compares the main Kafka distributions—Apache Kafka, Confluent Kafka, and CDH/HDP Kafka—examining their origins, feature sets, ecosystem support, and trade‑offs to help you choose the most suitable version for your streaming workloads.

Streamingbig-dataconfluent
0 likes · 10 min read
Which Kafka Distribution Fits Your Needs? A Detailed Comparison
DataFunTalk
DataFunTalk
Aug 9, 2019 · Big Data

Performance Optimization Techniques for Spark and Spark Streaming Applications

This article explains how to improve Spark and Spark Streaming performance by tuning serialization, broadcast variables, parallelism, batch intervals, memory usage, garbage collection, and Kafka integration, providing practical code examples and real‑world optimization results.

Broadcast VariablesKryoMemory Optimization
0 likes · 32 min read
Performance Optimization Techniques for Spark and Spark Streaming Applications
vivo Internet Technology
vivo Internet Technology
Aug 7, 2019 · Big Data

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

The article gives a thorough overview of Apache Kafka, explaining its core concepts, architecture, deployment steps, monitoring tools, and offset management, including broker and topic structures, producer/consumer APIs, replication, leader election, consumer groups, offset committing, and practical configuration and troubleshooting guidance.

Big DataKafkaMessaging
0 likes · 36 min read
Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management
dbaplus Community
dbaplus Community
Jul 30, 2019 · Big Data

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

With the surge in real‑time data from sensors and devices, choosing the right streaming engine is critical; this article compares Apache Spark and Apache Flink—examining their architectures, micro‑batch vs continuous processing, strengths, limitations, and use‑case suitability for Kafka‑driven pipelines.

Big DataFlinkKafka
0 likes · 14 min read
Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 30, 2019 · Big Data

Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series

This article presents a carefully organized catalogue of over a hundred technical posts covering Flink source‑code analysis, fundamental and advanced big‑data structures, Hadoop ecosystem components, real‑time streaming with Spark and Kafka, as well as system design guidelines and miscellaneous insights, each linked to its original publication for easy reference.

Big DataDistributed SystemsFlink
0 likes · 6 min read
Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 19, 2019 · Big Data

Understanding Spark Structured Streaming StateStore: Architecture, Operations, and Fault Recovery

This article explains the design and implementation of Spark Structured Streaming's StateStore module, covering its distributed architecture, state sharding, versioning, batch read/write, migration, update/query APIs, maintenance compaction, and fault‑tolerance mechanisms that enable incremental continuous queries with exactly‑once guarantees.

Big DataSparkStateStore
0 likes · 8 min read
Understanding Spark Structured Streaming StateStore: Architecture, Operations, and Fault Recovery
Huajiao Technology
Huajiao Technology
Jun 4, 2019 · Industry Insights

Understanding Streaming Media: From RTMP to MPEG‑DASH Explained

This article explains streaming media fundamentals and compares major live‑streaming protocols—including RTMP, HTTP‑FLV, HLS, HDS, and MPEG‑DASH—detailing their architectures, workflow diagrams, MPD structure, segment handling, and practical steps for building a DASH demo with MP4Box and dash.js.

MPEG-DASHRTMPStreaming
0 likes · 16 min read
Understanding Streaming Media: From RTMP to MPEG‑DASH Explained
DataFunTalk
DataFunTalk
Jun 3, 2019 · Big Data

Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink

This article examines the need for real‑time computation, explains streaming versus real‑time concepts, and compares Apache Spark and Apache Flink—covering their architectures, micro‑batch and continuous processing, advantages, limitations, windowing, event‑time handling, and watermarks—to guide engine selection for Kafka‑driven workloads.

FlinkKafkaSpark
0 likes · 15 min read
Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink
DataFunTalk
DataFunTalk
May 27, 2019 · Big Data

Practical Applications and Ecosystem Integration of Apache Kafka

This article explores Apache Kafka’s evolution, core messaging and stream processing capabilities, typical use cases, internal storage mechanisms, API choices, and best practices for deploying Kafka on Kubernetes, providing readers with comprehensive guidance to assess suitability and implement effective Kafka solutions.

Apache KafkaKafka APIsKubernetes
0 likes · 16 min read
Practical Applications and Ecosystem Integration of Apache Kafka
Big Data Technology Architecture
Big Data Technology Architecture
May 18, 2019 · Big Data

Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations

This article explains Kafka message structure and offset retrieval, details Hadoop's map and reduce shuffle processes, outlines Spark's deployment modes, describes HDFS read/write mechanisms, compares reduceByKey and groupByKey performance, and discusses Spark streaming integration with Kafka and data loss prevention.

HDFSHadoopKafka
0 likes · 10 min read
Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations
Big Data Technology Architecture
Big Data Technology Architecture
Apr 22, 2019 · Big Data

Comparison of Apache Spark and Apache Flink: Programming Models, Streaming, State Management, and Exactly-Once Semantics

This article compares Apache Spark and Apache Flink, outlining their programming models, streaming mechanisms, state management, time semantics, and exactly‑once guarantees, and highlights the strengths and differences of each framework for batch and real‑time big‑data processing.

Apache FlinkApache SparkExactly-Once
0 likes · 8 min read
Comparison of Apache Spark and Apache Flink: Programming Models, Streaming, State Management, and Exactly-Once Semantics
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 21, 2019 · Big Data

Apache Flink Table API Tutorial and End‑to‑End Examples

This article provides a comprehensive tutorial on Apache Flink's Table API, explaining its concepts, core features, and a wide range of operators such as SELECT, WHERE, GROUP BY, UNION, JOIN, and various window functions, while offering complete Scala code examples, custom sources, sinks, and an end‑to‑end job that computes page‑view counts per region using event‑time tumbling windows.

Big DataFlinkScala
0 likes · 36 min read
Apache Flink Table API Tutorial and End‑to‑End Examples
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 20, 2019 · Databases

Understanding JOIN Operators: From Traditional Databases to Apache Flink Streaming

This article explains the purpose and types of SQL JOIN operators, demonstrates their syntax and semantics with examples, compares traditional database joins to Apache Flink's streaming two‑stream join implementation, and discusses optimization techniques such as state management, shuffle handling, and join reordering.

Apache FlinkState ManagementStreaming
0 likes · 22 min read
Understanding JOIN Operators: From Traditional Databases to Apache Flink Streaming
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 19, 2019 · Big Data

Comprehensive Overview of SQL and Apache Flink SQL Features with Practical Code Examples

This article provides an in-depth introduction to SQL, its history and ANSI standards, then details Apache Flink's SQL capabilities—including SELECT, WHERE, GROUP BY, UNION, JOIN, window functions, and user-defined functions—accompanied by extensive code examples and a complete end‑to‑end Flink job implementation.

Apache FlinkBig DataStreaming
0 likes · 34 min read
Comprehensive Overview of SQL and Apache Flink SQL Features with Practical Code Examples