Tagged articles

560 articles

Page 4 of 6

Oct 29, 2021 · Cloud Native

RocketMQ 5.0 Overview: A Cloud‑Native Messaging, Event and Stream Fusion Platform

This article reviews the evolution of RocketMQ from its early MetaQ roots through the 4.x releases, explains the motivations behind RocketMQ 5.0, and details its cloud‑native architecture, lightweight SDK, storage‑compute separation, POP consumption model, elastic scaling, and the upcoming RocketMQ Streams framework.

Distributed SystemsMessage QueueRocketMQ

0 likes · 18 min read

RocketMQ 5.0 Overview: A Cloud‑Native Messaging, Event and Stream Fusion Platform

Big Data Technology & Architecture

Oct 29, 2021 · Big Data

Dimension Table Join Strategies in Apache Flink: Preload, Distributed Cache, Hot Storage, Broadcast, and Temporal Table Function

The article explains various dimension‑table join approaches in Apache Flink, including preloading tables into memory, using distributed cache, leveraging hot storage with async I/O, broadcasting state, and temporal table function joins, and compares their trade‑offs for different data volumes and update frequencies.

Dimension TableFlinkJOIN

0 likes · 10 min read

Dimension Table Join Strategies in Apache Flink: Preload, Distributed Cache, Hot Storage, Broadcast, and Temporal Table Function

Big Data Technology & Architecture

Oct 26, 2021 · Big Data

Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse

This article shares practical insights on designing and operating a real‑time clickstream data warehouse using Flink for streaming processing and ClickHouse for near‑real‑time OLAP, covering dimensional modeling, layered architecture, Flink‑ClickHouse sink implementation, and data rebalancing strategies.

ClickHouseData WarehouseFlink

0 likes · 10 min read

Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse

Big Data Technology & Architecture

Oct 21, 2021 · Big Data

Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal

This article provides a detailed comparison of three popular open‑source change data capture tools—Debezium, Flink CDC, and Canal—covering their underlying principles, architecture, deployment options, performance characteristics, and suitability for real‑time data synchronization in big‑data environments.

CDCCanalChange Data Capture

0 likes · 21 min read

Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal

G7 EasyFlow Tech Circle

Oct 20, 2021 · Backend Development

How to Monitor HLS Video Streams in Connected Cars Using NGINX and Lua

This article explains a lightweight, loosely‑coupled solution for real‑time monitoring of HLS video playback in vehicle IoT, detailing how to embed device and session identifiers in HLS requests, modify M3U8 files, and collect statistics with NGINX Lua scripts.

LuaNginxStreaming

0 likes · 9 min read

How to Monitor HLS Video Streams in Connected Cars Using NGINX and Lua

Tencent Cloud Developer

Oct 19, 2021 · Backend Development

Comprehensive Guide to gRPC Communication with Go and PHP: Protobuf, Streaming, TLS, and Timeout

This comprehensive guide walks you through creating a gRPC user service in Go and PHP, from defining protobuf messages and generating code, implementing server and client stubs, adding client, server, and bidirectional streaming, securing communication with TLS certificates, and managing request deadlines with timeout controls.

GoPHPStreaming

0 likes · 33 min read

Comprehensive Guide to gRPC Communication with Go and PHP: Protobuf, Streaming, TLS, and Timeout

IT Architects Alliance

Oct 13, 2021 · Industry Insights

Is Kafka Still Worth the Effort? Rethinking Data Pipeline Costs and Alternatives

The article examines Apache Kafka's strengths and shortcomings, explores the operational complexities of managing a Kafka deployment, and encourages organizations to reassess its value versus emerging alternatives by weighing maturity, scalability, and total cost of ownership.

Alternative PlatformsKafkaOperational Challenges

0 likes · 12 min read

Is Kafka Still Worth the Effort? Rethinking Data Pipeline Costs and Alternatives

Big Data Technology & Architecture

Oct 12, 2021 · Big Data

Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide

This article explores the evolution of data lakes, compares major cloud providers' lake architectures, introduces the emerging lakehouse concept, and provides a step‑by‑step Flink‑Iceberg implementation—including dependencies, catalog setup, table creation, checkpointing, and Kafka ingestion—demonstrating practical big‑data streaming solutions.

Data LakeFlinkIceberg

0 likes · 14 min read

Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide

Big Data Technology Architecture

Oct 9, 2021 · Big Data

Apache Kafka 3.0 Release Highlights and New Features

Apache Kafka 3.0 introduces major enhancements including KRaft consensus, deprecation of Java 8 and Scala 2.12 support, stronger producer guarantees, updated APIs, improved Kafka Connect, MirrorMaker 2 flexibility, and numerous KIP-driven feature upgrades, marking a significant step forward for the distributed streaming platform.

KafkaKafka 3.0Streaming

0 likes · 13 min read

Apache Kafka 3.0 Release Highlights and New Features

Big Data Technology & Architecture

Oct 9, 2021 · Big Data

Apache Flink 1.7–1.14 Release Highlights and Feature Evolution

This article provides a comprehensive overview of Apache Flink's major releases from version 1.7 to 1.14, detailing new APIs, state management improvements, Kubernetes integration, SQL and Table API enhancements, checkpointing advances, and performance optimizations that together illustrate the platform's evolution for both streaming and batch processing workloads.

Apache FlinkBatch ProcessingCheckpoint

0 likes · 78 min read

Apache Flink 1.7–1.14 Release Highlights and Feature Evolution

21CTO

Oct 6, 2021 · Big Data

Building a Real-Time TB-Scale Bill Query System with Kafka, Kudu, and Presto

This article details the design and implementation of a real‑time, TB‑scale bill‑detail query platform that leverages Kafka for streaming, Debezium and Confluent Platform for change capture, Kudu for low‑latency storage, and Presto/Kylin for fast OLAP queries, while outlining deployment, integration, and future enhancements.

KafkaKuduPresto

0 likes · 19 min read

Building a Real-Time TB-Scale Bill Query System with Kafka, Kudu, and Presto

DataFunTalk

Oct 6, 2021 · Big Data

Optimizing Flink Real‑Time Computing at Bilibili: Connector Stability, SQL, Runtime, and Future Outlook

This article details Bilibili's comprehensive optimization of Flink real‑time computing, covering connector stability improvements, SQL interval‑join enhancements, runtime state and checkpoint refinements, a diagnostic tool, and future directions for high‑throughput streaming workloads.

Big DataCheckpointFlink

0 likes · 18 min read

Optimizing Flink Real‑Time Computing at Bilibili: Connector Stability, SQL, Runtime, and Future Outlook

Douyu Streaming

Sep 27, 2021 · Game Development

Douyu’s Live2D Virtual Avatar Plugin: Unity Architecture & Key Tech

This article details Douyu's virtual avatar tool built on Unity and Live2D, covering project background, core features, layered architecture, key technologies such as Unity rendering, FairyGUI UI, TCP-based IPC, model dressing and recoloring, face‑data processing, and future development plans.

Game DevelopmentIPCLive2D

0 likes · 13 min read

Cloud Native Technology Community

Sep 26, 2021 · Big Data

Apache Kafka 3.0.0 Release Summary: New Features, Improvements, Bugs, Tasks, and Tests

Apache Kafka 3.0.0, released on September 21, 2021, introduces major changes such as deprecating Java 8 and Scala 2.12, adding Raft‑based metadata quorum, stronger producer delivery guarantees, removal of old message formats, numerous performance optimizations, extensive bug fixes, and a large set of new and updated JIRA issues across features, improvements, bugs, tasks, tests, and subtasks.

ApacheBig DataKafka3.0

0 likes · 37 min read

Apache Kafka 3.0.0 Release Summary: New Features, Improvements, Bugs, Tasks, and Tests

Programmer DD

Sep 26, 2021 · Big Data

What’s New in Apache Kafka 3.0? Key Features and Improvements Explained

Apache Kafka 3.0.0 introduces a host of enhancements—including deprecated Java 8/Scala 2.12 support, Raft metadata snapshots, stronger producer guarantees, MirrorMaker 2 upgrades, and Kafka Streams improvements—while continuing to serve real‑time data pipelines and streaming applications.

Apache KafkaBig DataKafka 3.0

0 likes · 3 min read

What’s New in Apache Kafka 3.0? Key Features and Improvements Explained

IT Architects Alliance

Sep 25, 2021 · Big Data

Apache Kafka 3.0.0 Release: New Features, API Changes, and KRaft Improvements

Apache Kafka 3.0.0 introduces numerous enhancements including deprecation of Java 8 and Scala 2.12 support, KRaft metadata snapshots, stronger default producer delivery guarantees, expanded Connect and Streams APIs, updated MirrorMaker 2 configuration, and many KIP-driven feature and API changes for improved streaming and event processing.

Apache KafkaEvent ProcessingKIP

0 likes · 15 min read

Apache Kafka 3.0.0 Release: New Features, API Changes, and KRaft Improvements

Big Data Technology & Architecture

Sep 10, 2021 · Big Data

Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage

This article provides a comprehensive guide to Apache Flink's Table API and SQL, covering required dependencies, the differences between old and Blink planners, program structure, table environment creation, catalog registration, query execution, conversion between DataStream and Table, update modes, and time attribute handling, with Scala code examples throughout.

FlinkSQLScala

0 likes · 26 min read

Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage

Big Data Technology & Architecture

Sep 8, 2021 · Big Data

Understanding Flink's Memory Model: On‑Heap, Off‑Heap, and Memory Management

This article explains Flink's memory architecture, covering on‑heap and off‑heap memory concepts, garbage collection, allocation strategies, memory segments, buffers, the memory manager, and how network transmission and back‑pressure are handled to achieve efficient streaming processing.

FlinkMemory ManagementOff-Heap

0 likes · 20 min read

Understanding Flink's Memory Model: On‑Heap, Off‑Heap, and Memory Management

Big Data Technology & Architecture

Aug 24, 2021 · Big Data

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

This article provides an in-depth overview of data lake concepts, definitions, and essential features, followed by detailed case studies of enterprise data lake implementations and comparative analysis of leading data lake table formats—Iceberg, Hudi, and Delta Lake—highlighting their architectures, capabilities, and trade‑offs.

Data LakeDelta LakeFlink

0 likes · 19 min read

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

Big Data Technology & Architecture

Aug 21, 2021 · Big Data

Kafka Overview: Background, Core Concepts, Producer/Consumer Configuration, Core Principles, Operations, and Stream Processing

This article provides a comprehensive beginner-friendly guide to Apache Kafka, covering its background, core concepts, producer and consumer settings with code examples, underlying architecture, operational monitoring, integration with Spark and Flink, and an introduction to Kafka Streams.

ConsumerJavaProducer

0 likes · 19 min read

Kafka Overview: Background, Core Concepts, Producer/Consumer Configuration, Core Principles, Operations, and Stream Processing

iQIYI Technical Product Team

Aug 13, 2021 · Game Development

Improving VR Video Clarity: PPD, Tile Encoding, and Future Directions

VR video clarity suffers because the required pixels‑per‑degree far exceed what 4K or 8K spherical streams can deliver, but tile‑based encoding that decodes only the viewport, combined with low motion‑to‑photon latency, distortion control, advanced codecs and AI‑driven projection, promises sharper, lower‑bitrate 6DoF experiences.

8KLatencyPPD

0 likes · 13 min read

Improving VR Video Clarity: PPD, Tile Encoding, and Future Directions

Ctrip Technology

Aug 5, 2021 · Frontend Development

Understanding React Server Components: Concepts, Usage, and Implementation

This article explains the motivation, component types, naming conventions, runtime mechanism, streaming protocol, design goals, and practical considerations of React Server Components, illustrating how they reduce client bundle size and enable progressive server‑side rendering with code examples.

Code SplittingReactSSR

0 likes · 15 min read

Understanding React Server Components: Concepts, Usage, and Implementation

Big Data Technology & Architecture

Jul 30, 2021 · Big Data

Enterprise Big Data Platform Architecture: Insights from Taobao, Meituan, and Didi

This article examines the architecture of enterprise-level big data platforms at leading Chinese tech firms—Taobao, Meituan, and Didi—detailing their data sources, synchronization components, batch and streaming processing layers, scheduling systems, and common design patterns, while highlighting shared principles across these implementations.

Batch ProcessingEnterpriseStreaming

0 likes · 9 min read

Enterprise Big Data Platform Architecture: Insights from Taobao, Meituan, and Didi

Big Data Technology & Architecture

Jul 27, 2021 · Big Data

An Introduction to Apache Pulsar: Core Concepts, Architecture, and Key Features

Apache Pulsar is a cloud‑native distributed messaging platform that combines messaging, storage, and lightweight compute, featuring multi‑tenant support, geo‑replication, and high throughput, and this article introduces its core concepts, architecture components such as brokers, BookKeeper, ZooKeeper, and key design features.

Apache PulsarBookKeeperCloud Native

0 likes · 13 min read

An Introduction to Apache Pulsar: Core Concepts, Architecture, and Key Features

DataFunTalk

Jul 26, 2021 · Big Data

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

This article describes how SmartNews integrated Flink into its Airflow‑driven Hive batch pipeline to cut the actions table generation latency from three hours to about thirty‑four minutes, detailing the technical challenges, design decisions, and production results.

AWSBig DataFlink

0 likes · 12 min read

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

Big Data Technology & Architecture

Jul 20, 2021 · Big Data

Common Issues and Solutions for Flink CDC with MySQL

This article summarizes frequent problems encountered when using Flink CDC with MySQL—including Kafka version conflicts, checkpoint timeouts, permission errors, global lock issues, and DDL parsing failures—and provides practical configuration tweaks and code examples to resolve them.

CDCCheckpointDebezium

0 likes · 11 min read

Common Issues and Solutions for Flink CDC with MySQL

Open Source Linux

Jul 17, 2021 · Big Data

Master Kafka Basics: Topics, Partitions, Producers & Consumers Explained

This article provides a clear, visual guide to Kafka’s core concepts—including producers, consumers, topics, partitions, consumer groups, message ordering, and the underlying ZooKeeper‑managed cluster architecture—helping readers grasp how Kafka enables reliable, scalable stream processing.

Big DataConsumersPartitions

0 likes · 6 min read

Master Kafka Basics: Topics, Partitions, Producers & Consumers Explained

Big Data Technology & Architecture

Jul 9, 2021 · Big Data

Understanding Kafka: Use Cases, Reliability, Storage, Replication, Consumer Assignment, Transactions, and Exactly-Once Semantics

This article explains why Kafka is used, its buffering, decoupling, redundancy and robustness benefits, details the ack reliability levels, storage design, replica synchronization, ISR handling, consumer partition assignment strategies, transaction support, exactly‑once semantics, and why read‑write separation is not provided.

Consumer AssignmentExactly-OnceMessage Queue

0 likes · 20 min read

Understanding Kafka: Use Cases, Reliability, Storage, Replication, Consumer Assignment, Transactions, and Exactly-Once Semantics

Architect

Jul 7, 2021 · Big Data

Understanding Kafka High Availability and Resolving Consumer Offset Issues

This article explains Kafka's high‑availability architecture, including multi‑replica design, ISR synchronization, leader election, acks configuration, and how misconfigured __consumer_offset replication can cause consumer outages, offering practical steps to ensure reliable message delivery.

Consumer OffsetReplicationStreaming

0 likes · 8 min read

Understanding Kafka High Availability and Resolving Consumer Offset Issues

Architecture Digest

Jul 3, 2021 · Fundamentals

Message Exchange Patterns: Architecture and Routing

This article explains the fundamental message exchange patterns—including publish‑subscribe, fan‑out, unidirectional and bidirectional streaming—as well as routing models such as unicast, broadcast, multicast, and anycast, illustrating each with common technology examples.

MessagingStreamingmulticast

0 likes · 8 min read

Message Exchange Patterns: Architecture and Routing

DataFunTalk

Jun 29, 2021 · Big Data

In-depth Analysis of Flink SQL 1.13 Features and Improvements

This article provides a comprehensive overview of Apache Flink SQL 1.13, detailing new Window TVF support, cumulate windows, performance optimizations, time‑zone handling, enhanced Hive compatibility, SQL client upgrades, DataStream‑Table conversion improvements, and outlines the roadmap for the upcoming 1.14 release.

DataStreamFlinkHive Integration

0 likes · 15 min read

In-depth Analysis of Flink SQL 1.13 Features and Improvements

360 Tech Engineering

Jun 25, 2021 · Big Data

Introducing ULTRON: A Real‑Time Data Warehouse Platform Powered by FlinkSQL

ULTRON is a one‑stop real‑time data‑warehouse development platform built on FlinkSQL that unifies data integration, asset management, cluster deployment, modeling, ETL, OLAP analysis and governance, addressing the limitations of traditional batch‑oriented warehouses and simplifying streaming data workflows for developers.

Data GovernanceFlinkSQLStreaming

0 likes · 13 min read

Introducing ULTRON: A Real‑Time Data Warehouse Platform Powered by FlinkSQL

Alibaba Terminal Technology

Jun 25, 2021 · Frontend Development

Mastering Web Multimedia Front‑End: A Complete Beginner’s Guide

This comprehensive guide introduces multimedia front‑end development, explains W3C media standards and HTML elements, explores media APIs, outlines playback scenarios and solutions, and details both consumer‑facing live video systems and production‑side tools such as streaming and video‑editing, while sharing Alibaba’s roadmap for the field.

MultimediaStreamingmedia APIs

0 likes · 25 min read

Mastering Web Multimedia Front‑End: A Complete Beginner’s Guide

Sohu Tech Products

Jun 9, 2021 · Big Data

Real-time UV Counting with Flink, Hologres, and RoaringBitmap

This article explains how to implement both offline (T+1) and real‑time UV counting using Hologres with RoaringBitmap for high‑cardinality aggregation, and demonstrates a complete Flink‑Hologres pipeline—including table creation, streaming joins, windowed aggregation, and query examples—for fine‑grained user metric analysis.

FlinkHologresRoaringBitmap

0 likes · 11 min read

Real-time UV Counting with Flink, Hologres, and RoaringBitmap

Top Architect

Jun 8, 2021 · Backend Development

Architectural Messaging Patterns: Exchange Architectures and Routing Methods

This article explains the fundamental messaging exchange architectures such as Pub‑Sub, Fanout, Unidirectional and Bidirectional streaming, and the routing patterns including Unicast, Broadcast, Multicast and Anycast, illustrating how they are used in systems like Redis, Kafka, RabbitMQ and IBM MQ to simplify communication between producers and consumers.

FanoutMessagingStreaming

0 likes · 8 min read

Architectural Messaging Patterns: Exchange Architectures and Routing Methods

dbaplus Community

Jun 5, 2021 · Big Data

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

This article explains the concept of data lakes, outlines a four‑layer open‑source architecture, presents several classic Flink‑Iceberg use cases, details why Iceberg was chosen, and describes the design of Flink’s streaming sink and upcoming community roadmap.

Apache FlinkApache IcebergBig Data

0 likes · 14 min read

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

Big Data Technology & Architecture

Jun 3, 2021 · Big Data

Comparing Apache Pulsar and Apache Kafka: Architecture, Performance, Use Cases, and Ecosystem

This article compares Apache Pulsar and Apache Kafka across performance, architecture, features, and real‑world use cases, highlighting Pulsar’s multi‑layer design, scalability, client language support, ecosystem integrations, and operational advantages while providing detailed analysis of storage, messaging models, and community resources.

Apache PulsarCloud NativeMessage Queue

0 likes · 28 min read

Comparing Apache Pulsar and Apache Kafka: Architecture, Performance, Use Cases, and Ecosystem

ITFLY8 Architecture Home

Jun 3, 2021 · Big Data

Building a Real‑Time Flink Recommendation System: Architecture, Code & Deployment

This article walks through a complete Flink‑based recommendation system, detailing its v2.0 architecture, recommendation algorithms, front‑end and back‑end components, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink

0 likes · 10 min read

Building a Real‑Time Flink Recommendation System: Architecture, Code & Deployment

Big Data Technology & Architecture

Jun 1, 2021 · Big Data

Understanding Idle State Retention Time in Flink SQL

Flink SQL's idle state retention time feature prevents state explosion by automatically cleaning up state for keys that remain inactive beyond a configurable time window, requiring both minimum and maximum retention settings, with implementation details involving CleanupState, timers, and KeyedProcessFunctionWithCleanupState.

FlinkIdle State RetentionSQL

0 likes · 8 min read

Understanding Idle State Retention Time in Flink SQL

New Oriental Technology

May 31, 2021 · Fundamentals

Live Streaming Network Transmission: Protocols, Encoding, Decoding, and Synchronization

This article explains the end‑to‑end live‑streaming workflow, covering how a broadcaster pushes video and audio to a server, the various streaming protocols (RTMP, HTTP‑FLV, HLS, RTP), encoding formats, FFmpeg‑based decoding, hardware vs software decoding, and audio‑video synchronization techniques.

Audio-Video SyncNetwork ProtocolsStreaming

0 likes · 26 min read

Live Streaming Network Transmission: Protocols, Encoding, Decoding, and Synchronization

IT Architects Alliance

May 30, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's Flink‑based streaming ETL system, detailing business background, log classifications, specialized and generic ETL services, Python UDF integration, runtime optimizations, HDFS write tuning, SLA metrics, fault‑tolerance mechanisms, and future roadmap for unified data lakes and PyFlink support.

Big DataData IntegrationETL

0 likes · 19 min read

NetEase Game Streaming ETL Architecture and Practices Based on Flink

dbaplus Community

May 19, 2021 · Big Data

Why Kafka 2.8 Is Dropping Zookeeper: Architecture, Challenges, and the Path to KIP‑500

The article explains how Kafka 2.8 removes its dependency on Zookeeper, detailing Kafka's core concepts, the role of Zookeeper in broker registration, load balancing, and controller election, the operational drawbacks of this coupling, and how KIP‑500 with a Quorum Controller modernizes the architecture.

Distributed SystemsKIP-500Kafka

0 likes · 10 min read

Why Kafka 2.8 Is Dropping Zookeeper: Architecture, Challenges, and the Path to KIP‑500

Byte Quality Assurance Team

May 19, 2021 · Big Data

Streaming 102: The World Beyond Batch

This article extends the concepts introduced in Streaming 101 by deeply exploring data processing patterns for unbounded data, covering windowing, watermarks, triggers, accumulation modes, and their practical implications for building robust low‑latency streaming pipelines.

Big DataStreamingTriggers

0 likes · 14 min read

vivo Internet Technology

May 12, 2021 · Backend Development

Understanding RTMP Protocol and Livego Source Code Analysis

The article explains RTMP’s multiplexed, packetized streaming over TCP, detailing its chunk structure, message types, handshake, and connection workflow, then demonstrates livego’s publishing and pulling processes, discusses typical latency sources, and offers mitigation strategies and reference resources for developers.

GoLivegoRTMP

0 likes · 28 min read

Understanding RTMP Protocol and Livego Source Code Analysis

DataFunTalk

May 11, 2021 · Big Data

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

This article details Baixin Bank's construction of a Flink‑driven real‑time computing platform integrated with Hudi as a real‑time data lake, covering background, architecture, data collection, transformation, storage layers, technical challenges, future roadmap, and practical lessons for similar big‑data initiatives.

Big DataFlinkHudi

0 likes · 12 min read

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

IT Architects Alliance

May 11, 2021 · Big Data

Demystifying Kafka: Core Concepts of Topics, Partitions, and Architecture

This article provides a clear, visual walkthrough of Kafka’s fundamental architecture, explaining how producers and consumers interact, the role of topics and partitions, consumer groups, and ZooKeeper’s coordination, helping readers grasp message flow, storage, ordering, and fault‑tolerance in a distributed streaming system.

KafkaMessage QueuePartition

0 likes · 6 min read

Demystifying Kafka: Core Concepts of Topics, Partitions, and Architecture

DataFunTalk

May 2, 2021 · Big Data

Continuous Optimization and Practice of Flink at Kuaishou

This article presents Kuaishou's comprehensive engineering practices for improving Flink's stability, task startup latency, and SQL performance, including high‑availability Kafka connectors, fault‑recovery mechanisms, I/O reductions, asynchronous job upgrades, aggregation optimizations, and future resource‑utilization plans.

Big DataFlinkKafka

0 likes · 10 min read

Continuous Optimization and Practice of Flink at Kuaishou

Programmer DD

Apr 30, 2021 · Big Data

Kafka 2.8.0 Release: Say Goodbye to ZooKeeper with Raft Metadata Mode

Kafka 2.8.0, released on April 19, 2021, introduces the groundbreaking Raft Metadata mode that eliminates the need for ZooKeeper, alongside numerous new features, bug fixes, and enhancements such as API controls for stream threads, SASL_SSL mutual TLS, and IP rate limiting.

Big DataKafkaRaft

0 likes · 5 min read

Kafka 2.8.0 Release: Say Goodbye to ZooKeeper with Raft Metadata Mode

Big Data Technology & Architecture

Apr 24, 2021 · Big Data

Integrating Apache Flink 1.12.2 with Apache Hudi: Batch and Streaming Modes

This article walks through downloading the required Flink and Hudi components, building Hudi for Scala 2.12, and demonstrates step‑by‑step how to create, populate, query, and update Hudi tables in both batch and streaming modes using Flink SQL, complete with code snippets and result screenshots.

ApacheBatchData Lake

0 likes · 8 min read

Integrating Apache Flink 1.12.2 with Apache Hudi: Batch and Streaming Modes

DataFunTalk

Apr 23, 2021 · Big Data

Building and Evolving Zhihu’s Flink‑Based Data Integration Platform

This article details Zhihu’s transition from a Sqoop‑driven data integration system to a Flink‑centric platform, covering business scenarios, historical architecture, design goals, technology choices, performance optimizations, and future plans for unified streaming‑batch processing across diverse storage systems.

Batch ProcessingBig DataData Integration

0 likes · 14 min read

Building and Evolving Zhihu’s Flink‑Based Data Integration Platform

Big Data Technology & Architecture

Apr 23, 2021 · Big Data

Reading HBase with Flink 1.12 – Environment Setup, Code Samples, and Result

This article demonstrates how to configure Flink 1.12 to read data from HBase, covering the required environment components, HBase table creation, Maven dependencies, Java POJO and Flink‑SQL code, and showing the query results with and without printing the TableResult.

FlinkHBaseJava

0 likes · 11 min read

Reading HBase with Flink 1.12 – Environment Setup, Code Samples, and Result

Laravel Tech Community

Apr 22, 2021 · Big Data

Apache Kafka 2.8.0 Release Highlights and New Features

Apache Kafka 2.8.0 introduces several significant enhancements, including a new group API, mutual TLS authentication for SASL_SSL listeners, JSON request/response logging, broker connection rate limiting, topic identifiers, self‑managed quorum replacing ZooKeeper, and numerous improvements to Streams and Connect APIs for more reliable real‑time data pipelines.

Apache KafkaBig DataDistributed Systems

0 likes · 2 min read

Apache Kafka 2.8.0 Release Highlights and New Features

Tencent Cloud Developer

Apr 14, 2021 · Cloud Native

Apache Pulsar Meetup Shenzhen: Cloud-Native Distributed Messaging and Streaming Platform

The Apache Pulsar Meetup in Shenzhen on April 17, 2021, co‑hosted by Tencent Middleware and StreamNative, will showcase Pulsar’s cloud‑native messaging, streaming, and storage capabilities through sessions on KoP migration, big‑data and IoT use cases, cloud‑native deployments, and the StreamNative Cloud Pulsar‑as‑a‑Service offering.

Apache PulsarDistributed SystemsMeetup

0 likes · 7 min read

Apache Pulsar Meetup Shenzhen: Cloud-Native Distributed Messaging and Streaming Platform

ByteFE

Apr 13, 2021 · Frontend Development

Streaming Server‑Side Rendering in React: Concepts, lazy, Suspense, and Implementation

This article explains the principles of streaming server‑side rendering (SSR) in React, compares it with traditional client‑side rendering, and demonstrates how lazy loading and Suspense can be used together with streaming SSR to parallelize data and JavaScript delivery for faster first‑paint and improved user experience.

@LazyReactSSR

0 likes · 10 min read

Streaming Server‑Side Rendering in React: Concepts, lazy, Suspense, and Implementation

Programmer DD

Mar 29, 2021 · Big Data

Mastering Kafka: High‑Throughput Distributed Messaging Explained

This comprehensive guide introduces Kafka as a high‑throughput, distributed, publish‑subscribe messaging system, detailing its core concepts, architecture, features, replication, log management, reliability guarantees, and typical use cases such as log collection, real‑time analytics, and cross‑cluster mirroring.

Big DataDistributed MessagingKafka

0 likes · 15 min read

Mastering Kafka: High‑Throughput Distributed Messaging Explained

Big Data Technology & Architecture

Mar 25, 2021 · Big Data

Netflix Real-Time Analytics Architecture Using Apache Druid

The article details how Netflix collects massive real‑time device logs, streams them through Kafka into Apache Druid, and uses this high‑performance analytical database to monitor, query, and continuously improve user experience at a scale of over two million events per second.

Apache DruidNetflixReal-time analytics

0 likes · 13 min read

Netflix Real-Time Analytics Architecture Using Apache Druid

Big Data Technology & Architecture

Mar 18, 2021 · Big Data

Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues

This article details common Flink streaming problems such as data skew causing task back‑pressure, oversized Kafka messages, high‑throughput ack settings, slot removal errors, checkpoint timeouts, and resource constraints, and provides concrete configuration changes and architectural adjustments to resolve them.

CheckpointData SkewFlink

0 likes · 18 min read

Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues

Big Data Technology & Architecture

Mar 16, 2021 · Big Data

Using Flink Upsert‑Kafka Connector for Real‑Time Data Aggregation and TiDB Synchronization

This article explains the upsert‑kafka connector in Flink, its configuration parameters, step‑by‑step usage with SQL examples, and demonstrates a complete pipeline that reads Kafka streams, aggregates page view metrics, and synchronizes the results to TiDB in real time.

FlinkSQLStreaming

0 likes · 13 min read

Using Flink Upsert‑Kafka Connector for Real‑Time Data Aggregation and TiDB Synchronization

DataFunTalk

Mar 15, 2021 · Big Data

Ten Gotchas When Migrating Spark Jobs to Flink

This article shares ten practical pitfalls encountered while moving hour‑level Spark session processing jobs to Apache Flink, covering parallelism skew, state TTL, checkpoint handling, logging, debugging, state migration, Reduce vs Process, input validation, event‑time handling, and the trade‑offs of storing data inside Flink.

Big DataFlinkState Management

0 likes · 19 min read

Ten Gotchas When Migrating Spark Jobs to Flink

Big Data Technology & Architecture

Mar 15, 2021 · Big Data

Implementation and Usage of Flink FileSystem, JDBC, and Kafka Connectors

The article provides a comprehensive technical guide on Flink's FileSystem, JDBC, and Kafka connectors, detailing their source and sink implementations, core code logic, checkpoint handling, partition commit strategies, and complete SQL usage examples for streaming applications.

ConnectorFilesystemFlink

0 likes · 25 min read

Implementation and Usage of Flink FileSystem, JDBC, and Kafka Connectors

DataFunTalk

Mar 7, 2021 · Big Data

Building Stream‑Batch Integrated ETL with Flink SQL: Data Warehouse and Data Integration

This article explains how Flink SQL can be used to construct a unified stream‑batch ETL pipeline for data warehouses and data lakes, covering data integration, CDC support, streaming writes to Hive and Iceberg, and various join techniques such as regular, interval, and temporal joins.

CDCData IntegrationETL

0 likes · 20 min read

Building Stream‑Batch Integrated ETL with Flink SQL: Data Warehouse and Data Integration

360 Smart Cloud

Mar 4, 2021 · Information Security

Improving Large-Scale Regex Matching Performance with Hyperscan and Flink

This article explains how to boost the efficiency of massive regular‑expression matching by using Intel's Hyperscan library, integrating it with Apache Flink for streaming processing, and providing deployment guidelines for both private and internal environments.

FlinkSecurityStreaming

0 likes · 10 min read

Improving Large-Scale Regex Matching Performance with Hyperscan and Flink

Big Data Technology & Architecture

Mar 2, 2021 · Big Data

An Introduction to Kafka Connect: Architecture, Components, and Hands‑On Setup

This article introduces Kafka Connect, explaining its purpose as a scalable and reliable tool for moving data between Apache Kafka and external systems, detailing its core concepts, architecture, deployment modes, configuration files, and a step‑by‑step example that streams data from a file source to a file sink.

Data IntegrationETLStreaming

0 likes · 12 min read

An Introduction to Kafka Connect: Architecture, Components, and Hands‑On Setup

Laravel Tech Community

Feb 28, 2021 · Big Data

Apache Beam 2.28.0 Release Highlights and New Features

Apache Beam 2.28.0 introduces extensive Parquet support, new hash functions in BeamSQL and ZetaSQL, ApproximateDistinct via HLL, enhanced I/O connectors including SpannerIO for Numeric fields, ParquetIO schema support, KafkaTableProvider thrift, HadoopFormatIO key/value cloning skip, and various other improvements.

Apache BeamBatchBig Data

0 likes · 3 min read

Apache Beam 2.28.0 Release Highlights and New Features

dbaplus Community

Feb 23, 2021 · Big Data

How NetEase Game Teams Built a Scalable Flink‑Based Streaming ETL Platform

This article explains how NetEase games collect heterogeneous logs, design a Flink‑driven streaming ETL pipeline, handle schema‑free sources, implement Python UDFs with Jython, optimize HDFS writes, manage real‑time and offline warehouses, and share practical tuning and fault‑tolerance techniques.

ETLFlinkHive

0 likes · 22 min read

How NetEase Game Teams Built a Scalable Flink‑Based Streaming ETL Platform

DataFunTalk

Feb 22, 2021 · Big Data

Optimizing Flink Real-Time Task Resources: Memory and Message Processing Perspectives

This article explores practical methods for optimizing Flink real‑time task resources on Kubernetes, focusing on memory usage analysis via GC logs and message‑processing capacity assessment, proposing automated detection of over‑provisioned memory and CPU, and outlining a workflow for resource adjustment to reduce costs.

Big DataFlinkGC Analysis

0 likes · 18 min read

Optimizing Flink Real-Time Task Resources: Memory and Message Processing Perspectives

dbaplus Community

Feb 18, 2021 · Big Data

How JD Search Scaled Real‑Time Analytics with Flink and Doris

This article details JD Search's journey from a Storm‑based pipeline to a Flink‑driven architecture backed by Apache Doris, covering business requirements, technical challenges, design trade‑offs, performance optimizations for massive traffic spikes, and future plans for their real‑time OLAP data warehouse.

Big DataFlinkOLAP

0 likes · 12 min read

How JD Search Scaled Real‑Time Analytics with Flink and Doris

Sohu Tech Products

Feb 17, 2021 · Big Data

Dynamic Broadcast State and Data Partitioning in an Apache Flink Fraud Detection Engine

This article demonstrates how to initialize, broadcast, and dynamically update rule sets in an Apache Flink fraud detection pipeline, using BroadcastProcessFunction and MapState to achieve runtime data partitioning without recompiling, and explains the underlying data exchange patterns such as forward, hash, rebalance, and broadcast.

Apache FlinkBroadcast StateDynamic Key Function

0 likes · 11 min read

Dynamic Broadcast State and Data Partitioning in an Apache Flink Fraud Detection Engine

DataFunTalk

Feb 17, 2021 · Big Data

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

The article details Apache Iceberg 0.11.0's core enhancements—including partition changes, SortOrder, extensive Flink and Spark integrations, CDC/Upsert support, hash‑based write distribution to reduce small files, and upcoming 0.12.0 roadmap—while providing practical SQL and API examples for data‑lake practitioners.

Apache IcebergBig DataCDC

0 likes · 13 min read

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

DataFunTalk

Feb 15, 2021 · Big Data

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

This article presents Meituan's use of Flink to enable incremental data warehouse production, covering the warehouse architecture, streaming data integration evolution, real-time OLAP applications, platform design, and future directions for unified stream‑batch processing.

Big DataFlinkIncremental Processing

0 likes · 11 min read

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

Big Data Technology & Architecture

Feb 7, 2021 · Big Data

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

This guide walks through setting up Apache Zeppelin as a low‑cost, SQL‑centric development platform for Flink, covering environment preparation, installation, interpreter configuration, execution modes, verification, common pitfalls, dimension‑table joins, custom UDFs, Redis integration, and dual‑stream join techniques.

FlinkSQLStreaming

0 likes · 24 min read

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

Big Data Technology & Architecture

Feb 1, 2021 · Big Data

Flink 1.12 Enhancements: Full SQL Support, Hive Integration, and Streaming Write to Hive

The article reviews Flink 1.12's major enhancements, including comprehensive SQL capabilities, deep integration with Hive via catalog and streaming support, and a practical code example that demonstrates how to write streaming data into Hive tables while handling partition commits and small‑file merging.

Data IntegrationFlinkHive

0 likes · 7 min read

Flink 1.12 Enhancements: Full SQL Support, Hive Integration, and Streaming Write to Hive

Full-Stack Internet Architecture

Feb 1, 2021 · Big Data

Kafka Overview: Architecture, Advantages, Disadvantages, and Core Concepts

This article provides a comprehensive introduction to Apache Kafka, covering its distributed publish‑subscribe architecture, its key components such as brokers, topics, partitions, producers, consumers, and ZooKeeper, as well as its advantages, drawbacks, storage mechanisms, partition assignment strategies, and reliability guarantees for high‑throughput big‑data streaming.

Big DataDistributed SystemsMessage Queue

0 likes · 20 min read

Kafka Overview: Architecture, Advantages, Disadvantages, and Core Concepts

DataFunTalk

Feb 1, 2021 · Big Data

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points, Iceberg's table format and capabilities, Flink‑Iceberg streaming and batch processing, practical implementations, and future roadmap for data‑lake acceleration.

Apache FlinkApache IcebergBig Data

0 likes · 21 min read

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

iQIYI Technical Product Team

Jan 29, 2021 · Cloud Computing

iQIYI Introduces CUVA HDR Standard Support and Explores the Ultra‑HD Video Industry

iQIYI becomes the first video platform to deliver content meeting the China Ultra‑High‑Definition Video Industry Alliance (CUUVA) HDR standard, enabling devices like Xiaomi 10 Pro, Huawei P30 Pro, iPhone 11 Pro and XS Max to display richer colors, higher contrast, and deeper visual depth, while fostering an open, industry‑wide ecosystem and planning future CUVA HDR live‑streaming support.

CUVAHDRStreaming

0 likes · 5 min read

iQIYI Introduces CUVA HDR Standard Support and Explores the Ultra‑HD Video Industry

Hulu Beijing

Jan 25, 2021 · Product Management

What Hulu’s Generation Stream Study Reveals About the Next‑Gen TV Audience

Hulu’s Generation Stream study, conducted with Culture Co‑Op and industry experts, surveyed 2,500 U.S. viewers aged 13‑54, revealing that 90% watch TV via streaming, identifying three audience segments—‘Stream Most’, ‘Stream Only’, and ‘Stream Also’—and highlighting how streaming reshapes viewing habits and content expectations.

Generation StreamHuluStreaming

0 likes · 6 min read

What Hulu’s Generation Stream Study Reveals About the Next‑Gen TV Audience

Efficient Ops

Jan 17, 2021 · Big Data

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

This article introduces Kafka’s fundamental role as a messaging system, explains topics, partitions, producers, consumers, replicas, consumer groups, and the controller, and explores its cluster architecture, performance optimizations like sequential writes and zero-copy, providing a comprehensive overview for building scalable data pipelines.

Big DataDistributed SystemsMessage Queue

0 likes · 11 min read

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

Top Architect

Jan 17, 2021 · Big Data

Migrating LinkedIn’s Who Viewed Your Profile System from Lambda Architecture to a Lambda‑less Architecture

This article describes how LinkedIn’s Who Viewed Your Profile feature was originally built on a Lambda architecture, the operational challenges it caused, and the step‑by‑step migration to a streamlined, Samza‑driven, Lambda‑less design that improves performance, reduces maintenance overhead, and retains essential batch capabilities.

Lambda architectureLinkedInPinot

0 likes · 11 min read

Alibaba Cloud Developer

Jan 11, 2021 · Backend Development

How Streaming Output and Reactive Programming Boost Web Performance

This article explains the concepts of streaming output and reactive programming, describes the underlying HTTP chunked transfer, SSE, WebSocket and RSocket protocols, provides code examples, and outlines practical scenarios where end‑to‑end streaming improves performance and user experience.

BackendHTTP Chunked TransferSSE

0 likes · 18 min read

How Streaming Output and Reactive Programming Boost Web Performance

Big Data Technology & Architecture

Jan 10, 2021 · Big Data

Integrating Apache Flink 1.12 with Hive: Configuration, Catalog, Planner, and UDF Usage

This guide explains how to integrate Flink 1.12 with Hive using HiveCatalog, covering required dependencies, Blink planner configuration, SQL dialect switching, Hive UDF support, temporal table joins, and provides complete code snippets for a streaming‑batch unified data warehouse solution.

Blink PlannerFlinkHive

0 likes · 16 min read

Integrating Apache Flink 1.12 with Hive: Configuration, Catalog, Planner, and UDF Usage

Big Data Technology & Architecture

Jan 9, 2021 · Big Data

Comprehensive 2021 Flink Interview Questions and Answers

This article presents a detailed collection of 2021 Flink interview questions covering checkpoint mechanisms, watermarks, state backends, join types, fault tolerance, resource configuration, and recent Flink 1.10 features, providing concise explanations and code examples for each topic.

CheckpointFlinkState Backend

0 likes · 23 min read

Comprehensive 2021 Flink Interview Questions and Answers

Big Data Technology & Architecture

Jan 5, 2021 · Big Data

Setting Up Apache Spark Standalone with Docker and Using Apache Zeppelin for Data Processing

This guide demonstrates how to build a Docker‑based Spark standalone environment, configure Apache Zeppelin to connect to it, and perform data analysis on local CSV files, HDFS, and streaming sources such as Twitter and Kafka, with complete code examples.

Apache ZeppelinDockerScala

0 likes · 10 min read

Setting Up Apache Spark Standalone with Docker and Using Apache Zeppelin for Data Processing

58 Tech

Jan 4, 2021 · Big Data

Building a Real‑Time Data Warehouse with Flink: Architecture, Implementation and Lessons Learned

This article describes how a fast‑growing company built a layered real‑time data warehouse on Flink, detailing the evolution from a simple 1.0 pipeline to a 2.0 architecture with ODS, DWD and ADS layers, dimension joins, exactly‑once sinks, HDFS partitioning, monitoring, and future improvements.

Big DataETLFlink

0 likes · 14 min read

Building a Real‑Time Data Warehouse with Flink: Architecture, Implementation and Lessons Learned

Big Data Technology & Architecture

Dec 29, 2020 · Databases

Setting Up and Using the MySQL CDC Connector with Apache Flink

This article provides a step‑by‑step guide on configuring the MySQL CDC connector for Flink, covering Maven and SQL client dependencies, MySQL user setup, connector options, table creation via SQL and Stream API, key features, common issues, and practical troubleshooting tips.

CDCConnectorFlink

0 likes · 10 min read

Setting Up and Using the MySQL CDC Connector with Apache Flink

Big Data Technology & Architecture

Dec 25, 2020 · Big Data

Implementing Custom Source and Sink in Flink Streaming with RocketMQ and HBase

This article details how to migrate Spark Streaming jobs to Flink Streaming by creating custom SourceFunction and SinkFunction implementations, including a RocketMQ source connector and an HBase sink, with code examples, configuration tips, and discussion of checkpointing and watermark handling.

FlinkHBaseRocketMQ

0 likes · 20 min read

Implementing Custom Source and Sink in Flink Streaming with RocketMQ and HBase

iQIYI Technical Product Team

Dec 11, 2020 · Fundamentals

Analysis of the MSU World Video Codec Competition and the Current State of the AV1 Ecosystem

The MSU World Video Codec Competition highlighted iQIYI’s QAV1 encoder achieving faster speeds and superior compression compared to H.265, while the expanding AV1 ecosystem—bolstered by widespread hardware decoding, platform adoption, and recent QAV1 enhancements such as 8K, HDR, and improved rate‑distortion optimization—promises higher quality video at lower bandwidth costs.

AV1Codec CompetitionQAV1

0 likes · 7 min read

Analysis of the MSU World Video Codec Competition and the Current State of the AV1 Ecosystem

Programmer DD

Dec 9, 2020 · Big Data

Master Apache Beam: Build a Portable Word Count Pipeline in Minutes

This tutorial introduces Apache Beam’s unified programming model for batch and streaming, explains its core concepts and terminology, compares it with other runners, and walks through a complete Java word‑count example—including dependencies, pipeline construction, transforms, and execution with DirectRunner.

Apache BeamDataflowDistributed Processing

0 likes · 8 min read

Master Apache Beam: Build a Portable Word Count Pipeline in Minutes

DataFunTalk

Dec 6, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: Overview of AI Flow and Its Architecture

This article explains how Flink enables end‑to‑end machine‑learning workflows through AI Flow, covering the background of Lambda architecture, AI task stages, the advantages of Flink, AI Flow components, AI Graph concepts, integration with Python and TensorFlow, and a real‑world advertising recommendation use case.

AI FlowFlinkReal-Time

0 likes · 14 min read

Building an AI Ecosystem with Flink: Overview of AI Flow and Its Architecture

DataFunTalk

Dec 3, 2020 · Big Data

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

This article explains how Apache Flink integrates with data lake architectures, especially using Apache Iceberg as a table format, to enable real‑time streaming ingestion, CDC processing, near‑real‑time lambda architectures, and future enhancements like automatic file merging and row‑level deletes.

Apache IcebergData LakeFlink

0 likes · 13 min read

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

DataFunSummit

Dec 1, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications

This article explains how Flink enables end‑to‑end AI workflows through the AI Flow platform, covering the Lambda architecture background, AI task pipeline stages, the reasons for choosing Flink, AI Flow’s graph model, core services, integration with ML pipelines, and real‑world advertising recommendation use cases.

AI FlowAI PipelineBig Data

0 likes · 12 min read

Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications

Java High-Performance Architecture

Nov 18, 2020 · Big Data

Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks

This article examines Apache Pulsar, an open‑source messaging platform created by Yahoo, compares it with Kafka by outlining Kafka’s common pain points, highlights Pulsar’s multi‑tenant architecture, layered storage, built‑in functions, and security features, and discusses the trade‑offs of each solution.

Apache PulsarBig DataDistributed Systems

0 likes · 6 min read

Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks

DataFunTalk

Nov 17, 2020 · Artificial Intelligence

Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, explains its core algorithms, performance comparison with Spark ML, version‑wise feature evolution, and provides practical quick‑start instructions for both Java (Maven) and Python (PyAlink) users, including data source handling, type conversion components, unified file‑system operations, and an overview of its FM algorithm implementation.

AlinkBatch ProcessingData Integration

0 likes · 13 min read

Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide

Big Data Technology & Architecture

Nov 16, 2020 · Big Data

Understanding Spark Streaming Backpressure Mechanism and Source Code Analysis

This article explains why Spark Streaming introduced backpressure, how the dynamic rate‑control mechanism works, and provides a detailed walkthrough of the relevant source code, including the RateController class, its registration, and the execution flow that adjusts ingestion rates to match processing capacity.

RateControllerRateLimiterSpark

0 likes · 14 min read

Understanding Spark Streaming Backpressure Mechanism and Source Code Analysis

DataFunSummit

Nov 10, 2020 · Artificial Intelligence

Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, detailing its core algorithms, performance advantages over Spark ML, version evolution, Maven and PyAlink installation steps, data‑source integrations, FM algorithm support, and unified file‑system operations for both batch and streaming workloads.

AlinkFlinkPyAlink

0 likes · 11 min read

Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide

Top Architect

Nov 9, 2020 · Cloud Computing

Design Analysis of Netflix's Cloud‑Based Microservices Architecture

This article examines how Netflix migrated its video‑streaming platform to AWS, adopted a microservices architecture, and built a global CDN, detailing the system’s components, design goals such as high availability, low latency and scalability, and the trade‑offs and resilience techniques employed.

AWSNetflixStreaming

0 likes · 23 min read

Design Analysis of Netflix's Cloud‑Based Microservices Architecture

360 Tech Engineering

Nov 6, 2020 · Big Data

Guide to Flink SQL: Features, Scenarios, and Productization

Flink SQL, the high‑level SQL interface for Apache Flink, offers language‑independent, dependency‑free, easy‑to‑use stream processing with advanced features such as DDL, UDFs, time semantics, windowing, pattern matching, and built‑in connectors, supporting data synchronization, batch‑stream fusion, Hive integration, and various product enhancements.

Data IntegrationFlinkHive

0 likes · 11 min read

Guide to Flink SQL: Features, Scenarios, and Productization

DataFunTalk

Nov 1, 2020 · Big Data

Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse

The article explains how Flink 1.11 deepens its integration with Hive, covering background, new connector features, simplified dependency management, enhanced Hive dialect, streaming writes and reads, temporal table joins, and how these capabilities enable a unified batch‑streaming data warehouse.

Batch‑Streaming IntegrationData WarehouseFlink

0 likes · 16 min read

Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse

MaGe Linux Operations

Oct 29, 2020 · Backend Development

Master HTTP Requests with Python httpx: GET, POST, PUT, Streaming & More

This guide walks you through using the Python httpx library to perform various HTTP methods—including GET, POST, PUT, DELETE, HEAD, and OPTIONS—handle query parameters, decode responses, work with JSON, custom headers, form data, file uploads, streaming, cookies, redirects, and authentication, all with clear code examples.

APIAuthenticationHTTP

0 likes · 10 min read

Master HTTP Requests with Python httpx: GET, POST, PUT, Streaming & More

DataFunTalk

Oct 29, 2020 · Big Data

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

Lyft transformed its legacy data pipeline by designing a cloud‑native, Flink‑based near real‑time analytics platform that ingests billions of events, writes Parquet files to S3, leverages Presto for interactive queries, and implements multi‑stage non‑blocking ETL, fault‑tolerant back‑fill, and extensive performance optimizations.

AWSData LakeETL

0 likes · 12 min read

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

ITPUB

Oct 16, 2020 · Big Data

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

This article details NetEase Cloud Music's evolution of a real‑time data warehouse built on Flink 1.9 and Calcite, covering platform scale, architectural design, metadata management, SDK simplifications, monitoring improvements, and concrete use cases such as AB‑testing, live reporting, and feature serving.

Big DataCalciteFlink

0 likes · 8 min read

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite