Tagged articles
560 articles
Page 4 of 6
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Oct 29, 2021 · Cloud Native

RocketMQ 5.0 Overview: A Cloud‑Native Messaging, Event and Stream Fusion Platform

This article reviews the evolution of RocketMQ from its early MetaQ roots through the 4.x releases, explains the motivations behind RocketMQ 5.0, and details its cloud‑native architecture, lightweight SDK, storage‑compute separation, POP consumption model, elastic scaling, and the upcoming RocketMQ Streams framework.

Distributed SystemsMessage QueueRocketMQ
0 likes · 18 min read
RocketMQ 5.0 Overview: A Cloud‑Native Messaging, Event and Stream Fusion Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 29, 2021 · Big Data

Dimension Table Join Strategies in Apache Flink: Preload, Distributed Cache, Hot Storage, Broadcast, and Temporal Table Function

The article explains various dimension‑table join approaches in Apache Flink, including preloading tables into memory, using distributed cache, leveraging hot storage with async I/O, broadcasting state, and temporal table function joins, and compares their trade‑offs for different data volumes and update frequencies.

Dimension TableFlinkJOIN
0 likes · 10 min read
Dimension Table Join Strategies in Apache Flink: Preload, Distributed Cache, Hot Storage, Broadcast, and Temporal Table Function
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 26, 2021 · Big Data

Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse

This article shares practical insights on designing and operating a real‑time clickstream data warehouse using Flink for streaming processing and ClickHouse for near‑real‑time OLAP, covering dimensional modeling, layered architecture, Flink‑ClickHouse sink implementation, and data rebalancing strategies.

ClickHouseData WarehouseFlink
0 likes · 10 min read
Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 21, 2021 · Big Data

Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal

This article provides a detailed comparison of three popular open‑source change data capture tools—Debezium, Flink CDC, and Canal—covering their underlying principles, architecture, deployment options, performance characteristics, and suitability for real‑time data synchronization in big‑data environments.

CDCCanalChange Data Capture
0 likes · 21 min read
Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal
Tencent Cloud Developer
Tencent Cloud Developer
Oct 19, 2021 · Backend Development

Comprehensive Guide to gRPC Communication with Go and PHP: Protobuf, Streaming, TLS, and Timeout

This comprehensive guide walks you through creating a gRPC user service in Go and PHP, from defining protobuf messages and generating code, implementing server and client stubs, adding client, server, and bidirectional streaming, securing communication with TLS certificates, and managing request deadlines with timeout controls.

GoPHPStreaming
0 likes · 33 min read
Comprehensive Guide to gRPC Communication with Go and PHP: Protobuf, Streaming, TLS, and Timeout
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 12, 2021 · Big Data

Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide

This article explores the evolution of data lakes, compares major cloud providers' lake architectures, introduces the emerging lakehouse concept, and provides a step‑by‑step Flink‑Iceberg implementation—including dependencies, catalog setup, table creation, checkpointing, and Kafka ingestion—demonstrating practical big‑data streaming solutions.

Data LakeFlinkIceberg
0 likes · 14 min read
Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide
Big Data Technology Architecture
Big Data Technology Architecture
Oct 9, 2021 · Big Data

Apache Kafka 3.0 Release Highlights and New Features

Apache Kafka 3.0 introduces major enhancements including KRaft consensus, deprecation of Java 8 and Scala 2.12 support, stronger producer guarantees, updated APIs, improved Kafka Connect, MirrorMaker 2 flexibility, and numerous KIP-driven feature upgrades, marking a significant step forward for the distributed streaming platform.

KafkaKafka 3.0Streaming
0 likes · 13 min read
Apache Kafka 3.0 Release Highlights and New Features
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 9, 2021 · Big Data

Apache Flink 1.7–1.14 Release Highlights and Feature Evolution

This article provides a comprehensive overview of Apache Flink's major releases from version 1.7 to 1.14, detailing new APIs, state management improvements, Kubernetes integration, SQL and Table API enhancements, checkpointing advances, and performance optimizations that together illustrate the platform's evolution for both streaming and batch processing workloads.

Apache FlinkBatch ProcessingCheckpoint
0 likes · 78 min read
Apache Flink 1.7–1.14 Release Highlights and Feature Evolution
21CTO
21CTO
Oct 6, 2021 · Big Data

Building a Real-Time TB-Scale Bill Query System with Kafka, Kudu, and Presto

This article details the design and implementation of a real‑time, TB‑scale bill‑detail query platform that leverages Kafka for streaming, Debezium and Confluent Platform for change capture, Kudu for low‑latency storage, and Presto/Kylin for fast OLAP queries, while outlining deployment, integration, and future enhancements.

KafkaKuduPresto
0 likes · 19 min read
Building a Real-Time TB-Scale Bill Query System with Kafka, Kudu, and Presto
Douyu Streaming
Douyu Streaming
Sep 27, 2021 · Game Development

Douyu’s Live2D Virtual Avatar Plugin: Unity Architecture & Key Tech

This article details Douyu's virtual avatar tool built on Unity and Live2D, covering project background, core features, layered architecture, key technologies such as Unity rendering, FairyGUI UI, TCP-based IPC, model dressing and recoloring, face‑data processing, and future development plans.

Game DevelopmentIPCLive2D
0 likes · 13 min read
Douyu’s Live2D Virtual Avatar Plugin: Unity Architecture & Key Tech
Cloud Native Technology Community
Cloud Native Technology Community
Sep 26, 2021 · Big Data

Apache Kafka 3.0.0 Release Summary: New Features, Improvements, Bugs, Tasks, and Tests

Apache Kafka 3.0.0, released on September 21, 2021, introduces major changes such as deprecating Java 8 and Scala 2.12, adding Raft‑based metadata quorum, stronger producer delivery guarantees, removal of old message formats, numerous performance optimizations, extensive bug fixes, and a large set of new and updated JIRA issues across features, improvements, bugs, tasks, tests, and subtasks.

ApacheBig DataKafka3.0
0 likes · 37 min read
Apache Kafka 3.0.0 Release Summary: New Features, Improvements, Bugs, Tasks, and Tests
Programmer DD
Programmer DD
Sep 26, 2021 · Big Data

What’s New in Apache Kafka 3.0? Key Features and Improvements Explained

Apache Kafka 3.0.0 introduces a host of enhancements—including deprecated Java 8/Scala 2.12 support, Raft metadata snapshots, stronger producer guarantees, MirrorMaker 2 upgrades, and Kafka Streams improvements—while continuing to serve real‑time data pipelines and streaming applications.

Apache KafkaBig DataKafka 3.0
0 likes · 3 min read
What’s New in Apache Kafka 3.0? Key Features and Improvements Explained
IT Architects Alliance
IT Architects Alliance
Sep 25, 2021 · Big Data

Apache Kafka 3.0.0 Release: New Features, API Changes, and KRaft Improvements

Apache Kafka 3.0.0 introduces numerous enhancements including deprecation of Java 8 and Scala 2.12 support, KRaft metadata snapshots, stronger default producer delivery guarantees, expanded Connect and Streams APIs, updated MirrorMaker 2 configuration, and many KIP-driven feature and API changes for improved streaming and event processing.

Apache KafkaEvent ProcessingKIP
0 likes · 15 min read
Apache Kafka 3.0.0 Release: New Features, API Changes, and KRaft Improvements
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 10, 2021 · Big Data

Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage

This article provides a comprehensive guide to Apache Flink's Table API and SQL, covering required dependencies, the differences between old and Blink planners, program structure, table environment creation, catalog registration, query execution, conversion between DataStream and Table, update modes, and time attribute handling, with Scala code examples throughout.

FlinkSQLScala
0 likes · 26 min read
Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 24, 2021 · Big Data

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

This article provides an in-depth overview of data lake concepts, definitions, and essential features, followed by detailed case studies of enterprise data lake implementations and comparative analysis of leading data lake table formats—Iceberg, Hudi, and Delta Lake—highlighting their architectures, capabilities, and trade‑offs.

Data LakeDelta LakeFlink
0 likes · 19 min read
Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 21, 2021 · Big Data

Kafka Overview: Background, Core Concepts, Producer/Consumer Configuration, Core Principles, Operations, and Stream Processing

This article provides a comprehensive beginner-friendly guide to Apache Kafka, covering its background, core concepts, producer and consumer settings with code examples, underlying architecture, operational monitoring, integration with Spark and Flink, and an introduction to Kafka Streams.

ConsumerJavaProducer
0 likes · 19 min read
Kafka Overview: Background, Core Concepts, Producer/Consumer Configuration, Core Principles, Operations, and Stream Processing
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 13, 2021 · Game Development

Improving VR Video Clarity: PPD, Tile Encoding, and Future Directions

VR video clarity suffers because the required pixels‑per‑degree far exceed what 4K or 8K spherical streams can deliver, but tile‑based encoding that decodes only the viewport, combined with low motion‑to‑photon latency, distortion control, advanced codecs and AI‑driven projection, promises sharper, lower‑bitrate 6DoF experiences.

8KLatencyPPD
0 likes · 13 min read
Improving VR Video Clarity: PPD, Tile Encoding, and Future Directions
Ctrip Technology
Ctrip Technology
Aug 5, 2021 · Frontend Development

Understanding React Server Components: Concepts, Usage, and Implementation

This article explains the motivation, component types, naming conventions, runtime mechanism, streaming protocol, design goals, and practical considerations of React Server Components, illustrating how they reduce client bundle size and enable progressive server‑side rendering with code examples.

Code SplittingReactSSR
0 likes · 15 min read
Understanding React Server Components: Concepts, Usage, and Implementation
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 30, 2021 · Big Data

Enterprise Big Data Platform Architecture: Insights from Taobao, Meituan, and Didi

This article examines the architecture of enterprise-level big data platforms at leading Chinese tech firms—Taobao, Meituan, and Didi—detailing their data sources, synchronization components, batch and streaming processing layers, scheduling systems, and common design patterns, while highlighting shared principles across these implementations.

Batch ProcessingEnterpriseStreaming
0 likes · 9 min read
Enterprise Big Data Platform Architecture: Insights from Taobao, Meituan, and Didi
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 27, 2021 · Big Data

An Introduction to Apache Pulsar: Core Concepts, Architecture, and Key Features

Apache Pulsar is a cloud‑native distributed messaging platform that combines messaging, storage, and lightweight compute, featuring multi‑tenant support, geo‑replication, and high throughput, and this article introduces its core concepts, architecture components such as brokers, BookKeeper, ZooKeeper, and key design features.

Apache PulsarBookKeeperCloud Native
0 likes · 13 min read
An Introduction to Apache Pulsar: Core Concepts, Architecture, and Key Features
DataFunTalk
DataFunTalk
Jul 26, 2021 · Big Data

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

This article describes how SmartNews integrated Flink into its Airflow‑driven Hive batch pipeline to cut the actions table generation latency from three hours to about thirty‑four minutes, detailing the technical challenges, design decisions, and production results.

AWSBig DataFlink
0 likes · 12 min read
Accelerating Hive Daily Tables with Flink: A SmartNews Case Study
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 20, 2021 · Big Data

Common Issues and Solutions for Flink CDC with MySQL

This article summarizes frequent problems encountered when using Flink CDC with MySQL—including Kafka version conflicts, checkpoint timeouts, permission errors, global lock issues, and DDL parsing failures—and provides practical configuration tweaks and code examples to resolve them.

CDCCheckpointDebezium
0 likes · 11 min read
Common Issues and Solutions for Flink CDC with MySQL
Open Source Linux
Open Source Linux
Jul 17, 2021 · Big Data

Master Kafka Basics: Topics, Partitions, Producers & Consumers Explained

This article provides a clear, visual guide to Kafka’s core concepts—including producers, consumers, topics, partitions, consumer groups, message ordering, and the underlying ZooKeeper‑managed cluster architecture—helping readers grasp how Kafka enables reliable, scalable stream processing.

Big DataConsumersPartitions
0 likes · 6 min read
Master Kafka Basics: Topics, Partitions, Producers & Consumers Explained
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 9, 2021 · Big Data

Understanding Kafka: Use Cases, Reliability, Storage, Replication, Consumer Assignment, Transactions, and Exactly-Once Semantics

This article explains why Kafka is used, its buffering, decoupling, redundancy and robustness benefits, details the ack reliability levels, storage design, replica synchronization, ISR handling, consumer partition assignment strategies, transaction support, exactly‑once semantics, and why read‑write separation is not provided.

Consumer AssignmentExactly-OnceMessage Queue
0 likes · 20 min read
Understanding Kafka: Use Cases, Reliability, Storage, Replication, Consumer Assignment, Transactions, and Exactly-Once Semantics
Architect
Architect
Jul 7, 2021 · Big Data

Understanding Kafka High Availability and Resolving Consumer Offset Issues

This article explains Kafka's high‑availability architecture, including multi‑replica design, ISR synchronization, leader election, acks configuration, and how misconfigured __consumer_offset replication can cause consumer outages, offering practical steps to ensure reliable message delivery.

Consumer OffsetReplicationStreaming
0 likes · 8 min read
Understanding Kafka High Availability and Resolving Consumer Offset Issues
Architecture Digest
Architecture Digest
Jul 3, 2021 · Fundamentals

Message Exchange Patterns: Architecture and Routing

This article explains the fundamental message exchange patterns—including publish‑subscribe, fan‑out, unidirectional and bidirectional streaming—as well as routing models such as unicast, broadcast, multicast, and anycast, illustrating each with common technology examples.

MessagingStreamingmulticast
0 likes · 8 min read
Message Exchange Patterns: Architecture and Routing
DataFunTalk
DataFunTalk
Jun 29, 2021 · Big Data

In-depth Analysis of Flink SQL 1.13 Features and Improvements

This article provides a comprehensive overview of Apache Flink SQL 1.13, detailing new Window TVF support, cumulate windows, performance optimizations, time‑zone handling, enhanced Hive compatibility, SQL client upgrades, DataStream‑Table conversion improvements, and outlines the roadmap for the upcoming 1.14 release.

DataStreamFlinkHive Integration
0 likes · 15 min read
In-depth Analysis of Flink SQL 1.13 Features and Improvements
360 Tech Engineering
360 Tech Engineering
Jun 25, 2021 · Big Data

Introducing ULTRON: A Real‑Time Data Warehouse Platform Powered by FlinkSQL

ULTRON is a one‑stop real‑time data‑warehouse development platform built on FlinkSQL that unifies data integration, asset management, cluster deployment, modeling, ETL, OLAP analysis and governance, addressing the limitations of traditional batch‑oriented warehouses and simplifying streaming data workflows for developers.

Data GovernanceFlinkSQLStreaming
0 likes · 13 min read
Introducing ULTRON: A Real‑Time Data Warehouse Platform Powered by FlinkSQL
Alibaba Terminal Technology
Alibaba Terminal Technology
Jun 25, 2021 · Frontend Development

Mastering Web Multimedia Front‑End: A Complete Beginner’s Guide

This comprehensive guide introduces multimedia front‑end development, explains W3C media standards and HTML elements, explores media APIs, outlines playback scenarios and solutions, and details both consumer‑facing live video systems and production‑side tools such as streaming and video‑editing, while sharing Alibaba’s roadmap for the field.

MultimediaStreamingmedia APIs
0 likes · 25 min read
Mastering Web Multimedia Front‑End: A Complete Beginner’s Guide
Sohu Tech Products
Sohu Tech Products
Jun 9, 2021 · Big Data

Real-time UV Counting with Flink, Hologres, and RoaringBitmap

This article explains how to implement both offline (T+1) and real‑time UV counting using Hologres with RoaringBitmap for high‑cardinality aggregation, and demonstrates a complete Flink‑Hologres pipeline—including table creation, streaming joins, windowed aggregation, and query examples—for fine‑grained user metric analysis.

FlinkHologresRoaringBitmap
0 likes · 11 min read
Real-time UV Counting with Flink, Hologres, and RoaringBitmap
Top Architect
Top Architect
Jun 8, 2021 · Backend Development

Architectural Messaging Patterns: Exchange Architectures and Routing Methods

This article explains the fundamental messaging exchange architectures such as Pub‑Sub, Fanout, Unidirectional and Bidirectional streaming, and the routing patterns including Unicast, Broadcast, Multicast and Anycast, illustrating how they are used in systems like Redis, Kafka, RabbitMQ and IBM MQ to simplify communication between producers and consumers.

FanoutMessagingStreaming
0 likes · 8 min read
Architectural Messaging Patterns: Exchange Architectures and Routing Methods
dbaplus Community
dbaplus Community
Jun 5, 2021 · Big Data

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

This article explains the concept of data lakes, outlines a four‑layer open‑source architecture, presents several classic Flink‑Iceberg use cases, details why Iceberg was chosen, and describes the design of Flink’s streaming sink and upcoming community roadmap.

Apache FlinkApache IcebergBig Data
0 likes · 14 min read
How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 3, 2021 · Big Data

Comparing Apache Pulsar and Apache Kafka: Architecture, Performance, Use Cases, and Ecosystem

This article compares Apache Pulsar and Apache Kafka across performance, architecture, features, and real‑world use cases, highlighting Pulsar’s multi‑layer design, scalability, client language support, ecosystem integrations, and operational advantages while providing detailed analysis of storage, messaging models, and community resources.

Apache PulsarCloud NativeMessage Queue
0 likes · 28 min read
Comparing Apache Pulsar and Apache Kafka: Architecture, Performance, Use Cases, and Ecosystem
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 1, 2021 · Big Data

Understanding Idle State Retention Time in Flink SQL

Flink SQL's idle state retention time feature prevents state explosion by automatically cleaning up state for keys that remain inactive beyond a configurable time window, requiring both minimum and maximum retention settings, with implementation details involving CleanupState, timers, and KeyedProcessFunctionWithCleanupState.

FlinkIdle State RetentionSQL
0 likes · 8 min read
Understanding Idle State Retention Time in Flink SQL
New Oriental Technology
New Oriental Technology
May 31, 2021 · Fundamentals

Live Streaming Network Transmission: Protocols, Encoding, Decoding, and Synchronization

This article explains the end‑to‑end live‑streaming workflow, covering how a broadcaster pushes video and audio to a server, the various streaming protocols (RTMP, HTTP‑FLV, HLS, RTP), encoding formats, FFmpeg‑based decoding, hardware vs software decoding, and audio‑video synchronization techniques.

Audio-Video SyncNetwork ProtocolsStreaming
0 likes · 26 min read
Live Streaming Network Transmission: Protocols, Encoding, Decoding, and Synchronization
IT Architects Alliance
IT Architects Alliance
May 30, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's Flink‑based streaming ETL system, detailing business background, log classifications, specialized and generic ETL services, Python UDF integration, runtime optimizations, HDFS write tuning, SLA metrics, fault‑tolerance mechanisms, and future roadmap for unified data lakes and PyFlink support.

Big DataData IntegrationETL
0 likes · 19 min read
NetEase Game Streaming ETL Architecture and Practices Based on Flink
Byte Quality Assurance Team
Byte Quality Assurance Team
May 19, 2021 · Big Data

Streaming 102: The World Beyond Batch

This article extends the concepts introduced in Streaming 101 by deeply exploring data processing patterns for unbounded data, covering windowing, watermarks, triggers, accumulation modes, and their practical implications for building robust low‑latency streaming pipelines.

Big DataStreamingTriggers
0 likes · 14 min read
Streaming 102: The World Beyond Batch
vivo Internet Technology
vivo Internet Technology
May 12, 2021 · Backend Development

Understanding RTMP Protocol and Livego Source Code Analysis

The article explains RTMP’s multiplexed, packetized streaming over TCP, detailing its chunk structure, message types, handshake, and connection workflow, then demonstrates livego’s publishing and pulling processes, discusses typical latency sources, and offers mitigation strategies and reference resources for developers.

GoLivegoRTMP
0 likes · 28 min read
Understanding RTMP Protocol and Livego Source Code Analysis
DataFunTalk
DataFunTalk
May 11, 2021 · Big Data

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

This article details Baixin Bank's construction of a Flink‑driven real‑time computing platform integrated with Hudi as a real‑time data lake, covering background, architecture, data collection, transformation, storage layers, technical challenges, future roadmap, and practical lessons for similar big‑data initiatives.

Big DataFlinkHudi
0 likes · 12 min read
Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake
IT Architects Alliance
IT Architects Alliance
May 11, 2021 · Big Data

Demystifying Kafka: Core Concepts of Topics, Partitions, and Architecture

This article provides a clear, visual walkthrough of Kafka’s fundamental architecture, explaining how producers and consumers interact, the role of topics and partitions, consumer groups, and ZooKeeper’s coordination, helping readers grasp message flow, storage, ordering, and fault‑tolerance in a distributed streaming system.

KafkaMessage QueuePartition
0 likes · 6 min read
Demystifying Kafka: Core Concepts of Topics, Partitions, and Architecture
DataFunTalk
DataFunTalk
May 2, 2021 · Big Data

Continuous Optimization and Practice of Flink at Kuaishou

This article presents Kuaishou's comprehensive engineering practices for improving Flink's stability, task startup latency, and SQL performance, including high‑availability Kafka connectors, fault‑recovery mechanisms, I/O reductions, asynchronous job upgrades, aggregation optimizations, and future resource‑utilization plans.

Big DataFlinkKafka
0 likes · 10 min read
Continuous Optimization and Practice of Flink at Kuaishou
Programmer DD
Programmer DD
Apr 30, 2021 · Big Data

Kafka 2.8.0 Release: Say Goodbye to ZooKeeper with Raft Metadata Mode

Kafka 2.8.0, released on April 19, 2021, introduces the groundbreaking Raft Metadata mode that eliminates the need for ZooKeeper, alongside numerous new features, bug fixes, and enhancements such as API controls for stream threads, SASL_SSL mutual TLS, and IP rate limiting.

Big DataKafkaRaft
0 likes · 5 min read
Kafka 2.8.0 Release: Say Goodbye to ZooKeeper with Raft Metadata Mode
DataFunTalk
DataFunTalk
Apr 23, 2021 · Big Data

Building and Evolving Zhihu’s Flink‑Based Data Integration Platform

This article details Zhihu’s transition from a Sqoop‑driven data integration system to a Flink‑centric platform, covering business scenarios, historical architecture, design goals, technology choices, performance optimizations, and future plans for unified streaming‑batch processing across diverse storage systems.

Batch ProcessingBig DataData Integration
0 likes · 14 min read
Building and Evolving Zhihu’s Flink‑Based Data Integration Platform
Laravel Tech Community
Laravel Tech Community
Apr 22, 2021 · Big Data

Apache Kafka 2.8.0 Release Highlights and New Features

Apache Kafka 2.8.0 introduces several significant enhancements, including a new group API, mutual TLS authentication for SASL_SSL listeners, JSON request/response logging, broker connection rate limiting, topic identifiers, self‑managed quorum replacing ZooKeeper, and numerous improvements to Streams and Connect APIs for more reliable real‑time data pipelines.

Apache KafkaBig DataDistributed Systems
0 likes · 2 min read
Apache Kafka 2.8.0 Release Highlights and New Features
Tencent Cloud Developer
Tencent Cloud Developer
Apr 14, 2021 · Cloud Native

Apache Pulsar Meetup Shenzhen: Cloud-Native Distributed Messaging and Streaming Platform

The Apache Pulsar Meetup in Shenzhen on April 17, 2021, co‑hosted by Tencent Middleware and StreamNative, will showcase Pulsar’s cloud‑native messaging, streaming, and storage capabilities through sessions on KoP migration, big‑data and IoT use cases, cloud‑native deployments, and the StreamNative Cloud Pulsar‑as‑a‑Service offering.

Apache PulsarDistributed SystemsMeetup
0 likes · 7 min read
Apache Pulsar Meetup Shenzhen: Cloud-Native Distributed Messaging and Streaming Platform
ByteFE
ByteFE
Apr 13, 2021 · Frontend Development

Streaming Server‑Side Rendering in React: Concepts, lazy, Suspense, and Implementation

This article explains the principles of streaming server‑side rendering (SSR) in React, compares it with traditional client‑side rendering, and demonstrates how lazy loading and Suspense can be used together with streaming SSR to parallelize data and JavaScript delivery for faster first‑paint and improved user experience.

@LazyReactSSR
0 likes · 10 min read
Streaming Server‑Side Rendering in React: Concepts, lazy, Suspense, and Implementation
Programmer DD
Programmer DD
Mar 29, 2021 · Big Data

Mastering Kafka: High‑Throughput Distributed Messaging Explained

This comprehensive guide introduces Kafka as a high‑throughput, distributed, publish‑subscribe messaging system, detailing its core concepts, architecture, features, replication, log management, reliability guarantees, and typical use cases such as log collection, real‑time analytics, and cross‑cluster mirroring.

Big DataDistributed MessagingKafka
0 likes · 15 min read
Mastering Kafka: High‑Throughput Distributed Messaging Explained
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 18, 2021 · Big Data

Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues

This article details common Flink streaming problems such as data skew causing task back‑pressure, oversized Kafka messages, high‑throughput ack settings, slot removal errors, checkpoint timeouts, and resource constraints, and provides concrete configuration changes and architectural adjustments to resolve them.

CheckpointData SkewFlink
0 likes · 18 min read
Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues
DataFunTalk
DataFunTalk
Mar 15, 2021 · Big Data

Ten Gotchas When Migrating Spark Jobs to Flink

This article shares ten practical pitfalls encountered while moving hour‑level Spark session processing jobs to Apache Flink, covering parallelism skew, state TTL, checkpoint handling, logging, debugging, state migration, Reduce vs Process, input validation, event‑time handling, and the trade‑offs of storing data inside Flink.

Big DataFlinkState Management
0 likes · 19 min read
Ten Gotchas When Migrating Spark Jobs to Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 2, 2021 · Big Data

An Introduction to Kafka Connect: Architecture, Components, and Hands‑On Setup

This article introduces Kafka Connect, explaining its purpose as a scalable and reliable tool for moving data between Apache Kafka and external systems, detailing its core concepts, architecture, deployment modes, configuration files, and a step‑by‑step example that streams data from a file source to a file sink.

Data IntegrationETLStreaming
0 likes · 12 min read
An Introduction to Kafka Connect: Architecture, Components, and Hands‑On Setup
Laravel Tech Community
Laravel Tech Community
Feb 28, 2021 · Big Data

Apache Beam 2.28.0 Release Highlights and New Features

Apache Beam 2.28.0 introduces extensive Parquet support, new hash functions in BeamSQL and ZetaSQL, ApproximateDistinct via HLL, enhanced I/O connectors including SpannerIO for Numeric fields, ParquetIO schema support, KafkaTableProvider thrift, HadoopFormatIO key/value cloning skip, and various other improvements.

Apache BeamBatchBig Data
0 likes · 3 min read
Apache Beam 2.28.0 Release Highlights and New Features
DataFunTalk
DataFunTalk
Feb 22, 2021 · Big Data

Optimizing Flink Real-Time Task Resources: Memory and Message Processing Perspectives

This article explores practical methods for optimizing Flink real‑time task resources on Kubernetes, focusing on memory usage analysis via GC logs and message‑processing capacity assessment, proposing automated detection of over‑provisioned memory and CPU, and outlining a workflow for resource adjustment to reduce costs.

Big DataFlinkGC Analysis
0 likes · 18 min read
Optimizing Flink Real-Time Task Resources: Memory and Message Processing Perspectives
dbaplus Community
dbaplus Community
Feb 18, 2021 · Big Data

How JD Search Scaled Real‑Time Analytics with Flink and Doris

This article details JD Search's journey from a Storm‑based pipeline to a Flink‑driven architecture backed by Apache Doris, covering business requirements, technical challenges, design trade‑offs, performance optimizations for massive traffic spikes, and future plans for their real‑time OLAP data warehouse.

Big DataFlinkOLAP
0 likes · 12 min read
How JD Search Scaled Real‑Time Analytics with Flink and Doris
Sohu Tech Products
Sohu Tech Products
Feb 17, 2021 · Big Data

Dynamic Broadcast State and Data Partitioning in an Apache Flink Fraud Detection Engine

This article demonstrates how to initialize, broadcast, and dynamically update rule sets in an Apache Flink fraud detection pipeline, using BroadcastProcessFunction and MapState to achieve runtime data partitioning without recompiling, and explains the underlying data exchange patterns such as forward, hash, rebalance, and broadcast.

Apache FlinkBroadcast StateDynamic Key Function
0 likes · 11 min read
Dynamic Broadcast State and Data Partitioning in an Apache Flink Fraud Detection Engine
DataFunTalk
DataFunTalk
Feb 17, 2021 · Big Data

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

The article details Apache Iceberg 0.11.0's core enhancements—including partition changes, SortOrder, extensive Flink and Spark integrations, CDC/Upsert support, hash‑based write distribution to reduce small files, and upcoming 0.12.0 roadmap—while providing practical SQL and API examples for data‑lake practitioners.

Apache IcebergBig DataCDC
0 likes · 13 min read
Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 7, 2021 · Big Data

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

This guide walks through setting up Apache Zeppelin as a low‑cost, SQL‑centric development platform for Flink, covering environment preparation, installation, interpreter configuration, execution modes, verification, common pitfalls, dimension‑table joins, custom UDFs, Redis integration, and dual‑stream join techniques.

FlinkSQLStreaming
0 likes · 24 min read
Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Feb 1, 2021 · Big Data

Kafka Overview: Architecture, Advantages, Disadvantages, and Core Concepts

This article provides a comprehensive introduction to Apache Kafka, covering its distributed publish‑subscribe architecture, its key components such as brokers, topics, partitions, producers, consumers, and ZooKeeper, as well as its advantages, drawbacks, storage mechanisms, partition assignment strategies, and reliability guarantees for high‑throughput big‑data streaming.

Big DataDistributed SystemsMessage Queue
0 likes · 20 min read
Kafka Overview: Architecture, Advantages, Disadvantages, and Core Concepts
DataFunTalk
DataFunTalk
Feb 1, 2021 · Big Data

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points, Iceberg's table format and capabilities, Flink‑Iceberg streaming and batch processing, practical implementations, and future roadmap for data‑lake acceleration.

Apache FlinkApache IcebergBig Data
0 likes · 21 min read
Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 29, 2021 · Cloud Computing

iQIYI Introduces CUVA HDR Standard Support and Explores the Ultra‑HD Video Industry

iQIYI becomes the first video platform to deliver content meeting the China Ultra‑High‑Definition Video Industry Alliance (CUUVA) HDR standard, enabling devices like Xiaomi 10 Pro, Huawei P30 Pro, iPhone 11 Pro and XS Max to display richer colors, higher contrast, and deeper visual depth, while fostering an open, industry‑wide ecosystem and planning future CUVA HDR live‑streaming support.

CUVAHDRStreaming
0 likes · 5 min read
iQIYI Introduces CUVA HDR Standard Support and Explores the Ultra‑HD Video Industry
Hulu Beijing
Hulu Beijing
Jan 25, 2021 · Product Management

What Hulu’s Generation Stream Study Reveals About the Next‑Gen TV Audience

Hulu’s Generation Stream study, conducted with Culture Co‑Op and industry experts, surveyed 2,500 U.S. viewers aged 13‑54, revealing that 90% watch TV via streaming, identifying three audience segments—‘Stream Most’, ‘Stream Only’, and ‘Stream Also’—and highlighting how streaming reshapes viewing habits and content expectations.

Generation StreamHuluStreaming
0 likes · 6 min read
What Hulu’s Generation Stream Study Reveals About the Next‑Gen TV Audience
Efficient Ops
Efficient Ops
Jan 17, 2021 · Big Data

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

This article introduces Kafka’s fundamental role as a messaging system, explains topics, partitions, producers, consumers, replicas, consumer groups, and the controller, and explores its cluster architecture, performance optimizations like sequential writes and zero-copy, providing a comprehensive overview for building scalable data pipelines.

Big DataDistributed SystemsMessage Queue
0 likes · 11 min read
Understanding Kafka: Core Concepts, Architecture, and Performance Secrets
Top Architect
Top Architect
Jan 17, 2021 · Big Data

Migrating LinkedIn’s Who Viewed Your Profile System from Lambda Architecture to a Lambda‑less Architecture

This article describes how LinkedIn’s Who Viewed Your Profile feature was originally built on a Lambda architecture, the operational challenges it caused, and the step‑by‑step migration to a streamlined, Samza‑driven, Lambda‑less design that improves performance, reduces maintenance overhead, and retains essential batch capabilities.

Lambda architectureLinkedInPinot
0 likes · 11 min read
Migrating LinkedIn’s Who Viewed Your Profile System from Lambda Architecture to a Lambda‑less Architecture
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 11, 2021 · Backend Development

How Streaming Output and Reactive Programming Boost Web Performance

This article explains the concepts of streaming output and reactive programming, describes the underlying HTTP chunked transfer, SSE, WebSocket and RSocket protocols, provides code examples, and outlines practical scenarios where end‑to‑end streaming improves performance and user experience.

BackendHTTP Chunked TransferSSE
0 likes · 18 min read
How Streaming Output and Reactive Programming Boost Web Performance
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 9, 2021 · Big Data

Comprehensive 2021 Flink Interview Questions and Answers

This article presents a detailed collection of 2021 Flink interview questions covering checkpoint mechanisms, watermarks, state backends, join types, fault tolerance, resource configuration, and recent Flink 1.10 features, providing concise explanations and code examples for each topic.

CheckpointFlinkState Backend
0 likes · 23 min read
Comprehensive 2021 Flink Interview Questions and Answers
iQIYI Technical Product Team
iQIYI Technical Product Team
Dec 11, 2020 · Fundamentals

Analysis of the MSU World Video Codec Competition and the Current State of the AV1 Ecosystem

The MSU World Video Codec Competition highlighted iQIYI’s QAV1 encoder achieving faster speeds and superior compression compared to H.265, while the expanding AV1 ecosystem—bolstered by widespread hardware decoding, platform adoption, and recent QAV1 enhancements such as 8K, HDR, and improved rate‑distortion optimization—promises higher quality video at lower bandwidth costs.

AV1Codec CompetitionQAV1
0 likes · 7 min read
Analysis of the MSU World Video Codec Competition and the Current State of the AV1 Ecosystem
Programmer DD
Programmer DD
Dec 9, 2020 · Big Data

Master Apache Beam: Build a Portable Word Count Pipeline in Minutes

This tutorial introduces Apache Beam’s unified programming model for batch and streaming, explains its core concepts and terminology, compares it with other runners, and walks through a complete Java word‑count example—including dependencies, pipeline construction, transforms, and execution with DirectRunner.

Apache BeamDataflowDistributed Processing
0 likes · 8 min read
Master Apache Beam: Build a Portable Word Count Pipeline in Minutes
DataFunTalk
DataFunTalk
Dec 6, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: Overview of AI Flow and Its Architecture

This article explains how Flink enables end‑to‑end machine‑learning workflows through AI Flow, covering the background of Lambda architecture, AI task stages, the advantages of Flink, AI Flow components, AI Graph concepts, integration with Python and TensorFlow, and a real‑world advertising recommendation use case.

AI FlowFlinkReal-Time
0 likes · 14 min read
Building an AI Ecosystem with Flink: Overview of AI Flow and Its Architecture
DataFunTalk
DataFunTalk
Dec 3, 2020 · Big Data

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

This article explains how Apache Flink integrates with data lake architectures, especially using Apache Iceberg as a table format, to enable real‑time streaming ingestion, CDC processing, near‑real‑time lambda architectures, and future enhancements like automatic file merging and row‑level deletes.

Apache IcebergData LakeFlink
0 likes · 13 min read
Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg
DataFunSummit
DataFunSummit
Dec 1, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications

This article explains how Flink enables end‑to‑end AI workflows through the AI Flow platform, covering the Lambda architecture background, AI task pipeline stages, the reasons for choosing Flink, AI Flow’s graph model, core services, integration with ML pipelines, and real‑world advertising recommendation use cases.

AI FlowAI PipelineBig Data
0 likes · 12 min read
Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications
Java High-Performance Architecture
Java High-Performance Architecture
Nov 18, 2020 · Big Data

Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks

This article examines Apache Pulsar, an open‑source messaging platform created by Yahoo, compares it with Kafka by outlining Kafka’s common pain points, highlights Pulsar’s multi‑tenant architecture, layered storage, built‑in functions, and security features, and discusses the trade‑offs of each solution.

Apache PulsarBig DataDistributed Systems
0 likes · 6 min read
Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks
DataFunTalk
DataFunTalk
Nov 17, 2020 · Artificial Intelligence

Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, explains its core algorithms, performance comparison with Spark ML, version‑wise feature evolution, and provides practical quick‑start instructions for both Java (Maven) and Python (PyAlink) users, including data source handling, type conversion components, unified file‑system operations, and an overview of its FM algorithm implementation.

AlinkBatch ProcessingData Integration
0 likes · 13 min read
Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 16, 2020 · Big Data

Understanding Spark Streaming Backpressure Mechanism and Source Code Analysis

This article explains why Spark Streaming introduced backpressure, how the dynamic rate‑control mechanism works, and provides a detailed walkthrough of the relevant source code, including the RateController class, its registration, and the execution flow that adjusts ingestion rates to match processing capacity.

RateControllerRateLimiterSpark
0 likes · 14 min read
Understanding Spark Streaming Backpressure Mechanism and Source Code Analysis
DataFunSummit
DataFunSummit
Nov 10, 2020 · Artificial Intelligence

Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, detailing its core algorithms, performance advantages over Spark ML, version evolution, Maven and PyAlink installation steps, data‑source integrations, FM algorithm support, and unified file‑system operations for both batch and streaming workloads.

AlinkFlinkPyAlink
0 likes · 11 min read
Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide
Top Architect
Top Architect
Nov 9, 2020 · Cloud Computing

Design Analysis of Netflix's Cloud‑Based Microservices Architecture

This article examines how Netflix migrated its video‑streaming platform to AWS, adopted a microservices architecture, and built a global CDN, detailing the system’s components, design goals such as high availability, low latency and scalability, and the trade‑offs and resilience techniques employed.

AWSNetflixStreaming
0 likes · 23 min read
Design Analysis of Netflix's Cloud‑Based Microservices Architecture
360 Tech Engineering
360 Tech Engineering
Nov 6, 2020 · Big Data

Guide to Flink SQL: Features, Scenarios, and Productization

Flink SQL, the high‑level SQL interface for Apache Flink, offers language‑independent, dependency‑free, easy‑to‑use stream processing with advanced features such as DDL, UDFs, time semantics, windowing, pattern matching, and built‑in connectors, supporting data synchronization, batch‑stream fusion, Hive integration, and various product enhancements.

Data IntegrationFlinkHive
0 likes · 11 min read
Guide to Flink SQL: Features, Scenarios, and Productization
DataFunTalk
DataFunTalk
Nov 1, 2020 · Big Data

Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse

The article explains how Flink 1.11 deepens its integration with Hive, covering background, new connector features, simplified dependency management, enhanced Hive dialect, streaming writes and reads, temporal table joins, and how these capabilities enable a unified batch‑streaming data warehouse.

Batch‑Streaming IntegrationData WarehouseFlink
0 likes · 16 min read
Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse
MaGe Linux Operations
MaGe Linux Operations
Oct 29, 2020 · Backend Development

Master HTTP Requests with Python httpx: GET, POST, PUT, Streaming & More

This guide walks you through using the Python httpx library to perform various HTTP methods—including GET, POST, PUT, DELETE, HEAD, and OPTIONS—handle query parameters, decode responses, work with JSON, custom headers, form data, file uploads, streaming, cookies, redirects, and authentication, all with clear code examples.

APIAuthenticationHTTP
0 likes · 10 min read
Master HTTP Requests with Python httpx: GET, POST, PUT, Streaming & More
DataFunTalk
DataFunTalk
Oct 29, 2020 · Big Data

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

Lyft transformed its legacy data pipeline by designing a cloud‑native, Flink‑based near real‑time analytics platform that ingests billions of events, writes Parquet files to S3, leverages Presto for interactive queries, and implements multi‑stage non‑blocking ETL, fault‑tolerant back‑fill, and extensive performance optimizations.

AWSData LakeETL
0 likes · 12 min read
Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink
ITPUB
ITPUB
Oct 16, 2020 · Big Data

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

This article details NetEase Cloud Music's evolution of a real‑time data warehouse built on Flink 1.9 and Calcite, covering platform scale, architectural design, metadata management, SDK simplifications, monitoring improvements, and concrete use cases such as AB‑testing, live reporting, and feature serving.

Big DataCalciteFlink
0 likes · 8 min read
How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite