Tagged articles

Streaming

570 articles · Page 2 of 6

Feb 10, 2025 · Artificial Intelligence

deepseek4j 1.3: Java SDK adds web search, streaming & multi‑channel AI

deepseek4j 1.3 introduces web‑search capability, streaming responses, system prompts, expanded multi‑platform support, enhanced SSE debugging, and upcoming features like API‑key rotation and resilience, enabling Java developers to integrate DeepSeek models effortlessly while focusing on business logic.

AIDeepSeekStreaming

0 likes · 8 min read

deepseek4j 1.3: Java SDK adds web search, streaming & multi‑channel AI

DataFunSummit

Feb 9, 2025 · Big Data

Modern Data Stack on Alibaba Cloud Using Flink CDC: Architecture, Features, and Use Cases

This article presents a comprehensive overview of Alibaba Cloud's modern data stack built on Flink CDC, detailing its core concepts, extended capabilities, typical application scenarios, performance optimizations, a live demo, and future development plans for large‑scale streaming data integration.

Alibaba CloudBig DataData Integration

0 likes · 13 min read

Modern Data Stack on Alibaba Cloud Using Flink CDC: Architecture, Features, and Use Cases

macrozheng

Jan 24, 2025 · Backend Development

Boost Java Excel Performance with FastExcel: Features, Usage, and Comparison

This article introduces FastExcel, an upgraded Java library for high‑performance Excel read/write, outlines its key features, provides step‑by‑step code examples for entity creation, event listeners, writing, reading, PDF conversion, compares it with EasyExcel, and concludes with its suitability for large‑scale data processing.

ExcelFastExcelPDF

0 likes · 8 min read

Boost Java Excel Performance with FastExcel: Features, Usage, and Comparison

Code Ape Tech Column

Jan 24, 2025 · Backend Development

FastExcel: High‑Performance Java Library for Excel Read/Write – Features, Usage, and Comparison with EasyExcel

FastExcel is a Java library that builds on EasyExcel to provide higher performance, low‑memory streaming, and additional features such as PDF conversion, offering simple APIs, full compatibility, and detailed code examples for creating entity classes, listeners, and read/write operations.

ExcelFastExcelJava

0 likes · 9 min read

FastExcel: High‑Performance Java Library for Excel Read/Write – Features, Usage, and Comparison with EasyExcel

Alibaba Cloud Big Data AI Platform

Jan 14, 2025 · Big Data

How Fluss Unifies Lake and Stream for Real‑Time Analytics: Architecture, Benefits, and Future Roadmap

This article summarizes a talk by Alibaba Cloud senior engineer and Flink Committer Luo Yuxia on the challenges of separating lake and stream storage, introduces the Fluss lake‑stream unified architecture, explains its technical benefits such as second‑level data freshness, unified metadata, efficient changelog generation, and outlines future plans for broader ecosystem integration.

Data LakeFlinkFluss

0 likes · 13 min read

How Fluss Unifies Lake and Stream for Real‑Time Analytics: Architecture, Benefits, and Future Roadmap

Ctrip Technology

Jan 3, 2025 · Big Data

Design and Implementation of a Kafka Gatekeeper for FinOps Billing Data Quality Governance

This article describes the challenges of data quality in Ctrip’s hybrid‑cloud FinOps billing system and presents the design, implementation, and high‑availability deployment of a custom Kafka Gatekeeper proxy that performs pre‑validation, configurable rules, self‑service dashboards, and automated alerts to improve coverage, timeliness, and responsibility attribution.

Big DataCloud NativeData Quality

0 likes · 17 min read

Design and Implementation of a Kafka Gatekeeper for FinOps Billing Data Quality Governance

Big Data Technology & Architecture

Jan 2, 2025 · Big Data

Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details

This article provides a comprehensive overview of Apache Paimon, covering its real‑time lake ingestion, unified stream‑batch processing, table types (primary‑key and append‑only), LSM‑tree storage, bucket mechanisms, merge‑engine options, compaction strategies, concurrency control, consumption methods, tag management, data cleanup, and system tables for big‑data workloads.

Apache PaimonBig DataFlink

0 likes · 25 min read

Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details

Zhihu Tech Column

Dec 31, 2024 · Cloud Native

Cloud Native Innovation Forum: AutoMQ Table Topic, OceanBase Integrated Database, and Observability Practices

The article recaps Zhihu's Cloud Native Innovation Forum where experts from AutoMQ, OceanBase, and Flashcat shared practical solutions on streaming data ingestion, unified database architectures, and AI‑driven observability, highlighting real‑world deployments, performance optimizations, and cost‑saving strategies.

AIAutoMQCloud Native

0 likes · 10 min read

Cloud Native Innovation Forum: AutoMQ Table Topic, OceanBase Integrated Database, and Observability Practices

Architect

Dec 15, 2024 · Databases

Efficient MySQL Queries for Millions of Rows: Regular, Stream, and Cursor

When processing massive MySQL result sets, loading all rows into JVM memory can cause OOM and slow performance, so this guide compares three approaches—regular pagination, streaming queries using server-side cursors, and cursor‑based fetchSize control—detailing their implementations, MyBatis configurations, and trade‑offs.

CursorDatabase QueryLarge Data

0 likes · 10 min read

Efficient MySQL Queries for Millions of Rows: Regular, Stream, and Cursor

MaGe Linux Operations

Dec 14, 2024 · Big Data

Master Kafka: From Core Concepts to Real-World Deployment

This comprehensive guide explains Kafka’s architecture, core APIs, topics and partitions, deployment steps, multi‑broker clustering, and practical use cases such as messaging, log aggregation, stream processing, and data import/export with Kafka Connect, providing a hands‑on tutorial for developers and engineers.

InstallationStreamingdistributed systems

0 likes · 30 min read

Master Kafka: From Core Concepts to Real-World Deployment

DaTaobao Tech

Dec 6, 2024 · Big Data

How Paimon + Flink Enables Low‑Cost Real‑Time State Storage for Complex Streaming Jobs

This article explains how Apache Paimon can be used as a real‑time state store for Flink, detailing its low‑cost, scalable storage, lookup‑join design, table schema, bucket configuration, memory tuning, and practical use cases such as handling refund‑adjusted order tags and cumulative metrics.

Apache PaimonBig DataFlink

0 likes · 16 min read

How Paimon + Flink Enables Low‑Cost Real‑Time State Storage for Complex Streaming Jobs

Tencent Advertising Technology

Dec 6, 2024 · Big Data

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Tencent's advertising team replaced a traditional HDFS‑Hive warehouse with an Apache Iceberg‑based data lake, adding primary‑key tables, multi‑stream merging, adaptive compaction, and Spark SPJ optimizations to achieve minute‑level feature update latency, 10× back‑fill speed, and up to 60% storage savings.

Big DataCDCCompaction

0 likes · 25 min read

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Big Data Technology & Architecture

Dec 2, 2024 · Big Data

Optimizing Primary‑Key and Append‑Scalable Tables in Paimon with Flink

This guide explains how to optimize Paimon primary‑key and Append‑Scalable tables in Flink by adjusting sink and source parallelism, checkpoint intervals, making small‑file merges fully asynchronous, changing file formats, and applying ordering strategies to improve both write and read performance.

BatchBig DataFlink

0 likes · 6 min read

Optimizing Primary‑Key and Append‑Scalable Tables in Paimon with Flink

Bilibili Tech

Nov 26, 2024 · Big Data

Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices

Bilibili migrated its massive user‑behavior, commercial AI training, and database synchronization pipelines from Hive and Kafka to an Iceberg‑based streaming‑batch architecture, using Flink and the Magnus optimizer to achieve minute‑level freshness, reduce CPU and memory usage by about 20‑22 %, save roughly 3.55 M CNY annually, and dramatically improve query latency and join performance.

BatchData IntegrationData Lake

0 likes · 20 min read

Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices

DataFunSummit

Nov 23, 2024 · Big Data

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

This article presents Bilibili's end‑to‑end exploration of a streaming‑batch unified data pipeline built on Apache Iceberg, detailing the original and iterated architectures for massive user behavior transmission, online AI training, DB synchronization, and dimension‑join, along with performance gains, cost savings, and future plans.

Batch ProcessingData LakeFlink

0 likes · 20 min read

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

JavaEdge

Nov 11, 2024 · Fundamentals

Master Audio/Video Development: Fast‑Track Your FFmpeg Skills and Community Engagement

This guide outlines why audio‑video technology is booming, how mastering FFmpeg provides a rapid entry point, and offers a structured learning path—including core concepts, streaming tools, API usage, and community contribution—to help developers quickly become proficient in media processing.

FFmpegStreamingaudio

0 likes · 8 min read

Master Audio/Video Development: Fast‑Track Your FFmpeg Skills and Community Engagement

Architecture Digest

Nov 8, 2024 · Backend Development

The Evolution of EasyExcel: From Alibaba’s Internal Tool to the Upcoming EasyExcel‑Plus

This article chronicles the birth, technical innovations, open‑source journey, and community impact of Alibaba’s EasyExcel library, explains its memory‑optimized streaming design with a concise code example, and announces the forthcoming EasyExcel‑Plus project along with a free book giveaway.

EasyExcelExcelJava

0 likes · 7 min read

The Evolution of EasyExcel: From Alibaba’s Internal Tool to the Upcoming EasyExcel‑Plus

CSS Magic

Nov 8, 2024 · Artificial Intelligence

LLM Application Development Tips (3): Exploring LLM API Inputs and Outputs

This article explains how to configure key OpenAI chat completion parameters—such as temperature, top_p, streaming, response format, and tool selection—and walks through the structure of the API's JSON response, highlighting fields like id, model, choices, finish_reason, and usage for better control and cost estimation.

AI AgentsAPI parametersJSON response

0 likes · 8 min read

LLM Application Development Tips (3): Exploring LLM API Inputs and Outputs

Mike Chen's Internet Architecture

Oct 31, 2024 · Backend Development

Kafka vs RabbitMQ: Which Messaging System Suits Your Projects?

This article compares Kafka and RabbitMQ across design philosophy, performance, data models, delivery semantics, ecosystem, and typical use cases, helping developers choose the right messaging middleware for high‑throughput streaming or reliable task queues.

ComparisonMessage QueueRabbitMQ

0 likes · 4 min read

Kafka vs RabbitMQ: Which Messaging System Suits Your Projects?

360 Zhihui Cloud Developer

Oct 31, 2024 · Backend Development

Boosting Ozone Block Reads with gRPC Streaming: Up to 30% Faster

This article explains how a gRPC bidirectional streaming read method was added to Ozone to reduce chunk‑by‑chunk request gaps, describes the client‑side implementation, presents single‑ and multi‑threaded performance tests showing roughly 30% faster reads, and outlines future enhancements such as pre‑fetching.

OzoneStreamingblock storage

0 likes · 7 min read

Boosting Ozone Block Reads with gRPC Streaming: Up to 30% Faster

Big Data Technology & Architecture

Oct 12, 2024 · Big Data

Introduction to Apache Paimon: Architecture, Unified Storage, and Core Concepts

This article introduces Apache Paimon, an open‑source table format that supports batch and streaming reads and writes, explains its architecture, unified storage model, and core concepts such as file layout, snapshots, manifests, data files, partitions, and consistency guarantees.

Apache PaimonBig DataOLAP

0 likes · 6 min read

Introduction to Apache Paimon: Architecture, Unified Storage, and Core Concepts

DataFunSummit

Sep 30, 2024 · Big Data

Apache Hudi from Zero to One: The Swiss Army Knife for Data Ingestion – Hudi Streamer (Part 9)

This article introduces Apache Hudi Streamer, a versatile Spark‑based data ingestion tool likened to a Swiss Army knife, detailing its core options—including table configuration, continuous mode, source classes, transformers, table services, catalog synchronization, and advanced features—while guiding users on practical pipeline setup.

Apache HudiBig DataSpark

0 likes · 10 min read

Apache Hudi from Zero to One: The Swiss Army Knife for Data Ingestion – Hudi Streamer (Part 9)

StarRocks

Sep 19, 2024 · Big Data

How Ele.me Built a Real‑Time Lakehouse: From 1.0 to 3.0 with Flink, Paimon & StarRocks

This article details Ele.me's journey in evolving its real‑time data warehouse, covering the original 1.0 architecture, the 2.0 lakehouse redesign with Paimon and StarRocks, performance evaluations of lake formats and query engines, and the roadmap toward a 3.0 streaming lakehouse solution.

Big DataFlinkLakehouse

0 likes · 16 min read

How Ele.me Built a Real‑Time Lakehouse: From 1.0 to 3.0 with Flink, Paimon & StarRocks

JD Retail Technology

Sep 3, 2024 · Backend Development

Design and Architecture of a New Video Review System with Streamlined Frame Extraction and Parallel Processing

This article presents the design goals, architecture, technology selection, and component details of a unified video review system that leverages FFmpeg for frame extraction, stream‑based parallel processing, and flexible synchronous/asynchronous workflows to achieve low latency and high scalability.

FFmpegStreamingVideo Processing

0 likes · 10 min read

Design and Architecture of a New Video Review System with Streamlined Frame Extraction and Parallel Processing

Java Tech Enthusiast

Sep 2, 2024 · Industry Insights

Why Major Pirate Streaming Sites Are Closing: Industry Trends and Copyright Crackdowns

A wave of shutdowns affecting popular free video and anime piracy platforms such as RARBG and Animeflix reveals how pandemic costs, legal pressures, court rulings, and coordinated anti‑piracy actions by industry alliances are reshaping the digital media landscape and pushing users toward legitimate services.

Industry TrendsStreamingcopyright enforcement

0 likes · 7 min read

Why Major Pirate Streaming Sites Are Closing: Industry Trends and Copyright Crackdowns

Big Data Technology & Architecture

Aug 26, 2024 · Big Data

Understanding Flink 1.11 JobManager and TaskManager Memory Configuration

This article details the major memory model changes in Flink 1.11 for JobManager and TaskManager, compares them with Flink 1.9, provides concrete JVM command examples, explains the relationship between memory settings and parallelism, and introduces fine‑grained resource management for streaming workloads.

Big DataFlinkJobManager

0 likes · 9 min read

Understanding Flink 1.11 JobManager and TaskManager Memory Configuration

Big Data Technology & Architecture

Aug 20, 2024 · Big Data

Practical Insights on Using Apache Paimon for Real-World Data Lake Scenarios

This article shares a personal, experience‑driven overview of Apache Paimon, highlighting its design simplicity, key capabilities such as schema evolution, stream‑batch unified processing, primary‑key support, and closed‑loop data handling, while discussing when its features are appropriate for production environments.

Apache PaimonBatch ProcessingBig Data

0 likes · 5 min read

Practical Insights on Using Apache Paimon for Real-World Data Lake Scenarios

MaGe Linux Operations

Aug 7, 2024 · Operations

How to Configure Nginx for Direct MP4 Streaming (Step-by-Step Guide)

Learn how to compile Nginx with the MP4 module and set up a server block to stream MP4 files directly, including required commands, configuration snippets, and restart procedures, enabling seamless offline video playback without downloading.

NGINXServer ConfigurationStreaming

0 likes · 3 min read

How to Configure Nginx for Direct MP4 Streaming (Step-by-Step Guide)

DeWu Technology

Jul 31, 2024 · Big Data

Custom Flink Scheduler Enhancements: Resource Balancing, Task Migration, and TmRestart Strategy

The article details Dewu’s custom Flink scheduler, DwScheduler, which adds JSON‑based resource specifications, per‑TaskManager slot sharing for balanced CPU use, hot TaskManager migration callbacks, and a new TmRestart strategy for rapid pod‑process recovery, offering practical techniques to enhance real‑time stream processing stability and performance.

Apache FlinkPerformance OptimizationResource Management

0 likes · 9 min read

Custom Flink Scheduler Enhancements: Resource Balancing, Task Migration, and TmRestart Strategy

Soul Technical Team

Jul 23, 2024 · Big Data

Kafka Stability Challenges and Governance Framework at Soul

This article analyzes the role, application scenarios, stability challenges, and comprehensive governance framework of Apache Kafka at Soul, covering deployment, configuration, monitoring, standard controls, common misuse, and future directions toward cloud‑native solutions.

MonitoringOperationsStability

0 likes · 30 min read

Kafka Stability Challenges and Governance Framework at Soul

Aikesheng Open Source Community

Jul 22, 2024 · Databases

Performance Comparison of Local, Remote, and Percona XtraBackup 8.0 Backup Methods for MySQL

This article evaluates MySQL backup strategies using Percona XtraBackup, comparing local, remote, and streaming modes in terms of backup duration, primary server load impact, and backup file size, and provides practical recommendations based on detailed test results.

Percona XtraBackupStreamingbackup

0 likes · 17 min read

Performance Comparison of Local, Remote, and Percona XtraBackup 8.0 Backup Methods for MySQL

Rare Earth Juejin Tech Community

Jul 22, 2024 · Big Data

Comprehensive Guide to Kafka: Architecture, Core Concepts, and Configuration

This article provides an in‑depth overview of Apache Kafka, covering its use cases, comparison with other message queues, versioning, performance mechanisms, core concepts such as topics, partitions, offsets, consumer groups, rebalancing, replication, leader election, idempotence, transactions, compression, interceptors, request handling, and practical configuration tips for reliable streaming applications.

Big DataMessage QueueStreaming

0 likes · 25 min read

Comprehensive Guide to Kafka: Architecture, Core Concepts, and Configuration

Alibaba Cloud Big Data AI Platform

Jul 19, 2024 · Big Data

How to Deploy a PySpark Streaming Job on EMR Serverless Spark

This guide walks you through creating a Kafka‑enabled EMR Serverless Spark cluster, configuring network connections and security groups, uploading JARs and Python resources, and finally launching and monitoring a PySpark streaming application.

Big DataEMR ServerlessPySpark

0 likes · 8 min read

How to Deploy a PySpark Streaming Job on EMR Serverless Spark

Tencent Cloud Developer

Jul 16, 2024 · Big Data

In‑Depth Exploration of Apache Kafka: Architecture, High Reliability, and High Performance

Apache Kafka achieves high‑throughput, fault‑tolerant messaging by combining a partitioned log architecture with leader‑follower replication, asynchronous producer pipelines, configurable acknowledgments, page‑cache‑based sequential writes, zero‑copy transfers, batching, compression, and a multi‑reactor network model that together ensure scalability, reliability, and performance.

Apache KafkaReliabilityStreaming

0 likes · 30 min read

In‑Depth Exploration of Apache Kafka: Architecture, High Reliability, and High Performance

Architecture Development Notes

Jul 14, 2024 · Backend Development

Inside Netflix’s Tech Stack: How They Power Billions of Streams

This article breaks down Netflix’s comprehensive technology stack—from mobile and web front‑ends to microservices, data storage, streaming pipelines, and CI/CD tools—showcasing how the platform delivers seamless, high‑performance video experiences to billions of users worldwide.

CloudMicroservicesNetflix

0 likes · 8 min read

Inside Netflix’s Tech Stack: How They Power Billions of Streams

Tencent Cloud Developer

Jul 2, 2024 · Big Data

Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices

This guide shows how to deploy Apache Flink 1.17 in Docker, configure off‑heap memory, connect it to Pulsar via the 4.1.0‑1.17 connector, run example jobs that copy topics and perform windowed word‑count, and provides Maven dependencies, custom serialization tips, batching settings, and version‑specific best‑practice notes.

Apache FlinkDataStreamDocker deployment

0 likes · 20 min read

Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices

Code Mala Tang

Jun 29, 2024 · Frontend Development

Master WritableStream: Real-World Uses, Best Practices, and Common Pitfalls

This article introduces the JavaScript WritableStream API, explains its core methods and construction, demonstrates practical scenarios such as file uploads, logging, data transformation, and media handling, and discusses advanced considerations like chunk sizing, error recovery, concurrency control, and performance optimization.

StreamingWeb APIWritableStream

0 likes · 10 min read

Master WritableStream: Real-World Uses, Best Practices, and Common Pitfalls

Code Mala Tang

Jun 27, 2024 · Frontend Development

Mastering ReadableStream: A Deep Dive into Web Streams API

This article introduces the concept of streams, explains the Web Streams API and its ReadableStream component, details constructors, methods, queuing strategies, back‑pressure handling, BYOB and byte streams, and provides practical code examples and usage scenarios for modern web development.

Front-endReadableStreamStreaming

0 likes · 20 min read

Mastering ReadableStream: A Deep Dive into Web Streams API

DataFunTalk

Jun 18, 2024 · Big Data

Real-time Data Warehouse Evolution with Data Lake: Architecture, Challenges, and Solutions

This article presents a comprehensive overview of the evolution from traditional Lambda‑based real‑time data warehouse solutions to a data‑lake‑integrated architecture, detailing the shortcomings of legacy designs, the iterative improvements made at JD Technology, and the technical and operational challenges encountered during implementation.

Data LakeLambda architectureReal-Time Data Warehouse

0 likes · 24 min read

Real-time Data Warehouse Evolution with Data Lake: Architecture, Challenges, and Solutions

Big Data Technology & Architecture

Jun 16, 2024 · Big Data

Real-time Big Data Analytics with Apache Paimon and the Streaming Lakehouse Architecture

This article summarizes Wang Feng's presentation on the next‑generation Lakehouse architecture, explaining how Apache Paimon provides a unified, real‑time data lake format that bridges batch and streaming workloads, enabling low‑latency analytics and AI integration for modern big‑data applications.

Apache PaimonBig DataStreaming

0 likes · 9 min read

Real-time Big Data Analytics with Apache Paimon and the Streaming Lakehouse Architecture

Sohu Tech Products

Jun 5, 2024 · Big Data

Why Kafka Is the Backbone of Modern Data Pipelines: Core Architecture and Use Cases

This article explains Kafka's role as a high‑throughput distributed message queue, detailing its core components, topic‑partition model, consumer groups, storage mechanisms, fault‑tolerance features, delivery guarantees, ZooKeeper coordination, and scalability strategies for building reliable real‑time data pipelines.

Big DataMessage QueueStreaming

0 likes · 14 min read

Why Kafka Is the Backbone of Modern Data Pipelines: Core Architecture and Use Cases

Su San Talks Tech

Jun 2, 2024 · Big Data

Mastering Kafka: Core Architecture, Use Cases, and Design Principles

This article provides a comprehensive overview of Apache Kafka, covering its role as a message queue, core components, topic and partition design, consumer groups, storage mechanisms, high‑availability features, delivery guarantees, ZooKeeper coordination, and scalability strategies for building robust real‑time data pipelines.

Big DataStreamingkafka

0 likes · 15 min read

Mastering Kafka: Core Architecture, Use Cases, and Design Principles

DataFunTalk

May 16, 2024 · Big Data

Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon

This article presents UCloud's USDP‑based streaming data lake warehouse solution that leverages Flink for real‑time processing and Paimon for lake storage, detailing its architecture, advantages, practical scenarios, and providing complete SQL and Flink CDC code snippets for end‑to‑end implementation.

CDCData LakeFlink

0 likes · 27 min read

Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon

Sohu Tech Products

May 15, 2024 · Artificial Intelligence

OpenAI Assistants API Quickstart Project for Next.js

OpenAI’s open‑source openai‑assistants‑quickstart project shows how to integrate the Assistants API into a Next.js app, offering streaming chat, code‑interpreter, file‑search, and function‑calling tools, and provides step‑by‑step setup instructions so developers can quickly build and customize AI assistants.

AI assistantAssistants APICode interpreter

0 likes · 4 min read

OpenAI Assistants API Quickstart Project for Next.js

DataFunSummit

May 15, 2024 · Big Data

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

This article details Xiaomi's sales data warehouse development, covering its history, architecture, dimensional modeling, layer design, streaming‑batch integration, governance, security, and future directions, while also addressing practical Q&A on implementation challenges and best practices.

Big DataData WarehouseFlink

0 likes · 15 min read

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

Mike Chen's Internet Architecture

May 11, 2024 · Big Data

Comprehensive Introduction to Apache Kafka: Architecture, Features, and Use Cases

This article provides a detailed overview of Apache Kafka, covering its core characteristics, distributed architecture, key components such as topics, partitions, brokers, producers, consumers, ZooKeeper, and common application scenarios like log collection, event‑driven architecture, real‑time analytics, and monitoring.

Big DataMessage QueueStreaming

0 likes · 7 min read

Comprehensive Introduction to Apache Kafka: Architecture, Features, and Use Cases

Mike Chen's Internet Architecture

May 9, 2024 · Big Data

Understanding Apache Kafka: Features, Architecture, and Real‑World Use Cases

This article provides a comprehensive overview of Apache Kafka, covering its core features, architectural components, message flow, and common application scenarios such as log collection, decoupled messaging, activity tracking, operational monitoring, and stream processing.

Apache KafkaMessage QueueStreaming

0 likes · 6 min read

Understanding Apache Kafka: Features, Architecture, and Real‑World Use Cases

dbaplus Community

May 7, 2024 · Big Data

How Twitter Scaled to Process 400 Billion Events Daily: Architecture Evolution

Twitter processes up to 400 billion events per day, moving from a Lambda‑style architecture with Scalding, Heron, and TSAR to a hybrid Twitter‑Data‑Center and Google Cloud pipeline that delivers sub‑10 ms latency, higher throughput, and lower operational cost while simplifying real‑time aggregation.

Event ProcessingStreamingTwitter

0 likes · 9 min read

How Twitter Scaled to Process 400 Billion Events Daily: Architecture Evolution

Big Data Technology & Architecture

Apr 30, 2024 · Big Data

Apache Paimon Becomes a Top-Level Project: A Comprehensive Overview of Lakehouse Framework Capabilities and Future Trends

The article reviews Apache Paimon's graduation to an Apache Top-Level Project, outlines the essential capabilities of modern lakehouse frameworks—including streaming and batch I/O, multi‑engine integration, and advanced features—and discusses the problems they solve and the promising direction of the lakehouse ecosystem.

Apache PaimonBatch ProcessingBig Data

0 likes · 5 min read

Apache Paimon Becomes a Top-Level Project: A Comprehensive Overview of Lakehouse Framework Capabilities and Future Trends

Bilibili Tech

Apr 26, 2024 · Artificial Intelligence

2024 Bilibili Technology Patent Awards – Highlights of Ten Winning Innovations

On World Intellectual Property Day, Bilibili honored ten breakthrough patents that together enable billion‑scale video duplicate detection, AI‑driven story generation, synchronized live rhythm‑games, automatic OTT casting, knowledge‑graph‑based content moderation, glitch‑free multi‑audio streaming, modular playback integration, neural‑network resolution encoding, AV1 reference‑frame pruning, and fine‑grained GPU isolation.

StreamingVideo Processingartificial-intelligence

0 likes · 6 min read

2024 Bilibili Technology Patent Awards – Highlights of Ten Winning Innovations

21CTO

Apr 22, 2024 · Big Data

Inside Uber’s Real‑Time Data Infrastructure: How They Scale Streaming at Massive Scale

This article explores Uber’s sophisticated real‑time data infrastructure, detailing how the company leverages open‑source technologies such as Apache Kafka, Flink, Pinot, and Presto, and describing the architectural components, scaling challenges, multi‑region resilience, data back‑filling, and operational practices that enable low‑latency analytics for millions of daily rides and deliveries.

Big DataFlinkPinot

0 likes · 25 min read

Inside Uber’s Real‑Time Data Infrastructure: How They Scale Streaming at Massive Scale

DataFunSummit

Apr 18, 2024 · Big Data

Real‑time Data Warehouse Evolution with Data Lake: Architecture, Challenges, and Solutions

This article presents a comprehensive overview of JD Tech's real‑time data warehouse evolution, detailing the legacy Lambda‑based design, its shortcomings, the transition to a data‑lake‑integrated architecture, iterative improvements, encountered technical and non‑technical issues, and future outlooks.

ClickHouseData LakeFlink

0 likes · 24 min read

Bilibili Tech

Apr 12, 2024 · Backend Development

Design and Optimization of a High‑Throughput Long‑Connection Service for Live Streaming

The article details a Golang‑based high‑throughput long‑connection service for live‑streaming, describing its five‑layer architecture, multi‑protocol support, load‑balancing, message‑queue decoupling, aggregation with brotli compression, multi‑region deployment, priority channels, and future enhancements for observability and intelligent endpoint selection.

Streamingbackend-architecturegolang

0 likes · 16 min read

Design and Optimization of a High‑Throughput Long‑Connection Service for Live Streaming

Bilibili Tech

Apr 9, 2024 · Big Data

Optimizing Flink State Performance with RocksDB KV Separation and BlobDB

In large‑scale Flink double‑stream joins, terabyte‑sized RocksDB state caused severe compaction latency and CPU spikes, but enabling RocksDB BlobDB KV‑separation (and an inner‑compaction patch) dramatically shrank SST files, reduced read/write latencies to sub‑millisecond levels, and cut CPU spikes by about half.

FlinkKV SeparationPerformance Optimization

0 likes · 12 min read

Optimizing Flink State Performance with RocksDB KV Separation and BlobDB

DataFunSummit

Apr 7, 2024 · Big Data

Li Auto’s Flink on Kubernetes Data Integration Practice

This article presents Li Auto’s end‑to‑end data integration journey, detailing the evolution of its data platform, the challenges of heterogeneous sources, and how a unified Flink‑on‑K8s solution with cloud‑native architecture, operator management, monitoring, and checkpointing addresses batch‑stream convergence and future scalability.

Batch ProcessingBig DataData Integration

0 likes · 12 min read

Li Auto’s Flink on Kubernetes Data Integration Practice

Ctrip Technology

Mar 22, 2024 · Mobile Development

Design and Implementation of the Cloud Touch Platform for Remote Mobile Device Control and Testing

The article presents the background, full‑scenario construction, core architecture, device‑pool strategy, remote iOS control via WebDriverAgent, screen‑sync using ffmpeg, streaming pipeline, data collection, and practical lessons of the Cloud Touch platform that enables unified remote testing and customer‑support workflows for mobile applications.

Cloud TouchFFmpegRemote Device Control

0 likes · 14 min read

Design and Implementation of the Cloud Touch Platform for Remote Mobile Device Control and Testing

Didi Tech

Mar 12, 2024 · Big Data

Understanding Flink Metrics System: Core Concepts, Elastic Design, and Practical Usage

The article explains Flink’s metrics architecture—core concepts, reporter interfaces, built‑in and custom metric types, elastic plugin design, and scheduled reporting—illustrated with a consumption‑latency example, and shows how Didi uses these metrics for real‑time UI curves, alerts, and intelligent task diagnosis.

Big DataFlinkMetrics

0 likes · 11 min read

Understanding Flink Metrics System: Core Concepts, Elastic Design, and Practical Usage

Architect's Guide

Mar 2, 2024 · Fundamentals

RabbitMQ vs Kafka: Core Differences and When to Use Each

This article compares RabbitMQ and Apache Kafka across architecture, message ordering, routing, timing, retention, fault handling, scalability, and consumer complexity, and provides guidance on which platform suits specific use‑cases such as flexible routing, strict ordering, long‑term retention, or high throughput.

Message OrderingMessage QueueRabbitMQ

0 likes · 19 min read

RabbitMQ vs Kafka: Core Differences and When to Use Each

Airbnb Technology Team

Mar 1, 2024 · Big Data

Riverbed: A Scalable Data Framework for Real‑time and Batch Processing at Airbnb

Airbnb’s Riverbed framework unifies streaming CDC events and batch Spark jobs behind a GraphQL‑based declarative API to automatically build and maintain distributed materialized views, using Kafka‑partitioned ordering and version control to deliver billions of daily updates with low‑latency reads for features such as payments and search.

AirbnbApache SparkData Engineering

0 likes · 8 min read

Riverbed: A Scalable Data Framework for Real‑time and Batch Processing at Airbnb

MaGe Linux Operations

Feb 20, 2024 · Big Data

Redis Streams vs Kafka: Which Is Better for Real‑Time Event Processing?

This article compares Redis Streams and Kafka, examining their architectures, ordering guarantees, consumer group models, scalability, and trade‑offs, and shows how Redis can emulate Kafka‑like semantics using the Runnel library, while highlighting memory‑speed benefits versus Kafka’s durable, unlimited log storage.

Event ProcessingRedisRunnel

0 likes · 9 min read

Redis Streams vs Kafka: Which Is Better for Real‑Time Event Processing?

Rare Earth Juejin Tech Community

Feb 8, 2024 · Big Data

What Is Kafka? Overview, Architecture, Features, Deployment, and Sample Code

This article explains Kafka as a distributed publish/subscribe messaging system, detailing its core functions, architecture, advantages, deployment methods, common use cases, and provides Java consumer and producer code examples for real‑time data processing.

Big DataJavaMessage Queue

0 likes · 8 min read

What Is Kafka? Overview, Architecture, Features, Deployment, and Sample Code

Open Source Tech Hub

Jan 31, 2024 · Artificial Intelligence

How to Build Async OpenAI PHP Clients with Workerman & Webman

This guide shows how to install the OpenAI PHP async client and implement streaming and non‑streaming chat, image generation, audio speech, and embedding features using Workerman and Webman, including Azure OpenAI support, with complete code examples.

APIOpenAIPHP

0 likes · 6 min read

How to Build Async OpenAI PHP Clients with Workerman & Webman

StarRocks

Jan 30, 2024 · Big Data

How InLong Guarantees Exactly‑Once Real‑Time Writes to StarRocks

This article explains how Apache InLong provides automatic, secure, high‑performance real‑time data transfer to StarRocks, detailing the transactional Stream Load API, the two‑phase commit process, Flink‑based ingestion architecture, exactly‑once guarantees, and performance test results across different parallelism levels.

Big DataExactly-onceInLong

0 likes · 11 min read

How InLong Guarantees Exactly‑Once Real‑Time Writes to StarRocks

MaGe Linux Operations

Jan 21, 2024 · Big Data

Master Kafka: Core Concepts, Metrics, and Troubleshooting Guide

This article explains Kafka's fundamental components, version evolution, key monitoring metrics for producers, brokers, consumers and Zookeeper, and provides step‑by‑step troubleshooting methods for common issues such as slow topic throughput and message backlog.

Big DataMessage QueueStreaming

0 likes · 8 min read

Master Kafka: Core Concepts, Metrics, and Troubleshooting Guide

Spring Full-Stack Practical Cases

Jan 15, 2024 · Backend Development

Simulating ChatGPT‑Style Typing with Spring WebFlux and SSE

This tutorial demonstrates how to use Spring WebFlux’s reactive streaming to create a ChatGPT‑like typing effect, covering backend setup, SSE integration, frontend Axios handling, and a comparison between Flux and traditional Server‑Sent Events.

JavaReactive StreamsServer‑Sent Events

0 likes · 8 min read

Simulating ChatGPT‑Style Typing with Spring WebFlux and SSE

Rare Earth Juejin Tech Community

Jan 13, 2024 · Big Data

What Is Kafka? Overview, Architecture, Features, Deployment, and Sample Code

Kafka, an Apache‑developed distributed publish/subscribe messaging system, provides reliable, high‑throughput real‑time data streaming with producers, consumers, brokers, streams, and connectors, and the article explains its core concepts, architecture, advantages, deployment methods, use cases, and includes Java code examples for producers and consumers.

Big DataJavaMessage Queue

0 likes · 8 min read

FunTester

Jan 5, 2024 · Big Data

An Overview of Apache Kafka and Kafka Streams Technical Features

This article introduces Apache Kafka as a high‑throughput, scalable, fault‑tolerant distributed streaming platform, explains why it is chosen for real‑time data pipelines, and details key Kafka Streams concepts such as stream processing, interactive queries, stateful processing, windowing, serialization, and testing.

Apache KafkaBig DataStreaming

0 likes · 13 min read

An Overview of Apache Kafka and Kafka Streams Technical Features

Sohu Tech Products

Dec 27, 2023 · Big Data

Practical Implementation of Data Integration with Flink on Kubernetes at Li Auto

Li Auto built a cloud‑native data‑integration platform by deploying Flink on Kubernetes, unifying batch and streaming workloads with a storage layer (JuiceFS + BOS) and Flink Operator, enabling simple source‑sink pipelines, elastic scaling, automated checkpointing, and centralized monitoring while addressing earlier fragmentation and resource inefficiencies.

Big DataCloud NativeData Integration

0 likes · 11 min read

Practical Implementation of Data Integration with Flink on Kubernetes at Li Auto

dbaplus Community

Dec 25, 2023 · Big Data

Why Spark and Flink Can't Stream MySQL via JDBC (And What Works Instead)

This article explains the limitations of using JDBC for true streaming reads in Spark and Flink, demonstrates failed attempts with MySQL, shows workarounds that revert to batch processing, and recommends Flink CDC as the practical solution for incremental MySQL ingestion.

Big DataCDCFlink

0 likes · 8 min read

Why Spark and Flink Can't Stream MySQL via JDBC (And What Works Instead)

ITPUB

Dec 24, 2023 · Backend Development

Why Kafka Is the Backbone of Modern Messaging, Streaming, and Data Pipelines

This article explains how Kafka serves as a high‑throughput, durable messaging system, a reliable storage layer, a log‑aggregation hub, a stream‑processing engine, and a core component for CDC, system migration, monitoring, and event‑sourcing architectures.

CDCEvent SourcingStreaming

0 likes · 9 min read

Why Kafka Is the Backbone of Modern Messaging, Streaming, and Data Pipelines

DataFunTalk

Dec 15, 2023 · Big Data

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

The Flink Forward Asia 2023 conference showcased major updates to Apache Flink (versions 1.17 and 1.18), introduced the Apache Paimon lakehouse project, announced Flink CDC 3.0, and highlighted community growth, cloud‑native deployments, and real‑time data‑warehouse use cases across industry leaders.

Apache FlinkApache PaimonBig Data

0 likes · 17 min read

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

ITPUB

Dec 14, 2023 · Big Data

How to Build a Python‑Hadoop Word Count on a Single‑Node Cluster

This step‑by‑step guide shows how to install and configure a single‑node Hadoop 3.2.0 environment on CentOS 7, set up Python 3.7, write MapReduce mapper and reducer scripts in Python, and run a word‑count job using Hadoop streaming, illustrating core Hadoop concepts and their relevance today.

HadoopMapReducePython

0 likes · 21 min read

How to Build a Python‑Hadoop Word Count on a Single‑Node Cluster

Tencent Cloud Developer

Dec 14, 2023 · Big Data

Master Word Count with Python & Hadoop: A Step‑by‑Step Guide

This tutorial walks you through Hadoop’s core components, sets up a single‑node Hadoop cluster on CentOS 7, installs Python 3, writes mapper and reducer scripts in Python, and runs a Hadoop‑Streaming word‑count job to demonstrate classic big‑data processing techniques.

Big DataHadoopLinux

0 likes · 22 min read

Master Word Count with Python & Hadoop: A Step‑by‑Step Guide

DataFunTalk

Dec 12, 2023 · Big Data

Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates

The Flink Forward Asia 2023 conference recap highlights opening remarks, a keynote on Flink’s dominance in streaming compute, detailed 2023 technical advancements, case studies, the launch of Flink CDC 3.0, and a preview of Flink 2.0, along with links to photos and video recordings.

Apache FlinkBig DataFlink 2.0

0 likes · 5 min read

Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates

DataFunTalk

Dec 8, 2023 · Big Data

Zhihu Bridge Platform: Architecture, Capabilities, and Future Trends of Content Operations

This article presents a comprehensive overview of Zhihu's Bridge platform, detailing its content‑operation architecture—including content pool, management, analysis, monitoring, and intervention modules—explaining the underlying streaming and batch technologies such as Flink, Doris, and Elasticsearch, and outlining future automation and AI‑driven workflow directions.

AIBig DataStreaming

0 likes · 17 min read

Zhihu Bridge Platform: Architecture, Capabilities, and Future Trends of Content Operations

ITPUB

Dec 2, 2023 · Backend Development

Why Did My Flink Kafka Job Lose Data? Uncovering Misconfigured Bootstrap Servers

A Flink job that reads from Kafka and writes to Elasticsearch was losing data because the bootstrap.servers list mixed production and pre‑release clusters, causing random server selection, partition discovery failures, and offset mismatches, which were resolved by correcting the server configuration.

Bootstrap ServersData lossFlink

0 likes · 8 min read

Why Did My Flink Kafka Job Lose Data? Uncovering Misconfigured Bootstrap Servers

JavaEdge

Nov 24, 2023 · Backend Development

Why Kafka Is the Ultimate Backbone for Modern Backend Systems

This article explores how Kafka serves as a versatile backbone for messaging, durable storage, log aggregation, monitoring, commit logs, recommendation pipelines, stream processing, CDC, system migration, and event sourcing, highlighting its performance, reliability, and practical deployment patterns.

Message QueueStreamingbackend

0 likes · 10 min read

Why Kafka Is the Ultimate Backbone for Modern Backend Systems

Alibaba Cloud Big Data AI Platform

Nov 23, 2023 · Big Data

Why Apache Paimon Is Revolutionizing Streaming Lakehouse Architecture with Flink

The article traces the shift from traditional Hive‑based warehouses to modern lakehouse architectures, explains the advantages of lake formats, introduces Apache Paimon as a streaming‑first data lake integrated with Flink, presents performance benchmarks showing its superiority over Hudi, and demonstrates a real‑time streaming lakehouse workflow.

Apache PaimonBig DataFlink

0 likes · 15 min read

Why Apache Paimon Is Revolutionizing Streaming Lakehouse Architecture with Flink

Alibaba Cloud Big Data AI Platform

Nov 22, 2023 · Big Data

Real-Time Data Integration with Flink CDC: Core Tech and Alibaba Cloud Solutions

This article, based on a presentation by Flink CDC and Apache Flink community leaders, explores CDC real‑time integration challenges, delves into Flink CDC’s core technologies such as incremental snapshot and lock‑free processing, and demonstrates Alibaba Cloud’s enterprise‑grade solutions for end‑to‑end real‑time data pipelines.

Alibaba CloudBig DataChange Data Capture

0 likes · 21 min read

Real-Time Data Integration with Flink CDC: Core Tech and Alibaba Cloud Solutions

Tencent Cloud Middleware

Nov 15, 2023 · Big Data

Optimizing Apache Pulsar for MySQL Binlog Ingestion and Sorting in Apache InLong

This article explains how Apache Pulsar is used within Apache InLong to collect, sort, and reliably deliver massive MySQL binlog incremental data, covering component architecture, job isolation, client and producer management, consumption strategies, common pitfalls, performance tuning, and practical code examples.

Apache PulsarBinlogInLong

0 likes · 21 min read

Optimizing Apache Pulsar for MySQL Binlog Ingestion and Sorting in Apache InLong

Big Data Technology Architecture

Nov 14, 2023 · Big Data

Open Source Big Data Platform 3.0: Streaming Lakehouse, Serverless Architecture, and AI Integration

The talk outlines the evolution of Alibaba Cloud's open‑source big data platform from Hadoop‑based EMR to a 3.0 architecture featuring a streaming lakehouse, full serverless compute and storage, AI‑driven operations, and upcoming vector search services, highlighting technical motivations, challenges, and product releases.

Big DataLakehouseServerless

0 likes · 14 min read

Open Source Big Data Platform 3.0: Streaming Lakehouse, Serverless Architecture, and AI Integration

DataFunSummit

Nov 9, 2023 · Big Data

Spark 3.4 New Features Overview: Community Updates, SQL Enhancements, PySpark, Streaming, and AI Ecosystem

This article presents a comprehensive overview of Spark 3.4, covering community growth statistics, major SQL improvements such as default column values and timestamp handling, new PySpark and streaming capabilities, and the emerging AI ecosystem that integrates natural‑language interfaces and Spark AI services.

DatabricksPySparkStreaming

0 likes · 14 min read

Spark 3.4 New Features Overview: Community Updates, SQL Enhancements, PySpark, Streaming, and AI Ecosystem

macrozheng

Nov 9, 2023 · Big Data

7 Real-World Kafka Use Cases Every Engineer Should Know

This article explains Kafka's core components and features, then details seven practical scenarios—including log processing, recommendation streams, monitoring, CDC, system migration, event sourcing, and message queuing—showing how Kafka powers modern distributed systems.

Big DataMessage QueueStreaming

0 likes · 12 min read

7 Real-World Kafka Use Cases Every Engineer Should Know

ITPUB

Nov 7, 2023 · Big Data

7 Real-World Kafka Use Cases That Power Modern Distributed Systems

This article introduces Apache Kafka’s core components and key features, then details seven practical use cases—including log processing, recommendation streams, monitoring, CDC, system migration, event sourcing, and message queuing—illustrated with diagrams and step‑by‑step workflows for distributed systems.

Big DataMessage QueueStreaming

0 likes · 10 min read

7 Real-World Kafka Use Cases That Power Modern Distributed Systems

HelloTech

Oct 31, 2023 · Big Data

Investigation of Data Loss in a Flink Kafka Consumer Caused by Mixed Kafka Cluster Configuration

The data loss in a Flink‑Kafka job was caused by a mis‑configured bootstrap.servers list that mixed production and pre‑release Kafka clusters, leading different subtasks to connect to different clusters, resulting in inconsistent partition discovery and offset fetching, which omitted several partitions until the list was corrected.

Cluster ConfigurationData lossElasticsearch

0 likes · 8 min read

Investigation of Data Loss in a Flink Kafka Consumer Caused by Mixed Kafka Cluster Configuration

Java Architect Essentials

Oct 16, 2023 · Backend Development

RabbitMQ vs Kafka: Comparing Asynchronous Messaging Patterns and Architectural Differences

This article introduces asynchronous messaging patterns, then compares RabbitMQ and Apache Kafka by examining their internal architectures, message models, and trade‑offs, helping architects choose the appropriate solution based on scenario requirements in modern.

Message QueuePub-SubRabbitMQ

0 likes · 11 min read

RabbitMQ vs Kafka: Comparing Asynchronous Messaging Patterns and Architectural Differences

Top Architect

Sep 25, 2023 · Backend Development

RabbitMQ vs Kafka: Detailed Comparison and When to Use Each

This article provides an in‑depth technical comparison of RabbitMQ and Apache Kafka, covering their core architectural differences, message ordering, routing, timing, retention, fault handling, scalability, consumer complexity, and offers guidance on selecting the appropriate platform for various backend scenarios.

Message QueueRabbitMQStreaming

0 likes · 18 min read

RabbitMQ vs Kafka: Detailed Comparison and When to Use Each

JD Cloud Developers

Sep 18, 2023 · Backend Development

Mastering Rust gRPC Streaming with Tonic: Build Server & Client

This guide walks through creating a Rust project that uses the Tonic library to implement gRPC streaming, covering project setup, protobuf definitions, server and client code, testing with grpcurl, and enabling the reflection API for service introspection.

Streamingasyncbackend

0 likes · 15 min read

Mastering Rust gRPC Streaming with Tonic: Build Server & Client

21CTO

Sep 8, 2023 · Big Data

Why Real-Time Data Processing Is the Next Frontier for Data Engineers

Real-time data processing transforms traditional batch pipelines by delivering fresh, low‑latency data to millions of concurrent users, leveraging event‑driven architectures, streaming engines, and real‑time databases, with use cases ranging from fraud detection to personalized e‑commerce and operational dashboards, and includes reference architectures and tool recommendations.

Big DataData EngineeringReal-time Processing

0 likes · 16 min read

Why Real-Time Data Processing Is the Next Frontier for Data Engineers

StarRocks

Sep 6, 2023 · Big Data

How Paimon + StarRocks Revolutionize Lakehouse Analytics

This article reviews traditional Lambda and Kappa data‑warehouse architectures, then details four Paimon‑StarRocks lakehouse solutions—including a data‑lake center, accelerated query with materialized views, hot‑cold data separation, and the JNI connector—while also outlining StarRocks’ future roadmap for lakehouse analytics.

Big DataLakehousePaimon

0 likes · 11 min read

How Paimon + StarRocks Revolutionize Lakehouse Analytics

Data Thinking Notes

Aug 27, 2023 · Big Data

How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution

This article analyzes the shortcomings of mainstream Lambda‑style data warehouse architectures, introduces Hudi‑based lakehouse design principles, details the three‑layer unified storage architecture, data distribution, model and read/write mechanisms, and showcases real‑time streaming, multidimensional analysis, and stream‑batch reuse scenarios along with future roadmap plans.

HudiLakehouseStreaming

0 likes · 14 min read

How ByteDance’s LAS Team Unified Real‑Time and Offline Warehousing with a Lakehouse Solution

Big Data Technology & Architecture

Aug 21, 2023 · Big Data

Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

This note outlines how Hudi, Iceberg, and Paimon provide unified batch‑stream storage, UPSERT support, time‑travel capabilities, and lower development costs, enabling a streaming‑warehouse architecture that offers near‑real‑time latency, consistent semantics, persisted intermediate results, and easier historical data repair.

Batch ProcessingHudiIceberg

0 likes · 5 min read

Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

Bitu Technology

Aug 9, 2023 · Product Management

Tubi July 2023 Highlights: New CEO, Top FAST Service Ranking, Content Personalization Strategy, Data‑Driven DNA, Emmy Nomination

In July 2023 Tubi announced the appointment of Anjali Sud as CEO, reinforced its position as the highest‑rated FAST service in the U.S., detailed its personalized content strategy, highlighted its data‑driven technology and ad‑tech approach, and celebrated its first Emmy nomination and industry recognitions.

CEOData-DrivenFAST

0 likes · 8 min read

Tubi July 2023 Highlights: New CEO, Top FAST Service Ranking, Content Personalization Strategy, Data‑Driven DNA, Emmy Nomination

ByteDance Data Platform

Aug 9, 2023 · Big Data

Why Traditional Data Warehouses Fail and How a Real‑Time Lakehouse Solves the Pain

This article analyzes the shortcomings of mainstream data‑warehouse and data‑lake architectures, explains the design of ByteDance's real‑time/offline unified lakehouse solution, and demonstrates its practical applications and future roadmap across streaming, multi‑dimensional analysis, and batch‑stream reuse scenarios.

HudiLASLakehouse

0 likes · 14 min read

Why Traditional Data Warehouses Fail and How a Real‑Time Lakehouse Solves the Pain

Mike Chen's Internet Architecture

Aug 2, 2023 · Backend Development

Kafka Core Architecture, Principles, Features, and Application Scenarios

This article explains Kafka's core architecture—including topics, producers, brokers, and consumers—its underlying mechanisms, the role of Zookeeper, key characteristics such as high throughput and fault tolerance, and common use cases like log collection, activity tracking, and stream processing.

Backend DevelopmentMessage QueueStreaming

0 likes · 7 min read

Kafka Core Architecture, Principles, Features, and Application Scenarios

Ctrip Technology

Jul 13, 2023 · Frontend Development

Streaming Rendering with React 18: Next.js, Remix, and Custom SSR Implementations

This article explains the concept of streaming (incremental) rendering introduced in React 18, demonstrates how to apply it in Next.js and Remix using server components and Suspense, and walks through a custom SSR setup that leverages renderToPipeableStream and the upcoming use hook for seamless data fetching.

Next.jsReActRemix

0 likes · 34 min read

Streaming Rendering with React 18: Next.js, Remix, and Custom SSR Implementations

Java Architecture Diary

Jul 11, 2023 · Big Data

Redpanda vs Apache Kafka with KRaft: Why Redpanda Is Up to 10× Faster

This article presents a detailed benchmark comparing Redpanda 23.1 and Apache Kafka 3.4.0 (with and without KRaft) across multiple AWS instance types, showing how Redpanda consistently delivers higher throughput and dramatically lower end‑to‑end latency, often outperforming Kafka by 4‑20× even with extra hardware.

Apache KafkaBig DataKRaft

0 likes · 12 min read

Redpanda vs Apache Kafka with KRaft: Why Redpanda Is Up to 10× Faster

DataFunTalk

Jul 5, 2023 · Big Data

DataFun Summit 2023 Real‑Time Computing Forum – Speaker Line‑up and Session Details

The DataFun Summit 2023 Real‑Time Computing Forum showcases a series of expert talks on Apache Flink, stream‑batch integration, cloud‑native streaming databases, and large‑scale real‑time data warehousing, featuring speakers from Alibaba Cloud, Taobao, Didi, Ant Group and RisingWave.

Big DataCloud NativeData Warehousing

0 likes · 8 min read

DataFun Summit 2023 Real‑Time Computing Forum – Speaker Line‑up and Session Details

Big Data Technology & Architecture

Jul 4, 2023 · Big Data

Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics

This article presents a step‑by‑step guide on how the logistics provider Haicheng Bangda implemented a streaming data warehouse using Paimon, Flink CDC, and Kubernetes, covering business background, architecture choices, environment setup, SQL examples, troubleshooting tips, and future roadmap for their digital transformation.

Big DataCDCData Warehouse

0 likes · 27 min read

Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics

Sanyou's Java Diary

Jun 26, 2023 · Big Data

Master Kafka Interview Questions: Architecture, Partitioning, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its core architecture, message queue models, communication process, partition selection, consumer groups, rebalancing strategies, partition assignment algorithms, reliability guarantees, replica synchronization, and reasons for removing Zookeeper in newer versions.

Consumer GroupReliabilityStreaming

0 likes · 20 min read

Master Kafka Interview Questions: Architecture, Partitioning, and Reliability Explained