Tag

real-time data processing

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
May 8, 2024 · Artificial Intelligence

Kuaishou’s Practices for Large‑Scale Model Data Processing and Storage

This article shares Kuaishou’s real‑time, massive‑scale model data processing pipeline, covering model scenarios, recommendation workflow complexity, large‑scale data storage, streaming joins, feature computation, NVM‑based storage solutions, strong consistency mechanisms, and future outlook for AI recommendation systems.

Big DataKuaishouLarge-Scale Models
0 likes · 16 min read
Kuaishou’s Practices for Large‑Scale Model Data Processing and Storage
DataFunTalk
DataFunTalk
Feb 7, 2024 · Big Data

Kuaishou's Practices for Large‑Scale Model Data Processing, Real‑Time Feature Handling, and Storage

This article presents Kuaishou's end‑to‑end engineering solutions for handling massive, real‑time recommendation model data, covering scenario description, complex business pipelines, trillion‑parameter model storage, high‑throughput processing with Flink and NVM, and future directions for cloud‑native scalability.

Big DataKuaishouLarge-Scale Models
0 likes · 15 min read
Kuaishou's Practices for Large‑Scale Model Data Processing, Real‑Time Feature Handling, and Storage
FunTester
FunTester
Jan 5, 2024 · Big Data

An Overview of Apache Kafka and Kafka Streams Technical Features

This article introduces Apache Kafka as a high‑throughput, scalable, fault‑tolerant distributed streaming platform, explains why it is chosen for real‑time data pipelines, and details key Kafka Streams concepts such as stream processing, interactive queries, stateful processing, windowing, serialization, and testing.

Apache KafkaBig DataKafka Streams
0 likes · 13 min read
An Overview of Apache Kafka and Kafka Streams Technical Features
vivo Internet Technology
vivo Internet Technology
May 24, 2023 · Big Data

Kafka Real-time Data Archiving to Hive: Flink SQL and DataStream Implementation Solutions

The article explains how to archive Kafka real‑time data to Hive using either Flink SQL, which quickly creates partitioned ORC tables but requires timezone handling, or Flink DataStream for more complex pipelines, and offers best‑practice guidance on data quality, system complexity, security, and performance.

Big DataDataStreamFlink
0 likes · 15 min read
Kafka Real-time Data Archiving to Hive: Flink SQL and DataStream Implementation Solutions
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Oct 26, 2022 · Big Data

Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice

Arctic, NetEase’s streaming lakehouse built on Apache Iceberg, unifies streaming and batch workloads with millisecond‑level latency, Hive compatibility, and built‑in message‑queue support, delivering CDC, upserts and OLAP without a Lambda architecture, as demonstrated by real‑time processing of 2 PB of Hive data for Cloud Music.

Apache IcebergArcticHive Compatibility
0 likes · 15 min read
Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice
Bilibili Tech
Bilibili Tech
Jun 10, 2022 · Big Data

Incremental Data Lake Design and Hudi Core Optimizations with Flink

The article describes how combining Apache Flink with Hudi enables an incremental data lake that delivers near‑real‑time analytics by switching to merge‑on‑read, fixing log handling bugs, improving compaction planning, and refactoring table‑service scheduling, while showcasing use cases such as CDC ingestion, data quality control, and real‑time materialized views, and outlines future enhancements like optimistic concurrency and unified schema evolution.

Apache HudiCDCCompaction Optimization
0 likes · 21 min read
Incremental Data Lake Design and Hudi Core Optimizations with Flink
DataFunSummit
DataFunSummit
Dec 10, 2021 · Big Data

Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance

This article details NetEase Yanxuan's evolution of a real‑time data platform from 2017 to present, covering background, current scale, layered architecture, Flink‑SQL development IDE, service‑oriented task execution, resource‑optimizing deployment modes, cloud‑native migration, comprehensive data governance, and future batch‑stream integration plans.

Big DataFlinkStreaming SQL
0 likes · 15 min read
Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance
HomeTech
HomeTech
Nov 3, 2021 · Big Data

Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation

This article presents Car Home's experience building a real‑time materialized view system on Apache Flink, detailing system analysis, problem decomposition, a global‑version‑based CDC algorithm, its implementation as a Flink connector, practical deployment results, and remaining challenges such as clock dependency and state size.

AlgorithmBig DataCDC
0 likes · 17 min read
Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation
Tencent Cloud Developer
Tencent Cloud Developer
Sep 6, 2018 · Big Data

Real-Time Stream Computing: Concepts, Challenges, and Tencent Cloud Solutions

As mobile and IoT data surge, real-time stream computing—especially Flink’s low-latency, high-throughput, exactly-once engine—addresses challenges of latency, accuracy, and usability, and Tencent Cloud’s managed Flink service provides elastic, secure, integrated pipelines for applications ranging from online status monitoring to fraud detection and smart transportation.

Apache StormBig DataCloud Services
0 likes · 30 min read
Real-Time Stream Computing: Concepts, Challenges, and Tencent Cloud Solutions