Topic

batch processing

Collection size
120 articles
Page 2 of 6
vivo Internet Technology
vivo Internet Technology
Dec 13, 2023 · Big Data

Hudi Data Lake Implementation and Optimization Practice at vivo

Vivo’s big‑data team deployed Apache Hudi to create a lakehouse that unifies streaming and batch workloads, leverages COW and MOR storage modes, automates small‑file clustering and compaction, and applies extensive version, streaming, batch, and lifecycle optimizations, delivering minute‑level latency, hundred‑million‑records‑per‑minute ingestion, and query speeds up to 20 % faster than Hive.

Apache HudiBatch ProcessingBig Data
0 likes · 11 min read
Hudi Data Lake Implementation and Optimization Practice at vivo
DeWu Technology
DeWu Technology
Oct 10, 2022 · Big Data

Offline and Real-Time User Profile Fusion Architecture

The architecture combines a nightly batch job that generates offline user profiles stored in HBase with a Flink‑based stream layer that lazily loads those profiles on app start and creates real‑time updates, then fuses both streams into a unified, timestamp‑ordered profile in Redis, forming a Lambda‑style pipeline.

Batch ProcessingFlinkHBase
0 likes · 10 min read
Offline and Real-Time User Profile Fusion Architecture
DaTaobao Tech
DaTaobao Tech
Dec 11, 2023 · Big Data

Design and Implementation of an Online Batch Processing Framework for Large-Scale Promotion Systems

The paper presents a centralized online batch‑processing framework for large‑scale promotion systems, where applications integrate via an SDK, a task‑center schedules and dispatches sub‑tasks through RocketMQ to Dubbo‑enabled containers, employing MapReduce‑style splitting, Guava rate‑limiting, heartbeat health checks, and has successfully handled over 1.3 million tasks during Double‑11.

Batch ProcessingBig DataDubbo
0 likes · 9 min read
Design and Implementation of an Online Batch Processing Framework for Large-Scale Promotion Systems
DaTaobao Tech
DaTaobao Tech
Aug 11, 2022 · Big Data

Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing

The article describes how fragmented real‑time, batch, and online data‑warehouse pipelines suffer from low productivity and inconsistent data quality, and introduces a unified SQL engine built on Apache Calcite that parses, optimizes, and compiles a single SQL statement into executable plans for ODPS, Flink, or Java, leveraging Janino code generation, multi‑backend state storage, and snapshot‑join semantics to boost performance and simplify development.

Batch ProcessingCalciteFlink
0 likes · 16 min read
Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing
Bilibili Tech
Bilibili Tech
Oct 25, 2023 · Backend Development

Performance Optimization Practices in Bilibili's Risk Control Engine

To overcome storage, compute, and I/O bottlenecks in Bilibili’s risk‑control engine, the team combined pre‑fetching with Redis caching, batch retrieval, asynchronous writes via Railgun, aggressive log compression, and a multi‑level cache plus Bloom filter, cutting latency to sub‑100 ms, reducing Redis QPS by over 90 % and storage by ~38 %, while supporting million‑level query throughput.

AsyncBackendBatch Processing
0 likes · 22 min read
Performance Optimization Practices in Bilibili's Risk Control Engine
Tencent Cloud Developer
Tencent Cloud Developer
Aug 1, 2019 · Databases

FeatureKV: A High-Performance Key-Value Storage System for WeChat's Billion-Scale Challenges

FeatureKV, WeChat’s high‑performance key‑value store, handles one‑billion queries per second and ingests a billion keys per hour by separating write‑only DataSvr from read‑only KVSvr, supporting in‑memory, indexed, and block‑indexed tables, scaling horizontally, guaranteeing eventual consistency with versioned reads, and delivering up to 11 billion reads per second with sub‑15 ms latency.

Batch ProcessingDistributed StorageFeatureKV
0 likes · 22 min read
FeatureKV: A High-Performance Key-Value Storage System for WeChat's Billion-Scale Challenges
Shopee Tech Team
Shopee Tech Team
Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBatch Processing
0 likes · 20 min read
Building Real-Time Data Warehouse with Flink + Hudi at Shopee
Baidu Geek Talk
Baidu Geek Talk
Jul 7, 2021 · Backend Development

Design and Implementation of Baidu's Commodity Promotion System

The article details Baidu’s 2020‑built commodity promotion system for Baijiahao and live‑stream e‑commerce, linking merchants with authors/streamers through CPS billing, three user interfaces and five core services, and highlights technical choices such as dynamic‑library tracking, asynchronous batch writes, and a high‑cohesion, low‑coupling architecture requiring cross‑team collaboration.

Baidu ecosystemBatch ProcessingCPS
0 likes · 15 min read
Design and Implementation of Baidu's Commodity Promotion System
JD Retail Technology
JD Retail Technology
Feb 29, 2024 · Databases

Optimizing Large‑Scale Batch Processing for an Advertising Platform: From Query Tuning to Load‑Balanced Execution

This article presents a real‑world case study of optimizing massive batch‑processing tasks in an ad‑platform by applying query‑level improvements, cursor‑based pagination, shard‑aware batch updates, JVM‑tuned garbage collection, and distributed load‑balancing, ultimately reducing CPU usage from 80% to under 2% and cutting query‑per‑minute volume from millions to a few thousand.

Batch ProcessingDatabase OptimizationJava
0 likes · 22 min read
Optimizing Large‑Scale Batch Processing for an Advertising Platform: From Query Tuning to Load‑Balanced Execution
Beike Product & Technology
Beike Product & Technology
Mar 10, 2020 · Fundamentals

Optimizing String Replacement Using SSE2 SIMD Instructions

This article explains how to use SSE2 SIMD instructions to optimize string replacement operations, demonstrating a 16-character batch processing technique that significantly improves performance for longer strings.

Batch ProcessingSIMDString Optimization
0 likes · 4 min read
Optimizing String Replacement Using SSE2 SIMD Instructions
ByteFE
ByteFE
Oct 31, 2022 · Backend Development

Image Optimization for ISV Pages: Offline Compression, WebP Conversion, and Batch Processing

This article details a systematic approach to reducing image sizes for ISV‑generated pages, covering offline compression, WebP conversion, data structure design, batch processing pipelines, monitoring, and fallback strategies, while providing code examples and performance comparisons.

BackendBatch ProcessingImage Optimization
0 likes · 26 min read
Image Optimization for ISV Pages: Offline Compression, WebP Conversion, and Batch Processing
JD Tech
JD Tech
May 31, 2018 · Backend Development

Design and Architecture of a Unified MySQL Data Synchronization Platform

This article details the design of a unified MySQL data synchronization platform that consolidates offline sync, real‑time subscription, and real‑time sync into BatchJob, StreamJob, and PieJob abstractions, describing task implementations, cluster architecture, high‑availability mechanisms, and evolution challenges such as file loss and metadata handling.

Batch ProcessingHigh AvailabilityMySQL
0 likes · 10 min read
Design and Architecture of a Unified MySQL Data Synchronization Platform
Ctrip Technology
Ctrip Technology
Jan 25, 2017 · Backend Development

Handling Duplicate Messages, Ordering, Concurrency, and Batch Processing in Message‑Driven Systems

This article shares practical patterns and built‑in mechanisms for dealing with duplicate messages, message ordering, concurrent updates, asynchronous acknowledgments, and batch processing in a large‑scale, message‑driven architecture, illustrated with QMQ examples from Qunar's platform.

Backend DevelopmentBatch ProcessingDuplicate Message Handling
0 likes · 16 min read
Handling Duplicate Messages, Ordering, Concurrency, and Batch Processing in Message‑Driven Systems
Qunar Tech Salon
Qunar Tech Salon
Jul 5, 2019 · Big Data

Understanding Big Data Processing Architectures: Lambda, Kappa, and Lambda Plus

This article explains the technical challenges of large‑scale data processing, compares the classic Lambda and Kappa architectures, and introduces the cloud‑native Lambda Plus solution built on TableStore and Blink that simplifies batch‑stream integration for TB‑scale workloads.

Batch ProcessingBig DataCloud Services
0 likes · 13 min read
Understanding Big Data Processing Architectures: Lambda, Kappa, and Lambda Plus
Qunar Tech Salon
Qunar Tech Salon
Jan 21, 2017 · Backend Development

Message Consumption Patterns and Best Practices in Qunar's QMQ

This article shares Qunar's practical experiences with message-driven architecture, detailing consumer handling of duplicate messages, ordering, concurrency control, asynchronous processing, and batch strategies, and presents concrete solutions such as idempotent checks, deduplication tables, versioning, and QMQ's built‑in executors.

Batch ProcessingConcurrencyMessage Queue
0 likes · 18 min read
Message Consumption Patterns and Best Practices in Qunar's QMQ
Qunar Tech Salon
Qunar Tech Salon
Jul 28, 2015 · Fundamentals

Introduction to xargs with Basic and Advanced Usage Examples

This article explains the purpose of the Unix xargs command, demonstrates basic usage for concatenating log files, shows an advanced example for renaming text files to log files using the -I placeholder, and provides a step‑by‑step breakdown of how the command pipeline works.

Batch ProcessingShellcommand-line
0 likes · 3 min read
Introduction to xargs with Basic and Advanced Usage Examples
Zhuanzhuan Tech
Zhuanzhuan Tech
Jan 11, 2019 · Databases

Differences Between TiDB and MySQL: Transactions, Queries, Server‑Side Prepared Statements, and Batch Processing

This article examines TiDB, a world‑class open‑source distributed NewSQL database, comparing its transaction and query behavior with MySQL, discussing underlying Percolator model, server‑side prepared statements, batch processing techniques, and practical optimization strategies for developers.

Batch ProcessingDistributed DatabaseMySQL
0 likes · 10 min read
Differences Between TiDB and MySQL: Transactions, Queries, Server‑Side Prepared Statements, and Batch Processing
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Feb 24, 2024 · Backend Development

Introducing Karta: A Lightweight Go Library for Asynchronous and Batch Function Task Processing

This article introduces Karta, a lightweight Go library that provides two modes—Pipeline for unknown‑size asynchronous tasks and Group for known‑size batch tasks—offering a concise API, configurable workers, and built‑in callbacks to simplify high‑performance concurrent processing in backend applications.

AsyncBackendBatch Processing
0 likes · 9 min read
Introducing Karta: A Lightweight Go Library for Asynchronous and Batch Function Task Processing
Big Data Technology Architecture
Big Data Technology Architecture
Nov 15, 2021 · Big Data

Flink Sort‑Shuffle: Design, Implementation, and Performance Evaluation

This article explains how Flink's new sort‑shuffle mechanism improves large‑scale batch processing by reducing file counts, optimizing I/O, lowering memory usage, and delivering up to tenfold speedups, while also detailing configuration tips and future enhancements.

Batch ProcessingBig DataData Shuffle
0 likes · 16 min read
Flink Sort‑Shuffle: Design, Implementation, and Performance Evaluation