Tag

Offline Computing

0 views collected around this technical thread.

vivo Internet Technology
vivo Internet Technology
Jan 24, 2024 · Big Data

Evolution of Vivo's Trillions-Scale Data Architecture: Dual-Active Real-Time and Offline Computing

Vivo’s trillion‑scale data platform evolved into a dual‑active real‑time and offline architecture that leverages multi‑datacenter clusters, Kafka/Pulsar caching, a unified sorting layer, HBase‑backed dimension tables, and micro‑batch Spark jobs to deliver low‑cost, high‑performance processing, 99.9% availability, and 99.9995% data‑integrity.

Data IntegrityHBaseKafka
0 likes · 16 min read
Evolution of Vivo's Trillions-Scale Data Architecture: Dual-Active Real-Time and Offline Computing
Baidu Geek Talk
Baidu Geek Talk
Aug 28, 2023 · Cloud Native

Baidu Search Vertical Offline Computing System Architecture Evolution

Baidu's search vertical offline computing system evolved through four stages—from a fragmented pre‑2018 processing setup to a unified business framework, then serverless functions, and finally a data‑intelligent architecture with multi‑layer abstraction, graph and multi‑language engines, achieving 5‑10× efficiency gains and dramatically reducing failures.

Baidu SearchCloud NativeDAG Processing
0 likes · 23 min read
Baidu Search Vertical Offline Computing System Architecture Evolution
Baidu Geek Talk
Baidu Geek Talk
Jun 7, 2023 · Big Data

Optimization Practices for Offline Big Data Computing and Storage at Baidu MEG

Baidu MEG’s offline big‑data platform cut costs and boost efficiency by applying intelligent scheduling, storage‑separation, tide‑power workload profiling, remote shuffle services and dynamic quota resizing, raising compute utilization from 55 % to 80 % and storage from 63 % to 78 %, slashing annual expenses by roughly ¥70 million and reducing task duration by about 30 %.

Offline ComputingRSSbig data
0 likes · 12 min read
Optimization Practices for Offline Big Data Computing and Storage at Baidu MEG
Baidu Geek Talk
Baidu Geek Talk
Jan 5, 2022 · Cloud Native

Baidu Cloud‑Native Mixed Workload (Offline Co‑location) Technology Overview

Baidu’s mixed‑workload approach co‑locates offline batch jobs with latency‑sensitive online services on shared nodes, using a dynamic resource view, priority‑based scheduling, cpuset/NUMA isolation, eBPF policies, and predictive profiling, boosting CPU utilization above 40 % and saving billions of RMB in total cost of ownership.

Cloud NativeKubernetesMixed Workload
0 likes · 17 min read
Baidu Cloud‑Native Mixed Workload (Offline Co‑location) Technology Overview
DataFunTalk
DataFunTalk
Aug 29, 2021 · Big Data

Building and Optimizing the Offline Computing Platform at Autohome: Challenges, Solutions, and Future Plans

This article details the evolution of Autohome's offline computing platform from a 50‑node cluster in 2013 to a multi‑thousand‑node Hadoop ecosystem, describing performance and stability challenges, multi‑tenant operational issues, low resource utilization, and the comprehensive technical solutions and future roadmap implemented to address them.

AI on HadoopHadoopMetaStore
0 likes · 11 min read
Building and Optimizing the Offline Computing Platform at Autohome: Challenges, Solutions, and Future Plans
Big Data Technology Architecture
Big Data Technology Architecture
Jun 4, 2020 · Big Data

58.com Big Data Offline Computing Platform: Architecture, Scaling, Optimization, and Cross‑Data‑Center Migration

This article presents a comprehensive case study of 58.com’s massive Hadoop‑based offline computing platform, detailing its architecture, scaling challenges, performance‑tuning measures, YARN and SparkSQL upgrades, and the systematic cross‑data‑center migration of thousands of nodes and petabytes of data.

Cluster ScalingHadoopOffline Computing
0 likes · 23 min read
58.com Big Data Offline Computing Platform: Architecture, Scaling, Optimization, and Cross‑Data‑Center Migration
DataFunTalk
DataFunTalk
Apr 9, 2020 · Big Data

Scaling and Optimizing 58.com’s Hadoop‑Based Offline Computing Platform: Architecture, Challenges, and Solutions

This article details how 58.com built a massive Hadoop‑based offline computing platform with over 4,000 servers and hundreds of petabytes of storage, addressing scaling, stability, GC, YARN scheduling, SparkSQL migration, storage operations, and a large‑scale cross‑datacenter migration.

Cluster ScalingHadoopOffline Computing
0 likes · 24 min read
Scaling and Optimizing 58.com’s Hadoop‑Based Offline Computing Platform: Architecture, Challenges, and Solutions
Big Data Technology Architecture
Big Data Technology Architecture
Aug 21, 2019 · Big Data

Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage

This article explains fundamental big‑data concepts by comparing offline (batch) and real‑time (stream) computing, distinguishing real‑time queries from ad‑hoc queries, clarifying OLTP versus OLAP workloads, and outlining the differences between row‑based and column‑based storage architectures.

OLAPOLTPOffline Computing
0 likes · 5 min read
Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage