Tagged articles
7 articles
Page 1 of 1
DataFunTalk
DataFunTalk
Sep 15, 2022 · Big Data

Bilibili Offline Platform: Migration from Hive to Spark and Large‑Scale Optimizations

This article details Bilibili's evolution of its offline computing platform from Hadoop‑based Hive to Spark, describing the migration process, automated SQL conversion, result verification, stability and performance enhancements, meta‑store optimizations, and future work on remote shuffle and vectorized execution.

Data SkippingMetaStoreShuffle
0 likes · 28 min read
Bilibili Offline Platform: Migration from Hive to Spark and Large‑Scale Optimizations
ITPUB
ITPUB
Aug 1, 2022 · Big Data

How Bilibili Scaled Offline Computing: Migrating from Hive to Spark and Boosting Performance

This article details Bilibili's evolution from a Hadoop‑based offline platform to a Spark‑driven architecture, covering the Hive‑to‑Spark migration, automated SQL conversion, result validation, stability enhancements, performance tuning, meta‑store federation, and future directions for large‑scale data processing.

Big DataData SkippingMetaStore
0 likes · 31 min read
How Bilibili Scaled Offline Computing: Migrating from Hive to Spark and Boosting Performance
DataFunTalk
DataFunTalk
Aug 29, 2021 · Big Data

Building and Optimizing the Offline Computing Platform at Autohome: Challenges, Solutions, and Future Plans

This article details the evolution of Autohome's offline computing platform from a 50‑node cluster in 2013 to a multi‑thousand‑node Hadoop ecosystem, describing performance and stability challenges, multi‑tenant operational issues, low resource utilization, and the comprehensive technical solutions and future roadmap implemented to address them.

AI on HadoopMetaStoreOffline Computing
0 likes · 11 min read
Building and Optimizing the Offline Computing Platform at Autohome: Challenges, Solutions, and Future Plans
DataFunTalk
DataFunTalk
Mar 10, 2021 · Big Data

Hive MetaStore Challenges and Optimizations at Kuaishou

At Kuaishou, the Hive MetaStore service, which stores metadata for Hive, faced scalability and performance challenges due to massive dynamic partitions and high query volume, leading to a series of architectural optimizations—including read‑write separation, API enhancements, traffic control, and federation—to improve stability and efficiency.

Big DataKuaishouMetaStore
0 likes · 15 min read
Hive MetaStore Challenges and Optimizations at Kuaishou
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 17, 2019 · Big Data

Step-by-Step Guide to Installing Hive 2.1.0 on a Hadoop 2.7.1 Cluster (Ubuntu 14.04)

This tutorial provides a comprehensive, step-by-step procedure for setting up Hive 2.1.0 on a Hadoop 2.7.1 cluster running Ubuntu 14.04, covering environment preparation, Hive installation, configuration of environment variables, MySQL metastore integration, client setup, service startup, and basic verification commands.

Big DataHadoopInstallation
0 likes · 8 min read
Step-by-Step Guide to Installing Hive 2.1.0 on a Hadoop 2.7.1 Cluster (Ubuntu 14.04)