Tag

Bulkload

0 views collected around this technical thread.

Baidu Geek Talk
Baidu Geek Talk
Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGBulkloadClickHouse
0 likes · 19 min read
TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Mar 16, 2022 · Databases

RDB: Cloud Music's Customized Algorithm Feature KV Storage System Based on RocksDB

To meet Cloud Music’s massive algorithm‑feature KV storage needs, the team built RDB—a RocksDB‑based engine within Tair—adding bulk‑load, dual‑version imports, KV‑separation, in‑place sequence appends and protobuf field updates, cutting storage cost, write amplification and latency while scaling to billions of records and millions of QPS.

Algorithm FeaturesBulkloadCompaction
0 likes · 16 min read
RDB: Cloud Music's Customized Algorithm Feature KV Storage System Based on RocksDB
DataFunTalk
DataFunTalk
Jun 11, 2021 · Big Data

Comprehensive Guide to Fast and Stable Hive‑to‑HBase Data Transfer Using Bulkload, MapReduce, and Spark

This article explains how to efficiently move large volumes of data from Hive to HBase by leveraging HBase's bulkload mechanism, detailing the original MapReduce workflow, its performance bottlenecks, and a rewritten Spark‑based solution that simplifies ETL, improves partitioning, and achieves several‑fold speedup.

Big DataBulkloadHBase
0 likes · 17 min read
Comprehensive Guide to Fast and Stable Hive‑to‑HBase Data Transfer Using Bulkload, MapReduce, and Spark
Youzan Coder
Youzan Coder
Dec 18, 2019 · Big Data

HBase Bulkload Practice at Youzan: From MapReduce to Spark Evolution

Youzan’s evolution of HBase bulk‑load—from manual MapReduce jobs to Hive‑SQL and finally Spark—demonstrates how generating HFiles on HDFS, partitioning by region, sorting keys, and handling serialization issues enables billions of records to be loaded efficiently without disrupting production clusters.

Big DataBulkloadDistributed Systems
0 likes · 16 min read
HBase Bulkload Practice at Youzan: From MapReduce to Spark Evolution