Tagged articles

bulkload

7 articles · Page 1 of 1

Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGClickHouseCloud Native

0 likes · 19 min read

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Meituan Technology Team

Mar 14, 2024 · Databases

Meituan Large‑Scale KV Storage: Challenges and Architectural Practices

The article details Meituan’s evolution of KV storage, analyzes scalability and availability challenges of both in‑memory (Squirrel) and persistent (Cellar) systems, and presents concrete architectural solutions such as gossip optimization, fork‑less RDB, multi‑threading, bulkload, and cross‑region disaster recovery, while outlining future directions like Zookeeper removal and vector engine support.

CellarKV storageSquirrel

0 likes · 34 min read

Meituan Large‑Scale KV Storage: Challenges and Architectural Practices

Huolala Tech

May 25, 2023 · Big Data

How Huolala Solved HBase Bulkload Challenges: A Practical Guide

This article details Huolala’s experience building a unified Hive‑to‑HBase pipeline, addressing low development efficiency, lack of monitoring, and HBase instability by evaluating two architectures, implementing a generic Transform tool, optimizing compaction and DistCp, and establishing stability and data‑validation mechanisms.

CompactionDistcpHBase

0 likes · 12 min read

How Huolala Solved HBase Bulkload Challenges: A Practical Guide

NetEase Cloud Music Tech Team

Mar 16, 2022 · Databases

RDB: Cloud Music's Customized Algorithm Feature KV Storage System Based on RocksDB

To meet Cloud Music’s massive algorithm‑feature KV storage needs, the team built RDB—a RocksDB‑based engine within Tair—adding bulk‑load, dual‑version imports, KV‑separation, in‑place sequence appends and protobuf field updates, cutting storage cost, write amplification and latency while scaling to billions of records and millions of QPS.

Algorithm FeaturesCompactionKV Separation

0 likes · 16 min read

RDB: Cloud Music's Customized Algorithm Feature KV Storage System Based on RocksDB

Big Data Technology & Architecture

Oct 30, 2021 · Databases

HBase Common Issues, Optimization Tips, and New Features in HBase 2.0

This article compiles frequently asked HBase questions, troubleshooting steps, performance optimization techniques, configuration guidance, and an overview of new HBase 2.0 features such as off‑heap memory, Procedure v2, In‑Memory Compaction, and MOB support, providing practical solutions for administrators and developers.

HBaseIn-Memory CompactionMOB

0 likes · 29 min read

HBase Common Issues, Optimization Tips, and New Features in HBase 2.0

DataFunTalk

Jun 11, 2021 · Big Data

Comprehensive Guide to Fast and Stable Hive‑to‑HBase Data Transfer Using Bulkload, MapReduce, and Spark

This article explains how to efficiently move large volumes of data from Hive to HBase by leveraging HBase's bulkload mechanism, detailing the original MapReduce workflow, its performance bottlenecks, and a rewritten Spark‑based solution that simplifies ETL, improves partitioning, and achieves several‑fold speedup.

Big DataETLHBase

0 likes · 17 min read

Comprehensive Guide to Fast and Stable Hive‑to‑HBase Data Transfer Using Bulkload, MapReduce, and Spark

Youzan Coder

Dec 18, 2019 · Big Data

HBase Bulkload Practice at Youzan: From MapReduce to Spark Evolution

Youzan’s evolution of HBase bulk‑load—from manual MapReduce jobs to Hive‑SQL and finally Spark—demonstrates how generating HFiles on HDFS, partitioning by region, sorting keys, and handling serialization issues enables billions of records to be loaded efficiently without disrupting production clusters.

HBaseHadoopNoSQL

0 likes · 16 min read

HBase Bulkload Practice at Youzan: From MapReduce to Spark Evolution