Tagged articles
312 articles
Page 2 of 4
IT Architects Alliance
IT Architects Alliance
May 26, 2021 · Databases

Understanding MySQL Slow Queries, Elasticsearch, and HBase: Causes and Practical Solutions

This article explains why MySQL queries become slow, how indexes work and fail, the impact of MDL locks, large‑table challenges, sharding and read‑write splitting strategies, then introduces Elasticsearch’s search capabilities and HBase’s column‑family storage, offering practical guidance for each technology.

Database PerformanceElasticsearchHBase
0 likes · 17 min read
Understanding MySQL Slow Queries, Elasticsearch, and HBase: Causes and Practical Solutions
DataFunTalk
DataFunTalk
May 22, 2021 · Databases

Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

The article examines the strengths and weaknesses of combining HBase and Elasticsearch for massive data storage and retrieval, outlines three integration patterns and their challenges, and presents Alibaba Cloud's Lindorm Searchindex as a SQL‑driven, low‑cost, strongly consistent solution that simplifies development and improves performance.

Big DataElasticsearchHBase
0 likes · 11 min read
Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution
IT Architects Alliance
IT Architects Alliance
May 22, 2021 · Big Data

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

This article presents a comprehensive walkthrough of a Flink‑powered recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms (hotness, product similarity, collaborative filtering), front‑end and back‑end UI, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink
0 likes · 11 min read
Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide
Code Ape Tech Column
Code Ape Tech Column
May 21, 2021 · Databases

Why Your MySQL Queries Are Slow and How ElasticSearch & HBase Can Help

This article analyzes common causes of slow MySQL queries such as index misuse, MDL locks, and large‑table bottlenecks, then presents practical solutions like proper indexing, sharding, read/write splitting, and evaluates when to complement MySQL with ElasticSearch or HBase for better performance.

Database PerformanceElasticsearchHBase
0 likes · 19 min read
Why Your MySQL Queries Are Slow and How ElasticSearch & HBase Can Help
Architect
Architect
May 19, 2021 · Big Data

Flink-Based Real-Time Recommendation System Architecture and Deployment Guide

This article presents a comprehensive overview of a Flink-powered real-time recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms, front‑end and back‑end interfaces, Docker‑based deployment of MySQL, Redis, HBase, Kafka, and step‑by‑step startup procedures.

DockerFlinkHBase
0 likes · 9 min read
Flink-Based Real-Time Recommendation System Architecture and Deployment Guide
dbaplus Community
dbaplus Community
Apr 17, 2021 · Big Data

How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink

This article details a financial company's exploration of Apache Flink for real‑time processing, covering its unique business constraints, end‑to‑end data pipeline, single‑table and multi‑table use cases, implementation challenges, code snippets, data initialization, testing strategies, and lessons learned.

FinancialFlinkHBase
0 likes · 13 min read
How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 14, 2021 · Backend Development

Building a Redis‑Based Distributed Queue to Cut HBase IO Bottlenecks

The article explores what makes code 'good'—emphasizing usability, readability, and maintainability—then details the design and implementation of a lightweight Redis‑based distributed consumption queue that alleviates HBase I/O pressure, describing its architecture, modules, logging, and performance gains.

BackendHBasedistributed queue
0 likes · 10 min read
Building a Redis‑Based Distributed Queue to Cut HBase IO Bottlenecks
iQIYI Technical Product Team
iQIYI Technical Product Team
Apr 9, 2021 · Big Data

Real-Time Data Warehouse at iQIYI Video Production Using Spark and ClickHouse

To meet iQIYI video production’s thousands‑QPS, petabyte‑scale, frequently‑updated data and large‑table join requirements, the team built a Spark‑plus‑ClickHouse real‑time warehouse that streams Kafka changes, joins HBase dimensions, and writes to ClickHouse, reducing reporting development time from days to hours while supporting both offline and real‑time analytics.

ClickHouseHBaseKafka
0 likes · 12 min read
Real-Time Data Warehouse at iQIYI Video Production Using Spark and ClickHouse
Big Data Technology Architecture
Big Data Technology Architecture
Mar 9, 2021 · Databases

Evaluating ZGC vs G1 GC Performance in HBase Clusters

This article examines the challenges of GC pauses in low‑latency HBase services, explains ZGC’s fully concurrent architecture and key techniques such as colored pointers and read barriers, and presents experimental comparisons of ZGC and G1 GC using YCSB benchmarks, highlighting latency, throughput and CPU usage differences.

Garbage CollectionHBaseYCSB
0 likes · 18 min read
Evaluating ZGC vs G1 GC Performance in HBase Clusters
DataFunTalk
DataFunTalk
Feb 13, 2021 · Databases

Improving HBase Availability and Reducing Latency Spikes with Replication‑Based Multi‑Path Reads and ZGC

This article describes how the Didi HBase team tackled HBase’s weak availability and GC‑induced latency spikes by introducing a replication‑based client multi‑path read mechanism, configuring hedged reads, and adopting the Z Garbage Collector, and presents the resulting performance improvements and remaining challenges.

Big DataHBaseMulti-Path Read
0 likes · 11 min read
Improving HBase Availability and Reducing Latency Spikes with Replication‑Based Multi‑Path Reads and ZGC
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 7, 2021 · Databases

Comprehensive HBase Optimization Guide: Table Design, RowKey, JVM Tuning, Cache Settings, and Read/Write Performance

This article provides a detailed, practical guide to optimizing HBase in production, covering table pre‑splitting, RowKey design, JVM memory and GC settings, MSLAB and BucketCache configuration, read‑side client and server tuning, write‑side strategies, and additional tips such as compression and scan caching.

CacheDatabase TuningHBase
0 likes · 29 min read
Comprehensive HBase Optimization Guide: Table Design, RowKey, JVM Tuning, Cache Settings, and Read/Write Performance
Didi Tech
Didi Tech
Dec 21, 2020 · Big Data

HBase Availability and Latency Optimizations: Replication‑Based Multi‑Read and ZGC Adoption

To overcome HBase’s weak availability and GC‑induced latency spikes, the DiDi team introduced a replication‑based client multi‑read (hedged‑read) mechanism and migrated to the Z Garbage Collector, which together dramatically cut maximum and 99.9th‑percentile latencies while keeping services online during region disruptions.

Big DataHBaseLow latency
0 likes · 12 min read
HBase Availability and Latency Optimizations: Replication‑Based Multi‑Read and ZGC Adoption
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 8, 2020 · Big Data

Horizontal Comparison of HBase, Kudu, and ClickHouse (V2.0)

This article provides a comprehensive technical comparison of HBase, Kudu, and ClickHouse—covering installation dependencies, architecture, basic read/write and query operations, real‑world use cases at Didi, a Kudu‑based real‑time data warehouse, and ClickHouse log‑analysis practices—highlighting each system’s strengths and trade‑offs for big‑data workloads.

ClickHouseHBaseKudu
0 likes · 17 min read
Horizontal Comparison of HBase, Kudu, and ClickHouse (V2.0)
DeWu Technology
DeWu Technology
Nov 19, 2020 · Operations

HBase Operations and Use Cases for High‑Concurrency E‑commerce

In this talk, Yun Jin explains how HBase’s petabyte‑scale, horizontally‑scalable architecture—built on Hadoop, HMaster, RegionServers, and Zookeeper—enables e‑commerce platforms to handle extreme promotion‑day traffic by supporting high‑throughput reads/writes, time‑series monitoring, massive order storage, and robust HA, while covering essential table operations, monitoring, and troubleshooting techniques.

Big DataHBaseOperations
0 likes · 6 min read
HBase Operations and Use Cases for High‑Concurrency E‑commerce
Big Data Technology Architecture
Big Data Technology Architecture
Nov 3, 2020 · Big Data

Performance Optimization of Apache Kylin at Beike: HBase Tuning, Region Management, and Slow‑Query Mitigation

This article details how Beike's engineering team scaled Apache Kylin to handle tens of millions of daily queries by optimizing HBase configurations, reducing region count, improving data locality, addressing IO and JVM GC bottlenecks, and implementing comprehensive slow‑query detection and active‑defense mechanisms.

Apache KylinHBaseJVM GC
0 likes · 15 min read
Performance Optimization of Apache Kylin at Beike: HBase Tuning, Region Management, and Slow‑Query Mitigation
Zhongtong Tech
Zhongtong Tech
Oct 30, 2020 · Big Data

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

This article details ZTO Express's journey of adopting Apache Kylin for OLAP, comparing it with Presto, describing platform architecture, performance gains, integration with scheduling and monitoring systems, and the practical optimizations and future plans that enabled sub‑second query responses on massive daily data volumes.

Apache KylinBig DataHBase
0 likes · 16 min read
How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive
MaGe Linux Operations
MaGe Linux Operations
Sep 7, 2020 · Databases

Step-by-Step Guide to Installing an HBase Cluster on Hadoop

This article explains what HBase is, describes its Master, RegionServer, and Zookeeper components, and provides detailed environment preparation and configuration steps—including host setup, SSH key distribution, JDK installation, HBase deployment, configuration file edits, and cluster startup—so you can run HBase on a Hadoop cluster.

HBaseHadoopbigdata
0 likes · 8 min read
Step-by-Step Guide to Installing an HBase Cluster on Hadoop
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 27, 2020 · Big Data

HBase Architecture, Components, and Operations Overview

This article provides a comprehensive overview of Apache HBase’s architecture, detailing its core components such as RegionServer, HMaster, ZooKeeper, WAL, MemStore, and HFiles, and explains key processes including read/write paths, compaction, region splitting, load balancing, and recovery mechanisms.

Big DataDatabase ArchitectureDistributed Systems
0 likes · 17 min read
HBase Architecture, Components, and Operations Overview
Top Architect
Top Architect
Aug 14, 2020 · Big Data

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

This article presents a comprehensive guide for transferring massive MySQL datasets to HBase, covering environment setup on Ubuntu, three synchronization methods—MySQL LOAD DATA, a Kafka‑Thrift pipeline using Maxwell, and real‑time Flink processing—along with performance comparisons and practical tips for Hadoop, HBase, Kafka, Zookeeper, Phoenix, and related tools.

DataSyncFlinkHBase
0 likes · 24 min read
Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions
Programmer DD
Programmer DD
Jul 22, 2020 · Big Data

How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink

This comprehensive guide walks you through setting up a pseudo‑distributed Hadoop environment, loading massive MySQL data with LOAD DATA, Python scripts, and multithreading, and then synchronizing the data to HBase using three approaches—Sqoop, a Kafka‑Thrift pipeline, and a real‑time Kafka‑Flink pipeline—while also comparing query performance of HBase and Phoenix.

FlinkHBaseKafka
0 likes · 28 min read
How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 19, 2020 · Big Data

An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem

This article explains Hive's role as a Hadoop‑based data warehouse, its integration with HBase, the advantages and drawbacks of that combination, introduces Apache Phoenix as a high‑performance SQL layer on HBase, and describes the open‑source NewSQL database Lealone, providing practical usage scenarios and performance comparisons.

Big DataHBaseLealone
0 likes · 9 min read
An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 9, 2020 · Big Data

How ZooKeeper Supports HBase: Coordination, Fault Tolerance, Log Splitting, META Table Management, and Replication

This article explains how ZooKeeper functions as a distributed coordination service for HBase, detailing its role in master and RegionServer fault tolerance, log splitting, META table location tracking, and replication management, illustrating the underlying ZNode structures and failover mechanisms.

Big DataDistributed CoordinationHBase
0 likes · 7 min read
How ZooKeeper Supports HBase: Coordination, Fault Tolerance, Log Splitting, META Table Management, and Replication
vivo Internet Technology
vivo Internet Technology
Jul 8, 2020 · Databases

OpenTSDB: Architecture, Data Model, and HBase Integration for Time-Series Data Storage

The article offers a detailed technical overview of OpenTSDB’s architecture and data model, explaining how it leverages HBase for scalable time‑series storage, describing core concepts, table schemas, ingestion flow, performance considerations, and future alternatives for large‑scale monitoring workloads.

HBaseOpenTSDBTime Series Database
0 likes · 12 min read
OpenTSDB: Architecture, Data Model, and HBase Integration for Time-Series Data Storage
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 22, 2020 · Databases

JDHBase Multi‑Active Architecture and Replication Mechanisms

This article describes JDHBase’s large‑scale KV storage, its HBase‑based replication principle, the multi‑active cluster architecture with Fox Manager, client routing, automatic failover, dynamic replication tuning, serial replication guarantees, and future directions for improving cross‑region disaster recovery.

Cluster ManagementHBaseJDHBase
0 likes · 11 min read
JDHBase Multi‑Active Architecture and Replication Mechanisms
Suning Technology
Suning Technology
Jun 19, 2020 · Big Data

How Suning’s Big Data Engine Powered a Record‑Breaking 618 Sale

Suning’s 618 shopping festival showcased a massive sales surge backed by its big‑data platform, which processed over 200 billion requests, handled 38.5 PB of daily data, and delivered 31.5 trillion computations, while Kafka and HBase sustained tens of millions of TPS to ensure a seamless consumer experience.

618 SaleHBaseKafka
0 likes · 5 min read
How Suning’s Big Data Engine Powered a Record‑Breaking 618 Sale
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 16, 2020 · Big Data

Hot and Cold Data Separation in Big Data Systems

The article explains the concept of hot and cold data, why separating them reduces cost, and presents heterogeneous and homogeneous architectural solutions—including Elasticsearch, HBase, AWS S3, and cloud‑based UltraWarm—illustrated with network‑behavior and e‑commerce order system case studies.

AWS S3Big Data ArchitectureData Lifecycle
0 likes · 11 min read
Hot and Cold Data Separation in Big Data Systems
Big Data Technology Architecture
Big Data Technology Architecture
Jun 15, 2020 · Databases

Resolving Zookeeper and HBase Master Crash Caused by jute.maxbuffer Misconfiguration

The article details a step‑by‑step investigation of a Zookeeper outage and subsequent HBase master failure caused by an outdated Zookeeper version bug and an excessively large jute.maxbuffer setting, explaining how to identify the issue, adjust configurations, and improve region assignment performance.

Distributed SystemsHBaseZooKeeper
0 likes · 5 min read
Resolving Zookeeper and HBase Master Crash Caused by jute.maxbuffer Misconfiguration
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 10, 2020 · Databases

Understanding HBase Compaction: Types, Triggers, Algorithms, and Impact on Read/Write Performance

This article explains HBase compaction—a key operation in the Log‑Structured Merge‑Tree model—covering minor and major compaction differences, trigger conditions, configuration parameters, selection algorithms, thread‑pool handling, and the effects on read and write performance in a big‑data database environment.

HBaseLSMbigdata
0 likes · 10 min read
Understanding HBase Compaction: Types, Triggers, Algorithms, and Impact on Read/Write Performance
Didi Tech
Didi Tech
Jun 9, 2020 · Databases

Didi HBase Team’s Upgrade from 0.98 to 1.4.8: Challenges, Solutions, and Lessons Learned

Didi's HBase team upgraded eleven clusters from version 0.98 to 1.4.8, tackling maintenance burdens and custom‑patch divergence, validating RPC and HFile compatibility, performing extensive functional and performance tests, opting for a rolling upgrade, fixing a region‑split data‑loss bug, merging critical upstream patches, and establishing a reusable migration methodology.

CompatibilityDatabase UpgradeDidi
0 likes · 10 min read
Didi HBase Team’s Upgrade from 0.98 to 1.4.8: Challenges, Solutions, and Lessons Learned
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2020 · Databases

HBase Compaction Types and Parameter Tuning Guide

This article explains how HBase uses WAL and MemStore to create HFiles, describes the two compaction types (Minor and Major), and provides detailed recommendations for tuning key compaction-related configuration parameters to improve query performance and reduce HDFS impact.

HBasePerformanceTuning
0 likes · 4 min read
HBase Compaction Types and Parameter Tuning Guide
Big Data Technology Architecture
Big Data Technology Architecture
May 21, 2020 · Databases

Quick Start Guide: Running HBase with Docker

This tutorial demonstrates how to rapidly set up and use HBase inside a Docker container, covering Docker installation, image pulling, container execution, host configuration, accessing the HBase Web UI and shell, Zookeeper interaction, and a Java API example for beginners.

DockerHBaseTutorial
0 likes · 5 min read
Quick Start Guide: Running HBase with Docker
Youzan Coder
Youzan Coder
May 20, 2020 · Backend Development

Real-Time Loss Prevention System: Architecture and Implementation at YouZan

YouZan’s real‑time loss‑prevention platform monitors database binlogs, transforms and verifies transaction data across five loosely coupled layers, handling 200 million daily messages and 60 million checks with dynamic sharding, caching and distributed locks to detect over‑charges, duplicate refunds, migration inconsistencies and unauthorized data changes.

Distributed SystemsHBaseMessage Queue
0 likes · 12 min read
Real-Time Loss Prevention System: Architecture and Implementation at YouZan
Big Data Technology Architecture
Big Data Technology Architecture
May 19, 2020 · Big Data

Design and Implementation of a Unified Data Lake Platform Using HBase, Kafka, and Elasticsearch

This article summarizes the design, architecture, and key modules of a company-wide data lake platform—named “Tianchi”—built on HBase, Kafka, and Elasticsearch, detailing data ingestion, strategy output, metadata management, indexing, monitoring, and offline analysis, and shares lessons learned and future plans.

ArchitectureData PlatformElasticsearch
0 likes · 11 min read
Design and Implementation of a Unified Data Lake Platform Using HBase, Kafka, and Elasticsearch
Big Data Technology Architecture
Big Data Technology Architecture
May 12, 2020 · Databases

Key HBase Configuration Parameters and Production Recommendations (HBase 1.1.2)

This article categorizes and explains the most important HBase 1.1.2 configuration parameters—covering Region sizing, BlockCache strategies, Memstore thresholds, Compaction behavior, HLog handling, Call Queue tuning, and miscellaneous settings—while offering practical recommendations for optimal production deployment.

HBasePerformanceconfiguration
0 likes · 11 min read
Key HBase Configuration Parameters and Production Recommendations (HBase 1.1.2)
Architecture Digest
Architecture Digest
May 4, 2020 · Databases

HBase Overview, Architecture, Installation, and Basic Shell Operations

This article provides a comprehensive introduction to HBase, covering its origins, key characteristics, architecture components, installation steps, basic shell commands for table management, data structures, read/write processes, and high‑availability configuration within the Hadoop ecosystem.

Big DataHBaseHadoop
0 likes · 14 min read
HBase Overview, Architecture, Installation, and Basic Shell Operations
Big Data Technology Architecture
Big Data Technology Architecture
Apr 24, 2020 · Databases

Best Practices for HBase Region Count and Size to Improve Cluster Stability and Performance

The article explains how maintaining an optimal number of HBase regions (typically 20‑200 per RegionServer) and appropriate region size, along with careful MemStore and compaction settings, can prevent memory pressure, reduce GC pauses, and enhance overall cluster stability and throughput.

Cluster OptimizationHBasePerformance Tuning
0 likes · 5 min read
Best Practices for HBase Region Count and Size to Improve Cluster Stability and Performance
Big Data Technology Architecture
Big Data Technology Architecture
Apr 17, 2020 · Databases

Improving HBase Cluster Performance: Cache Optimization, GC Tuning, and Multiget Concurrency

This article details a series of practical enhancements applied to an HBase 1.2.4‑based cluster—including layered BucketCache, data pre‑heating, GC‑friendly object pooling, and a multiget concurrency model—that together raise throughput several‑fold and consistently keep P99 latency below 50 ms in YCSB benchmarks.

BenchmarkCacheGC optimization
0 likes · 14 min read
Improving HBase Cluster Performance: Cache Optimization, GC Tuning, and Multiget Concurrency
Big Data Technology Architecture
Big Data Technology Architecture
Apr 16, 2020 · Databases

Memory Management Optimizations in HBase MemStore: SkipList, MemStoreLAB, ChunkPool, Off‑heap and CCSMap

The article systematically explains how HBase's MemStore uses a SkipList‑based model and introduces successive memory‑management optimizations—MemStoreLAB, ChunkPool, off‑heap chunks, CompactingMemStore and the CCSMap data structure—to reduce object overhead, GC pressure and improve throughput.

HBaseMemStoreMemory Optimization
0 likes · 20 min read
Memory Management Optimizations in HBase MemStore: SkipList, MemStoreLAB, ChunkPool, Off‑heap and CCSMap
dbaplus Community
dbaplus Community
Apr 7, 2020 · Databases

How Pharos Accelerates HBase Multi‑Condition Queries with Low‑Latency Indexing

This article examines Pharos, Everbright Bank's home‑grown HBase indexing middleware, detailing why existing secondary‑index solutions fall short, the design goals of low latency, simple architecture and non‑intrusiveness, and the concrete storage, pagination, and transaction‑consistency techniques that enable fast complex queries on massive data.

HBaseLow latencyPharos
0 likes · 14 min read
How Pharos Accelerates HBase Multi‑Condition Queries with Low‑Latency Indexing
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 1, 2020 · Big Data

HBase Cluster Deployment Architecture, Configuration Optimization, and Application Layer Usage

This article details the evolution of HBase cluster deployment from mixed‑hardware/software setups to fully independent clusters, explains hardware and software considerations, presents memory and region planning, outlines key configuration parameters, and provides Spark integration examples for batch and real‑time queries and writes.

Big DataCluster DeploymentConfiguration Optimization
0 likes · 24 min read
HBase Cluster Deployment Architecture, Configuration Optimization, and Application Layer Usage
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 30, 2020 · Databases

HBase Optimization: JVM Tuning, Region Split Policies, BlockCache, and Compaction Strategies

This guide explains how to optimize HBase performance by adjusting JVM memory settings, selecting appropriate garbage collectors, configuring MSLAB and in‑memory compaction, choosing region split policies, tuning BlockCache implementations, and applying suitable compaction policies for different workloads.

Big DataBlockCacheHBase
0 likes · 18 min read
HBase Optimization: JVM Tuning, Region Split Policies, BlockCache, and Compaction Strategies
Top Architect
Top Architect
Mar 13, 2020 · Big Data

Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation

This article presents a comprehensive guide for synchronizing massive MySQL datasets to HBase, covering environment preparation, fast MySQL data loading techniques, and three practical pipelines—Sqoop, Kafka‑Thrift, and Kafka‑Flink—along with performance comparisons and optimization tips for large‑scale data processing.

Big DataFlinkHBase
0 likes · 24 min read
Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 12, 2020 · Databases

HBase FAQ: Performance Optimization, Bulk Load, Single‑Node Mode, Transactions, and Best Practices

This article compiles a series of HBase questions and answers covering write performance, bulk loading, single‑node configuration, column scalability, transaction isolation, fast deletion methods, off‑heap optimizations, bulkload modes, Hive integration, direct HFile reads, and region planning.

HBaseOff-HeapSingle Node
0 likes · 7 min read
HBase FAQ: Performance Optimization, Bulk Load, Single‑Node Mode, Transactions, and Best Practices
Big Data Technology Architecture
Big Data Technology Architecture
Mar 4, 2020 · Databases

HBase Memory‑Related Performance Tuning Guide

This article explains how to optimize HBase performance by properly configuring JVM memory, selecting suitable garbage‑collection strategies, enabling MSLAB and BucketCache, and adjusting read/write cache ratios to reduce fragmentation and improve throughput.

CacheGarbage CollectionHBase
0 likes · 8 min read
HBase Memory‑Related Performance Tuning Guide
ITPUB
ITPUB
Mar 2, 2020 · Big Data

Mastering ZooKeeper: Core Concepts and Real-World Big Data Applications

This article explains ZooKeeper’s architecture, key concepts such as roles, sessions, ZNodes, versioning, ACLs, and watchers, and demonstrates how it powers essential big‑data components like Hadoop’s ResourceManager and HBase’s master election, naming service, and distributed locking.

Big DataDistributed CoordinationHBase
0 likes · 23 min read
Mastering ZooKeeper: Core Concepts and Real-World Big Data Applications
Big Data Technology Architecture
Big Data Technology Architecture
Feb 22, 2020 · Databases

Using HBase PerformanceEvaluation (PE) Tool for Read/Write Latency Benchmarking (P99/P999)

This article explains how to use HBase's built‑in PerformanceEvaluation tool to run baseline read/write latency tests (P99 and P999), describes key command‑line parameters, presents benchmark results for random and sequential operations, and discusses the implications for HBase performance tuning.

BenchmarkDatabasePerformanceHBase
0 likes · 11 min read
Using HBase PerformanceEvaluation (PE) Tool for Read/Write Latency Benchmarking (P99/P999)
Big Data Technology Architecture
Big Data Technology Architecture
Feb 4, 2020 · Big Data

Using Apache Phoenix on CDH HBase: Installation, Configuration, and Secondary Index Creation

This article explains how to integrate Apache Phoenix with CDH‑based HBase, covering Phoenix overview, version selection, parcel installation, HBase configuration, command‑line usage, mapping existing tables, creating schemas and views, building secondary indexes, and comparing different index types for performance optimization.

Apache PhoenixCDHHBase
0 likes · 15 min read
Using Apache Phoenix on CDH HBase: Installation, Configuration, and Secondary Index Creation
dbaplus Community
dbaplus Community
Feb 2, 2020 · Databases

JDHBase Multi‑Active Disaster Recovery: Replication, Auto‑Failover & Consistency

JDHBase, JD.com’s large‑scale KV store, powers billions of daily reads and writes across 7,000 nodes, and this article details its multi‑active, cross‑region architecture—including HBase replication fundamentals, Fox Manager routing, automatic failover policies, dynamic replication tuning, and serial replication to ensure strong consistency.

Database ArchitectureHBaseReplication
0 likes · 15 min read
JDHBase Multi‑Active Disaster Recovery: Replication, Auto‑Failover & Consistency
Big Data Technology Architecture
Big Data Technology Architecture
Jan 31, 2020 · Big Data

Practical Experience with HBase at NetEase: Architecture, Core Use Cases, HBCK & RIT Troubleshooting, and Diagnosis Strategies

This article summarizes NetEase Hangzhou Research Institute expert Fan Xinxin's presentation on HBase, covering its role in the big‑data ecosystem, core production scenarios, RIT and HBCK troubleshooting techniques, and systematic monitoring and log‑analysis methods for diagnosing HBase issues.

ArchitectureHBCKHBase
0 likes · 11 min read
Practical Experience with HBase at NetEase: Architecture, Core Use Cases, HBCK & RIT Troubleshooting, and Diagnosis Strategies
JD Retail Technology
JD Retail Technology
Jan 6, 2020 · Backend Development

JDHBase Multi‑Active Architecture and Replication Practices

This article describes JDHBase’s large‑scale KV storage deployment, its HBase‑based asynchronous replication mechanism, the multi‑active architecture with active‑standby clusters, client interaction via Fox Manager, automatic failover strategies, dynamic replication tuning, and serial replication techniques to ensure data consistency across data centers.

Cluster ManagementHBaseReplication
0 likes · 13 min read
JDHBase Multi‑Active Architecture and Replication Practices
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Dec 31, 2019 · Big Data

Apache Kylin Overview and Model Optimization Practices for Trajectory Analytics

This article introduces Apache Kylin, details its deployment at Tongcheng Yilong, explains the design of a large‑scale trajectory model, and provides step‑by‑step optimization techniques—including cube dimension reduction, HBase rowkey tuning, build parameter tweaks, high‑cardinality handling, and query compression disabling—to achieve sub‑second OLAP queries on multi‑terabyte data.

Apache KylinBig DataCube
0 likes · 17 min read
Apache Kylin Overview and Model Optimization Practices for Trajectory Analytics
Youzan Coder
Youzan Coder
Dec 18, 2019 · Big Data

HBase Bulkload Practice at Youzan: From MapReduce to Spark Evolution

Youzan’s evolution of HBase bulk‑load—from manual MapReduce jobs to Hive‑SQL and finally Spark—demonstrates how generating HFiles on HDFS, partitioning by region, sorting keys, and handling serialization issues enables billions of records to be loaded efficiently without disrupting production clusters.

HBaseHadoopNoSQL
0 likes · 16 min read
HBase Bulkload Practice at Youzan: From MapReduce to Spark Evolution
Big Data Technology Architecture
Big Data Technology Architecture
Nov 19, 2019 · Backend Development

CMS GC JVM Parameter Tuning Guide for HBase Clusters

This article explains the fundamentals of the CMS (Concurrent Mark Sweep) garbage collector, presents a comprehensive set of JVM parameters optimized for HBase clusters, and provides detailed analysis of key settings to improve performance and reduce GC pauses.

CMS GCGarbage CollectionHBase
0 likes · 7 min read
CMS GC JVM Parameter Tuning Guide for HBase Clusters
dbaplus Community
dbaplus Community
Nov 3, 2019 · Databases

Insights from Data Platform Experts: Distributed Transactions, Aurora, and HBase

A recent data platform salon in Beijing gathered five leading experts who shared practical knowledge on data middle platforms, distributed transaction patterns, SQL audit design, Amazon Aurora's architecture, and JD's large‑scale HBase deployment, offering actionable guidance for modern enterprise data engineering.

Cloud DatabasesData PlatformDistributed Transactions
0 likes · 6 min read
Insights from Data Platform Experts: Distributed Transactions, Aurora, and HBase
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 28, 2019 · Big Data

Big Data Technology and Architecture: Leveraging Spark and HBase for Real‑Time and Offline Processing

This article outlines the challenges of various big‑data scenarios such as financial risk control, recommendation systems, and social feeds, explains why Spark is chosen over alternatives, describes a one‑stop data platform architecture with Spark‑HBase integration, and shares best‑practice tips and case studies.

Big DataData ArchitectureHBase
0 likes · 7 min read
Big Data Technology and Architecture: Leveraging Spark and HBase for Real‑Time and Offline Processing
Hulu Beijing
Hulu Beijing
Oct 28, 2019 · Big Data

How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming

At a Tsinghua University forum, Hulu presented a comprehensive overview of its big‑data solutions for advertising and streaming, covering challenges of massive, complex data, the limits of MySQL, and advanced techniques using HBase, Protobuf, Redis batch pipelines, and its own MPP engine Nesto for high‑performance, scalable analytics.

AdvertisingHBaseMPP
0 likes · 6 min read
How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming
DataFunTalk
DataFunTalk
Oct 25, 2019 · Big Data

Migrating Data from HBase to Kafka Using MapReduce

This article explains how to reverse the typical data flow by extracting massive Rowkeys from HBase with MapReduce, storing them on HDFS, and then using batch Get operations to retrieve the full records and write them into Kafka, while handling retries and monitoring progress.

Big DataData MigrationHBase
0 likes · 9 min read
Migrating Data from HBase to Kafka Using MapReduce
Youzan Coder
Youzan Coder
Oct 25, 2019 · Artificial Intelligence

Personalized Recommendation System Architecture and Techniques at Youzan

Youzan’s personalized recommendation platform combines a four‑layer architecture—data, storage, service, and application—with multi‑dimensional real‑time, offline, and cold‑start recall algorithms, Wide&Deep ranking, HBase/Druid storage, and configurable scene strategies to boost user conversion, traffic monetization, and future scalability.

HBaseWide&Deepcold start
0 likes · 16 min read
Personalized Recommendation System Architecture and Techniques at Youzan
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 21, 2019 · Databases

High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience

This article reviews Alibaba HBase's evolution toward high availability, covering large‑cluster architecture, reliability metrics (MTTF/MTTR), disaster‑recovery strategies such as data replication and traffic switching, performance optimizations for extreme latency requirements, and lessons learned for building resilient distributed database services.

Distributed SystemsHBasePerformance Optimization
0 likes · 20 min read
High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience
Sohu Tech Products
Sohu Tech Products
Oct 17, 2019 · Databases

HBase Table Design Strategies and Best Practices

This article explains HBase's data model and key components, details column descriptor options such as BloomFilter, Compression, Versions, TTL, and MinVersion, and provides practical design guidelines for columns, rowkeys, high vs. wide tables, region pre‑splitting, and hotspot mitigation to achieve optimal performance.

HBaseNoSQLTable Design
0 likes · 17 min read
HBase Table Design Strategies and Best Practices
Sohu Tech Products
Sohu Tech Products
Oct 9, 2019 · Databases

HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization

This article explains HBase’s data model and provides comprehensive table‑design strategies—including column‑descriptor options, row‑key best practices, high‑vs‑wide table trade‑offs, region splitting and pre‑splitting techniques—to help achieve optimal performance and scalability in large‑scale NoSQL workloads.

Big DataColumn FamilyHBase
0 likes · 16 min read
HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization
dbaplus Community
dbaplus Community
Oct 8, 2019 · Big Data

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

This article shares a senior data‑platform engineer's hands‑on experience managing dozens of thousand‑node clusters, detailing nine common cluster problems and step‑by‑step solutions—including performance tuning, RPC fixes, HDFS cleanup, Hive metadata repair, Spark shuffle optimization, HBase region recovery, and Kafka bottleneck mitigation.

Big DataCluster ManagementHBase
0 likes · 17 min read
How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Sep 27, 2019 · Big Data

Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries

This article explains how Apache Kylin, an open‑source distributed analytics engine built on Hadoop/Spark, achieves sub‑second OLAP query performance through pre‑computed cubes, a layered cuboid generation algorithm, bitmap‑based distinct counting, dimension optimization techniques, and tight integration with HBase for storage and fast SQL querying.

Apache KylinBig DataCube
0 likes · 15 min read
Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries