Tagged articles

HBase

312 articles · Page 2 of 4

May 26, 2021 · Databases

Understanding MySQL Slow Queries, Elasticsearch, and HBase: Causes and Practical Solutions

This article explains why MySQL queries become slow, how indexes work and fail, the impact of MDL locks, large‑table challenges, sharding and read‑write splitting strategies, then introduces Elasticsearch’s search capabilities and HBase’s column‑family storage, offering practical guidance for each technology.

Database PerformanceElasticsearchHBase

0 likes · 17 min read

Understanding MySQL Slow Queries, Elasticsearch, and HBase: Causes and Practical Solutions

DataFunTalk

May 22, 2021 · Databases

Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

The article examines the strengths and weaknesses of combining HBase and Elasticsearch for massive data storage and retrieval, outlines three integration patterns and their challenges, and presents Alibaba Cloud's Lindorm Searchindex as a SQL‑driven, low‑cost, strongly consistent solution that simplifies development and improves performance.

Big DataElasticsearchHBase

0 likes · 11 min read

Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

IT Architects Alliance

May 22, 2021 · Big Data

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

This article presents a comprehensive walkthrough of a Flink‑powered recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms (hotness, product similarity, collaborative filtering), front‑end and back‑end UI, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink

0 likes · 11 min read

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

Code Ape Tech Column

May 21, 2021 · Databases

Why Your MySQL Queries Are Slow and How ElasticSearch & HBase Can Help

This article analyzes common causes of slow MySQL queries such as index misuse, MDL locks, and large‑table bottlenecks, then presents practical solutions like proper indexing, sharding, read/write splitting, and evaluates when to complement MySQL with ElasticSearch or HBase for better performance.

Database PerformanceElasticsearchHBase

0 likes · 19 min read

Why Your MySQL Queries Are Slow and How ElasticSearch & HBase Can Help

Architect

May 19, 2021 · Big Data

Flink-Based Real-Time Recommendation System Architecture and Deployment Guide

This article presents a comprehensive overview of a Flink-powered real-time recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms, front‑end and back‑end interfaces, Docker‑based deployment of MySQL, Redis, HBase, Kafka, and step‑by‑step startup procedures.

DockerFlinkHBase

0 likes · 9 min read

Flink-Based Real-Time Recommendation System Architecture and Deployment Guide

Big Data Technology Architecture

May 19, 2021 · Databases

Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

This article examines the complementary strengths of HBase and Elasticsearch, outlines three integration patterns and their associated challenges, and introduces Alibaba Cloud's Lindorm Searchindex as a SQL‑driven, low‑cost solution that simplifies storage and full‑text search for massive data workloads.

ElasticsearchHBaseLindorm

0 likes · 12 min read

Big Data Technology & Architecture

Apr 23, 2021 · Big Data

Reading HBase with Flink 1.12 – Environment Setup, Code Samples, and Result

This article demonstrates how to configure Flink 1.12 to read data from HBase, covering the required environment components, HBase table creation, Maven dependencies, Java POJO and Flink‑SQL code, and showing the query results with and without printing the TableResult.

FlinkHBaseJava

0 likes · 11 min read

Reading HBase with Flink 1.12 – Environment Setup, Code Samples, and Result

dbaplus Community

Apr 17, 2021 · Big Data

How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink

This article details a financial company's exploration of Apache Flink for real‑time processing, covering its unique business constraints, end‑to‑end data pipeline, single‑table and multi‑table use cases, implementation challenges, code snippets, data initialization, testing strategies, and lessons learned.

FinancialFlinkHBase

0 likes · 13 min read

How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink

Big Data Technology Architecture

Apr 15, 2021 · Databases

HBase Read Performance Optimization: Best Practices and Tuning Guide

This article presents a comprehensive, practical guide to diagnosing and optimizing HBase read latency, covering common issues such as Full GC, region‑in‑transition, low write throughput, and high read delay, and offering client‑side, server‑side, column‑family, and HDFS tuning recommendations.

HBase

0 likes · 15 min read

Alibaba Cloud Developer

Apr 14, 2021 · Backend Development

Building a Redis‑Based Distributed Queue to Cut HBase IO Bottlenecks

The article explores what makes code 'good'—emphasizing usability, readability, and maintainability—then details the design and implementation of a lightweight Redis‑based distributed consumption queue that alleviates HBase I/O pressure, describing its architecture, modules, logging, and performance gains.

HBaseRedisbackend

0 likes · 10 min read

Building a Redis‑Based Distributed Queue to Cut HBase IO Bottlenecks

iQIYI Technical Product Team

Apr 9, 2021 · Big Data

Real-Time Data Warehouse at iQIYI Video Production Using Spark and ClickHouse

To meet iQIYI video production’s thousands‑QPS, petabyte‑scale, frequently‑updated data and large‑table join requirements, the team built a Spark‑plus‑ClickHouse real‑time warehouse that streams Kafka changes, joins HBase dimensions, and writes to ClickHouse, reducing reporting development time from days to hours while supporting both offline and real‑time analytics.

ClickHouseData EngineeringHBase

0 likes · 12 min read

Real-Time Data Warehouse at iQIYI Video Production Using Spark and ClickHouse

Big Data Technology Architecture

Mar 13, 2021 · Databases

HBCK: What It Checks, Common Commands, and Repair Strategies for HBase

This article explains how the HBase hbck tool verifies region consistency and table integrity, lists the most frequently used hbck commands, and describes both low‑risk and high‑risk repair procedures, including handling of RIT states and log‑based troubleshooting.

Database RepairHBCKHBase

0 likes · 4 min read

HBCK: What It Checks, Common Commands, and Repair Strategies for HBase

Big Data Technology Architecture

Mar 9, 2021 · Databases

Evaluating ZGC vs G1 GC Performance in HBase Clusters

This article examines the challenges of GC pauses in low‑latency HBase services, explains ZGC’s fully concurrent architecture and key techniques such as colored pointers and read barriers, and presents experimental comparisons of ZGC and G1 GC using YCSB benchmarks, highlighting latency, throughput and CPU usage differences.

G1GCGarbage CollectionHBase

0 likes · 18 min read

Evaluating ZGC vs G1 GC Performance in HBase Clusters

Code Ape Tech Column

Feb 18, 2021 · Databases

Why Your MySQL Queries Are Slow and How to Fix Them with Indexes, ES, and HBase

This article analyzes common causes of slow MySQL queries—especially index misuse—offers practical indexing techniques, explains MDL locks and large‑table bottlenecks, and then compares ElasticSearch and HBase as complementary solutions for high‑performance search and storage.

ElasticsearchHBaseIndexing

0 likes · 20 min read

Why Your MySQL Queries Are Slow and How to Fix Them with Indexes, ES, and HBase

DataFunTalk

Feb 13, 2021 · Databases

Improving HBase Availability and Reducing Latency Spikes with Replication‑Based Multi‑Path Reads and ZGC

This article describes how the Didi HBase team tackled HBase’s weak availability and GC‑induced latency spikes by introducing a replication‑based client multi‑path read mechanism, configuring hedged reads, and adopting the Z Garbage Collector, and presents the resulting performance improvements and remaining challenges.

Big DataHBaseMulti-Path Read

0 likes · 11 min read

Improving HBase Availability and Reducing Latency Spikes with Replication‑Based Multi‑Path Reads and ZGC

ITFLY8 Architecture Home

Feb 3, 2021 · Databases

Master MySQL Slow Queries, ElasticSearch, and HBase: Practical Performance Tips

This article explores why MySQL queries become slow, delves into index pitfalls and optimization techniques, then compares ElasticSearch and HBase architectures, offering practical guidance on when to use each technology and how to combine them for high‑performance data retrieval.

Database PerformanceElasticsearchHBase

0 likes · 22 min read

Master MySQL Slow Queries, ElasticSearch, and HBase: Practical Performance Tips

Top Architect

Jan 31, 2021 · Databases

Understanding and Optimizing Fast Queries: MySQL Indexes, ElasticSearch, and HBase

This article explains why MySQL queries become slow, how proper index design, MDL locks, sharding, read‑write separation, and the use of ElasticSearch or HBase can improve query performance in large‑scale systems, and provides practical tips and code examples for each technique.

HBaseIndexingdatabase optimization

0 likes · 20 min read

Understanding and Optimizing Fast Queries: MySQL Indexes, ElasticSearch, and HBase

Big Data Technology & Architecture

Jan 24, 2021 · Big Data

Design and Implementation of a Big Data OLAP Platform Based on Apache Kylin

This article explains the background, challenges, and architectural design of a big‑data OLAP platform that integrates Apache Kylin with a BI system, detailing pre‑computation strategies, cube construction, user authentication, storage engines, and query mechanisms to achieve sub‑second analytics on massive datasets.

Apache KylinData WarehouseHBase

0 likes · 11 min read

Design and Implementation of a Big Data OLAP Platform Based on Apache Kylin

Full-Stack Internet Architecture

Jan 22, 2021 · Databases

An Overview of HBase: Architecture, Design Principles, and Performance Characteristics

This article provides a comprehensive introduction to HBase, covering its origins, column‑oriented NoSQL design, storage on HDFS, logical and physical structures, read/write workflows, performance optimizations, and common interview questions for big‑data engineers.

Big DataColumnar DatabaseDistributed storage

0 likes · 24 min read

An Overview of HBase: Architecture, Design Principles, and Performance Characteristics

Big Data Technology & Architecture

Jan 19, 2021 · Databases

Understanding B+ Trees and Log-Structured Merge Trees (LSM Trees) in HBase

This article explains the fundamentals and drawbacks of B+ trees, introduces the Log-Structured Merge Tree (LSM Tree) used in HBase and other NoSQL databases, and discusses how LSM architecture, Bloom filters, and MemStore improve write performance while affecting read efficiency.

B+TreeHBaseLSM‑Tree

0 likes · 6 min read

Understanding B+ Trees and Log-Structured Merge Trees (LSM Trees) in HBase

Big Data Technology & Architecture

Jan 7, 2021 · Databases

Comprehensive HBase Optimization Guide: Table Design, RowKey, JVM Tuning, Cache Settings, and Read/Write Performance

This article provides a detailed, practical guide to optimizing HBase in production, covering table pre‑splitting, RowKey design, JVM memory and GC settings, MSLAB and BucketCache configuration, read‑side client and server tuning, write‑side strategies, and additional tips such as compression and scan caching.

CacheDatabase TuningHBase

0 likes · 29 min read

Comprehensive HBase Optimization Guide: Table Design, RowKey, JVM Tuning, Cache Settings, and Read/Write Performance

Big Data Technology & Architecture

Dec 25, 2020 · Big Data

Implementing Custom Source and Sink in Flink Streaming with RocketMQ and HBase

This article details how to migrate Spark Streaming jobs to Flink Streaming by creating custom SourceFunction and SinkFunction implementations, including a RocketMQ source connector and an HBase sink, with code examples, configuration tips, and discussion of checkpointing and watermark handling.

FlinkHBaseRocketMQ

0 likes · 20 min read

Implementing Custom Source and Sink in Flink Streaming with RocketMQ and HBase

Didi Tech

Dec 21, 2020 · Big Data

HBase Availability and Latency Optimizations: Replication‑Based Multi‑Read and ZGC Adoption

To overcome HBase’s weak availability and GC‑induced latency spikes, the DiDi team introduced a replication‑based client multi‑read (hedged‑read) mechanism and migrated to the Z Garbage Collector, which together dramatically cut maximum and 99.9th‑percentile latencies while keeping services online during region disruptions.

Big DataHBaseMulti-Read

0 likes · 12 min read

HBase Availability and Latency Optimizations: Replication‑Based Multi‑Read and ZGC Adoption

Big Data Technology & Architecture

Dec 8, 2020 · Big Data

Horizontal Comparison of HBase, Kudu, and ClickHouse (V2.0)

This article provides a comprehensive technical comparison of HBase, Kudu, and ClickHouse—covering installation dependencies, architecture, basic read/write and query operations, real‑world use cases at Didi, a Kudu‑based real‑time data warehouse, and ClickHouse log‑analysis practices—highlighting each system’s strengths and trade‑offs for big‑data workloads.

ClickHouseDatabase ComparisonHBase

0 likes · 17 min read

Horizontal Comparison of HBase, Kudu, and ClickHouse (V2.0)

DeWu Technology

Nov 19, 2020 · Operations

HBase Operations and Use Cases for High‑Concurrency E‑commerce

In this talk, Yun Jin explains how HBase’s petabyte‑scale, horizontally‑scalable architecture—built on Hadoop, HMaster, RegionServers, and Zookeeper—enables e‑commerce platforms to handle extreme promotion‑day traffic by supporting high‑throughput reads/writes, time‑series monitoring, massive order storage, and robust HA, while covering essential table operations, monitoring, and troubleshooting techniques.

Big DataHBaseOperations

0 likes · 6 min read

HBase Operations and Use Cases for High‑Concurrency E‑commerce

JD Tech Talk

Nov 9, 2020 · Big Data

Trajectory-Based Population Flow Analysis for COVID‑19 Prevention Using HBase and Spark

The article presents a comprehensive big‑data solution that stores massive GPS trajectory records in HBase, processes them with Spark to identify individuals who visited a pandemic source region, and visualizes their spatio‑temporal distribution in target cities to support precise epidemic control measures.

Big DataCOVID-19HBase

0 likes · 8 min read

Trajectory-Based Population Flow Analysis for COVID‑19 Prevention Using HBase and Spark

Practical DevOps Architecture

Nov 6, 2020 · Databases

HBase Overview and Step‑by‑Step Installation Guide

This article introduces HBase’s column‑oriented architecture, explains the roles of Master, RegionServer, and Zookeeper, and provides detailed environment preparation and installation commands for setting up an HBase cluster on Hadoop.

Big DataHBaseInstallation

0 likes · 8 min read

HBase Overview and Step‑by‑Step Installation Guide

Big Data Technology Architecture

Nov 3, 2020 · Big Data

Performance Optimization of Apache Kylin at Beike: HBase Tuning, Region Management, and Slow‑Query Mitigation

This article details how Beike's engineering team scaled Apache Kylin to handle tens of millions of daily queries by optimizing HBase configurations, reducing region count, improving data locality, addressing IO and JVM GC bottlenecks, and implementing comprehensive slow‑query detection and active‑defense mechanisms.

Apache KylinHBaseJVM GC

0 likes · 15 min read

Performance Optimization of Apache Kylin at Beike: HBase Tuning, Region Management, and Slow‑Query Mitigation

Zhongtong Tech

Oct 30, 2020 · Big Data

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

This article details ZTO Express's journey of adopting Apache Kylin for OLAP, comparing it with Presto, describing platform architecture, performance gains, integration with scheduling and monitoring systems, and the practical optimizations and future plans that enabled sub‑second query responses on massive daily data volumes.

Apache KylinBig DataHBase

0 likes · 16 min read

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

dbaplus Community

Oct 19, 2020 · Databases

Scaling an E‑commerce Order System with Sharding, ES‑HBase Search, and Zero‑Downtime Migration

This article details how a high‑traffic e‑commerce platform migrated from a single MySQL instance to a sharded architecture using Sharding‑JDBC, added Elasticsearch‑HBase for multi‑dimensional queries, and implemented zero‑downtime data migration and scaling strategies.

CanalElasticsearchHBase

0 likes · 19 min read

Scaling an E‑commerce Order System with Sharding, ES‑HBase Search, and Zero‑Downtime Migration

MaGe Linux Operations

Sep 7, 2020 · Databases

Step-by-Step Guide to Installing an HBase Cluster on Hadoop

This article explains what HBase is, describes its Master, RegionServer, and Zookeeper components, and provides detailed environment preparation and configuration steps—including host setup, SSH key distribution, JDK installation, HBase deployment, configuration file edits, and cluster startup—so you can run HBase on a Hadoop cluster.

HBaseHadoopbigdata

0 likes · 8 min read

Step-by-Step Guide to Installing an HBase Cluster on Hadoop

Big Data Technology & Architecture

Aug 27, 2020 · Big Data

HBase Architecture, Components, and Operations Overview

This article provides a comprehensive overview of Apache HBase’s architecture, detailing its core components such as RegionServer, HMaster, ZooKeeper, WAL, MemStore, and HFiles, and explains key processes including read/write paths, compaction, region splitting, load balancing, and recovery mechanisms.

Big DataCompactionDatabase Architecture

0 likes · 17 min read

HBase Architecture, Components, and Operations Overview

Big Data Technology & Architecture

Aug 26, 2020 · Big Data

Understanding HBase RegionServer, HRegion, HStore, and Column Family Management

The article explains HBase's RegionServer management of regions and stores, detailing HStore composition, MemStore flushing, split conditions, column family sharing within regions, and the performance implications of multiple column families, recommending a single column family design for optimal I/O efficiency.

ColumnFamilyHBaseRegionServer

0 likes · 3 min read

Understanding HBase RegionServer, HRegion, HStore, and Column Family Management

Big Data Technology & Architecture

Aug 22, 2020 · Big Data

Integrating Kerberos with Spark on CDH: Configuration, Deployment, and Troubleshooting Guide

This guide explains how to prepare a CDH‑based Spark environment for Kerberos authentication, covering prerequisite knowledge, classpath adjustments, HBase configuration files, Spark‑Env settings, user permission grants, Spark‑Submit execution, and common troubleshooting steps.

Big DataCDHHBase

0 likes · 12 min read

Integrating Kerberos with Spark on CDH: Configuration, Deployment, and Troubleshooting Guide

Big Data Technology & Architecture

Aug 18, 2020 · Big Data

End-to-End Real-Time Web Log Processing with Flume, Kafka, Spark Streaming, HBase, and Spring Boot

This tutorial demonstrates how to generate simulated web access logs in Python, schedule them with Crontab, collect them in real time using Flume, forward them to Kafka, process the streams with Spark Streaming, store results in HBase, and visualize the data via a Spring Boot application with ECharts.

Big DataEChartsFlume

0 likes · 36 min read

End-to-End Real-Time Web Log Processing with Flume, Kafka, Spark Streaming, HBase, and Spring Boot

Top Architect

Aug 14, 2020 · Big Data

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

This article presents a comprehensive guide for transferring massive MySQL datasets to HBase, covering environment setup on Ubuntu, three synchronization methods—MySQL LOAD DATA, a Kafka‑Thrift pipeline using Maxwell, and real‑time Flink processing—along with performance comparisons and practical tips for Hadoop, HBase, Kafka, Zookeeper, Phoenix, and related tools.

DataSyncFlinkHBase

0 likes · 24 min read

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

Architecture Digest

Aug 13, 2020 · Big Data

Synchronizing Billion-Row MySQL Data to HBase: Three Practical Schemes and Implementation Guide

This comprehensive guide details three practical methods for syncing massive MySQL datasets to HBase—including Sqoop, Kafka‑Thrift, and Flink pipelines—covering environment setup, configuration, code examples, performance comparisons, and optimization tips for large‑scale data ingestion and querying.

Big DataData synchronizationFlink

0 likes · 24 min read

Synchronizing Billion-Row MySQL Data to HBase: Three Practical Schemes and Implementation Guide

Big Data Technology Architecture

Aug 13, 2020 · Big Data

iQIYI’s Adoption of Apache Kylin for OLAP: Architecture, Optimizations, and Future Plans

The article details iQIYI’s migration from a Hive + MySQL OLAP stack to Apache Kylin, describing the system’s architecture, typical use cases, performance gains, independent HBase deployment, service platform for monitoring, and future plans such as automated cube building and clustering.

Apache KylinCubeHBase

0 likes · 13 min read

iQIYI’s Adoption of Apache Kylin for OLAP: Architecture, Optimizations, and Future Plans

Big Data Technology & Architecture

Aug 2, 2020 · Databases

Introduction to Apache Phoenix: An Open‑Source SQL Layer for HBase

This article introduces Apache Phoenix, an open‑source SQL layer for HBase that enables JDBC‑based table creation, data insertion and querying, while supporting secondary indexes, transactions and various optimizations, and outlines a series covering its syntax, tools, best‑practice and real‑world use cases.

Apache PhoenixHBaseJDBC

0 likes · 2 min read

Introduction to Apache Phoenix: An Open‑Source SQL Layer for HBase

Big Data Technology & Architecture

Jul 29, 2020 · Big Data

Sqoop Tutorial: Importing and Exporting Data between Relational Databases, HDFS, Hive, and HBase

This article provides a comprehensive guide to using Sqoop for importing data from relational databases into HDFS, Hive, and HBase, as well as exporting data back to databases, covering command syntax, options, and practical examples for big‑data workflows.

Big DataHBaseHDFS

0 likes · 8 min read

Sqoop Tutorial: Importing and Exporting Data between Relational Databases, HDFS, Hive, and HBase

Programmer DD

Jul 22, 2020 · Big Data

How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink

This comprehensive guide walks you through setting up a pseudo‑distributed Hadoop environment, loading massive MySQL data with LOAD DATA, Python scripts, and multithreading, and then synchronizing the data to HBase using three approaches—Sqoop, a Kafka‑Thrift pipeline, and a real‑time Kafka‑Flink pipeline—while also comparing query performance of HBase and Phoenix.

FlinkHBaseKafka

0 likes · 28 min read

How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink

Big Data Technology & Architecture

Jul 19, 2020 · Big Data

An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem

This article explains Hive's role as a Hadoop‑based data warehouse, its integration with HBase, the advantages and drawbacks of that combination, introduces Apache Phoenix as a high‑performance SQL layer on HBase, and describes the open‑source NewSQL database Lealone, providing practical usage scenarios and performance comparisons.

Big DataData WarehouseHBase

0 likes · 9 min read

An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem

Big Data Technology & Architecture

Jul 10, 2020 · Big Data

Creating a Test Table in Phoenix/HBase and Implementing a Custom Bitmap Aggregation Function in Spark

This tutorial demonstrates how to create a VARBINARY test table in HBase using Phoenix, serialize its data with RoaringBitmap, implement a custom Spark aggregation function to merge bitmap values, and query the table via Spark SQL, showcasing a practical big-data processing workflow.

Big DataHBasePhoenix

0 likes · 6 min read

Creating a Test Table in Phoenix/HBase and Implementing a Custom Bitmap Aggregation Function in Spark

Big Data Technology & Architecture

Jul 10, 2020 · Databases

Understanding B+ Trees and Log-Structured Merge (LSM) Trees and Their Use in HBase

This article reviews B+ trees, introduces log‑structured merge (LSM) trees, compares their strengths and weaknesses, and explains how HBase leverages LSM trees, HFiles, compaction, and Bloom filters to achieve high‑performance storage for write‑intensive workloads.

B+TreeDataStructuresDatabases

0 likes · 8 min read

Understanding B+ Trees and Log-Structured Merge (LSM) Trees and Their Use in HBase

Big Data Technology & Architecture

Jul 9, 2020 · Big Data

How ZooKeeper Supports HBase: Coordination, Fault Tolerance, Log Splitting, META Table Management, and Replication

This article explains how ZooKeeper functions as a distributed coordination service for HBase, detailing its role in master and RegionServer fault tolerance, log splitting, META table location tracking, and replication management, illustrating the underlying ZNode structures and failover mechanisms.

Big DataDistributed CoordinationHBase

0 likes · 7 min read

How ZooKeeper Supports HBase: Coordination, Fault Tolerance, Log Splitting, META Table Management, and Replication

vivo Internet Technology

Jul 8, 2020 · Databases

OpenTSDB: Architecture, Data Model, and HBase Integration for Time-Series Data Storage

The article offers a detailed technical overview of OpenTSDB’s architecture and data model, explaining how it leverages HBase for scalable time‑series storage, describing core concepts, table schemas, ingestion flow, performance considerations, and future alternatives for large‑scale monitoring workloads.

HBaseOpenTSDBUID mapping

0 likes · 12 min read

OpenTSDB: Architecture, Data Model, and HBase Integration for Time-Series Data Storage

Big Data Technology & Architecture

Jun 22, 2020 · Databases

JDHBase Multi‑Active Architecture and Replication Mechanisms

This article describes JDHBase’s large‑scale KV storage, its HBase‑based replication principle, the multi‑active cluster architecture with Fox Manager, client routing, automatic failover, dynamic replication tuning, serial replication guarantees, and future directions for improving cross‑region disaster recovery.

HBaseJDHBasecluster management

0 likes · 11 min read

JDHBase Multi‑Active Architecture and Replication Mechanisms

Suning Technology

Jun 19, 2020 · Big Data

How Suning’s Big Data Engine Powered a Record‑Breaking 618 Sale

Suning’s 618 shopping festival showcased a massive sales surge backed by its big‑data platform, which processed over 200 billion requests, handled 38.5 PB of daily data, and delivered 31.5 trillion computations, while Kafka and HBase sustained tens of millions of TPS to ensure a seamless consumer experience.

618 SaleHBaseKafka

0 likes · 5 min read

How Suning’s Big Data Engine Powered a Record‑Breaking 618 Sale

Big Data Technology & Architecture

Jun 16, 2020 · Big Data

Hot and Cold Data Separation in Big Data Systems

The article explains the concept of hot and cold data, why separating them reduces cost, and presents heterogeneous and homogeneous architectural solutions—including Elasticsearch, HBase, AWS S3, and cloud‑based UltraWarm—illustrated with network‑behavior and e‑commerce order system case studies.

AWS S3Big Data ArchitectureData Lifecycle

0 likes · 11 min read

Hot and Cold Data Separation in Big Data Systems

Big Data Technology Architecture

Jun 15, 2020 · Databases

Resolving Zookeeper and HBase Master Crash Caused by jute.maxbuffer Misconfiguration

The article details a step‑by‑step investigation of a Zookeeper outage and subsequent HBase master failure caused by an outdated Zookeeper version bug and an excessively large jute.maxbuffer setting, explaining how to identify the issue, adjust configurations, and improve region assignment performance.

HBaseTroubleshootingZookeeper

0 likes · 5 min read

Resolving Zookeeper and HBase Master Crash Caused by jute.maxbuffer Misconfiguration

Big Data Technology & Architecture

Jun 10, 2020 · Databases

Understanding HBase Compaction: Types, Triggers, Algorithms, and Impact on Read/Write Performance

This article explains HBase compaction—a key operation in the Log‑Structured Merge‑Tree model—covering minor and major compaction differences, trigger conditions, configuration parameters, selection algorithms, thread‑pool handling, and the effects on read and write performance in a big‑data database environment.

CompactionHBaseLSM

0 likes · 10 min read

Understanding HBase Compaction: Types, Triggers, Algorithms, and Impact on Read/Write Performance

Didi Tech

Jun 9, 2020 · Databases

Didi HBase Team’s Upgrade from 0.98 to 1.4.8: Challenges, Solutions, and Lessons Learned

Didi's HBase team upgraded eleven clusters from version 0.98 to 1.4.8, tackling maintenance burdens and custom‑patch divergence, validating RPC and HFile compatibility, performing extensive functional and performance tests, opting for a rolling upgrade, fixing a region‑split data‑loss bug, merging critical upstream patches, and establishing a reusable migration methodology.

Database UpgradeDidiHBase

0 likes · 10 min read

Didi HBase Team’s Upgrade from 0.98 to 1.4.8: Challenges, Solutions, and Lessons Learned

Big Data Technology Architecture

Jun 2, 2020 · Databases

JVM Tuning, Region Split, BlockCache, and Compaction Strategies for HBase

This article explains how to configure JVM memory, choose appropriate garbage‑collector settings, tune HBase region split policies, optimize BlockCache implementations, and select suitable compaction strategies to improve HBase performance on clusters of various sizes.

BlockCacheCompactionDatabase Performance

0 likes · 20 min read

JVM Tuning, Region Split, BlockCache, and Compaction Strategies for HBase

Big Data Technology Architecture

May 24, 2020 · Big Data

HBase Region State Machine and Transition Details

The article explains how HBase tracks each region's lifecycle states in hbase:meta and ZooKeeper, lists all possible states with their color codes, and describes the master‑region server interactions for opening, closing, splitting, and merging regions.

HBaseHadoopRegionState

0 likes · 7 min read

HBase Region State Machine and Transition Details

Big Data Technology Architecture

May 22, 2020 · Databases

HBase Compaction Types and Parameter Tuning Guide

This article explains how HBase uses WAL and MemStore to create HFiles, describes the two compaction types (Minor and Major), and provides detailed recommendations for tuning key compaction-related configuration parameters to improve query performance and reduce HDFS impact.

CompactionDatabasesHBase

0 likes · 4 min read

HBase Compaction Types and Parameter Tuning Guide

Big Data Technology Architecture

May 21, 2020 · Databases

Quick Start Guide: Running HBase with Docker

This tutorial demonstrates how to rapidly set up and use HBase inside a Docker container, covering Docker installation, image pulling, container execution, host configuration, accessing the HBase Web UI and shell, Zookeeper interaction, and a Java API example for beginners.

DockerHBasedatabase

0 likes · 5 min read

Quick Start Guide: Running HBase with Docker

Youzan Coder

May 20, 2020 · Backend Development

Real-Time Loss Prevention System: Architecture and Implementation at YouZan

YouZan’s real‑time loss‑prevention platform monitors database binlogs, transforms and verifies transaction data across five loosely coupled layers, handling 200 million daily messages and 60 million checks with dynamic sharding, caching and distributed locks to detect over‑charges, duplicate refunds, migration inconsistencies and unauthorized data changes.

HBaseMessage QueueSharding Strategy

0 likes · 12 min read

Real-Time Loss Prevention System: Architecture and Implementation at YouZan

Big Data Technology Architecture

May 19, 2020 · Big Data

Design and Implementation of a Unified Data Lake Platform Using HBase, Kafka, and Elasticsearch

This article summarizes the design, architecture, and key modules of a company-wide data lake platform—named “Tianchi”—built on HBase, Kafka, and Elasticsearch, detailing data ingestion, strategy output, metadata management, indexing, monitoring, and offline analysis, and shares lessons learned and future plans.

Data PlatformElasticsearchHBase

0 likes · 11 min read

Design and Implementation of a Unified Data Lake Platform Using HBase, Kafka, and Elasticsearch

Big Data Technology Architecture

May 12, 2020 · Databases

Key HBase Configuration Parameters and Production Recommendations (HBase 1.1.2)

This article categorizes and explains the most important HBase 1.1.2 configuration parameters—covering Region sizing, BlockCache strategies, Memstore thresholds, Compaction behavior, HLog handling, Call Queue tuning, and miscellaneous settings—while offering practical recommendations for optimal production deployment.

DatabasesHBasePerformance

0 likes · 11 min read

Key HBase Configuration Parameters and Production Recommendations (HBase 1.1.2)

Architecture Digest

May 4, 2020 · Databases

HBase Overview, Architecture, Installation, and Basic Shell Operations

This article provides a comprehensive introduction to HBase, covering its origins, key characteristics, architecture components, installation steps, basic shell commands for table management, data structures, read/write processes, and high‑availability configuration within the Hadoop ecosystem.

Big DataHBaseHadoop

0 likes · 14 min read

HBase Overview, Architecture, Installation, and Basic Shell Operations

Big Data Technology Architecture

Apr 29, 2020 · Databases

Enhancing HBase CAP Model and MTTR with Kafka‑Based IO Decoupling and Native AP Support

The article analyzes HBase's CP‑oriented CAP limitations, proposes native AP support via Replica, decouples WAL IO to Kafka, optimizes MTTR, introduces multi‑datacenter active/active disaster recovery, and redesigns client write paths and LogSplit processing for higher availability and throughput.

CAPDatabase ArchitectureHBase

0 likes · 11 min read

Enhancing HBase CAP Model and MTTR with Kafka‑Based IO Decoupling and Native AP Support

Big Data Technology Architecture

Apr 24, 2020 · Databases

Best Practices for HBase Region Count and Size to Improve Cluster Stability and Performance

The article explains how maintaining an optimal number of HBase regions (typically 20‑200 per RegionServer) and appropriate region size, along with careful MemStore and compaction settings, can prevent memory pressure, reduce GC pauses, and enhance overall cluster stability and throughput.

Cluster OptimizationDatabasesHBase

0 likes · 5 min read

Best Practices for HBase Region Count and Size to Improve Cluster Stability and Performance

Big Data Technology Architecture

Apr 17, 2020 · Databases

Improving HBase Cluster Performance: Cache Optimization, GC Tuning, and Multiget Concurrency

This article details a series of practical enhancements applied to an HBase 1.2.4‑based cluster—including layered BucketCache, data pre‑heating, GC‑friendly object pooling, and a multiget concurrency model—that together raise throughput several‑fold and consistently keep P99 latency below 50 ms in YCSB benchmarks.

BenchmarkCacheGC optimization

0 likes · 14 min read

Improving HBase Cluster Performance: Cache Optimization, GC Tuning, and Multiget Concurrency

Big Data Technology Architecture

Apr 16, 2020 · Databases

Memory Management Optimizations in HBase MemStore: SkipList, MemStoreLAB, ChunkPool, Off‑heap and CCSMap

The article systematically explains how HBase's MemStore uses a SkipList‑based model and introduces successive memory‑management optimizations—MemStoreLAB, ChunkPool, off‑heap chunks, CompactingMemStore and the CCSMap data structure—to reduce object overhead, GC pressure and improve throughput.

GCHBaseJava

0 likes · 20 min read

Memory Management Optimizations in HBase MemStore: SkipList, MemStoreLAB, ChunkPool, Off‑heap and CCSMap

Big Data Technology Architecture

Apr 11, 2020 · Databases

Understanding HBase Write Path and How to Prevent Write Blocking

This article explains the HBase data‑write process—including WAL logging, MemStore caching, and HFile flushing—identifies three levels of write‑blocking (HFile, MemStore, RegionServer), and provides configuration tweaks to mitigate blocking in production environments.

BlockingDatabasesHBase

0 likes · 6 min read

Understanding HBase Write Path and How to Prevent Write Blocking

dbaplus Community

Apr 7, 2020 · Databases

How Pharos Accelerates HBase Multi‑Condition Queries with Low‑Latency Indexing

This article examines Pharos, Everbright Bank's home‑grown HBase indexing middleware, detailing why existing secondary‑index solutions fall short, the design goals of low latency, simple architecture and non‑intrusiveness, and the concrete storage, pagination, and transaction‑consistency techniques that enable fast complex queries on massive data.

HBasePharosdistributed database

0 likes · 14 min read

How Pharos Accelerates HBase Multi‑Condition Queries with Low‑Latency Indexing

Big Data Technology & Architecture

Apr 1, 2020 · Big Data

HBase Cluster Deployment Architecture, Configuration Optimization, and Application Layer Usage

This article details the evolution of HBase cluster deployment from mixed‑hardware/software setups to fully independent clusters, explains hardware and software considerations, presents memory and region planning, outlines key configuration parameters, and provides Spark integration examples for batch and real‑time queries and writes.

Big DataCluster DeploymentConfiguration Optimization

0 likes · 24 min read

HBase Cluster Deployment Architecture, Configuration Optimization, and Application Layer Usage

Big Data Technology & Architecture

Mar 30, 2020 · Databases

HBase Optimization: JVM Tuning, Region Split Policies, BlockCache, and Compaction Strategies

This guide explains how to optimize HBase performance by adjusting JVM memory settings, selecting appropriate garbage collectors, configuring MSLAB and in‑memory compaction, choosing region split policies, tuning BlockCache implementations, and applying suitable compaction policies for different workloads.

Big DataBlockCacheCompaction

0 likes · 18 min read

HBase Optimization: JVM Tuning, Region Split Policies, BlockCache, and Compaction Strategies

Big Data Technology & Architecture

Mar 23, 2020 · Big Data

Best Practices for Designing HBase RowKey to Avoid Hotspots

The article explains how to design HBase RowKeys by dispersing keys, controlling their length, and ensuring uniqueness, providing concrete techniques such as salting, hashing, reversing values, and a practical example with table creation to improve scan performance and prevent region hotspot issues.

Big DataHBaseHotSpot

0 likes · 6 min read

Best Practices for Designing HBase RowKey to Avoid Hotspots

Top Architect

Mar 13, 2020 · Big Data

Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation

This article presents a comprehensive guide for synchronizing massive MySQL datasets to HBase, covering environment preparation, fast MySQL data loading techniques, and three practical pipelines—Sqoop, Kafka‑Thrift, and Kafka‑Flink—along with performance comparisons and optimization tips for large‑scale data processing.

Big DataData synchronizationFlink

0 likes · 24 min read

Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation

Big Data Technology & Architecture

Mar 12, 2020 · Databases

HBase FAQ: Performance Optimization, Bulk Load, Single‑Node Mode, Transactions, and Best Practices

This article compiles a series of HBase questions and answers covering write performance, bulk loading, single‑node configuration, column scalability, transaction isolation, fast deletion methods, off‑heap optimizations, bulkload modes, Hive integration, direct HFile reads, and region planning.

HBaseOff-HeapSingle Node

0 likes · 7 min read

HBase FAQ: Performance Optimization, Bulk Load, Single‑Node Mode, Transactions, and Best Practices

Big Data Technology Architecture

Mar 4, 2020 · Databases

HBase Memory‑Related Performance Tuning Guide

This article explains how to optimize HBase performance by properly configuring JVM memory, selecting suitable garbage‑collection strategies, enabling MSLAB and BucketCache, and adjusting read/write cache ratios to reduce fragmentation and improve throughput.

CacheGarbage CollectionHBase

0 likes · 8 min read

HBase Memory‑Related Performance Tuning Guide

Big Data Technology Architecture

Mar 2, 2020 · Databases

Understanding HBase Flush and Compaction Mechanisms and Their Configuration Parameters

This article explains the core mechanisms of HBase—Flush and Compaction—detailing why they are needed, the conditions that trigger Flush, the types and triggers of Compaction, and provides practical recommendations for tuning the most important configuration parameters to improve write and read performance.

CompactionFlushHBase

0 likes · 11 min read

Understanding HBase Flush and Compaction Mechanisms and Their Configuration Parameters

ITPUB

Mar 2, 2020 · Big Data

Mastering ZooKeeper: Core Concepts and Real-World Big Data Applications

This article explains ZooKeeper’s architecture, key concepts such as roles, sessions, ZNodes, versioning, ACLs, and watchers, and demonstrates how it powers essential big‑data components like Hadoop’s ResourceManager and HBase’s master election, naming service, and distributed locking.

Big DataDistributed CoordinationDistributed Lock

0 likes · 23 min read

Mastering ZooKeeper: Core Concepts and Real-World Big Data Applications

Big Data Technology Architecture

Feb 22, 2020 · Databases

Using HBase PerformanceEvaluation (PE) Tool for Read/Write Latency Benchmarking (P99/P999)

This article explains how to use HBase's built‑in PerformanceEvaluation tool to run baseline read/write latency tests (P99 and P999), describes key command‑line parameters, presents benchmark results for random and sequential operations, and discusses the implications for HBase performance tuning.

BenchmarkDatabasePerformanceHBase

0 likes · 11 min read

Using HBase PerformanceEvaluation (PE) Tool for Read/Write Latency Benchmarking (P99/P999)

Big Data Technology Architecture

Feb 15, 2020 · Databases

An Introduction to HBase: Architecture, Data Model, Storage Engine, Indexing, Features, and Use Cases

This article provides a comprehensive overview of HBase, covering its LSM‑Tree based storage engine, key‑value data model, column‑family storage design, indexing mechanisms, major advantages and drawbacks, and typical scenarios where HBase excels for massive, high‑throughput data workloads.

Distributed storageHBaseIndexing

0 likes · 8 min read

An Introduction to HBase: Architecture, Data Model, Storage Engine, Indexing, Features, and Use Cases

MaGe Linux Operations

Feb 8, 2020 · Operations

Why OpenTSDB Is the Ultimate Time‑Series Monitoring Solution for Scalable Operations

This article introduces OpenTSDB, a highly scalable time‑series monitoring system built on HBase, explains its architecture, demonstrates how it solves common monitoring challenges, and shows practical usage examples including data modeling, collector integration, and real‑world deployment insights.

HBaseOpenTSDBOperations

0 likes · 9 min read

Why OpenTSDB Is the Ultimate Time‑Series Monitoring Solution for Scalable Operations

Big Data Technology Architecture

Feb 4, 2020 · Big Data

Using Apache Phoenix on CDH HBase: Installation, Configuration, and Secondary Index Creation

This article explains how to integrate Apache Phoenix with CDH‑based HBase, covering Phoenix overview, version selection, parcel installation, HBase configuration, command‑line usage, mapping existing tables, creating schemas and views, building secondary indexes, and comparing different index types for performance optimization.

Apache PhoenixCDHHBase

0 likes · 15 min read

Using Apache Phoenix on CDH HBase: Installation, Configuration, and Secondary Index Creation

dbaplus Community

Feb 2, 2020 · Databases

JDHBase Multi‑Active Disaster Recovery: Replication, Auto‑Failover & Consistency

JDHBase, JD.com’s large‑scale KV store, powers billions of daily reads and writes across 7,000 nodes, and this article details its multi‑active, cross‑region architecture—including HBase replication fundamentals, Fox Manager routing, automatic failover policies, dynamic replication tuning, and serial replication to ensure strong consistency.

Database ArchitectureDisaster RecoveryHBase

0 likes · 15 min read

JDHBase Multi‑Active Disaster Recovery: Replication, Auto‑Failover & Consistency

Big Data Technology Architecture

Jan 31, 2020 · Big Data

Practical Experience with HBase at NetEase: Architecture, Core Use Cases, HBCK & RIT Troubleshooting, and Diagnosis Strategies

This article summarizes NetEase Hangzhou Research Institute expert Fan Xinxin's presentation on HBase, covering its role in the big‑data ecosystem, core production scenarios, RIT and HBCK troubleshooting techniques, and systematic monitoring and log‑analysis methods for diagnosing HBase issues.

HBCKHBaseRIT

0 likes · 11 min read

Practical Experience with HBase at NetEase: Architecture, Core Use Cases, HBCK & RIT Troubleshooting, and Diagnosis Strategies

Big Data Technology & Architecture

Jan 8, 2020 · Big Data

Real-Time Data Warehouse Architecture and Challenges Using Flink, Kafka, and HBase

This article examines the design of a real-time data warehouse built on Flink, Kafka, and HBase, compares it with traditional offline warehouses, and discusses key challenges such as data accuracy, latency, and the complexity of maintaining real-time dimension tables.

Big DataData WarehouseFlink

0 likes · 10 min read

Real-Time Data Warehouse Architecture and Challenges Using Flink, Kafka, and HBase

Big Data Technology & Architecture

Jan 7, 2020 · Big Data

Real-time Data Processing with Kafka, Spark Streaming, and HBase: Implementation Guide

This article presents a step‑by‑step guide for building a real‑time data pipeline using Kafka as a message buffer, Spark‑Streaming's Direct Approach for processing, and HBase for storage, including code examples, Maven configuration, local cluster setup, and troubleshooting tips.

Big DataHBaseKafka

0 likes · 12 min read

Real-time Data Processing with Kafka, Spark Streaming, and HBase: Implementation Guide

JD Retail Technology

Jan 6, 2020 · Backend Development

JDHBase Multi‑Active Architecture and Replication Practices

This article describes JDHBase’s large‑scale KV storage deployment, its HBase‑based asynchronous replication mechanism, the multi‑active architecture with active‑standby clusters, client interaction via Fox Manager, automatic failover strategies, dynamic replication tuning, and serial replication techniques to ensure data consistency across data centers.

Distributed storageHBaseHigh Availability

0 likes · 13 min read

JDHBase Multi‑Active Architecture and Replication Practices

Tongcheng Travel Technology Center

Dec 31, 2019 · Big Data

Apache Kylin Overview and Model Optimization Practices for Trajectory Analytics

This article introduces Apache Kylin, details its deployment at Tongcheng Yilong, explains the design of a large‑scale trajectory model, and provides step‑by‑step optimization techniques—including cube dimension reduction, HBase rowkey tuning, build parameter tweaks, high‑cardinality handling, and query compression disabling—to achieve sub‑second OLAP queries on multi‑terabyte data.

Apache KylinBig DataCube

0 likes · 17 min read

Apache Kylin Overview and Model Optimization Practices for Trajectory Analytics

Youzan Coder

Dec 18, 2019 · Big Data

HBase Bulkload Practice at Youzan: From MapReduce to Spark Evolution

Youzan’s evolution of HBase bulk‑load—from manual MapReduce jobs to Hive‑SQL and finally Spark—demonstrates how generating HFiles on HDFS, partitioning by region, sorting keys, and handling serialization issues enables billions of records to be loaded efficiently without disrupting production clusters.

HBaseHadoopNoSQL

0 likes · 16 min read

HBase Bulkload Practice at Youzan: From MapReduce to Spark Evolution

dbaplus Community

Dec 11, 2019 · Databases

How Alibaba Scales HBase for High Availability: 10‑Year Lessons from Production

This article reviews Alibaba's decade‑long evolution of HBase high‑availability, covering large‑cluster design, MTTF/MTTR metrics, disaster‑recovery strategies, traffic switching, and performance optimizations that together enable millions of requests per second with near‑zero downtime.

Alibaba CloudHBaseHigh Availability

0 likes · 21 min read

How Alibaba Scales HBase for High Availability: 10‑Year Lessons from Production

Big Data Technology & Architecture

Dec 2, 2019 · Big Data

Implementing Custom Flink Sources and Sinks for RocketMQ and HBase Streaming

This article explains how to create custom Flink SourceFunction and SinkFunction implementations, demonstrates a RocketMQ source and an HBase sink with full code examples, and discusses checkpointing, event‑time handling, and deployment of the streaming job on a Flink‑on‑YARN cluster.

Big DataFlinkHBase

0 likes · 16 min read

Implementing Custom Flink Sources and Sinks for RocketMQ and HBase Streaming

Big Data Technology Architecture

Nov 19, 2019 · Backend Development

CMS GC JVM Parameter Tuning Guide for HBase Clusters

This article explains the fundamentals of the CMS (Concurrent Mark Sweep) garbage collector, presents a comprehensive set of JVM parameters optimized for HBase clusters, and provides detailed analysis of key settings to improve performance and reduce GC pauses.

CMS GCGarbage CollectionHBase

0 likes · 7 min read

CMS GC JVM Parameter Tuning Guide for HBase Clusters

Big Data Technology Architecture

Nov 11, 2019 · Databases

Practical Guide to Querying HBase with Python happybase and JPype

This tutorial walks through setting up the Python happybase library, installing JPype for Java integration, and demonstrates end‑to‑end code examples for connecting to an HBase Thrift server, generating row keys via Java utilities, querying data, and handling type conversions.

HBaseJPypePython

0 likes · 7 min read

Practical Guide to Querying HBase with Python happybase and JPype

dbaplus Community

Nov 3, 2019 · Databases

Insights from Data Platform Experts: Distributed Transactions, Aurora, and HBase

A recent data platform salon in Beijing gathered five leading experts who shared practical knowledge on data middle platforms, distributed transaction patterns, SQL audit design, Amazon Aurora's architecture, and JD's large‑scale HBase deployment, offering actionable guidance for modern enterprise data engineering.

Cloud DatabasesData PlatformDatabases

0 likes · 6 min read

Insights from Data Platform Experts: Distributed Transactions, Aurora, and HBase

Big Data Technology & Architecture

Oct 28, 2019 · Big Data

Big Data Technology and Architecture: Leveraging Spark and HBase for Real‑Time and Offline Processing

This article outlines the challenges of various big‑data scenarios such as financial risk control, recommendation systems, and social feeds, explains why Spark is chosen over alternatives, describes a one‑stop data platform architecture with Spark‑HBase integration, and shares best‑practice tips and case studies.

Big DataData ArchitectureHBase

0 likes · 7 min read

Big Data Technology and Architecture: Leveraging Spark and HBase for Real‑Time and Offline Processing

Hulu Beijing

Oct 28, 2019 · Big Data

How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming

At a Tsinghua University forum, Hulu presented a comprehensive overview of its big‑data solutions for advertising and streaming, covering challenges of massive, complex data, the limits of MySQL, and advanced techniques using HBase, Protobuf, Redis batch pipelines, and its own MPP engine Nesto for high‑performance, scalable analytics.

AdvertisingHBaseMPP

0 likes · 6 min read

How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming

DataFunTalk

Oct 25, 2019 · Big Data

Migrating Data from HBase to Kafka Using MapReduce

This article explains how to reverse the typical data flow by extracting massive Rowkeys from HBase with MapReduce, storing them on HDFS, and then using batch Get operations to retrieve the full records and write them into Kafka, while handling retries and monitoring progress.

Big DataData MigrationHBase

0 likes · 9 min read

Migrating Data from HBase to Kafka Using MapReduce

Youzan Coder

Oct 25, 2019 · Artificial Intelligence

Personalized Recommendation System Architecture and Techniques at Youzan

Youzan’s personalized recommendation platform combines a four‑layer architecture—data, storage, service, and application—with multi‑dimensional real‑time, offline, and cold‑start recall algorithms, Wide&Deep ranking, HBase/Druid storage, and configurable scene strategies to boost user conversion, traffic monetization, and future scalability.

HBaseWide&Deepcold-start

0 likes · 16 min read

Personalized Recommendation System Architecture and Techniques at Youzan

Big Data Technology & Architecture

Oct 21, 2019 · Databases

High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience

This article reviews Alibaba HBase's evolution toward high availability, covering large‑cluster architecture, reliability metrics (MTTF/MTTR), disaster‑recovery strategies such as data replication and traffic switching, performance optimizations for extreme latency requirements, and lessons learned for building resilient distributed database services.

DatabasesDisaster RecoveryHBase

0 likes · 20 min read

High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience

Sohu Tech Products

Oct 17, 2019 · Databases

HBase Table Design Strategies and Best Practices

This article explains HBase's data model and key components, details column descriptor options such as BloomFilter, Compression, Versions, TTL, and MinVersion, and provides practical design guidelines for columns, rowkeys, high vs. wide tables, region pre‑splitting, and hotspot mitigation to achieve optimal performance.

HBaseNoSQLTable Design

0 likes · 17 min read

HBase Table Design Strategies and Best Practices

Sohu Tech Products

Oct 9, 2019 · Databases

HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization

This article explains HBase’s data model and provides comprehensive table‑design strategies—including column‑descriptor options, row‑key best practices, high‑vs‑wide table trade‑offs, region splitting and pre‑splitting techniques—to help achieve optimal performance and scalability in large‑scale NoSQL workloads.

Big DataColumn FamilyHBase

0 likes · 16 min read

HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization

dbaplus Community

Oct 8, 2019 · Big Data

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

This article shares a senior data‑platform engineer's hands‑on experience managing dozens of thousand‑node clusters, detailing nine common cluster problems and step‑by‑step solutions—including performance tuning, RPC fixes, HDFS cleanup, Hive metadata repair, Spark shuffle optimization, HBase region recovery, and Kafka bottleneck mitigation.

Big DataHBaseHadoop

0 likes · 17 min read

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

Xueersi Online School Tech Team

Sep 27, 2019 · Big Data

Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries

This article explains how Apache Kylin, an open‑source distributed analytics engine built on Hadoop/Spark, achieves sub‑second OLAP query performance through pre‑computed cubes, a layered cuboid generation algorithm, bitmap‑based distinct counting, dimension optimization techniques, and tight integration with HBase for storage and fast SQL querying.

Apache KylinBig DataCube

0 likes · 15 min read

Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries

Big Data Technology & Architecture

Sep 25, 2019 · Big Data

Designing and Using Global Secondary Indexes in Apache Phoenix

This article explains how Apache Phoenix implements global secondary indexes using separate HBase tables, demonstrates index creation and data synchronization with example SQL, and provides design guidelines to optimize query latency and avoid full‑table scans in big‑data environments.

Big DataHBasePhoenix

0 likes · 4 min read

Designing and Using Global Secondary Indexes in Apache Phoenix