Tagged articles
312 articles
Page 3 of 4
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 23, 2019 · Backend Development

Design and Evolution of Feed Stream Architecture for High‑Throughput Applications

This article analyzes the business requirements, technical challenges, and mainstream architectural solutions for large‑scale feed streams, and proposes a step‑by‑step evolution path—from a simple push model using cloud Kafka and HBase to hybrid push‑pull and recommendation‑driven designs—suitable for startups and rapidly growing platforms.

BackendHBaseKafka
0 likes · 15 min read
Design and Evolution of Feed Stream Architecture for High‑Throughput Applications
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 22, 2019 · Databases

Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration

This article explains how Alibaba Cloud's BDS migration service enables continuous, high‑performance migration of HBase clusters—including schema, full data, and incremental sync—across version upgrades, hardware changes, network migrations, and cross‑region scenarios, while ensuring stability and minimal impact on live workloads.

Alibaba CloudBDSBig Data
0 likes · 10 min read
Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 2, 2019 · Databases

Apache Phoenix Tutorial: Quick Start, Data Types, DML, Indexes, Salted Tables, and Advanced Features

This comprehensive guide introduces Apache Phoenix as an HBase SQL layer, covering quick‑start steps, supported data types, DML syntax, salted tables to prevent hotspots, various secondary index types, bulk‑load methods, auto‑increment IDs, dynamic columns, pagination, query plan analysis, and data migration techniques.

Apache PhoenixData MigrationHBase
0 likes · 33 min read
Apache Phoenix Tutorial: Quick Start, Data Types, DML, Indexes, Salted Tables, and Advanced Features
360 Tech Engineering
360 Tech Engineering
Aug 22, 2019 · Big Data

Design and Implementation of XStore: A Hadoop‑Based Sample Storage System

This article details the design, architecture, and operational experience of XStore, a Hadoop‑backed sample storage system that handles billions of APK and other binary samples, addressing functional and non‑functional requirements such as real‑time upload, large‑scale storage, high‑performance reads, and disaster recovery.

HBaseHDFSHadoop
0 likes · 11 min read
Design and Implementation of XStore: A Hadoop‑Based Sample Storage System
Big Data Technology Architecture
Big Data Technology Architecture
Aug 16, 2019 · Big Data

In‑Depth Overview of HBase Architecture

This article provides a comprehensive, illustrated explanation of Apache HBase's architecture, covering its master‑slave components, region management, Zookeeper coordination, data flow for reads and writes, storage structures, compaction processes, fault recovery, and the system's strengths and limitations within the Hadoop ecosystem.

ArchitectureDistributed SystemsHBase
0 likes · 21 min read
In‑Depth Overview of HBase Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 8, 2019 · Big Data

Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization

This article provides an in‑depth overview of Apache Kylin’s pre‑computation architecture, data‑warehouse concepts, step‑by‑step cube creation from Hive tables, and advanced optimization techniques such as derived dimensions, aggregation groups, and HBase row‑key encoding to achieve sub‑second OLAP queries on massive datasets.

Apache KylinBig DataCube
0 likes · 20 min read
Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization
Big Data Technology Architecture
Big Data Technology Architecture
Aug 5, 2019 · Big Data

Zookeeper in Distributed Systems: Roles in Kafka, Hadoop, HBase, and Solr

This article explains Zookeeper’s core concepts, its ZAB consensus protocol, and surveys its essential roles in major big‑data components such as Kafka, Hadoop, HBase, and Solr, illustrating how it provides configuration, naming, coordination, leader election, and high‑availability services across distributed architectures.

Distributed SystemsHBaseHadoop
0 likes · 5 min read
Zookeeper in Distributed Systems: Roles in Kafka, Hadoop, HBase, and Solr
System Architect Go
System Architect Go
Jul 19, 2019 · Big Data

Introduction to HBase: Architecture, Data Model, and Operations

This article provides a comprehensive overview of HBase, covering its distributed column‑oriented architecture, data model components, storage mechanisms, read/write processes, WAL lifecycle, MemStore flushing, region splitting and merging, and failure recovery within the Hadoop ecosystem.

ArchitectureBig DataHBase
0 likes · 20 min read
Introduction to HBase: Architecture, Data Model, and Operations
dbaplus Community
dbaplus Community
Jul 18, 2019 · Databases

How JD.com Scales HBase to 90PB: Architecture, Optimizations, and Lessons

This article examines JD.com's massive HBase deployment, detailing its evolution from early adoption to a 90PB, 7,000‑node cluster, the platform's architecture, multi‑active disaster recovery, multi‑tenant isolation, and the integration of Phoenix for SQL‑based access, offering practical insights for large‑scale distributed storage.

Big DataDatabase ArchitectureHBase
0 likes · 15 min read
How JD.com Scales HBase to 90PB: Architecture, Optimizations, and Lessons
DataFunTalk
DataFunTalk
Jun 28, 2019 · Databases

Deep Dive into Phoenix Index Creation, Maintenance, and SQL Compilation

This article provides a detailed technical analysis of Phoenix's native index creation and maintenance mechanisms, the underlying source code for index building, the role of coprocessors, and the complete SQL compilation pipeline from parsing to execution, highlighting how hints and optimizers influence index usage.

CoprocessorHBasePhoenix
0 likes · 26 min read
Deep Dive into Phoenix Index Creation, Maintenance, and SQL Compilation
dbaplus Community
dbaplus Community
Jun 2, 2019 · Databases

Why Sharding (Database Partitioning) Beats Partitioning and NoSQL for Massive Data

The article explains why sharding (splitting databases and tables) is the preferred solution for handling massive user, order, and transaction data in high‑traffic internet applications, comparing it with partitioning and NoSQL/NewSQL alternatives, and detailing practical middleware choices, sharding column selection, and integration with Elasticsearch and HBase.

ElasticsearchHBaseMySQL
0 likes · 14 min read
Why Sharding (Database Partitioning) Beats Partitioning and NoSQL for Massive Data
Big Data Technology & Architecture
Big Data Technology & Architecture
May 29, 2019 · Cloud Native

Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases

The article presents Alibaba Cloud's real-time computing solution based on Flink and HBase, covering market competition, open‑source ecosystem, containerized architecture on Kubernetes, and typical applications such as online education video analysis, city‑brain traffic management, and fraud detection.

Big DataCloud NativeFlink
0 likes · 12 min read
Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases
Big Data Technology Architecture
Big Data Technology Architecture
May 27, 2019 · Databases

Understanding HBase Compaction: Types, Triggers, Parameters, and Performance Impact

This article explains HBase's compaction mechanism, covering why it is needed, the differences between minor and major compaction, the conditions that trigger compaction, key configuration parameters, thread‑pool handling, compaction policies, and how compaction influences read and write performance in a large‑scale NoSQL database.

HBasePerformancebigdata
0 likes · 12 min read
Understanding HBase Compaction: Types, Triggers, Parameters, and Performance Impact
Big Data Technology Architecture
Big Data Technology Architecture
May 13, 2019 · Big Data

Problems Caused by Single-Point Region Assignment in HBase and Possible Solutions

The article analyzes how HBase regions being assigned to a single RegionServer create reliability issues such as jitter, service interruptions, and data loss, examines the underlying hardware, OS, and operational factors, and proposes system optimizations and replica-based high‑availability strategies to mitigate these problems.

Distributed SystemsHBaseRegion
0 likes · 10 min read
Problems Caused by Single-Point Region Assignment in HBase and Possible Solutions
Big Data Technology Architecture
Big Data Technology Architecture
May 8, 2019 · Databases

Understanding HBase Scan Process and Its Performance Compared to Parquet and Kudu

The article explains why HBase read operations are complex due to its LSM‑Tree storage and multi‑version design, details the step‑by‑step Scan workflow, discusses the reasons for its multi‑request architecture, compares scan performance with Parquet and Kudu, and offers recommendations for large‑scale data scanning.

HBaseLSM‑TreePerformance
0 likes · 7 min read
Understanding HBase Scan Process and Its Performance Compared to Parquet and Kudu
Big Data Technology & Architecture
Big Data Technology & Architecture
May 7, 2019 · Databases

Design and Multi‑Tenant Management of HBase at Didi

This article details Didi's use of HBase for various online and offline workloads, covering multi‑language support, data types, rowkey designs for order, trajectory and ETA scenarios, multi‑tenant resource management with DHS and RS Group, and operational best practices.

GeoHashHBaseResource Management
0 likes · 12 min read
Design and Multi‑Tenant Management of HBase at Didi
Youzan Coder
Youzan Coder
Apr 17, 2019 · Big Data

Order Data Synchronization Architecture at YouZan: From MySQL to ES and HBase

YouZan’s order data synchronization moves changes from MySQL through Canal‑parsed binlogs into a message queue, then uses sequential SeqNo‑based optimistic locking and HBase’s column‑version timestamps to guarantee ordering for both single‑ and multi‑table updates, while a Logstash‑style configurable pipeline feeds ES for search and HBase for detail queries, eliminating ordered‑queue bottlenecks and ensuring high‑throughput consistency.

BinlogCanalDistributed Systems
0 likes · 12 min read
Order Data Synchronization Architecture at YouZan: From MySQL to ES and HBase
Youzan Coder
Youzan Coder
Apr 12, 2019 · Industry Insights

How Youzan Scaled Its Log Platform to Handle Billions of Daily Logs

This article details Youzan's evolution from a simple Flume‑based log collector to a multi‑tenant, Kafka‑buffered, Spark‑processed, HBase‑backed logging architecture that now handles hundreds of billions of log entries per day, highlighting challenges, design decisions, and future improvements.

Distributed SystemsElasticsearchHBase
0 likes · 10 min read
How Youzan Scaled Its Log Platform to Handle Billions of Daily Logs
JD Retail Technology
JD Retail Technology
Apr 10, 2019 · Databases

HBase at JD.com: Architecture, Use Cases, and Evolution

This article explains how JD.com leverages the open‑source HBase database for massive, low‑latency data storage across various business lines, detailing its architecture, multi‑tenant isolation, disaster‑recovery mechanisms, and integration with Phoenix SQL for OLTP workloads.

Big DataDatabase ArchitectureHBase
0 likes · 13 min read
HBase at JD.com: Architecture, Use Cases, and Evolution
58 Tech
58 Tech
Mar 27, 2019 · Databases

OpenTSDB Architecture, Data Model, Storage Optimizations, and Practical Use Cases

This article introduces OpenTSDB as a distributed, scalable time‑series database built on HBase, explains its architecture, data model, and storage optimizations, presents real‑world monitoring use cases, analyzes performance issues caused by high‑cardinality tags, and details the solution steps taken to restore query speed.

HBaseOpenTSDBStorage Optimization
0 likes · 9 min read
OpenTSDB Architecture, Data Model, Storage Optimizations, and Practical Use Cases
dbaplus Community
dbaplus Community
Mar 12, 2019 · Databases

Mastering HBase Cross‑Datacenter Migration: Snapshots, Architecture, and Real‑World Tips

This article provides a comprehensive technical guide on HBase, covering its core concepts, advantages and drawbacks, architecture layers, practical use cases, and a detailed step‑by‑step process for large‑scale cross‑datacenter migration using snapshot‑based strategies, with commands, diagrams, and lessons learned.

Big DataData MigrationDatabase Architecture
0 likes · 19 min read
Mastering HBase Cross‑Datacenter Migration: Snapshots, Architecture, and Real‑World Tips
Youzan Coder
Youzan Coder
Feb 20, 2019 · Databases

HBase Read Path Analysis

The article first outlines HBase’s overall architecture and core components, then details the end‑to‑end read path—from client request routing to RegionServer processing, data organization and filtering—and finally presents practical client‑ and server‑side optimizations such as heterogeneous storage, HDFS short‑circuit, hedged reads, high‑availability reads, and warm‑up failure fixes, illustrated with Youzan’s production cluster.

Distributed SystemsHBaseTechnical Guide
0 likes · 17 min read
HBase Read Path Analysis
JD Tech
JD Tech
Feb 18, 2019 · Big Data

Understanding HBase: Advantages, Use Cases, Data Model, and Architecture

This article explains HBase as a high‑performance, column‑oriented distributed storage system, outlines its advantages and limitations, presents real‑world scenarios such as seller operation logs and message logs, and details its data structures, architecture components, and design considerations for big‑data applications.

ArchitectureHBaseNoSQL
0 likes · 9 min read
Understanding HBase: Advantages, Use Cases, Data Model, and Architecture
58 Tech
58 Tech
Jan 7, 2019 · Big Data

Comparison of Kuaishou BlobStore and 58 WOS Object Storage Systems

The article summarizes the technical talk from the 58 Group technology salon, detailing the architectures, scalability, high‑availability mechanisms, and storage models of Kuaishou's BlobStore and 58's WOS, and compares their design choices for large‑scale object storage.

BlobStoreHBaseWOS
0 likes · 9 min read
Comparison of Kuaishou BlobStore and 58 WOS Object Storage Systems
58 Tech
58 Tech
Dec 28, 2018 · Big Data

Kylin OLAP Platform Architecture, Optimizations, and 58.com Case Study

This article introduces Kylin, a HBase‑based multidimensional analysis platform, explains its architecture and various performance optimizations—including multi‑tenant support, dimension dictionary handling, and cube size estimation—while showcasing a real‑world deployment and case study at 58.com.

Cube OptimizationHBaseKylin
0 likes · 14 min read
Kylin OLAP Platform Architecture, Optimizations, and 58.com Case Study
Beike Product & Technology
Beike Product & Technology
Dec 27, 2018 · Cloud Computing

HBase Ecosystem Introduction

This article introduces HBase's ecosystem, including its components like OpenTSDB for time-series data, Kylin for cube analysis, Phoenix for SQL operations, and GeoMesa for spatial data, along with the author's experience in deploying these in a production environment.

Cloud ComputingGeoMesaHBase
0 likes · 9 min read
HBase Ecosystem Introduction
58 Tech
58 Tech
Dec 12, 2018 · Big Data

Design and Optimization of 58.com’s HBase Platform: Multi‑Tenant Support, Data Access Interfaces, Import/Export Tools, and Performance Tuning

This article details the architecture and operational enhancements of 58.com’s HBase platform, covering multi‑tenant resource isolation, various data access APIs, bulk import/export mechanisms, and a series of performance optimizations that improve stability and scalability for massive data workloads.

HBasedata importmulti-tenant
0 likes · 18 min read
Design and Optimization of 58.com’s HBase Platform: Multi‑Tenant Support, Data Access Interfaces, Import/Export Tools, and Performance Tuning
Youzan Coder
Youzan Coder
Dec 10, 2018 · Backend Development

How Youzan Scaled Order Export to Millions with ES, HBase, and Config‑Driven Design

This article examines the challenges of Youzan's order export system, describes the migration from PHP‑based scripts to an Elasticsearch and HBase stack, and details the step‑by‑step configuration‑driven refactor—including enum field definitions, Groovy scripts, strategy patterns, plugin architecture, and quality‑assurance practices—that enabled million‑order exports with high performance and stability.

Backend ArchitectureElasticsearchGroovy
0 likes · 13 min read
How Youzan Scaled Order Export to Millions with ES, HBase, and Config‑Driven Design
DataFunTalk
DataFunTalk
Dec 6, 2018 · Databases

HBase RowKey and Index Design: Principles, Practices, and Case Studies

This article introduces HBase fundamentals, explores effective RowKey and secondary index design principles, discusses demand analysis, presents techniques such as reversing, salting, hashing, and reviews real-world case studies for OpenTSDB, JanusGraph, and GeoMesa, offering practical guidance for scalable NoSQL data modeling.

Database ArchitectureHBaseNoSQL
0 likes · 19 min read
HBase RowKey and Index Design: Principles, Practices, and Case Studies
DataFunTalk
DataFunTalk
Dec 1, 2018 · Databases

Apache HBase: Current Status, Development, Features, and Future Roadmap

This article provides a comprehensive overview of Apache HBase, covering its core architecture, key features such as automatic sharding, LSM‑Tree storage, separation of storage and compute, the ecosystem, real‑world use cases, recent 2.0 enhancements, upgrade guidance, future plans, and community recruitment information.

ApacheDatabase FeaturesHBase
0 likes · 10 min read
Apache HBase: Current Status, Development, Features, and Future Roadmap
Programmer DD
Programmer DD
Nov 18, 2018 · Databases

How We Optimized HBase for 80 Billion Daily Logs: Real‑World Tuning Strategies

This article details the practical performance‑tuning steps applied to a large‑scale HBase deployment handling 80 billion daily log entries, covering rowkey redesign, region redistribution, HDFS write‑timeout fixes, network‑topology adjustments, and JVM parameter tweaks that together stabilized the system and dramatically improved throughput.

HBaseHDFSPerformance Tuning
0 likes · 14 min read
How We Optimized HBase for 80 Billion Daily Logs: Real‑World Tuning Strategies
Qunar Tech Salon
Qunar Tech Salon
Nov 6, 2018 · Operations

Analyzing TCP Connection States and Resolving TIME_WAIT, CLOSE_WAIT, and SYN_RECV Issues in a Java/Tomcat/HBase System

This article walks through a real‑world incident where sudden traffic drops were traced to abnormal TCP states—TIME_WAIT, CLOSE_WAIT, and SYN_RECV—by examining monitoring data, explaining the TCP handshake, reviewing relevant kernel parameters, and debugging Java/ZooKeeper/HBase code to identify and fix the root cause.

HBaseSYN_RECVTCP
0 likes · 20 min read
Analyzing TCP Connection States and Resolving TIME_WAIT, CLOSE_WAIT, and SYN_RECV Issues in a Java/Tomcat/HBase System
DataFunTalk
DataFunTalk
Oct 19, 2018 · Databases

HBase Application and High‑Availability Practices

This article summarizes the current usage of HBase at Ping An Technology, the challenges it addresses, detailed client‑ and server‑side performance and stability optimizations, high‑availability mechanisms, data migration strategies, monitoring and repair practices, and future development plans.

Data MigrationHBasePerformance Optimization
0 likes · 9 min read
HBase Application and High‑Availability Practices
DataFunTalk
DataFunTalk
Oct 14, 2018 · Big Data

Exploring Real-Time Data Warehouse Practices Based on HBase

The article details the evolution from an offline to a real‑time HBase data warehouse, covering business scenarios, the use of Maxwell for MySQL‑to‑Kafka ingestion, Phoenix for SQL access, CDH cluster tuning, monitoring, and several production case studies.

HBaseKafkaPhoenix
0 likes · 14 min read
Exploring Real-Time Data Warehouse Practices Based on HBase
DataFunTalk
DataFunTalk
Sep 29, 2018 · Big Data

Applying HBase in a Risk‑Control System and High‑Availability Practices

This article summarizes Guo Dongdong’s presentation on leveraging HBase for a risk‑control platform, detailing its architecture, data import/export mechanisms, indexing, region server recovery challenges, monitoring, SQL interception, dual‑cluster high‑availability, and future enhancements for large‑scale, low‑latency big‑data services.

Distributed SystemsHBasePhoenix
0 likes · 13 min read
Applying HBase in a Risk‑Control System and High‑Availability Practices
DataFunTalk
DataFunTalk
Jul 13, 2018 · Big Data

Applying BitMap Indexing with HBase for Precise Marketing in Big Data

This article details a big‑data precise‑marketing solution that leverages HBase storage and Roaring BitMap indexing to efficiently handle billions of user records, describing project background, technology selection, architecture, partitioning strategy, and coprocessor implementation for fast multidimensional queries.

BitmapHBaseRoaring Bitmap
0 likes · 13 min read
Applying BitMap Indexing with HBase for Precise Marketing in Big Data
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 17, 2018 · Big Data

How a Big Data Platform Powers Real‑Time Facial Recognition for Billion‑Scale Face Libraries

This case study details how Beijing 恒远华信息技术有限公司 built a dynamic face‑capture and real‑time recognition solution on Huawei FusionInsight HD, leveraging deep‑learning algorithms, distributed storage, and stream processing to handle hundreds of millions of faces with high speed, efficiency, and security.

Apache StormHBaseHuawei FusionInsight
0 likes · 17 min read
How a Big Data Platform Powers Real‑Time Facial Recognition for Billion‑Scale Face Libraries
dbaplus Community
dbaplus Community
Apr 2, 2018 · Databases

Why Titan Outperforms Traditional RDBMS for Complex Graph Queries

The article explains how relational databases struggle with many‑to‑many and deep relationship queries, compares popular graph databases, details Titan's modular architecture, data model, Gremlin query examples, storage layout, and demonstrates its successful deployment at Paipaidai for large‑scale fraud detection, achieving over 25% efficiency gains.

Graph DatabaseGremlinHBase
0 likes · 10 min read
Why Titan Outperforms Traditional RDBMS for Complex Graph Queries
Hulu Beijing
Hulu Beijing
Feb 28, 2018 · Big Data

How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data

This article introduces Hulu's in‑house OLAP engine Nesto, detailing its near‑real‑time data ingestion, nested data model, TB‑level storage using HBase and Parquet, MPP query execution, custom predicate library, and the overall architecture that enables sub‑second ad‑hoc queries for user analytics.

Big DataColumnar StorageDistributed Systems
0 likes · 22 min read
How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data
dbaplus Community
dbaplus Community
Nov 26, 2017 · Databases

Understanding HBase Region Auto‑Splitting: Policies, Process, and Pitfalls

This article explains how HBase achieves scalable region auto‑splitting, detailing the various split policies, the algorithm for locating split points, the transactional split workflow, reference file handling, data migration via compaction, cleanup procedures, and common troubleshooting tips.

Distributed SystemsHBaseReference File
0 likes · 17 min read
Understanding HBase Region Auto‑Splitting: Policies, Process, and Pitfalls
Qunar Tech Salon
Qunar Tech Salon
Nov 14, 2017 · Backend Development

Designing Distributed Systems Inspired by McDonald’s Restaurant Operations

The article uses everyday observations from a McDonald’s restaurant to illustrate core distributed system concepts such as master‑slave architecture, two‑phase commit, microservice decomposition, task queues, and container orchestration, showing how these principles apply to backend engineering.

HBaseMaster‑Slavecassandra
0 likes · 15 min read
Designing Distributed Systems Inspired by McDonald’s Restaurant Operations
21CTO
21CTO
Nov 11, 2017 · Big Data

How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase

This article explains the design and implementation of a unified seller‑operation logging platform that uses Kafka for ingestion, Storm for real‑time processing, Elasticsearch for hot‑data search, and HBase for cold‑data storage, detailing the challenges faced and the optimizations applied.

Big DataElasticsearchHBase
0 likes · 12 min read
How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase
dbaplus Community
dbaplus Community
Oct 15, 2017 · Big Data

How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase

This article details JD's end‑to‑end seller log system architecture, explaining why Kafka, Storm, Elasticsearch and HBase were chosen, the challenges faced during scaling, and the practical solutions implemented to achieve a unified, high‑throughput logging platform for merchants and operations.

Big DataElasticsearchHBase
0 likes · 13 min read
How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 10, 2017 · Big Data

Alibaba’s HBase Innovations: Powering Big Data at Scale – HBaseCon 2017 Asia Insights

At HBaseCon 2017 Asia, Alibaba showcased a series of groundbreaking HBase enhancements—including strong synchronous replication, SQL-on-HBase capabilities, cross‑cluster range data copy, and read/write path optimizations—that dramatically improve performance, reliability, and usability for large‑scale big‑data storage.

Big DataHBasePerformance
0 likes · 10 min read
Alibaba’s HBase Innovations: Powering Big Data at Scale – HBaseCon 2017 Asia Insights
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jul 26, 2017 · Big Data

Inside Taobao’s Massive Data Architecture: From Hadoop “Cloud Ladder” to Real‑Time “Galaxy”

This article details Taobao’s multi‑layer massive data platform, covering its five‑tier architecture, the 1500‑node Hadoop “Cloud Ladder” for batch processing, the low‑latency “Galaxy” stream engine, MySQL‑based MyFOX, HBase‑based Prom storage, the glider middle‑layer, and sophisticated caching strategies that together support petabytes of data and millions of daily queries.

Big DataDistributed SystemsHBase
0 likes · 16 min read
Inside Taobao’s Massive Data Architecture: From Hadoop “Cloud Ladder” to Real‑Time “Galaxy”
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jul 10, 2017 · Databases

How Didi Scales HBase for Real‑Time Orders, Geo‑Tracking, and ETA

This article explains how Didi leverages HBase across multiple business scenarios—including order lifecycle queries, driver‑passenger trajectory tracking, ETA calculations, and cluster monitoring—while addressing multi‑language support, rowkey design, GeoHash indexing, and multi‑tenant resource management.

Database designGeoHashHBase
0 likes · 13 min read
How Didi Scales HBase for Real‑Time Orders, Geo‑Tracking, and ETA
21CTO
21CTO
Jul 6, 2017 · Big Data

How HBase Boosted Tencent Monitoring Platform Performance 3‑5×

Facing the challenge of storing over 120 billion daily monitoring points from hundreds of thousands of servers, Tencent’s monitoring platform migrated from a custom solution and OpenTSDB to a finely tuned HBase architecture, achieving 3‑5× higher throughput, improved reliability, and significant storage savings.

DistributedStorageHBasePerformanceTuning
0 likes · 11 min read
How HBase Boosted Tencent Monitoring Platform Performance 3‑5×
Architecture Digest
Architecture Digest
Jul 5, 2017 · Big Data

Design and Practice of Using HBase for Massive TMP Monitoring Data Storage

This article analyzes the limitations of the original TMP monitoring storage architecture, evaluates OpenTSDB's shortcomings at large scale, and details the design, implementation, and performance tuning of a custom HBase-based solution that achieves 3‑5× higher throughput for billions of monitoring data points per day.

HBaseOpenTSDBPerformance Tuning
0 likes · 12 min read
Design and Practice of Using HBase for Massive TMP Monitoring Data Storage
21CTO
21CTO
Jun 19, 2017 · Databases

How Didi Scales HBase for Real‑Time Orders, Geo‑Tracking, ETA and Monitoring

This article explains how Didi leverages HBase’s distributed architecture, multi‑language APIs, and custom rowkey designs to support online order queries, driver‑passenger trajectory tracking with GeoHash, real‑time ETA calculations, and a monitoring platform, while managing multi‑tenant resources through DHS and RS Group.

DidiGeoHashHBase
0 likes · 13 min read
How Didi Scales HBase for Real‑Time Orders, Geo‑Tracking, ETA and Monitoring
dbaplus Community
dbaplus Community
May 24, 2017 · Operations

How to Replace a ZooKeeper Node in a 5‑Node Cluster Without Downtime

This guide details the step‑by‑step process for replacing a faulty ZooKeeper node (myid 5) in a five‑node cluster, covering configuration updates in zoo.cfg, Hadoop’s hdfs‑site.xml, yarn‑site.xml, HBase‑site.xml, and the required service restarts to ensure continuous high‑availability.

HBaseHadoopZooKeeper
0 likes · 10 min read
How to Replace a ZooKeeper Node in a 5‑Node Cluster Without Downtime
Efficient Ops
Efficient Ops
May 2, 2017 · Big Data

Mastering ZooKeeper: Core Concepts and Real-World Big Data Applications

This article introduces ZooKeeper’s fundamental architecture, explains its key concepts such as cluster roles, sessions, ZNodes, watches, and ACLs, and then details how it powers essential distributed coordination tasks—including configuration management, naming services, master election, and distributed locks—in large‑scale Hadoop and HBase ecosystems.

Distributed CoordinationDistributed LocksHBase
0 likes · 25 min read
Mastering ZooKeeper: Core Concepts and Real-World Big Data Applications
Qunar Tech Salon
Qunar Tech Salon
Apr 21, 2017 · Big Data

Ensuring Exact‑Once Semantics in Spark Streaming with Kafka: Offline Repair and Data Deduplication Strategies

This article explains why Spark Streaming combined with Kafka can only guarantee at‑least‑once delivery, outlines the challenges of delayed and out‑of‑order events, and presents practical offline‑repair, deduplication, and output‑format techniques—including code examples—to achieve exact‑once semantics in big‑data pipelines.

Exact-OnceHBaseHDFS
0 likes · 11 min read
Ensuring Exact‑Once Semantics in Spark Streaming with Kafka: Offline Repair and Data Deduplication Strategies
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 1, 2017 · Databases

Alibaba’s HBase Scaling Secrets: High‑Availability, Replication, Performance

This article details how Alibaba has evolved HBase from an internal storage solution to a cloud service, covering its architecture, high‑availability design, asynchronous and synchronous replication, multi‑link data flows, cost‑effective redundancy, performance optimizations, and future development directions.

HBaseReplicationdistributed storage
0 likes · 28 min read
Alibaba’s HBase Scaling Secrets: High‑Availability, Replication, Performance
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 24, 2017 · Databases

Alibaba’s Secrets to Scaling HBase for PB‑Level Big Data

This article explains how Alibaba built, customized, and operated a massive HBase platform—covering its architecture, high‑availability design, asynchronous and synchronous replication, multi‑link data flow, cost‑aware redundancy, cross‑cluster migration, performance optimizations, and future directions for the distributed NoSQL database.

AlibabaHBaseReplication
0 likes · 29 min read
Alibaba’s Secrets to Scaling HBase for PB‑Level Big Data
Meituan Technology Team
Meituan Technology Team
Mar 2, 2017 · Big Data

Meituan Waimai Feature Archive Platform: Architecture, Tag System, and Data Processing

Meituan Waimai’s Feature Archive platform processes billions of daily orders by managing ~200 user and 400 merchant tags through a three‑layer architecture—Hive, Elasticsearch, HBase, and MySQL—offering visual tag selection, instant self‑service queries, full data extraction, and a predicate‑logic query language, while supporting future extensibility.

Big DataElasticsearchHBase
0 likes · 14 min read
Meituan Waimai Feature Archive Platform: Architecture, Tag System, and Data Processing
Qunar Tech Salon
Qunar Tech Salon
Feb 26, 2017 · Big Data

Comparative Analysis of Big Data Storage and Query Solutions

This article reviews major big‑data storage and query architectures—including HBase, Dremel/Parquet, pre‑aggregation systems, Lucene, and the custom Tindex solution—evaluating their strengths, weaknesses, and suitability for real‑time, high‑volume analytical workloads.

Big DataHBaseParquet
0 likes · 20 min read
Comparative Analysis of Big Data Storage and Query Solutions
Weidian Tech Team
Weidian Tech Team
Feb 24, 2017 · Big Data

How We Built a Scalable Dump Index Architecture for 60M Users and 1.3B Products

Facing the challenges of searching across 60 million users and 1.3 billion products, Weidian’s engineering team designed a dump‑based indexing pipeline—Ergate—that consolidates, transforms, version‑controls, and monitors data from MySQL to HBase, enabling fast, flexible, and reliable search across massive datasets.

HBasePlatformizationdata indexing
0 likes · 7 min read
How We Built a Scalable Dump Index Architecture for 60M Users and 1.3B Products
Tencent Cloud Developer
Tencent Cloud Developer
Dec 23, 2016 · Databases

Analysis of HBase Write-Ahead Log (WAL) Mechanism and Source Code Call Chain

The article explains HBase’s write‑ahead‑log architecture, detailing how client put/delete requests travel through RPC to the RegionServer, are processed by MultiRowMutationService, written to the WAL via FSHLog.append and sync, and finally stored in MemStore, while describing durability options and the underlying source‑code call chain.

Big DataHBaseWAL
0 likes · 10 min read
Analysis of HBase Write-Ahead Log (WAL) Mechanism and Source Code Call Chain
dbaplus Community
dbaplus Community
Nov 20, 2016 · Databases

How to Slash HBase Read Latency: Proven Client, Server, and HDFS Tweaks

This article examines the common causes of high read latency in HBase—such as full GC, region‑server imbalance, low write throughput, and inefficient client settings—and provides concrete optimization steps for the client, server, column‑family design, and HDFS layers to dramatically improve performance.

Client TuningHBaseHDFS
0 likes · 16 min read
How to Slash HBase Read Latency: Proven Client, Server, and HDFS Tweaks
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Oct 27, 2016 · Big Data

Inside Taobao’s Massive Data Architecture: How 1.5 PB Daily Is Processed and Served

The article explains Taobao’s five‑layer data product architecture—covering data sources, compute, storage, query, and product layers—and describes how massive volumes of data are ingested, processed in batch and streaming, stored in MySQL and HBase clusters, and served efficiently through a unified middle‑layer and sophisticated caching mechanisms.

Big DataDistributed SystemsHBase
0 likes · 15 min read
Inside Taobao’s Massive Data Architecture: How 1.5 PB Daily Is Processed and Served
Architecture Digest
Architecture Digest
Sep 12, 2016 · Artificial Intelligence

Design and Implementation of a Real‑Time, Highly Available General Recommendation Platform at YHD

The article describes how YHD's precision recommendation team built a real‑time, highly available, traceable general recommendation platform, detailing its background, overall architecture, visual configuration and traceability subsystems, and reporting significant improvements in development speed, reuse and user satisfaction.

AIHBaseKafka
0 likes · 8 min read
Design and Implementation of a Real‑Time, Highly Available General Recommendation Platform at YHD
Ctrip Technology
Ctrip Technology
Aug 26, 2016 · Big Data

Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Practical Applications in Flight Ticket Big Data

This article presents a comprehensive overview of the Qdata session on OLAP engine exploration, detailing the limitations of traditional MySQL‑based solutions, the requirements for large‑scale analytics, the architecture and theoretical foundations of Apache Kylin, its cube construction process, storage in HBase, query rewriting, real‑world flight‑ticket data applications, and the encountered challenges with corresponding optimization practices.

Apache KylinCubeHBase
0 likes · 7 min read
Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Practical Applications in Flight Ticket Big Data
Qunar Tech Salon
Qunar Tech Salon
Aug 16, 2016 · Big Data

Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Applications in Qunar's Big Data Platform

This article presents Qunar's experience transitioning from MySQL‑based OLAP to Apache Kylin, detailing the performance challenges, required features, Kylin's architecture and theory, cube construction process, storage mechanisms, real‑world applications, and the pitfalls and optimization practices discovered along the way.

Apache KylinCubeHBase
0 likes · 6 min read
Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Applications in Qunar's Big Data Platform
dbaplus Community
dbaplus Community
Aug 1, 2016 · Databases

How Facebook Scaled Its Data Storage with NoSQL: Cassandra, HBase, and Beyond

This article traces Facebook's evolution from a small social site to a global platform, explains how its massive data‑storage challenges led to the adoption of NoSQL solutions like Cassandra and HBase, and breaks down the core patterns, consistency models, and scaling techniques that power such large‑scale systems.

ConsistencyFacebookHBase
0 likes · 15 min read
How Facebook Scaled Its Data Storage with NoSQL: Cassandra, HBase, and Beyond
ITPUB
ITPUB
Jun 15, 2016 · Databases

Understanding HBase’s Physical Architecture: Regions, Stores, and WAL

This article explains HBase’s internal architecture, covering the roles of HRegionServer, Client, Zookeeper, Master, RegionServer, the physical storage layout, StoreFile and HFile structures, and the Write-Ahead Log mechanism that ensures data durability and fault tolerance.

HBaseHDFSNoSQL
0 likes · 13 min read
Understanding HBase’s Physical Architecture: Regions, Stores, and WAL
Architecture Digest
Architecture Digest
Jun 9, 2016 · Databases

Understanding HBase Architecture and Core Principles

This article provides a comprehensive overview of HBase, covering its distributed architecture, component roles, data organization, read/write mechanisms, and best practices for schema and region design to ensure efficient big‑data storage and retrieval.

ArchitectureBig DataHBase
0 likes · 17 min read
Understanding HBase Architecture and Core Principles
dbaplus Community
dbaplus Community
May 24, 2016 · Databases

Which NoSQL DB Fits Your Node Project? HBase, Redis, MongoDB, Couchbase, LevelDB Compared

This article provides a detailed comparison of five popular NoSQL databases—HBase, Redis, MongoDB, Couchbase, and LevelDB—covering their data models, performance characteristics, CAP classification, Node.js client options, advantages, drawbacks, and ideal use‑cases to help developers choose the right storage solution for a new Node project.

CouchbaseHBaseLevelDB
0 likes · 28 min read
Which NoSQL DB Fits Your Node Project? HBase, Redis, MongoDB, Couchbase, LevelDB Compared
Architect
Architect
May 6, 2016 · Big Data

Integrating Kylin, Mondrian, and Saiku to Build an OLAP Analysis Tool

This article describes how the Youzan data team combined Apache Kylin, Mondrian, and Saiku into a three‑layer OLAP system, covering background, component overviews, technical architecture, schema integration challenges, count‑distinct handling, Kylin‑specific SQL quirks, and practical solutions.

Big DataHBaseKylin
0 likes · 12 min read
Integrating Kylin, Mondrian, and Saiku to Build an OLAP Analysis Tool