Tagged articles
73 articles
Page 1 of 1
JD Tech
JD Tech
Dec 18, 2025 · Backend Development

Can AI Prompts Supercharge Your Backend, Frontend, and Big Data Projects?

This article showcases a series of real‑world development cases—from implementing a guided inventory task in a Java backend and generating Vue rule code, to writing unit tests, analyzing report data, converting SQL to Hive, debugging startup errors, publishing Maven APIs, optimizing slow SQL queries, and resolving MySQL deadlocks—demonstrating how AI‑driven prompts can accelerate coding, testing, and troubleshooting across multiple domains.

BackendDebuggingSQL
0 likes · 31 min read
Can AI Prompts Supercharge Your Backend, Frontend, and Big Data Projects?
Architect's Must-Have
Architect's Must-Have
Sep 15, 2025 · Big Data

Mastering Spark Streaming Rate Control: A Deep Dive into Backpressure

This article explains Spark Streaming's rate control mechanisms, covering static limits, the dynamic back‑pressure feature introduced in Spark 1.5, the PID‑based estimator, RPC communication, and how Guava's token‑bucket RateLimiter enforces the calculated thresholds to ensure stability and optimal throughput.

RateControlSparkStreaming
0 likes · 13 min read
Mastering Spark Streaming Rate Control: A Deep Dive into Backpressure
DataFunTalk
DataFunTalk
Dec 28, 2024 · Big Data

Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI

This article examines the current challenges of enterprise data analysis platforms, outlines three traditional analysis modes, and presents a next‑generation solution that combines Headless BI’s semantic modeling with Chat BI’s large‑language‑model interaction to deliver a more efficient, secure, and user‑friendly analytics experience.

ChatBIDataGovernanceHeadlessBI
0 likes · 15 min read
Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI
JD Cloud Developers
JD Cloud Developers
Dec 25, 2024 · Backend Development

How RoaringBitmap Transforms Massive User ID Storage in CDPs

This article explains how a CDP tackles billions‑scale user ID tags and groups by replacing naïve text‑file storage with bitmap techniques, detailing Bitmap basics, encoding strategies, Java BitSet limitations, and the adoption of RoaringBitmap for efficient compression and fast set operations.

RoaringBitmapbigdatastorage
0 likes · 10 min read
How RoaringBitmap Transforms Massive User ID Storage in CDPs
ITPUB
ITPUB
Dec 14, 2023 · Big Data

How to Build a Python‑Hadoop Word Count on a Single‑Node Cluster

This step‑by‑step guide shows how to install and configure a single‑node Hadoop 3.2.0 environment on CentOS 7, set up Python 3.7, write MapReduce mapper and reducer scripts in Python, and run a word‑count job using Hadoop streaming, illustrating core Hadoop concepts and their relevance today.

HadoopMapReducePython
0 likes · 21 min read
How to Build a Python‑Hadoop Word Count on a Single‑Node Cluster
DataFunSummit
DataFunSummit
Nov 15, 2023 · Big Data

Alibaba Cloud DataWorks Intelligent Data Modeling: Practices and Insights

This article introduces Alibaba Cloud DataWorks' intelligent data modeling tool, outlines the data demand flow, shares best practices and practical demonstrations of data warehouse modeling, discusses model application and data asset management, and answers common questions while highlighting its commercial availability.

AlibabaCloudDataGovernanceDataWarehouse
0 likes · 12 min read
Alibaba Cloud DataWorks Intelligent Data Modeling: Practices and Insights
DataFunSummit
DataFunSummit
Mar 6, 2023 · Big Data

Building a Unified Scheduling Center with Apache DolphinScheduler: Lenovo’s Practice

This article details Lenovo’s implementation of a unified scheduling center using Apache DolphinScheduler, covering background requirements, reasons for choosing the platform, architectural evolution, feature enhancements, and practical deployments such as HTTP task parameter passing, Java task plugins, global parameters, and future roadmap.

DolphinSchedulerLenovobigdata
0 likes · 19 min read
Building a Unified Scheduling Center with Apache DolphinScheduler: Lenovo’s Practice
Tencent Cloud Developer
Tencent Cloud Developer
Mar 1, 2023 · Big Data

We Analysis User Profiling System: Architecture and Technical Implementation

We Analysis, the official data‑analysis platform for WeChat mini‑program providers, delivers a zero‑learning‑curve user‑profiling system that combines basic tag analysis and flexible, rule‑based segmentation, using an ETL pipeline to store pre‑computed data in TDSQL and online bitmap‑optimized queries in ClickHouse with RoaringBitmap, ensuring low‑latency, stable, and comprehensive analytics.

ClickHouseDataPipelineSpark
0 likes · 20 min read
We Analysis User Profiling System: Architecture and Technical Implementation
Big Data Technology Architecture
Big Data Technology Architecture
Feb 15, 2023 · Databases

ClickHouse Usage Guide: Table Engines, Best Practices, and Cluster Architecture

This comprehensive guide introduces ClickHouse as a high‑performance columnar DBMS, outlines its main application scenarios, details the various table engines and their creation syntax, and provides practical development, deployment, and operational recommendations for building reliable ClickHouse clusters.

ClickHouseClusterArchitectureSQLGuidelines
0 likes · 22 min read
ClickHouse Usage Guide: Table Engines, Best Practices, and Cluster Architecture
Java Architect Essentials
Java Architect Essentials
Jan 31, 2023 · Big Data

Optimizing Large-Scale Data Retrieval: ClickHouse Pagination, Elasticsearch Scroll Scan, ES+HBase, and RediSearch + RedisJSON Solutions

This article examines a business requirement to filter and rank up to 100,000 records from a pool of tens of millions, presenting and evaluating four technical solutions—multithreaded ClickHouse pagination, Elasticsearch scroll‑scan deep paging, an ES‑HBase combined query, and a RediSearch + RedisJSON approach—along with performance data and code examples.

ClickHouseElasticsearchHBase
0 likes · 12 min read
Optimizing Large-Scale Data Retrieval: ClickHouse Pagination, Elasticsearch Scroll Scan, ES+HBase, and RediSearch + RedisJSON Solutions
ITPUB
ITPUB
Jan 20, 2023 · Big Data

How Bilibili Supercharged OLAP Queries with Iceberg Lakehouse Optimizations

This article details Bilibili's practical deployment of an Iceberg lake‑warehouse architecture within its OLAP platform, covering the motivations for lake‑warehouse integration, core Iceberg optimizations such as data‑organization sorting, Z‑order and secondary indexes, the Magnus intelligent management platform, and future roadmap plans.

Precomputationbigdataindexing
0 likes · 16 min read
How Bilibili Supercharged OLAP Queries with Iceberg Lakehouse Optimizations
Data Thinking Notes
Data Thinking Notes
Jan 12, 2023 · Big Data

Mastering Alibaba DataWorks: Data Warehouse Architecture & Modeling Guide

This comprehensive tutorial walks you through Alibaba DataWorks' data warehouse architecture, covering technical stack selection, three‑layer warehouse design (ODS, CDM, ADS), detailed data modeling with DDL examples, storage strategies, dimension and fact table conventions, and best‑practice hierarchical call standards.

DataModelingDataWarehouseDataWorks
0 likes · 27 min read
Mastering Alibaba DataWorks: Data Warehouse Architecture & Modeling Guide
Top Architect
Top Architect
Jan 7, 2023 · Big Data

Real‑time Data Processing with ElasticSearch, Kibana and Logstash: Installation, CRUD, Bulk Import, and Data Transformation

This tutorial walks through building a real‑time data processing pipeline using ElasticSearch, Kibana and Logstash, covering core concepts such as data volume, velocity, variety and accuracy, detailed installation steps, CRUD operations, bulk data import, Java‑based data conversion, and Logstash pipeline configuration with filters and date parsing.

BulkImportDataPipelineJava
0 likes · 31 min read
Real‑time Data Processing with ElasticSearch, Kibana and Logstash: Installation, CRUD, Bulk Import, and Data Transformation
Architect
Architect
Dec 19, 2022 · Databases

Understanding Elasticsearch DSL Query Syntax (7.x)

This article provides a comprehensive guide to Elasticsearch 7.x DSL query syntax, explaining core keywords, field mappings, various query types such as match, term, range, fuzzy, and bool, and includes practical code examples for building effective search queries.

DSLElasticsearchElasticsearch7
0 likes · 8 min read
Understanding Elasticsearch DSL Query Syntax (7.x)
Data Thinking Notes
Data Thinking Notes
Dec 14, 2022 · Big Data

Why Spark Jobs Keep Running After You Kill Them: Daemon Threads and Driver Behavior

This article investigates why Spark tasks that appear killed in the Web UI continue running on the driver, analyzes the role of daemon versus non‑daemon threads and SparkContext shutdown mechanisms, reproduces the issue with sample code, and provides practical solutions such as using daemon threads or checking SparkContext status.

DaemonThreadSparkbigdata
0 likes · 8 min read
Why Spark Jobs Keep Running After You Kill Them: Daemon Threads and Driver Behavior
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 14, 2022 · Big Data

Kafka Consumer Group Rebalance: Mechanisms, Strategies, Protocols, and Java Implementation

This article provides a comprehensive overview of Kafka consumer group rebalance, covering version compatibility, rebalance triggers, assignment strategies, generation handling, protocol details, the full rebalance workflow, listener usage, and complete Java code examples for offset management with database integration.

ConsumerGroupJavaKafka
0 likes · 19 min read
Kafka Consumer Group Rebalance: Mechanisms, Strategies, Protocols, and Java Implementation
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 22, 2022 · Artificial Intelligence

Scaling Fashion AI: How Zhiyi Built a Massive Image‑Recognition Platform on Alibaba Cloud

This article details how Hangzhou Zhiyi Technology leverages AI, big‑data pipelines, and Alibaba Cloud services to create a scalable fashion‑focused image‑recognition and visual‑search platform, covering company background, system architecture, model training, vector search, and future technical upgrades.

AICloudComputingFashionTech
0 likes · 13 min read
Scaling Fashion AI: How Zhiyi Built a Massive Image‑Recognition Platform on Alibaba Cloud
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 28, 2022 · Big Data

Spark SQL UNION Causing driver.maxResultSize Error and Its Resolution

When executing a Spark SQL query with dozens of UNION subqueries that each contain JOIN operations on Spark 3.1.2, the job fails because the total serialized result size of the tasks exceeds the driver’s maxResultSize limit, and the issue can be resolved by reducing the initial partition number used by Adaptive Query Execution.

DriverMaxResultSizePerformanceTuningSQL
0 likes · 10 min read
Spark SQL UNION Causing driver.maxResultSize Error and Its Resolution
AntTech
AntTech
Jun 28, 2022 · Operations

AntMonitor: Evolution, Features, and Core Technologies of Ant Group’s Observability Platform

The article details Ant Group’s AntMonitor observability platform, covering its development timeline, holographic monitoring capabilities, integrated performance analysis, efficient data integration, built‑in AI‑driven analytics, Monitoring‑as‑a‑Service, and the underlying high‑performance time‑series database and cloud‑native architecture that support massive real‑time data processing.

CloudNativeObservabilityTimeSeriesDatabase
0 likes · 17 min read
AntMonitor: Evolution, Features, and Core Technologies of Ant Group’s Observability Platform
StarRocks
StarRocks
May 19, 2022 · Big Data

How StarRocks Boosted MaFengWo’s OLAP Performance by 4×

MaFengWo’s data platform replaced Kylin, Presto, and Druid with StarRocks, redesigning its four‑layer architecture, unifying metadata, and optimizing single‑table, multi‑table, and precise‑deduplication queries, which cut query latency by four times, reduced storage by 87%, and lowered operational complexity.

Kylinbigdatadata-warehouse
0 likes · 15 min read
How StarRocks Boosted MaFengWo’s OLAP Performance by 4×
Architecture Digest
Architecture Digest
Oct 16, 2021 · Backend Development

Reflections on Technology Choices: Efficiency, Environment, and Team in Backend and Big Data Development

The author shares a personal journey through Java backend development, big‑data frameworks, database evolution, and team decision‑making, analyzing efficiency, environmental influences, and the impact of community and leadership on technology selection, while emphasizing practical trade‑offs over theoretical performance gains.

BackendJavaTeamManagement
0 likes · 31 min read
Reflections on Technology Choices: Efficiency, Environment, and Team in Backend and Big Data Development
DataFunTalk
DataFunTalk
Aug 28, 2021 · Databases

ClickHouse Projection: Concepts, Use Cases, Implementation and Production Benefits

This article presents an in‑depth overview of ClickHouse's Projection feature, covering its background, definition, storage and query mechanisms, practical use‑case demonstrations, performance comparisons with competing OLAP systems, and real‑world production results that highlight its advantages and limitations.

ClickHouseDataWarehouseMaterializedView
0 likes · 20 min read
ClickHouse Projection: Concepts, Use Cases, Implementation and Production Benefits
Big Data Technology Architecture
Big Data Technology Architecture
May 6, 2021 · Databases

Elasticsearch Pagination: From+size, search_after, and Scroll – Differences, Advantages, and Use Cases

This article explains Elasticsearch’s three pagination methods—From + size, search_after, and Scroll—detailing their definitions, code examples, advantages, disadvantages, and suitable scenarios, while also discussing max_result_window limits, PIT views, and best practices for handling large result sets.

BackendElasticsearchSearch
0 likes · 13 min read
Elasticsearch Pagination: From+size, search_after, and Scroll – Differences, Advantages, and Use Cases
Suning Technology
Suning Technology
Mar 23, 2021 · Operations

How Suning’s All‑Scenario Membership System Drives Private‑Domain Traffic in Post‑COVID Retail

At the 2021 Greater Bay Area Smart Retail Conference, Suning’s Director Wang Junjie revealed how the company’s unified, cross‑scenario membership platform leverages big data and AI to boost private‑domain traffic, streamline member lifecycle management, and deliver seamless digital marketing across all retail formats.

AIPrivateDomainRetail
0 likes · 4 min read
How Suning’s All‑Scenario Membership System Drives Private‑Domain Traffic in Post‑COVID Retail
Didi Tech
Didi Tech
Jan 25, 2021 · Big Data

Migrating Hive SQL to Spark SQL: Design, Implementation, and Performance Evaluation at DiDi

DiDi migrated over 10,000 Hive SQL tasks to Spark SQL using a lightweight dual‑run pipeline that extracts, rewrites, compares, and switches tasks, fixing syntax and UDF differences while adding features such as small‑file merging and enhanced partition pruning, resulting in Spark handling 85 % of workloads with 40 % faster execution, 21 % lower CPU and 49 % lower memory usage.

DataMigrationHiveSQLOptimization
0 likes · 18 min read
Migrating Hive SQL to Spark SQL: Design, Implementation, and Performance Evaluation at DiDi
Didi Tech
Didi Tech
Jan 12, 2021 · Big Data

Upgrading DiDi Real‑time Computing Engine from Flink 1.4 to Flink 1.10: Challenges, Optimizations, and Lessons Learned

DiDi upgraded its massive real‑time computing engine from Flink 1.4.2 to Flink 1.10, implementing a transparent migration across 1500 machines, adding native DDL, binary rows, MiniBatch, improved scheduling and window functions, and establishing a rigorous testing pipeline that achieved 99.9 % compatibility while preventing OOM issues.

FlinkPerformanceOptimizationRealTimeComputing
0 likes · 11 min read
Upgrading DiDi Real‑time Computing Engine from Flink 1.4 to Flink 1.10: Challenges, Optimizations, and Lessons Learned
DataFunTalk
DataFunTalk
Oct 27, 2020 · Databases

Didi's Large‑Scale Elasticsearch Upgrade: Architecture, Migration Strategy, and Performance Gains

This article systematically details Didi's migration of over 30 Elasticsearch clusters, 3,500 nodes and 8 PB of data from version 2.3.3 to 6.6.1, covering background, problem analysis, multi‑version architecture redesign, capacity planning, tiered storage, FastIndex, query replay, upgrade pitfalls, and the resulting cost reduction and performance improvements.

CapacityPlanningElasticsearchbigdata
0 likes · 15 min read
Didi's Large‑Scale Elasticsearch Upgrade: Architecture, Migration Strategy, and Performance Gains
JD Tech Talk
JD Tech Talk
Oct 20, 2020 · Databases

Using ClickHouse for Time‑Series Data Management and Analysis in JD.com JUST Platform

This article explains how JD.com’s JUST platform leverages the open‑source columnar database ClickHouse to store, query and analyze massive time‑series data, covering data modeling, lifecycle management, system goals, technology selection, cluster architecture, deployment, scaling and future enhancements.

ClickHouseDistributedSystemsTimeSeries
0 likes · 20 min read
Using ClickHouse for Time‑Series Data Management and Analysis in JD.com JUST Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 19, 2020 · Big Data

Understanding Flink Timer Mechanism and Its Internal Implementation

This article explains how Flink's Timer mechanism works, covering its usage in KeyedProcessFunction, the underlying TimerService and InternalTimerService implementations, the role of triggers, and the detailed code paths for processing‑time and event‑time timers, while highlighting performance considerations.

FlinkInternalTimerServiceKeyedProcessFunction
0 likes · 16 min read
Understanding Flink Timer Mechanism and Its Internal Implementation
MaGe Linux Operations
MaGe Linux Operations
Sep 7, 2020 · Databases

Step-by-Step Guide to Installing an HBase Cluster on Hadoop

This article explains what HBase is, describes its Master, RegionServer, and Zookeeper components, and provides detailed environment preparation and configuration steps—including host setup, SSH key distribution, JDK installation, HBase deployment, configuration file edits, and cluster startup—so you can run HBase on a Hadoop cluster.

HBaseHadoopbigdata
0 likes · 8 min read
Step-by-Step Guide to Installing an HBase Cluster on Hadoop
Top Architect
Top Architect
Aug 14, 2020 · Big Data

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

This article presents a comprehensive guide for transferring massive MySQL datasets to HBase, covering environment setup on Ubuntu, three synchronization methods—MySQL LOAD DATA, a Kafka‑Thrift pipeline using Maxwell, and real‑time Flink processing—along with performance comparisons and practical tips for Hadoop, HBase, Kafka, Zookeeper, Phoenix, and related tools.

DataSyncFlinkHBase
0 likes · 24 min read
Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Jul 6, 2020 · Big Data

Step-by-Step Guide: Installing ElasticSearch, ElasticSearch‑head, and Integrating with Spring Boot

This tutorial walks through installing ElasticSearch on CentOS, setting up the ElasticSearch‑head visual plugin, and integrating ElasticSearch with a Spring Boot application, including environment preparation, configuration, CRUD API implementation, and testing via Postman, providing a comprehensive guide for developers.

Searchbigdata
0 likes · 14 min read
Step-by-Step Guide: Installing ElasticSearch, ElasticSearch‑head, and Integrating with Spring Boot
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 10, 2020 · Databases

Understanding HBase Compaction: Types, Triggers, Algorithms, and Impact on Read/Write Performance

This article explains HBase compaction—a key operation in the Log‑Structured Merge‑Tree model—covering minor and major compaction differences, trigger conditions, configuration parameters, selection algorithms, thread‑pool handling, and the effects on read and write performance in a big‑data database environment.

HBaseLSMbigdata
0 likes · 10 min read
Understanding HBase Compaction: Types, Triggers, Algorithms, and Impact on Read/Write Performance
Architect
Architect
Jun 10, 2020 · Big Data

Understanding Flink Time Notions: ProcessTime, EventTime, IngestionTime and Watermarks with Code Examples

This article explains the three time notions supported by Apache Flink—ProcessTime, EventTime, and IngestionTime—detailing their semantics, how Watermarks enable event‑time processing, and provides Scala code samples for configuring time characteristics, assigning timestamps, and generating Watermarks in a streaming job.

EventTimeFlinkScala
0 likes · 16 min read
Understanding Flink Time Notions: ProcessTime, EventTime, IngestionTime and Watermarks with Code Examples
Bitu Technology
Bitu Technology
May 29, 2020 · Big Data

Optimizing Data Access in Tubi Data Runtime: Redshift Connector, SQL Cell Magic, and JupyterLab Extensions

This article explains how Tubi Data Runtime (TDR) streamlines data access on JupyterHub by introducing an optimized Redshift connector, custom SQL cell magic, and JupyterLab extensions for data exploration, reducing latency and resource usage while enhancing collaboration and usability for data scientists and engineers.

DataConnectorJupyterHubKubernetes
0 likes · 12 min read
Optimizing Data Access in Tubi Data Runtime: Redshift Connector, SQL Cell Magic, and JupyterLab Extensions
Programmer DD
Programmer DD
May 23, 2020 · Big Data

How Data Middle Platforms Transform Ingestion, Governance, and Real‑Time Analytics

This article outlines the core concepts of a data middle platform, covering data aggregation, ingestion tools, offline and real‑time development, data governance, service layers, and practical implementation details such as ODS, DWD, and monitoring, illustrating how enterprises build scalable, secure data ecosystems.

DataGovernanceDataMiddlePlatformDataWarehouse
0 likes · 32 min read
How Data Middle Platforms Transform Ingestion, Governance, and Real‑Time Analytics
Big Data Technology Architecture
Big Data Technology Architecture
Feb 22, 2020 · Databases

Using HBase PerformanceEvaluation (PE) Tool for Read/Write Latency Benchmarking (P99/P999)

This article explains how to use HBase's built‑in PerformanceEvaluation tool to run baseline read/write latency tests (P99 and P999), describes key command‑line parameters, presents benchmark results for random and sequential operations, and discusses the implications for HBase performance tuning.

BenchmarkDatabasePerformanceHBase
0 likes · 11 min read
Using HBase PerformanceEvaluation (PE) Tool for Read/Write Latency Benchmarking (P99/P999)
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 7, 2019 · Big Data

Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings

This article demonstrates how to build a one‑second‑refresh real‑time dashboard for e‑commerce order data using Apache Flink, Kafka, and Redis, covering JSON message parsing, processing‑time windows, stateful aggregation for site‑level KPIs, and efficient top‑N product ranking via Redis sorted sets.

DashboardFlinkKafka
0 likes · 11 min read
Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings
Beike Product & Technology
Beike Product & Technology
Jun 28, 2019 · Big Data

Hadoop NameNode Performance Bottlenecks and Solutions: Federation, ViewFS, FastCopy, Balance & Mover

This article analyzes the performance and stability bottlenecks of a Hadoop 2.7.3 NameNode caused by memory limits, RPC QPS, and long restart times, and presents a comprehensive solution stack—including HDFS federation, ViewFS, FastCopy, and tuned Balance/Mover tools—to improve scalability and reduce downtime.

BalanceFastCopyFederation
0 likes · 11 min read
Hadoop NameNode Performance Bottlenecks and Solutions: Federation, ViewFS, FastCopy, Balance & Mover
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 18, 2019 · Big Data

Understanding Watermarks, Event Time, and Processing Time in Apache Flink

This article explains the three time concepts in Flink—Process Time, Event Time, and Ingestion Time—illustrates their impact on windowed computations with examples, introduces watermarks and allowed lateness for handling out‑of‑order data, and provides complete Scala code for both processing‑time and event‑time streaming applications.

EventTimeFlinkScala
0 likes · 13 min read
Understanding Watermarks, Event Time, and Processing Time in Apache Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 17, 2019 · Big Data

Understanding Spark SQL: Concepts, Queries, Data Sources, and Practical Examples

This article introduces Spark SQL fundamentals, including its architecture, DataFrame and Dataset abstractions, query methods, interoperability with RDD, user-defined functions, integration with Hive, data source handling, and provides step‑by‑step Scala code examples for loading data, performing aggregations, and solving common analytical tasks.

DataFramesHiveSQL
0 likes · 15 min read
Understanding Spark SQL: Concepts, Queries, Data Sources, and Practical Examples
Big Data Technology Architecture
Big Data Technology Architecture
May 27, 2019 · Databases

Understanding HBase Compaction: Types, Triggers, Parameters, and Performance Impact

This article explains HBase's compaction mechanism, covering why it is needed, the differences between minor and major compaction, the conditions that trigger compaction, key configuration parameters, thread‑pool handling, compaction policies, and how compaction influences read and write performance in a large‑scale NoSQL database.

HBasebigdatacompaction
0 likes · 12 min read
Understanding HBase Compaction: Types, Triggers, Parameters, and Performance Impact
Big Data Technology Architecture
Big Data Technology Architecture
May 8, 2019 · Databases

Understanding HBase Scan Process and Its Performance Compared to Parquet and Kudu

The article explains why HBase read operations are complex due to its LSM‑Tree storage and multi‑version design, details the step‑by‑step Scan workflow, discusses the reasons for its multi‑request architecture, compares scan performance with Parquet and Kudu, and offers recommendations for large‑scale data scanning.

HBaseLSM‑TreeSCAN
0 likes · 7 min read
Understanding HBase Scan Process and Its Performance Compared to Parquet and Kudu
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 16, 2019 · Big Data

Features, Configuration Parameters, and Implementation Details of Hadoop Capacity Scheduler

The article provides a comprehensive overview of Hadoop's Capacity Scheduler, describing its resource‑allocation features, configurable XML parameters, queue access controls, dynamic configuration updates, and the internal workflow of application initialization and resource scheduling within YARN.

CapacitySchedulerHadoopResourceManagement
0 likes · 13 min read
Features, Configuration Parameters, and Implementation Details of Hadoop Capacity Scheduler
Youzan Coder
Youzan Coder
Mar 8, 2019 · Big Data

Why Spark Shuffle Often Runs Out of Memory and How to Fix It

This article examines Spark's memory management and the shuffle process, identifies the components that consume the most memory during shuffle write and read, analyzes common OOM scenarios such as task concurrency and data skew, and offers configuration tips to prevent out‑of‑memory failures.

MemoryManagementOutOfMemoryShuffle
0 likes · 14 min read
Why Spark Shuffle Often Runs Out of Memory and How to Fix It
DataFunTalk
DataFunTalk
Jan 25, 2019 · Big Data

Evolution and Technical Architecture of Ant Financial's Data Analysis Platform

This article presents a comprehensive overview of Ant Financial's data analysis platform, detailing its departmental role, the data analysis lifecycle, the platform's evolution from version 1.0 to 3.0, core technical components such as intelligent sync and pre‑computation, and a practical case study of performance optimization.

AnalyticsDataAnalysisDataEngineering
0 likes · 24 min read
Evolution and Technical Architecture of Ant Financial's Data Analysis Platform
21CTO
21CTO
Jul 6, 2017 · Big Data

How HBase Boosted Tencent Monitoring Platform Performance 3‑5×

Facing the challenge of storing over 120 billion daily monitoring points from hundreds of thousands of servers, Tencent’s monitoring platform migrated from a custom solution and OpenTSDB to a finely tuned HBase architecture, achieving 3‑5× higher throughput, improved reliability, and significant storage savings.

DistributedStorageHBasePerformanceTuning
0 likes · 11 min read
How HBase Boosted Tencent Monitoring Platform Performance 3‑5×