Tagged articles

Data Skew

49 articles · Page 1 of 1

Apr 23, 2026 · Backend Development

How JD Upgraded Its B‑Side Order Storage Architecture to Tackle Elasticsearch High‑Concurrency Pressure

Facing explosive merchant growth and soaring order volumes, JD redesigned its B‑side POP order storage by isolating large tenants, applying double‑hash routing, expanding clusters, buffering updates, and automating data archiving, ultimately delivering a high‑performance, scalable Elasticsearch platform that sustains massive traffic spikes.

Data SkewElasticsearchHigh concurrency

0 likes · 16 min read

How JD Upgraded Its B‑Side Order Storage Architecture to Tackle Elasticsearch High‑Concurrency Pressure

Java Architect Handbook

Apr 6, 2026 · Databases

Why MySQL Indexes Still Slow Queries and How to Fix Them

This guide explains the six common reasons why MySQL indexes may fail to improve query speed, shows how interviewers evaluate index knowledge, and provides concrete SQL examples, EXPLAIN analysis, and practical optimization techniques such as redesigning indexes, using covering indexes, avoiding implicit type conversion, and tuning database configuration.

Covering IndexData SkewDatabase Interview

0 likes · 15 min read

Why MySQL Indexes Still Slow Queries and How to Fix Them

Big Data Technology & Architecture

Dec 31, 2024 · Big Data

Eliminating Shuffle in Spark Joins with Storage Partitioned Join (SPJ) for Iceberg Tables

This article explains how Spark ≥ 3.3 introduces Storage Partitioned Join (SPJ) to avoid costly shuffle operations when joining partitioned V2 source tables such as Apache Iceberg, detailing the required conditions, configuration settings, practical code examples, and various join scenarios including mismatched partitions and data skew.

BucketingData SkewSQL

0 likes · 15 min read

Eliminating Shuffle in Spark Joins with Storage Partitioned Join (SPJ) for Iceberg Tables

Big Data Technology & Architecture

Nov 26, 2024 · Big Data

Understanding Full GC, Data Skew, and Parallelism in Flink Tasks

This article explains how to monitor and interpret Full GC in Flink TaskManagers, detect and address data skew through proper data distribution and parallelism settings, and recommends aligning consumer parallelism with Kafka partitions, while also providing practical tips for using tools like Prometheus and Arthas.

Data SkewFlinkKafka

0 likes · 6 min read

Understanding Full GC, Data Skew, and Parallelism in Flink Tasks

DaTaobao Tech

Jun 21, 2024 · Big Data

Flink Real-Time Data Development: Cases on Data Skew, Watermark Failure, and GroupBy Issues

The article walks through three Flink streaming pitfalls—data‑skew‑induced back‑pressure, lost watermarks after interval joins, and ineffective group‑by causing duplicate rows—and shows how to resolve them with two‑stage distinct aggregation, hash‑based key distribution, processing‑time windows or split jobs, and mini‑batch buffering.

Data SkewFlinkOptimization

0 likes · 14 min read

Flink Real-Time Data Development: Cases on Data Skew, Watermark Failure, and GroupBy Issues

Alibaba Cloud Developer

Apr 30, 2024 · Big Data

Mastering ODPS SQL: Proven Tips to Slash Query Time and Tackle Data Skew

This article explores practical SQL optimization techniques for Alibaba's ODPS platform, covering fundamentals, common pitfalls like null handling and select *, advanced strategies such as multi‑insert, partition limiting, UDF placement, data‑skew mitigation, parameter tuning, and real‑world case studies that dramatically reduce query runtimes.

Big DataData SkewHive

0 likes · 23 min read

Mastering ODPS SQL: Proven Tips to Slash Query Time and Tackle Data Skew

Big Data Technology & Architecture

Jun 16, 2023 · Big Data

Optimizing Big Data SQL: Handling Data Skew and Data Explosion

This article examines common performance issues in big data SQL queries, such as data skew and data explosion, and provides systematic troubleshooting steps and practical optimization techniques across the Map, Reduce, and Join stages, including partition merging, column pruning, predicate pushdown, and join strategies.

Data ExplosionData SkewDistributed Computing

0 likes · 10 min read

Optimizing Big Data SQL: Handling Data Skew and Data Explosion

JD Tech

Jun 14, 2023 · Big Data

Understanding and Solving Data Skew in Offline Big Data Development (Hive & Spark)

This article explains the concept of data skew in offline big‑data jobs, describes its symptoms and root causes, and provides practical optimization techniques for Hive and Spark—including partitioning strategies, map‑join usage, adaptive query settings, and monitoring approaches—to prevent performance degradation and runtime failures.

Data SkewHiveOptimization

0 likes · 17 min read

Understanding and Solving Data Skew in Offline Big Data Development (Hive & Spark)

Alibaba Cloud Developer

Jun 14, 2023 · Big Data

How to Diagnose and Optimize Data Skew and Data Expansion in Big Data SQL

This article shares practical methods, based on real‑world team experience, to identify and resolve data skew and data expansion issues in big data SQL queries, offering systematic investigation steps and optimization techniques for Map, Reduce, and Join stages.

Big DataData SkewHive

0 likes · 9 min read

How to Diagnose and Optimize Data Skew and Data Expansion in Big Data SQL

JD Retail Technology

Apr 14, 2023 · Big Data

Understanding Data Skew and Its Mitigation in Hive and Spark

This article explains the concept of data skew, its symptoms such as slow tasks and OOM errors, and provides comprehensive mitigation techniques and configuration examples for Hive and Spark, including custom partitioning, map joins, adaptive execution, and key detection methods.

Adaptive ExecutionBig DataData Skew

0 likes · 15 min read

Understanding Data Skew and Its Mitigation in Hive and Spark

JD Cloud Developers

Apr 4, 2023 · Databases

How to Scale B‑Token Systems with Horizontal Sharding and Consistent Hashing

This article examines the challenges of growing B‑token data volumes, including table size limits and data skew, and proposes a solution using horizontal sharding with a consistent‑hash ring, dynamic table allocation, water‑level thresholds, periodic archiving, and monitoring to support future growth without costly migrations.

Consistent HashingData SkewScalable Architecture

0 likes · 13 min read

How to Scale B‑Token Systems with Horizontal Sharding and Consistent Hashing

Data Thinking Notes

Dec 21, 2022 · Big Data

Why Your Spark Batch Job Fails: Memory Limits, Data Skew, and Practical Fixes

This article examines a recurring Spark batch task failure caused by OutOfMemory errors and data skew, details the investigation steps—including increasing executor memory, raising parallelism, and analyzing shuffle metrics—and proposes solutions such as data validation, filtering oversized keys, and memory adjustments.

Batch ProcessingData SkewOutOfMemory

0 likes · 4 min read

Why Your Spark Batch Job Fails: Memory Limits, Data Skew, and Practical Fixes

Data Thinking Notes

Nov 22, 2022 · Big Data

Why Sqoop Sync from RDS to Hive Stalls Over 8 Hours and How to Fix It

A Sqoop job that normally finishes within 2.5 hours occasionally takes more than 8 hours due to data skew caused by an unsuitable split column, and the article details the investigation, root‑cause analysis, and a practical solution using a better split column and adjusted parallelism.

Big DataData SkewHive

0 likes · 5 min read

Why Sqoop Sync from RDS to Hive Stalls Over 8 Hours and How to Fix It

Data Thinking Notes

Oct 24, 2022 · Big Data

How to Diagnose and Fix Spark Data Skew: Practical Optimization Techniques

This article explains the causes of Spark data skew, how to locate skewed tasks using the Web UI, and presents six optimization methods—including increasing shuffle parallelism, filtering abnormal keys, two‑stage aggregation, map‑join, key sampling, and random‑prefix joins—plus a real‑world case study.

Big DataData SkewJOIN

0 likes · 21 min read

How to Diagnose and Fix Spark Data Skew: Practical Optimization Techniques

NetEase LeiHuo UX Big Data Technology

Oct 17, 2022 · Big Data

Understanding Data Skew and Its Mitigation Strategies in Distributed Computing

The article explains what data skew is in distributed computing, analyzes its logical and data‑level causes, and presents preventive and remedial techniques such as data partitioning, logical replacement, two‑stage aggregation, increasing parallelism, and data cleaning to improve processing efficiency.

Data SkewPerformance OptimizationSpark

0 likes · 8 min read

Understanding Data Skew and Its Mitigation Strategies in Distributed Computing

DaTaobao Tech

Sep 6, 2022 · Big Data

SQL Optimization Techniques for ODPS (Open Data Processing Service)

The article presents practical ODPS SQL optimization strategies—including explicit column selection, partition limiting, multi‑insert, proper handling of nulls, join‑type choices, map‑join and skew hints, bucketed tables, and tuned task parameters—illustrated with three real‑world cases that dramatically cut execution time and resource usage.

Big DataData SkewHive

0 likes · 23 min read

SQL Optimization Techniques for ODPS (Open Data Processing Service)

DataFunTalk

Jun 28, 2022 · Big Data

JD Retail Traffic Data Warehouse Architecture and Processing Practices

This article presents a comprehensive technical overview of JD.com’s retail traffic data processing pipeline, detailing the multi‑layer data warehouse architecture, real‑time and offline data flows, a large‑scale back‑fill case using Iceberg and OLAP, data‑skew detection and mitigation techniques, and future directions involving unified Flink‑Spark streaming‑batch solutions.

Data SkewFlinkIceberg

0 likes · 12 min read

JD Retail Traffic Data Warehouse Architecture and Processing Practices

JD Retail Technology

Jan 27, 2022 · Big Data

How JD’s Custom Spark Engine Tackles Data Skew for Massive Offline Jobs

This article explains JD’s self‑developed data‑skew mitigation solution for Spark, detailing the problem of uneven key distribution, the limitations of the open‑source AQE implementation, and JD’s OptimizeSkewedJoinV2 algorithm that dramatically reduces stage latency in large‑scale join workloads.

Adaptive Query ExecutionBig DataData Skew

0 likes · 13 min read

How JD’s Custom Spark Engine Tackles Data Skew for Massive Offline Jobs

Architect

Jan 7, 2022 · Big Data

Spark Performance Optimization: Principles, Memory Model, Resource Tuning, Data Skew and Shuffle Tuning

This article provides an in‑depth guide to Spark performance optimization, covering the ten development principles, static and unified memory models, resource parameter tuning, data skew detection and mitigation techniques, as well as shuffle‑related configuration adjustments, supplemented with practical code examples and diagrams.

Data SkewMemory ModelPerformance Tuning

0 likes · 40 min read

Spark Performance Optimization: Principles, Memory Model, Resource Tuning, Data Skew and Shuffle Tuning

Qunar Tech Salon

Aug 26, 2021 · Big Data

Comprehensive Introduction to Apache Spark: History, Core Concepts, Architecture, and Performance Optimization

This article provides a thorough overview of Apache Spark, covering its origins, comparison with MapReduce, core concepts such as RDD, DAG, Jobs, Stages, and Tasks, the submission process, Web UI, and detailed performance tuning techniques including data skew mitigation.

Big DataData SkewMapReduce

0 likes · 15 min read

Comprehensive Introduction to Apache Spark: History, Core Concepts, Architecture, and Performance Optimization

Big Data Technology Architecture

Aug 24, 2021 · Big Data

Comprehensive Guide to Spark Performance Optimization, Data Skew Mitigation, and Troubleshooting

This article presents a detailed collection of Spark performance‑tuning techniques—including submit‑script parameters, RDD and operator optimizations, parallelism and memory settings, broadcast variables, Kryo serialization, locality wait adjustments—as well as systematic methods for detecting and resolving data skew and common runtime issues such as shuffle failures, serialization errors, and JVM memory problems.

Data SkewJVM TuningShuffle

0 likes · 21 min read

Comprehensive Guide to Spark Performance Optimization, Data Skew Mitigation, and Troubleshooting

Big Data Technology & Architecture

Jul 19, 2021 · Big Data

Understanding Hadoop: MapReduce, HDFS, YARN, and Core Big Data Concepts

This article provides a comprehensive overview of Hadoop’s core components—including MapReduce programming model, HDFS storage architecture, and YARN resource management—while discussing common challenges like data skew and small files, and offering learning resources for aspiring big‑data engineers.

Data SkewHDFSHadoop

0 likes · 9 min read

Understanding Hadoop: MapReduce, HDFS, YARN, and Core Big Data Concepts

NetEase Smart Enterprise Tech+

Jun 17, 2021 · Big Data

Building a Real‑Time Service Monitoring Framework with Flink at NetEase Cloud

This article explains how NetEase Cloud Communication designed and implemented a Flink‑based streaming aggregation framework that processes massive heartbeat logs in real time, handles data skew with two‑stage aggregation, and outputs metrics to Kafka and InfluxDB for monitoring and alerting.

AggregationData SkewFlink

0 likes · 11 min read

Building a Real‑Time Service Monitoring Framework with Flink at NetEase Cloud

Laravel Tech Community

May 9, 2021 · Backend Development

Understanding Consistent Hashing: From Simple Modulo Hash to Optimizations

This article explains the drawbacks of a basic modulo hash algorithm for key distribution, demonstrates how consistent hashing resolves scaling and node‑failure issues, and discusses virtual‑node techniques to mitigate data skew and improve load balancing in distributed cache systems.

Consistent HashingData Skewdistributed-caching

0 likes · 5 min read

Understanding Consistent Hashing: From Simple Modulo Hash to Optimizations

Architect

Apr 3, 2021 · Big Data

Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning

This article explains advanced Spark performance tuning techniques, focusing on diagnosing and resolving data skew and shuffle bottlenecks through stage analysis, key distribution inspection, and a variety of practical solutions such as Hive pre‑processing, key filtering, parallelism increase, two‑stage aggregation, map‑join, and combined strategies, while also covering ShuffleManager internals and related configuration parameters.

Big DataData SkewPerformance Tuning

0 likes · 47 min read

Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning

Big Data Technology Architecture

Apr 1, 2021 · Big Data

Spark Adaptive Execution: Dynamic Shuffle Partition, Broadcast Join, and Skew Handling

The article explains the limitations of static shuffle partitions, execution‑plan estimation, and data skew in Spark SQL, and describes how Spark Adaptive Execution can automatically adjust shuffle partition numbers, switch join strategies, and mitigate skew through configurable parameters and code examples.

Adaptive ExecutionBroadcast JoinData Skew

0 likes · 11 min read

Spark Adaptive Execution: Dynamic Shuffle Partition, Broadcast Join, and Skew Handling

Big Data Technology & Architecture

Mar 30, 2021 · Big Data

Implementing Real-Time Data Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions

This article describes how Soul's data engineering team replaced nightly batch ETL with real-time Delta Lake ingestion on EMR, detailing the motivations, comparative analysis of Delta, Hudi, Iceberg, the implementation architecture, encountered issues such as data skew and schema evolution, and the solutions adopted to improve performance and reliability.

Data LakeData SkewDelta Lake

0 likes · 13 min read

Implementing Real-Time Data Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions

Big Data Technology & Architecture

Mar 18, 2021 · Big Data

Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues

This article details common Flink streaming problems such as data skew causing task back‑pressure, oversized Kafka messages, high‑throughput ack settings, slot removal errors, checkpoint timeouts, and resource constraints, and provides concrete configuration changes and architectural adjustments to resolve them.

CheckpointData SkewFlink

0 likes · 18 min read

Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues

Big Data Technology Architecture

Mar 10, 2021 · Big Data

Comprehensive Spark Performance Optimization: Development Tuning, Resource Configuration, Data Skew Solutions, and Shuffle Tuning

This guide presents a complete Spark performance optimization handbook covering development‑time best practices, resource‑parameter tuning, detailed data‑skew detection and mitigation techniques, advanced shuffle‑engine configurations, and practical code examples to help engineers build faster, more reliable Spark jobs.

Data SkewResource TuningShuffle

0 likes · 69 min read

Comprehensive Spark Performance Optimization: Development Tuning, Resource Configuration, Data Skew Solutions, and Shuffle Tuning

vivo Internet Technology

Nov 11, 2020 · Big Data

Understanding Distributed Hash Tables (DHT) and Their Improvements

The article explains how Distributed Hash Tables replace simple modulo hashing with a ring‑based scheme, demonstrates severe data skew in basic implementations, and shows that adding multiple virtual nodes plus a load‑boundary factor dramatically balances storage and request distribution across cluster nodes.

DHTData SkewDistributed Hash Table

0 likes · 9 min read

Understanding Distributed Hash Tables (DHT) and Their Improvements

Big Data Technology & Architecture

Jul 1, 2020 · Big Data

Overview of Spark SQL Adaptive Execution Optimization Engine

This article explains Spark SQL's Adaptive Execution engine, covering its background, dynamic plan adjustments, shuffle partition tuning, data skew mitigation techniques, and the key configuration parameters needed to enable and fine‑tune adaptive query execution for improved performance.

Adaptive ExecutionBig DataConfiguration

0 likes · 7 min read

Overview of Spark SQL Adaptive Execution Optimization Engine

Big Data Technology & Architecture

Jun 15, 2020 · Big Data

Hive Optimization Techniques and Best Practices for Big Data Processing

This article provides a comprehensive guide to improving Hive query performance by covering column and partition pruning, predicate pushdown, replacing ORDER BY with SORT BY, using GROUP BY instead of DISTINCT, tuning MapReduce jobs, handling data skew in joins, and selecting appropriate storage formats for large‑scale data warehouses.

Big DataData SkewHive

0 likes · 19 min read

Hive Optimization Techniques and Best Practices for Big Data Processing

dbaplus Community

Mar 23, 2020 · Big Data

How to Detect and Resolve Data Skew in Spark and Hadoop

This article explains what data skew is in distributed big‑data systems like Spark and Hadoop, why it hurts performance, how to spot it using the Web UI or key statistics, and presents eight practical mitigation techniques ranging from filtering and shuffle parallelism to custom partitioners and broadcast joins.

Broadcast JoinData SkewHadoop

0 likes · 19 min read

How to Detect and Resolve Data Skew in Spark and Hadoop

Big Data Technology Architecture

Mar 21, 2020 · Big Data

Comprehensive Spark Performance Optimization: Development Tuning, Resource Configuration, Data Skew Handling, and Shuffle Tuning

This article presents a complete guide to Spark performance optimization, covering development‑time best practices, resource‑parameter tuning, systematic detection and resolution of data skew, and detailed shuffle‑related parameter adjustments, all illustrated with Scala code examples.

Data SkewPerformance OptimizationSpark

0 likes · 67 min read

Comprehensive Spark Performance Optimization: Development Tuning, Resource Configuration, Data Skew Handling, and Shuffle Tuning

Big Data Technology Architecture

Mar 19, 2020 · Big Data

Handling Data Skew in Hive: Join, Group By, and COUNT(DISTINCT) Optimizations

Data skew in Hive MapReduce jobs, caused by uneven key distribution during joins, group‑by, or COUNT(DISTINCT) operations, can severely slow tasks, and the article explains common scenarios and practical solutions such as using MapJoin, enabling map‑side aggregation, load‑balancing, and rewriting queries to mitigate skew.

Data SkewHiveMapJoin

0 likes · 7 min read

Handling Data Skew in Hive: Join, Group By, and COUNT(DISTINCT) Optimizations

Big Data Technology & Architecture

Feb 13, 2020 · Big Data

Optimizing Hadoop MapReduce Jobs for eBay CAL System to Reduce Execution Time and Resource Usage

This article describes how eBay's Central Application Logging (CAL) system generates massive daily logs, the challenges of Hadoop MapReduce job performance and resource consumption, and the step‑by‑step optimizations—reducing GC time, mitigating data skew, and improving algorithms—that cut execution time by over 60%, lowered cluster resource usage, and raised job success rates to nearly 100%.

Big DataData SkewGC

0 likes · 11 min read

Optimizing Hadoop MapReduce Jobs for eBay CAL System to Reduce Execution Time and Resource Usage

Big Data Technology & Architecture

Jan 30, 2020 · Big Data

Comprehensive Guide to Spark Performance Optimization (Development, Resource, Data Skew, and Shuffle Tuning)

This article provides an in‑depth, step‑by‑step guide to optimizing Spark jobs, covering development‑time best practices, resource‑parameter tuning, data‑skew detection and mitigation techniques, and shuffle‑stage performance tweaks, complete with Scala code examples and practical recommendations.

Big DataData SkewPerformance Optimization

0 likes · 67 min read

Comprehensive Guide to Spark Performance Optimization (Development, Resource, Data Skew, and Shuffle Tuning)

vivo Internet Technology

Dec 25, 2019 · Big Data

Understanding and Mitigating Data Skew in Spark and Hadoop

Data skew in Spark and Hadoop occurs when a few keys dominate shuffle traffic, causing slow tasks, OOM errors, and job failures; the article describes how to detect skew via UI metrics or sampling and offers mitigation tactics such as filtering keys, increasing shuffle partitions, custom partitioners, broadcast joins, salted keys, and Hadoop‑specific settings.

Data SkewPerformance OptimizationShuffle

0 likes · 18 min read

Understanding and Mitigating Data Skew in Spark and Hadoop

Big Data Technology & Architecture

Oct 14, 2019 · Big Data

Optimizing Spark PageRank: Cache, Checkpoint, Data Skew, and Resource Utilization

This article presents a comprehensive analysis of Spark PageRank performance, detailing the algorithm's basics, the original example code, and four key optimizations—caching with checkpointing, memory‑efficient data structures, handling data skew, and maximizing executor and driver resource usage—backed by experimental results and practical recommendations.

Big DataCacheCheckpoint

0 likes · 18 min read

Optimizing Spark PageRank: Cache, Checkpoint, Data Skew, and Resource Utilization

Big Data Technology & Architecture

May 30, 2019 · Big Data

Data Skew Optimization Techniques in Spark

This article explains the phenomenon, causes, detection methods, and a comprehensive set of solutions—including Hive preprocessing, key filtering, shuffle parallelism, two‑stage aggregation, map‑join, sampling, random prefixing, and combined strategies—to mitigate data skew in Spark jobs and improve performance.

Big DataData SkewShuffle

0 likes · 31 min read

Data Skew Optimization Techniques in Spark

Alibaba Cloud Developer

May 23, 2019 · Big Data

How Blink Powers Alibaba’s Real‑Time Supply‑Chain Data Warehouse

This article explains how Alibaba's Blink engine tackles the complex challenges of building a real‑time supply‑chain data warehouse—covering retroduction, dimension‑table joins, data skew, timeout statistics, zero‑point optimizations, and future directions—through SQL‑based stream processing and intelligent resource tuning.

Data SkewDimension joinFlink

0 likes · 14 min read

How Blink Powers Alibaba’s Real‑Time Supply‑Chain Data Warehouse

dbaplus Community

Mar 27, 2019 · Big Data

How eBay Cut Hadoop Job Runtime by 60%: Real‑World CAL Log Optimization

This article explains how eBay's CAL team reduced Hadoop MapReduce job execution time and resource consumption by over 60% through targeted GC tuning, data‑skew mitigation, and algorithmic improvements, boosting job success rates to nearly 100% while handling petabyte‑scale log data.

Big DataData SkewGC Tuning

0 likes · 12 min read

How eBay Cut Hadoop Job Runtime by 60%: Real‑World CAL Log Optimization

ITPUB

Dec 1, 2017 · Databases

How to Shrink Oracle Indexes for Skewed Columns Using Function Indexes

This article explains why conventional indexes waste space and perform poorly on highly skewed columns, introduces a decode‑based function index that excludes high‑frequency values, details the experimental setup with millions of rows, compares index size and query performance, and outlines the method's limitations.

Data SkewFunction IndexOracle

0 likes · 10 min read

How to Shrink Oracle Indexes for Skewed Columns Using Function Indexes

dbaplus Community

Aug 21, 2017 · Big Data

How to Tackle Spark Data Skew: Practical Solutions and Real‑World Examples

This article explains what Spark data skew is, why it hurts performance, and presents six practical mitigation techniques—including adjusting parallelism, custom partitioners, map‑side joins, and adding random prefixes—backed by detailed experiments, code snippets, and performance comparisons.

Data SkewMap-side JoinPartitioner

0 likes · 18 min read

How to Tackle Spark Data Skew: Practical Solutions and Real‑World Examples

Architecture Digest

Apr 24, 2017 · Big Data

Understanding and Solving Data Skew in Hadoop and Spark

This article explains what data skew is, why it occurs in large‑scale Hadoop and Spark jobs, illustrates typical symptoms, and presents practical strategies—including business‑level adjustments, code tweaks, and platform‑specific tuning—to mitigate and resolve skew in big‑data processing.

Big DataData SkewHadoop

0 likes · 11 min read

Understanding and Solving Data Skew in Hadoop and Spark

StarRing Big Data Open Lab

Dec 23, 2016 · Big Data

Master Inceptor UI: Navigate Local, Holddesk, and Executors for Optimal Job Management

This guide explains how to use Inceptor's management UI—specifically the Local, Storage, Holddesk, Environment, and Executors tabs—to monitor stage counts, inspect in‑memory table distribution, detect data skew, and verify executor health, enabling more effective job optimization.

Data SkewExecutor ManagementInceptor

0 likes · 7 min read

Master Inceptor UI: Navigate Local, Holddesk, and Executors for Optimal Job Management

Liulishuo Tech Team

Oct 17, 2016 · Big Data

Practical Tips and Common Pitfalls for Tuning Apache Spark Performance

This article shares hands‑on experience from Spark Summit attendees, covering why Spark is powerful, common performance problems such as slow jobs, OOM, data skew, excessive partitions, and provides concrete tuning advice on executors, cores, memory, and debugging techniques.

Apache SparkBig DataData Skew

0 likes · 11 min read

Practical Tips and Common Pitfalls for Tuning Apache Spark Performance

Architecture Digest

May 25, 2016 · Big Data

Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning

This article provides a comprehensive guide on tackling Spark performance bottlenecks by diagnosing data skew, locating the offending stages and operators, and applying a range of practical solutions—including Hive pre‑processing, key filtering, shuffle parallelism, two‑stage aggregation, map‑join, and combined strategies—followed by an in‑depth discussion of shuffle manager evolution and key configuration parameters for fine‑tuning.

Big DataData SkewPerformance Tuning

0 likes · 35 min read

Meituan Technology Team

May 13, 2016 · Big Data

Spark Performance Optimization Guide: Data Skew and Shuffle Tuning

This advanced Spark performance guide explains how data skew arises during shuffles and presents eight practical solutions—including Hive preprocessing, key filtering, increased shuffle parallelism, two‑stage aggregation, map joins, sampling, random prefixes, and combined strategies—while also detailing key shuffle‑tuning parameters such as spark.shuffle.file.buffer, spark.reducer.maxSizeInFlight, and spark.shuffle.manager to improve memory usage and execution speed.

Big DataData SkewPerformance Optimization

0 likes · 33 min read

Spark Performance Optimization Guide: Data Skew and Shuffle Tuning