Tagged articles
62 articles
Page 1 of 1
php Courses
php Courses
Nov 24, 2025 · Fundamentals

How to Randomly Shuffle Array Elements in PHP with shuffle()

Learn how to use PHP's built-in shuffle() function to randomly reorder array elements, with clear syntax explanation, step-by-step code examples, output demonstration, and important considerations such as in‑place modification, handling associative or multidimensional arrays, and preserving original data when needed.

ArrayBackendShuffle
0 likes · 3 min read
How to Randomly Shuffle Array Elements in PHP with shuffle()
php Courses
php Courses
May 19, 2025 · Backend Development

Using PHP shuffle() to Randomly Rearrange Array Elements

This article explains the PHP shuffle() function, detailing its syntax, behavior of modifying the original indexed array, return value, usage with both indexed and associative arrays, and provides multiple code examples demonstrating random reordering and the effect on array keys.

ArrayBackendPHP
0 likes · 5 min read
Using PHP shuffle() to Randomly Rearrange Array Elements
php Courses
php Courses
Mar 24, 2025 · Backend Development

Using PHP shuffle() to Randomly Rearrange Array Elements

This article explains PHP's shuffle() function, detailing its syntax, behavior of modifying the original array, return value, usage with indexed and associative arrays, and provides multiple code examples illustrating how to randomize array elements.

ArrayPHPShuffle
0 likes · 4 min read
Using PHP shuffle() to Randomly Rearrange Array Elements
php Courses
php Courses
Oct 28, 2024 · Backend Development

How to Use PHP shuffle() to Randomly Rearrange Array Elements

This article explains the PHP shuffle() function, detailing its syntax, behavior of modifying the original indexed array, return value, and provides multiple code examples—including handling of non-indexed arrays—to demonstrate how to randomly reorder array elements in PHP.

ArrayBackendPHP
0 likes · 4 min read
How to Use PHP shuffle() to Randomly Rearrange Array Elements
php Courses
php Courses
Aug 28, 2024 · Backend Development

How to Use PHP shuffle() to Randomly Rearrange Array Elements

This article explains PHP's shuffle() function, its syntax, behavior on indexed and associative arrays, and provides code examples demonstrating how to randomize array elements and handle the function's boolean return value in practice.

ArrayBackendShuffle
0 likes · 5 min read
How to Use PHP shuffle() to Randomly Rearrange Array Elements
360 Smart Cloud
360 Smart Cloud
Jul 9, 2024 · Big Data

Understanding Shuffle in Spark: From Native Shuffle to External and Remote Shuffle Services (Uniffle)

This article examines the critical role of shuffle in big‑data processing, compares Spark's native shuffle with the External Shuffle Service (ESS) and Remote Shuffle Service (RSS) solutions, introduces Uniffle's architecture and configuration, and shares practical deployment experiences and performance results within a 360 internal environment.

Big DataExternal Shuffle ServiceRemote Shuffle Service
0 likes · 15 min read
Understanding Shuffle in Spark: From Native Shuffle to External and Remote Shuffle Services (Uniffle)
DataFunTalk
DataFunTalk
Jun 22, 2024 · Big Data

Migrating Spark Shuffle Service from ESS to RSS (Celeborn) at Zhihu: Design, Implementation, and Benefits

This article details Zhihu's migration of massive Spark and MapReduce shuffle workloads from the External Shuffle Service (ESS) to a push‑based Remote Shuffle Service (RSS) powered by Celeborn, covering background problems, evaluation of open‑source implementations, deployment architecture, encountered issues, solutions, performance gains, and future plans.

Big DataRSSShuffle
0 likes · 19 min read
Migrating Spark Shuffle Service from ESS to RSS (Celeborn) at Zhihu: Design, Implementation, and Benefits
DataFunSummit
DataFunSummit
Mar 20, 2024 · Big Data

Large‑Scale Evolution of Spark Shuffle Cloud‑Native Architecture at ByteDance

This article details ByteDance's large‑scale evolution of Spark Shuffle to a cloud‑native architecture, describing background, stability and mixed‑resource scenarios, challenges such as CPU and I/O limits, custom ESS enhancements, shuffle throttling, spill‑split mechanisms, and the Cloud Shuffle Service with its push‑based design and performance gains.

Big DataKubernetesShuffle
0 likes · 21 min read
Large‑Scale Evolution of Spark Shuffle Cloud‑Native Architecture at ByteDance
Huolala Tech
Huolala Tech
Mar 7, 2024 · Big Data

Integrating Apache Tez with Remote Shuffle Service via Uniffle: HuoLala’s Experience

Facing exploding data volumes and rising cluster costs, HuoLala adopted Apache Tez’s Remote Shuffle Service built on Apache Uniffle, redesigning the Tez client to operate without source modifications, detailing architecture, implementation challenges, testing, stability measures, and future plans to enhance big‑data shuffle performance and cost efficiency.

Apache TezBig DataRemote Shuffle Service
0 likes · 14 min read
Integrating Apache Tez with Remote Shuffle Service via Uniffle: HuoLala’s Experience
php Courses
php Courses
Mar 7, 2024 · Backend Development

How to Randomly Shuffle an Array in PHP Using the shuffle Function

This article explains the PHP shuffle function, its syntax, how it directly modifies an array to randomize element order, provides example code with output, and discusses important considerations such as preserving the original array and handling associative or multidimensional arrays.

ArrayPHPShuffle
0 likes · 3 min read
How to Randomly Shuffle an Array in PHP Using the shuffle Function
php Courses
php Courses
Jan 29, 2024 · Backend Development

How to Use PHP shuffle() to Randomly Sort Arrays and Generate Random Numbers

This article explains the PHP shuffle() function, demonstrates how to create arrays, use shuffle() to randomize their elements, display the results, and shows additional uses such as generating random numbers with range() and shuffle(), providing clear code examples throughout.

ArraysBackendPHP
0 likes · 4 min read
How to Use PHP shuffle() to Randomly Sort Arrays and Generate Random Numbers
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 11, 2024 · Big Data

Unlock ODPS SQL Performance: Deep Dive into Execution Plans & Optimizations

This article examines ODPS SQL performance by dissecting logical execution plans and Logview visualizations, explaining the underlying principles of various optimization techniques such as multi‑distinct handling, shuffle reduction, system parameters, and different join strategies, and demonstrates how to apply these methods to improve query efficiency in real‑world data engineering tasks.

ODPSShuffleexecution plan
0 likes · 17 min read
Unlock ODPS SQL Performance: Deep Dive into Execution Plans & Optimizations
php Courses
php Courses
Dec 25, 2023 · Backend Development

How to Randomly Shuffle Array Elements Using PHP's shuffle Function

This article explains how to use PHP's built-in shuffle() function to randomly reorder array elements, covering its syntax, return value, example code for indexed and associative arrays, handling of multidimensional arrays, and important considerations such as in‑place modification and preserving original data.

ArrayBackendPHP
0 likes · 3 min read
How to Randomly Shuffle Array Elements Using PHP's shuffle Function
Zhongtong Tech
Zhongtong Tech
Dec 14, 2023 · Big Data

How Celeborn Transformed Spark Shuffle Performance at ZTO Express

Facing massive daily Spark shuffle volumes and unstable ETL performance, ZTO Express migrated from the community External Shuffle Service to Celeborn's Remote Shuffle Service, achieving higher disk I/O efficiency, better reliability, reduced network connections, and significant reductions in task failures and job latency.

Big DataRemote Shuffle ServiceShuffle
0 likes · 15 min read
How Celeborn Transformed Spark Shuffle Performance at ZTO Express
php Courses
php Courses
Dec 8, 2023 · Backend Development

Using PHP shuffle() to Randomly Rearrange Array Elements

This article explains PHP's shuffle() function, its syntax, behavior, return value, and demonstrates how it randomizes both indexed and associative arrays with code examples, highlighting that it modifies the original array and reindexes non‑sequential keys.

ArrayBackendShuffle
0 likes · 5 min read
Using PHP shuffle() to Randomly Rearrange Array Elements
DataFunTalk
DataFunTalk
Nov 18, 2023 · Big Data

Large‑Scale Evolution of Spark Shuffle Cloud‑Native Architecture at ByteDance

This article details ByteDance's extensive migration of Spark Shuffle to a cloud‑native architecture, describing the massive data volumes, the underlying ESS and CSS services, the challenges of resource isolation, monitoring, throttling, spill‑splitting, and the performance gains achieved across stable and mixed‑resource clusters.

Big DataByteDanceCloud Native
0 likes · 20 min read
Large‑Scale Evolution of Spark Shuffle Cloud‑Native Architecture at ByteDance
php Courses
php Courses
Aug 1, 2023 · Backend Development

Using PHP shuffle() Function to Randomly Reorder Array Elements

This article explains the PHP shuffle() function, detailing its syntax, return behavior, usage examples, and important considerations such as its effect on the original array, limitations with associative arrays, and handling of duplicate elements, providing a practical code demonstration.

ArrayPHPShuffle
0 likes · 3 min read
Using PHP shuffle() Function to Randomly Reorder Array Elements
JD Tech
JD Tech
Jun 14, 2023 · Big Data

Understanding and Solving Data Skew in Offline Big Data Development (Hive & Spark)

This article explains the concept of data skew in offline big‑data jobs, describes its symptoms and root causes, and provides practical optimization techniques for Hive and Spark—including partitioning strategies, map‑join usage, adaptive query settings, and monitoring approaches—to prevent performance degradation and runtime failures.

Data SkewShuffleSpark
0 likes · 17 min read
Understanding and Solving Data Skew in Offline Big Data Development (Hive & Spark)
Data Thinking Notes
Data Thinking Notes
Oct 24, 2022 · Big Data

How to Diagnose and Fix Spark Data Skew: Practical Optimization Techniques

This article explains the causes of Spark data skew, how to locate skewed tasks using the Web UI, and presents six optimization methods—including increasing shuffle parallelism, filtering abnormal keys, two‑stage aggregation, map‑join, key sampling, and random‑prefix joins—plus a real‑world case study.

Big DataData SkewJOIN
0 likes · 21 min read
How to Diagnose and Fix Spark Data Skew: Practical Optimization Techniques
DataFunTalk
DataFunTalk
Sep 15, 2022 · Big Data

Bilibili Offline Platform: Migration from Hive to Spark and Large‑Scale Optimizations

This article details Bilibili's evolution of its offline computing platform from Hadoop‑based Hive to Spark, describing the migration process, automated SQL conversion, result verification, stability and performance enhancements, meta‑store optimizations, and future work on remote shuffle and vectorized execution.

Data SkippingMetaStoreShuffle
0 likes · 28 min read
Bilibili Offline Platform: Migration from Hive to Spark and Large‑Scale Optimizations
IT Services Circle
IT Services Circle
Mar 21, 2022 · Big Data

Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms

This article explains the evolution and inner workings of Spark's shuffle phase, comparing the original Hash‑based shuffle, the default Sort‑based shuffle, the optimized Tungsten‑Sort shuffle, and related configuration options that affect performance and file handling in large‑scale data processing.

Hash ShuffleShuffleSort-Shuffle
0 likes · 17 min read
Understanding Spark Shuffle: Hash, Sort, and Tungsten Sort Mechanisms
Architect
Architect
Jan 7, 2022 · Big Data

Spark Performance Optimization: Principles, Memory Model, Resource Tuning, Data Skew and Shuffle Tuning

This article provides an in‑depth guide to Spark performance optimization, covering the ten development principles, static and unified memory models, resource parameter tuning, data skew detection and mitigation techniques, as well as shuffle‑related configuration adjustments, supplemented with practical code examples and diagrams.

Data SkewMemory ModelShuffle
0 likes · 40 min read
Spark Performance Optimization: Principles, Memory Model, Resource Tuning, Data Skew and Shuffle Tuning
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 23, 2021 · Big Data

Key Spark Configuration Parameters and Their Explanations

This article presents a comprehensive list of essential Spark configuration settings—including executor memory, off‑heap memory, memory fractions, shuffle options, and adaptive query execution parameters—each accompanied by a concise description to help users fine‑tune Spark performance.

Adaptive Query ExecutionBig DataMemory Management
0 likes · 6 min read
Key Spark Configuration Parameters and Their Explanations
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 1, 2021 · Big Data

Understanding Spark Shuffle: Mechanisms, Evolution, and Optimization

This article provides a comprehensive overview of Spark's shuffle process, explaining its definition, internal mechanisms such as shuffle write and read, the evolution of shuffle managers, and practical optimization techniques including parameter tuning and broadcast variables, all aimed at improving performance in large‑scale data processing.

Big DataShuffleShuffle Reader
0 likes · 18 min read
Understanding Spark Shuffle: Mechanisms, Evolution, and Optimization
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 17, 2021 · Big Data

Key Reliability Mechanisms of HDFS, YARN Failover Strategies, and Hadoop Shuffle Process

This article explains HDFS reliability features such as replica policies, rack awareness, heartbeat, safe mode, checksums, trash, metadata protection and snapshots, then details YARN failover handling for ApplicationMaster, NodeManager and ResourceManager, and finally describes the Hadoop MapReduce shuffle workflow and tuning tips.

HDFSMapReduceReliability
0 likes · 13 min read
Key Reliability Mechanisms of HDFS, YARN Failover Strategies, and Hadoop Shuffle Process
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2021 · Big Data

Understanding Hadoop's Circular Buffer in the Shuffle Phase

This article explains how Hadoop's MapReduce shuffle uses a circular buffer data structure to store serialized key/value pairs and their metadata in memory, describes its initialization, write path, spill handling, and the underlying algorithms that ensure efficient in‑memory sorting and disk spilling.

HadoopIn-Memory BufferMapReduce
0 likes · 24 min read
Understanding Hadoop's Circular Buffer in the Shuffle Phase
Big Data Technology Architecture
Big Data Technology Architecture
Aug 24, 2021 · Big Data

Comprehensive Guide to Spark Performance Optimization, Data Skew Mitigation, and Troubleshooting

This article presents a detailed collection of Spark performance‑tuning techniques—including submit‑script parameters, RDD and operator optimizations, parallelism and memory settings, broadcast variables, Kryo serialization, locality wait adjustments—as well as systematic methods for detecting and resolving data skew and common runtime issues such as shuffle failures, serialization errors, and JVM memory problems.

Data SkewShuffleSpark
0 likes · 21 min read
Comprehensive Guide to Spark Performance Optimization, Data Skew Mitigation, and Troubleshooting
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 4, 2021 · Big Data

Comprehensive Spark Interview Questions and Answers

This article provides a detailed collection of Spark interview questions covering deployment modes, performance advantages over MapReduce, shuffle mechanisms, RDD characteristics, optimization techniques, resource management, and various practical aspects of Spark on YARN, Mesos, and Kubernetes.

RDDShuffleSpark
0 likes · 21 min read
Comprehensive Spark Interview Questions and Answers
dbaplus Community
dbaplus Community
Apr 14, 2021 · Big Data

Master Spark Performance: Key Tuning, Shuffle & Join Optimization

This guide compiles practical Spark tuning techniques, covering essential configuration parameters, programming best‑practices, detailed shuffle mechanics, and join optimization strategies, while also addressing common errors and mitigation steps, enabling developers to improve performance and resource utilization in large‑scale data processing jobs.

Big DataError HandlingJOIN optimization
0 likes · 25 min read
Master Spark Performance: Key Tuning, Shuffle & Join Optimization
Architect
Architect
Apr 3, 2021 · Big Data

Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning

This article explains advanced Spark performance tuning techniques, focusing on diagnosing and resolving data skew and shuffle bottlenecks through stage analysis, key distribution inspection, and a variety of practical solutions such as Hive pre‑processing, key filtering, parallelism increase, two‑stage aggregation, map‑join, and combined strategies, while also covering ShuffleManager internals and related configuration parameters.

Big DataData SkewScala
0 likes · 47 min read
Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning
Architect
Architect
Apr 2, 2021 · Big Data

Spark Performance Optimization Guide: Development and Resource Tuning

This article provides a comprehensive guide to Spark performance optimization, covering development‑level tuning principles, resource configuration parameters, practical code examples, and best‑practice recommendations to achieve high‑throughput big‑data processing.

Big DataRDDResource Tuning
0 likes · 33 min read
Spark Performance Optimization Guide: Development and Resource Tuning
Big Data Technology Architecture
Big Data Technology Architecture
Mar 10, 2021 · Big Data

Comprehensive Spark Performance Optimization: Development Tuning, Resource Configuration, Data Skew Solutions, and Shuffle Tuning

This guide presents a complete Spark performance optimization handbook covering development‑time best practices, resource‑parameter tuning, detailed data‑skew detection and mitigation techniques, advanced shuffle‑engine configurations, and practical code examples to help engineers build faster, more reliable Spark jobs.

Data SkewResource TuningShuffle
0 likes · 69 min read
Comprehensive Spark Performance Optimization: Development Tuning, Resource Configuration, Data Skew Solutions, and Shuffle Tuning
Laravel Tech Community
Laravel Tech Community
Dec 29, 2020 · Backend Development

PHP shuffle() Function – Randomly Shuffle an Array

This article explains the PHP shuffle() function, describing its purpose of randomly reordering array elements, the required array parameter, the boolean return value, and provides a complete example with sample output to illustrate its usage.

ArrayBackendPHP
0 likes · 2 min read
PHP shuffle() Function – Randomly Shuffle an Array
Programmer DD
Programmer DD
Nov 27, 2020 · Fundamentals

7 Tiny Code Gems That Pack Massive Power: From Shuffle to Fast Inverse Square Root

This article showcases seven ultra‑compact yet powerful code examples—from a zero‑code deployment tool and a two‑line shuffle algorithm to sleep sort, a one‑line Python AI snippet, a simple tomorrow‑time sleep call, the legendary fast inverse square‑root constant, and the classic hello‑world program.

AlgorithmsFast Inverse Square RootShuffle
0 likes · 6 min read
7 Tiny Code Gems That Pack Massive Power: From Shuffle to Fast Inverse Square Root
ITPUB
ITPUB
Nov 16, 2020 · Fundamentals

7 Unexpected Code Hacks: No‑Code Deployment, Shuffle, Sleep Sort, AI One‑Liner & More

This article showcases seven intriguing code tricks—from a zero‑code deployment project and a concise shuffle algorithm to a sleep‑sort implementation, a one‑line AI chatbot, a simple next‑day timer, the legendary fast inverse square‑root constant, and the classic hello‑world example—each illustrated with brief explanations and runnable snippets.

AlgorithmsFast Inverse Square RootPython
0 likes · 6 min read
7 Unexpected Code Hacks: No‑Code Deployment, Shuffle, Sleep Sort, AI One‑Liner & More
Big Data Technology Architecture
Big Data Technology Architecture
Apr 28, 2020 · Big Data

Understanding Shuffle in Hadoop MapReduce and Spark

This article explains the concept and workflow of shuffle in Hadoop MapReduce and Spark, covering map‑side buffering, spill and merge, reduce‑side copy‑merge‑reduce, the reasons for sorting and file merging, and compares Hash‑Shuffle and Sort‑Shuffle implementations with performance considerations.

Hash ShuffleShuffleSort-Shuffle
0 likes · 16 min read
Understanding Shuffle in Hadoop MapReduce and Spark
dbaplus Community
dbaplus Community
Mar 23, 2020 · Big Data

How to Detect and Resolve Data Skew in Spark and Hadoop

This article explains what data skew is in distributed big‑data systems like Spark and Hadoop, why it hurts performance, how to spot it using the Web UI or key statistics, and presents eight practical mitigation techniques ranging from filtering and shuffle parallelism to custom partitioners and broadcast joins.

Broadcast JoinData SkewHadoop
0 likes · 19 min read
How to Detect and Resolve Data Skew in Spark and Hadoop
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 30, 2020 · Big Data

Comprehensive Guide to Spark Performance Optimization (Development, Resource, Data Skew, and Shuffle Tuning)

This article provides an in‑depth, step‑by‑step guide to optimizing Spark jobs, covering development‑time best practices, resource‑parameter tuning, data‑skew detection and mitigation techniques, and shuffle‑stage performance tweaks, complete with Scala code examples and practical recommendations.

Big DataData SkewResource Tuning
0 likes · 67 min read
Comprehensive Guide to Spark Performance Optimization (Development, Resource, Data Skew, and Shuffle Tuning)
vivo Internet Technology
vivo Internet Technology
Dec 25, 2019 · Big Data

Understanding and Mitigating Data Skew in Spark and Hadoop

Data skew in Spark and Hadoop occurs when a few keys dominate shuffle traffic, causing slow tasks, OOM errors, and job failures; the article describes how to detect skew via UI metrics or sampling and offers mitigation tactics such as filtering keys, increasing shuffle partitions, custom partitioners, broadcast joins, salted keys, and Hadoop‑specific settings.

Data SkewPartitioningShuffle
0 likes · 18 min read
Understanding and Mitigating Data Skew in Spark and Hadoop
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 3, 2019 · Big Data

Understanding Spark Shuffle and Smart Shuffle: Design, Implementation, and Performance Analysis

This article explains the evolution of Spark Shuffle from hash‑based to sort‑based, introduces the Smart Shuffle optimization, details their implementations and configurations, and presents performance comparisons using TPC‑DS benchmarks, highlighting significant speedups and reduced I/O overhead.

Big DataShuffleSmart Shuffle
0 likes · 7 min read
Understanding Spark Shuffle and Smart Shuffle: Design, Implementation, and Performance Analysis
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 9, 2019 · Big Data

Optimizing Spark Shuffle: Can Fetch, Efficient Fetch, and Reliable Fetch

This article analyzes three Spark shuffle bottlenecks—oversized partitions that exceed Netty's 2 GB limit, excessive retry latency caused by dead executors, and insufficient data‑corruption checks—and presents concrete configuration changes, new block identifiers, executor‑liveness checks, and CRC‑32 verification to improve fetchability, efficiency, and reliability at scale.

Big DataShuffleSpark
0 likes · 18 min read
Optimizing Spark Shuffle: Can Fetch, Efficient Fetch, and Reliable Fetch
Big Data Technology & Architecture
Big Data Technology & Architecture
May 30, 2019 · Big Data

Data Skew Optimization Techniques in Spark

This article explains the phenomenon, causes, detection methods, and a comprehensive set of solutions—including Hive preprocessing, key filtering, shuffle parallelism, two‑stage aggregation, map‑join, sampling, random prefixing, and combined strategies—to mitigate data skew in Spark jobs and improve performance.

Big DataData SkewShuffle
0 likes · 31 min read
Data Skew Optimization Techniques in Spark
Big Data Technology & Architecture
Big Data Technology & Architecture
May 28, 2019 · Big Data

Optimizing Flink Shuffle: New Flow‑Control Mechanism, Serialization Improvements, and Architecture Refactoring

The article explains how Flink's shuffle pipeline—from upstream data serialization to downstream consumption—is optimized through a credit‑based flow‑control mechanism, zero‑copy network buffers, broadcast serialization reduction, external shuffle service, and a plugin‑based shuffle manager, resulting in significant performance gains for both streaming and batch jobs.

Big DataFlinkFlow Control
0 likes · 15 min read
Optimizing Flink Shuffle: New Flow‑Control Mechanism, Serialization Improvements, and Architecture Refactoring
58 Tech
58 Tech
Mar 15, 2019 · Big Data

Optimizing Spark Join Operations in Spark Core and Spark SQL

This article explains how to improve Spark join performance by reducing shuffle, using appropriate partitioners, applying broadcast hash joins for small tables, and selecting the optimal join strategy (broadcast, shuffle hash, or sort‑merge) in both Spark Core and Spark SQL.

JOINShuffleSpark
0 likes · 6 min read
Optimizing Spark Join Operations in Spark Core and Spark SQL
Youzan Coder
Youzan Coder
Mar 8, 2019 · Big Data

Why Spark Shuffle Often Runs Out of Memory and How to Fix It

This article examines Spark's memory management and the shuffle process, identifies the components that consume the most memory during shuffle write and read, analyzes common OOM scenarios such as task concurrency and data skew, and offers configuration tips to prevent out‑of‑memory failures.

MemoryManagementOutOfMemoryShuffle
0 likes · 14 min read
Why Spark Shuffle Often Runs Out of Memory and How to Fix It
Sohu Tech Products
Sohu Tech Products
Feb 13, 2019 · Big Data

Evolution and Implementation Details of Spark Shuffle Mechanisms

This article examines the historical evolution of Spark's shuffle implementations—from early Hash‑Based Shuffle to modern SortShuffleWriter, BypassMergeSortShuffleWriter, and UnsafeShuffleWriter—explaining their design choices, selection criteria, and the corresponding shuffle reader architecture in a production‑grade Spark 2.1.1 environment.

Big DataShuffleShuffle Writer
0 likes · 13 min read
Evolution and Implementation Details of Spark Shuffle Mechanisms
21CTO
21CTO
May 17, 2018 · Big Data

Understanding Hadoop MapReduce and YARN: Architecture, Shuffle, and Scaling

This article explains Hadoop's core components, the MapReduce programming model, the detailed shuffle and merge processes, and how YARN replaces the classic JobTracker/TaskTracker architecture to improve scalability and resource utilization in large‑scale data processing clusters.

HadoopShuffleYARN
0 likes · 12 min read
Understanding Hadoop MapReduce and YARN: Architecture, Shuffle, and Scaling
ITPUB
ITPUB
Mar 29, 2018 · Big Data

Demystifying Hadoop: MapReduce, Shuffle, and YARN Architecture

This article explains Hadoop’s core components, the MapReduce programming model, the detailed shuffle and merge processes, and how YARN replaces the classic JobTracker/TaskTracker design to improve scalability and resource utilization in large‑scale data processing clusters.

Big DataHadoopMapReduce
0 likes · 15 min read
Demystifying Hadoop: MapReduce, Shuffle, and YARN Architecture
dbaplus Community
dbaplus Community
Aug 21, 2017 · Big Data

How to Tackle Spark Data Skew: Practical Solutions and Real‑World Examples

This article explains what Spark data skew is, why it hurts performance, and presents six practical mitigation techniques—including adjusting parallelism, custom partitioners, map‑side joins, and adding random prefixes—backed by detailed experiments, code snippets, and performance comparisons.

Data SkewMap-side JoinPartitioner
0 likes · 18 min read
How to Tackle Spark Data Skew: Practical Solutions and Real‑World Examples
Baidu Tech Salon
Baidu Tech Salon
Jan 13, 2015 · Big Data

Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle

This article reviews Spark 1.2’s major enhancements—including the External Data Source API, column pruning, predicate pushdown, and in‑memory columnar storage—while also detailing Baidu’s large‑scale Spark deployments, its custom high‑performance Shuffle service, and the integration of Spark with the Tachyon memory file system.

BaiduBig DataExternal Data Source API
0 likes · 16 min read
Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle