Spark and Flink Optimization Guide: Parallelism, GC Tuning, Memory Settings, and Production Configurations
This article provides a comprehensive guide on optimizing Spark and Flink workloads, covering parallelism settings, garbage‑collection tuning, out‑of‑memory mitigation, and full production‑grade configuration examples for both frameworks.
Part 1: Spark Optimization
1. Parallelism
Hudi defaults the input partition parallelism to 1500 to keep each Spark partition under the 2 GB limit (the limit was removed after Spark 2.4). If the input is larger, adjust the shuffle parallelism via hoodie.[insert|upsert|bulkinsert].shuffle.parallelism to at least inputDataSize/500MB.
6. GC Tuning
Follow Spark's GC tuning guidelines to avoid OutOfMemory errors. Use G1 or CMS collectors and add the following options to spark.executor.extraJavaOptions:
-XX:NewSize=1g -XX:SurvivorRatio=2 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof7. OutOfMemory Errors
If an OOM occurs, you can mitigate it by setting spark.memory.fraction=0.2 and spark.memory.storageFraction=0.2, allowing memory to spill instead of crashing, though performance may degrade.
8. Full Production Configuration
spark.driver.extraClassPath /etc/hive/conf
spark.driver.extraJavaOptions -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof
spark.driver.maxResultSize 2g
spark.driver.memory 4g
spark.executor.cores 1
spark.executor.extraJavaOptions -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof
spark.executor.id driver
spark.executor.instances 300
spark.executor.memory 6g
spark.rdd.compress true
spark.kryoserializer.buffer.max 512m
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled true
spark.sql.hive.convertMetastoreParquet false
spark.submit.deployMode cluster
spark.task.cpus 1
spark.task.maxFailures 4
spark.yarn.driver.memoryOverhead 1024
spark.yarn.executor.memoryOverhead 3072
spark.yarn.max.executor.failures 100Part 2: Flink Optimization
1. Available Flink Parameters
1. Table Parameters
Memory
2. Parallelism
Parallelism
3. Compaction
Compaction (only applicable to online compaction)
2. Flink Memory Optimization
1. Memory Parameters
Adjust memory-related settings as needed (see accompanying images for reference).
2. MOR (Merge-On-Read)
Switch the state backend to RocksDB (the default in‑memory backend consumes a lot of memory).
If memory permits, increase compaction.max_memory from the default 100 MB up to 1 GB.
Ensure the TaskManager (TM) allocates enough memory for each write task; for example, with a 4 GB TM running two StreamWriteFunction s, each can get ~2 GB, leaving buffer for network and other tasks.
Monitor compaction.max_memory and compaction.tasks to control memory usage and concurrency of compaction tasks.
Note: write.task.max.size - compaction.max_memory reserves buffer memory for each write task.
3. COW (Copy‑On‑Write)
Switch the state backend to RocksDB.
Increase both write.task.max.size and write.merge.max_memory (default 1 GB and 100 MB, can be raised to 2 GB and 1 GB).
Ensure TM memory is sufficient for each write task, leaving buffers for network and other task types.
Note: write.task.max.size - write.merge.max_memory reserves buffer memory for each write task.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
