Operations 4 min read

Why Is Airflow Draining CPU? A Step‑by‑Step Diagnosis and Fix

A high‑CPU anomaly on a Spark‑enabled machine was traced through application checks, network TIME_WAIT analysis, and Airflow inspection, leading to kernel tweaks and an Airflow configuration change that finally restored normal CPU usage.

Data Thinking Notes
Data Thinking Notes
Data Thinking Notes
Why Is Airflow Draining CPU? A Step‑by‑Step Diagnosis and Fix

1. Problem Phenomenon

Machine A runs Spark Master, Airflow, Hive, Sqoop and other heavy workloads, resulting in high memory and CPU usage. Over the past three days the CPU stayed above 95% for most of the day, especially after 18:00 when Spark tasks are few.

2. Investigation Process

2.1 Check Applications

At around 09:30 the CPU was high while five SparkSubmit tasks were running; no abnormal applications were found and no single app showed excessive CPU or memory consumption.

2.2 Check Network Connections

netstat revealed many TIME_WAIT connections, mainly to MySQL on hadoop11, exceeding 3,700 connections. The kernel parameters were adjusted in /etc/sysctl.conf:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1

After applying with /sbin/sysctl -p, TCP connections normalized but CPU remained high, indicating the issue was not caused by network sockets.

2.3 Check Airflow

Machine A (hadoop16) connects to MySQL on hadoop11 only via Airflow. Airflow runs webserver, scheduler, master, and worker processes, using CeleryExecutor with a parallelism of 16.

(1) Confirm Airflow as the cause

Restarting Airflow temporarily drops CPU usage, which spikes again once Airflow starts, confirming a correlation but not solving the root problem.

(2) Research similar issues

References include a StackOverflow discussion and the Airflow documentation on min_file_process_interval.

(3) Apply fix

The Airflow configuration airflow.cfg was updated: min_file_process_interval = 10 After restarting Airflow, CPU usage returned to normal and matched the new file‑scan interval setting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceLinuxCPUSpark
Data Thinking Notes
Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.