Operations 4 min read

Why Is Airflow Draining CPU? A Step‑by‑Step Diagnosis and Fix

A high‑CPU anomaly on a Spark‑enabled machine was traced through application checks, network TIME_WAIT analysis, and Airflow inspection, leading to kernel tweaks and an Airflow configuration change that finally restored normal CPU usage.

Data Thinking Notes
Data Thinking Notes
Data Thinking Notes
Why Is Airflow Draining CPU? A Step‑by‑Step Diagnosis and Fix

1. Problem Phenomenon

Machine A runs Spark Master, Airflow, Hive, Sqoop and other heavy workloads, resulting in high memory and CPU usage. Over the past three days the CPU stayed above 95% for most of the day, especially after 18:00 when Spark tasks are few.

2. Investigation Process

2.1 Check Applications

At around 09:30 the CPU was high while five SparkSubmit tasks were running; no abnormal applications were found and no single app showed excessive CPU or memory consumption.

2.2 Check Network Connections

netstat revealed many TIME_WAIT connections, mainly to MySQL on hadoop11, exceeding 3,700 connections. The kernel parameters were adjusted in

/etc/sysctl.conf

:

<code>net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1</code>

After applying with

/sbin/sysctl -p

, TCP connections normalized but CPU remained high, indicating the issue was not caused by network sockets.

2.3 Check Airflow

Machine A (hadoop16) connects to MySQL on hadoop11 only via Airflow. Airflow runs webserver, scheduler, master, and worker processes, using CeleryExecutor with a parallelism of 16.

(1) Confirm Airflow as the cause

Restarting Airflow temporarily drops CPU usage, which spikes again once Airflow starts, confirming a correlation but not solving the root problem.

(2) Research similar issues

References include a StackOverflow discussion and the Airflow documentation on

min_file_process_interval

.

(3) Apply fix

The Airflow configuration

airflow.cfg

was updated:

<code>min_file_process_interval = 10</code>

After restarting Airflow, CPU usage returned to normal and matched the new file‑scan interval setting.

PerformanceoperationsLinuxCPUSparkAirflow
Data Thinking Notes
Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.