Cloud Native 20 min read

Building a CI/CD Pipeline with GitHub Actions for Testing and Deploying Apache Airflow DAGs to Amazon MWAA

This guide explains how to create a robust GitHub Actions CI/CD workflow that automatically tests Apache Airflow DAGs using pytest, flake8, and Black, then securely deploys them to Amazon Managed Workflows for Apache Airflow (MWAA) with optional Git hooks and fork‑and‑pull collaboration models.

DevOps Cloud Academy
DevOps Cloud Academy
DevOps Cloud Academy
Building a CI/CD Pipeline with GitHub Actions for Testing and Deploying Apache Airflow DAGs to Amazon MWAA

Introduction

In this article we will learn how to build an effective CI/CD workflow with GitHub Actions for our Apache Airflow DAGs. Using DevOps concepts of continuous integration and continuous delivery, we will automatically test and deploy Airflow DAGs to Amazon Managed Workflows for Apache Airflow (Amazon MWAA) on AWS.

Technologies

Apache Airflow

According to the documentation, Apache Airflow is an open‑source platform for programmatically authoring, scheduling, and monitoring workflows. With Airflow you create workflows as directed acyclic graphs (DAGs) written in Python.

Amazon Managed Workflows for Apache Airflow (MWAA)

Amazon MWAA is a highly available, secure, fully managed service for orchestrating Apache Airflow workflows. MWAA automatically scales execution capacity and integrates with AWS security services for fast, secure data access.

GitHub Actions

GitHub Actions makes CI/CD automation easy. It allows you to build, test, and deploy code directly from GitHub, triggered by events such as pushes, issue creation, or releases, and you can leverage community‑maintained actions.

Glossary

DataOps

DataOps is an automated, process‑oriented approach that data teams use to improve data analysis quality and shorten cycle time. It applies agile methods across the entire data lifecycle, from preparation to reporting.

DevOps

DevOps combines software development (Dev) and IT operations (Ops) practices to shorten system development lifecycles and enable continuous delivery of high‑quality software.

DevOps is a set of practices aimed at shortening the time between committing a change and that change being in production while ensuring high quality. – Wikipedia

Fast Failure

A fast‑failure system reports any condition that may indicate a fault immediately, allowing errors to be discovered early in the SDLC.

Source Code

All source code for this demo, including GitHub Actions, pytest unit tests, and Git hooks, is open‑source on GitHub.

Architecture

The diagram below shows the architecture used in a recent blog post and video demo, where Apache Airflow programmatically loads data from Amazon Redshift and uploads it to an Amazon S3‑based data lake.

We will review how earlier DAGs were developed, tested, and deployed to MWAA using increasingly effective CI/CD workflows. The demonstrated workflow can also be applied to other Airflow resources such as SQL scripts, configuration files, Python requirements, and plugins.

Workflows

No DevOps

This minimal viable workflow loads a DAG directly into Amazon MWAA without applying CI/CD principles. Changes are made locally, copied to an S3 bucket, and automatically synced to MWAA. The changes are also (ideally) pushed back to a central Git repository.

The workflow has two major problems: the DAG can become out of sync between S3 and GitHub, and there is no fast‑failure DevOps concept, so errors may only be discovered after the DAG is imported into MWAA.

GitHub Actions

Compared with the previous workflow, a major improvement is using GitHub Actions to test and deploy code after it is pushed to GitHub. Although the code is still pushed directly to the main branch, the chance of a faulty DAG reaching MWAA is greatly reduced.

GitHub Actions also eliminate human error that could cause the DAG not to sync to S3, and they remove the need for Airflow developers to have direct access to the S3 bucket, improving security.

Test Types

The first GitHub Action test_dags.yml triggers on pushes to the dags directory and on pull‑request events to main . It runs a series of tests: Python dependency checks, code style, code quality, DAG import errors, and unit tests. These tests catch problems before the second Action syncs the DAG to S3.

name: Test DAGs

on:
  push:
    paths:
      - 'dags/**'
  pull_request:
    branches:
      - main

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.7'
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements/requirements.txt
          pip check
      - name: Lint with Flake8
        run: |
          pip install flake8
          flake8 --ignore E501 dags --benchmark -v
      - name: Confirm Black code compliance (psf/black)
        run: |
          pip install pytest-black
          pytest dags --black -v
      - name: Test with Pytest
        run: |
          pip install pytest
          cd tests || exit
          pytest tests.py -v

Python Dependencies

This test installs the modules listed in requirements.txt and checks for missing or conflicting packages.

- name: Install dependencies
  run: |
    python -m pip install --upgrade pip
    pip install -r requirements/requirements.txt
    pip check

It is essential to develop DAGs with the same Python version and module versions as the Airflow environment. You can retrieve the Python version and installed modules inside Airflow with:

python3 --version; python3 -m pip list

Airflow’s latest stable version is 2.2.2 (released 2021‑11‑15). At the time of writing, Amazon MWAA runs version 2.0.2 (released 2021‑04‑19) with Python 3.7.10.

Flake8

Flake8 is a modular source‑code checker that enforces style consistency according to PEP 8. In this demo we ignore rule E501 (line length) to keep the example concise.

- name: Lint with Flake8
  run: |
    pip install flake8
    flake8 --ignore E501 dags --benchmark -v

Black

Black is an uncompromising code formatter that makes all Python code look the same, speeding up code review. The repository uses a pre‑commit Git hook to run Black before committing.

- name: Confirm Black code compliance (psf/black)
  run: |
    pip install pytest-black
    pytest dags --black -v

pytest

pytest is a mature, full‑featured testing framework for Python. The test_dags.yml action runs the tests.py file, which contains several unit tests that verify DAG import, naming conventions, tags, owners, retry limits, and more.

import os
import sys
import pytest
from airflow.models import DagBag

sys.path.append(os.path.join(os.path.dirname(__file__), "../dags"))
sys.path.append(os.path.join(os.path.dirname(__file__), "../dags/utilities"))

os.environ["AIRFLOW_VAR_DATA_LAKE_BUCKET"] = "test_bucket"
# ... other environment variables ...

@pytest.fixture(params=["../dags/"])
def dag_bag(request):
    return DagBag(dag_folder=request.param, include_examples=False)

def test_no_import_errors(dag_bag):
    assert not dag_bag.import_errors
# ... additional tests omitted for brevity ...

Fork & Pull

Two collaborative development models are recommended:

Shared‑repository model using feature branches that are reviewed and merged into main .

Fork‑and‑pull model: fork the repository, make changes, open a pull request, and merge after approval and successful tests.

The fork‑and‑pull model greatly reduces the chance of merging bad code before all tests pass.

Sync DAGs to S3

The second GitHub Action sync_dags.yml runs after test_dags.yml completes successfully (or after a pull request is merged) and syncs the dags folder to an S3 bucket.

name: Sync DAGs

on:
  workflow_run:
    workflows:
      - 'Test DAGs'
    types:
      - completed
  pull_request:
    types:
      - closed

jobs:
  deploy:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    steps:
      - uses: actions/checkout@master
      - uses: jakejarvis/s3-sync-action@master
        env:
          AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_REGION: 'us-east-1'
          SOURCE_DIR: 'dags'
          DEST_DIR: 'dags'

The action requires three encrypted GitHub secrets (AWS credentials and bucket name) that must be created in the repository settings.

Local Testing and Git Hooks

To further improve the CI/CD pipeline, use Git hooks to run tests locally before pushing code. A pre‑push hook can execute the same test suite used in the GitHub Action, preventing bad code from ever reaching the remote repository.

#!/bin/sh
# do nothing if there are no commits to push
if [ -z "$(git log @{u}.. )" ]; then
  exit 0
fi
sh ./run_tests_locally.sh

Make the hook executable:

chmod 755 .git/hooks/pre-push

The run_tests_locally.sh script runs flake8, Black, and pytest locally:

#!/bin/sh
echo "Starting Flake8 test..."
flake8 --ignore E501 dags --benchmark || exit 1
echo "Starting Black test..."
python3 -m pytest --cache-clear
python3 -m pytest dags/ --black -v || exit 1
echo "Starting Pytest tests..."
cd tests || exit
python3 -m pytest tests.py -v || exit 1
echo "All tests completed successfully! 🥳"

References

Testing Airflow DAGs (documentation)

Testing Airflow code (YouTube video)

GitHub: Building and Testing Python (documentation)

Manning: Chapter 9 – Data Pipelines with Apache Airflow

CI/CDDevOpsAWSDataOpsGitHub ActionsApache AirflowMWAA
DevOps Cloud Academy
Written by

DevOps Cloud Academy

Exploring industry DevOps practices and technical expertise.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.