Cloud Native 20 min read

How to Reverse Engineer Docker Images into Dockerfiles with Dive and Dedockify

This tutorial explains how to dissect Docker images, extract layer information using Dive and the Docker Engine API, and automatically reconstruct a functional Dockerfile with the open‑source Dedockify tool, covering simple examples, multi‑stage builds, and practical recovery steps.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Reverse Engineer Docker Images into Dockerfiles with Dive and Dedockify

Introduction

As public Docker registries such as Docker Hub become ubiquitous, developers and administrators often pull images from unknown sources, treating them as black boxes without verifying their safety or the Dockerfile that produced them. Rebuilding a Dockerfile from an existing image is possible because most of the information is stored in the image layers.

Using Dive

Dive is a visual tool that inspects each layer of a Docker image. To demonstrate, we create a minimal Dockerfile that copies three empty test files into a scratch base image, build it as example1, and then run Dive to explore the layer contents.

mkdir $HOME/test1
cd $HOME/test1
cat > Dockerfile <<EOF
FROM scratch
COPY testfile1 /
COPY testfile2 /
COPY testfile3 /
EOF

docker build . -t example1

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive:latest example1

Dive shows each layer, the files added, and the commands that created the layer (e.g., #(nop) COPY file:… in /), although the original filenames are hashed.

Docker History

The built‑in docker history command also lists each layer’s CreatedBy entry. Using the --no-trunc flag reveals the full command strings, which can be parsed to reconstruct the Dockerfile steps.

Using the Python Docker Engine API

Docker provides a Python client library to query image history programmatically. The following script prints the history of a given image:

#!/usr/bin/python3
import docker
cli = docker.APIClient(base_url='unix://var/run/docker.sock')
print(cli.history('example1'))

The output contains dictionaries with fields such as CreatedBy, Id, and Tags. By reversing the order of the steps and handling #(nop) markers, we can generate a readable Dockerfile.

Dedockify

Dedockify is an open‑source script that automates the above process. It retrieves the image history via the Docker API, parses each entry, reverses the command list, and prints a reconstructed Dockerfile. Example output for example1:

FROM example1:latest
COPY file:e3c862873fa89cbf... in /
COPY file:2a949ad55eee33f... in /
COPY file:aa717ff85b39d3ed... in /

When the original base image is known (e.g., ubuntu:latest), Dedockify correctly emits the proper FROM line, as shown with example2.

Testing Dedockify Limitations

We build a more realistic Dockerfile that uses ubuntu:latest as the base, creates directories, and copies files. After building example2, Dedockify reproduces the Dockerfile almost exactly, confirming that explicit base images are recovered correctly.

Arbitrary Dockerfile Reconstruction

We load a pre‑encoded image ( example3) directly into Docker, then run Dedockify to obtain a skeleton Dockerfile. The script identifies WORKDIR changes and COPY commands, but original filenames remain hashed. By inspecting the image with Dive, we locate the actual files (zero‑byte test files and a small hello binary) and manually adjust the Dockerfile.

After editing, the final Dockerfile looks like:

FROM scratch
WORKDIR /testdir1
COPY testfile1 .
WORKDIR /testdir2
COPY testfile2 .
WORKDIR /testdir3
COPY testfile3 .
WORKDIR /app
COPY hello .
ENTRYPOINT ["/app/hello"]

Building this Dockerfile reproduces an image identical to the original example3, as verified by running the container and by re‑examining it with Dive.

Postscript

Future work could extend Dedockify to automatically extract file contents from each layer, infer the correct base image (scratch vs. another image), and handle multi‑stage builds more intelligently, ultimately providing a fully automated Docker image reverse‑engineering pipeline.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerPythonDockerfilededockifydiveimage-reverse-engineering
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.