How to Reverse Engineer Docker Images into Dockerfiles with Dive and Dedockify
This tutorial explains how to dissect Docker images, extract layer information using Dive and the Docker Engine API, and automatically reconstruct a functional Dockerfile with the open‑source Dedockify tool, covering simple examples, multi‑stage builds, and practical recovery steps.
Introduction
As public Docker registries such as Docker Hub become ubiquitous, developers and administrators often pull images from unknown sources, treating them as black boxes without verifying their safety or the Dockerfile that produced them. Rebuilding a Dockerfile from an existing image is possible because most of the information is stored in the image layers.
Using Dive
Dive is a visual tool that inspects each layer of a Docker image. To demonstrate, we create a minimal Dockerfile that copies three empty test files into a scratch base image, build it as example1, and then run Dive to explore the layer contents.
mkdir $HOME/test1
cd $HOME/test1
cat > Dockerfile <<EOF
FROM scratch
COPY testfile1 /
COPY testfile2 /
COPY testfile3 /
EOF
docker build . -t example1
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive:latest example1Dive shows each layer, the files added, and the commands that created the layer (e.g., #(nop) COPY file:… in /), although the original filenames are hashed.
Docker History
The built‑in docker history command also lists each layer’s CreatedBy entry. Using the --no-trunc flag reveals the full command strings, which can be parsed to reconstruct the Dockerfile steps.
Using the Python Docker Engine API
Docker provides a Python client library to query image history programmatically. The following script prints the history of a given image:
#!/usr/bin/python3
import docker
cli = docker.APIClient(base_url='unix://var/run/docker.sock')
print(cli.history('example1'))The output contains dictionaries with fields such as CreatedBy, Id, and Tags. By reversing the order of the steps and handling #(nop) markers, we can generate a readable Dockerfile.
Dedockify
Dedockify is an open‑source script that automates the above process. It retrieves the image history via the Docker API, parses each entry, reverses the command list, and prints a reconstructed Dockerfile. Example output for example1:
FROM example1:latest
COPY file:e3c862873fa89cbf... in /
COPY file:2a949ad55eee33f... in /
COPY file:aa717ff85b39d3ed... in /When the original base image is known (e.g., ubuntu:latest), Dedockify correctly emits the proper FROM line, as shown with example2.
Testing Dedockify Limitations
We build a more realistic Dockerfile that uses ubuntu:latest as the base, creates directories, and copies files. After building example2, Dedockify reproduces the Dockerfile almost exactly, confirming that explicit base images are recovered correctly.
Arbitrary Dockerfile Reconstruction
We load a pre‑encoded image ( example3) directly into Docker, then run Dedockify to obtain a skeleton Dockerfile. The script identifies WORKDIR changes and COPY commands, but original filenames remain hashed. By inspecting the image with Dive, we locate the actual files (zero‑byte test files and a small hello binary) and manually adjust the Dockerfile.
After editing, the final Dockerfile looks like:
FROM scratch
WORKDIR /testdir1
COPY testfile1 .
WORKDIR /testdir2
COPY testfile2 .
WORKDIR /testdir3
COPY testfile3 .
WORKDIR /app
COPY hello .
ENTRYPOINT ["/app/hello"]Building this Dockerfile reproduces an image identical to the original example3, as verified by running the container and by re‑examining it with Dive.
Postscript
Future work could extend Dedockify to automatically extract file contents from each layer, infer the correct base image (scratch vs. another image), and handle multi‑stage builds more intelligently, ultimately providing a fully automated Docker image reverse‑engineering pipeline.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
