Reverse‑Engineer Docker Images into Dockerfiles with Dive and Dedockify
This guide explains how to dissect Docker images, inspect their layers with Dive, extract build history via Docker commands and the Python Docker Engine API, and automatically reconstruct a functional Dockerfile using the open‑source Dedockify tool, complete with code snippets and practical examples.
Introduction
Public Docker registries make it easy to pull images from unknown sources, turning containers into black boxes whose provenance and security are hard to verify. By examining a Docker image’s internal layers, we can recover most of the information needed to rebuild its Dockerfile.
Using Dive
Dive is a visual tool that inspects each layer of an image. The workflow starts with a minimal Dockerfile, builds an image, and then runs Dive to explore the resulting layers.
mkdir $HOME/test1
cd $HOME/test1
cat > Dockerfile << EOF
FROM scratch
COPY testfile1 /
COPY testfile2 /
COPY testfile3 /
EOF
docker build . -t example1
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive:latest example1Dive shows the three COPY commands, the hash of each added file, and the directory tree for each layer.
Docker History
The built‑in docker history command lists the commands that created each layer. Adding --no‑trunc reveals the full #(nop) COPY … lines, which are essential for reconstructing the Dockerfile.
docker history example1 --no-trunc
# output shows full COPY commands with hashed file identifiersPython Docker Engine API
Docker provides a Python client for the Engine API. The following script queries an image’s history and prints the raw JSON structures.
#!/usr/bin/python3
import docker
cli = docker.APIClient(base_url='unix://var/run/docker.sock')
print(cli.history('example1'))The output contains the CreatedBy strings that correspond to the original Dockerfile instructions.
Dedockify
Dedockify is a Python utility that parses the history data, reverses the command order, and prints a reconstructed Dockerfile. The core logic extracts #(nop) lines as Dockerfile directives and formats RUN statements.
from sys import argv
import docker
class ImageNotFound(Exception):
pass
class MainObj:
def __init__(self):
self.commands = []
self.cli = docker.APIClient(base_url='unix://var/run/docker.sock')
self._get_image(argv[-1])
self.hist = self.cli.history(self.img['RepoTags'][0])
self._parse_history()
self.commands.reverse()
self._print_commands()
# ... (methods omitted for brevity) ...
__main__ = MainObj()Rebuilding Dockerfiles
Running Dedockify on example1 yields a Dockerfile that correctly lists the three COPY commands but mistakenly assumes the base image is example1:latest instead of scratch. A more realistic example uses an ubuntu:latest base, which Dedockify reconstructs accurately:
$ python3 dedockify.py 05651f084d67
FROM ubuntu:latest
RUN /bin/sh -c mkdir testdir1
COPY file:cc4f6e89... in /testdir1
RUN /bin/sh -c mkdir testdir2
COPY file:a04cdcdf... in /testdir2
RUN /bin/sh -c mkdir testdir3
COPY file:2ed8ccde... in /testdir3For a multi‑stage image ( example3) the tool recovers the final stage’s commands, including WORKDIR and COPY of a compiled binary, allowing a full rebuild.
Limitations and Further Work
Dedockify cannot recover original file names (they are hashed) or multi‑stage build information that is not present in the final image. Future improvements could automate layer‑by‑layer analysis, extract files directly from containers, and infer the correct base image (e.g., scratch) to produce a fully functional Dockerfile without manual tweaks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
