Cloud Native 16 min read

How to Parse Container Images in Go with go‑containerregistry

This article explains how to programmatically parse container images using Google's go‑containerregistry library, covering basic concepts such as ImageIndex, Image Manifest, layers, and diffIDs, and demonstrates retrieving image metadata, system packages, and Java application dependencies through code examples and practical Go snippets.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How to Parse Container Images in Go with go‑containerregistry

Background

Container images are essential in modern development workflows. While developers usually build, push, and run images, security‑focused platforms need to scan and analyze the image itself. This article shows how to parse a container image programmatically using Go.

go‑containerregistry

go‑containerregistry is an open‑source Go library from Google that provides a high‑level API for accessing container images. The underlying resources can be remote registries, local tar files, or a Docker daemon. The project also ships the crane and gcrane command‑line tools for interacting with registries.

Basic Interfaces

Image concepts

ImageIndex – An OCI ImageIndex that groups multiple architecture‑specific images under a single tag.

Image Manifest – Describes a single image and lists the digests of all its layers.

Image Config – Holds metadata such as creation time, architecture, and the uncompressed layer diffID s. The Config’s hash is the image ID shown by docker image.

Layer – A read‑only filesystem layer. Each layer has a compressed digest (used in registries) and an uncompressed diffID (used locally).

type ImageIndex interface {
    MediaType() (types.MediaType, error)
    Digest() (Hash, error)
    Size() (int64, error)
    IndexManifest() (*IndexManifest, error)
    RawManifest() ([]byte, error)
    Image(Hash) (Image, error)
    ImageIndex(Hash) (ImageIndex, error)
}

type Image interface {
    Layers() ([]Layer, error)
    MediaType() (types.MediaType, error)
    Size() (int64, error)
    ConfigName() (Hash, error)
    ConfigFile() (*ConfigFile, error)
    RawConfigFile() ([]byte, error)
    Digest() (Hash, error)
    Manifest() (*Manifest, error)
    RawManifest() ([]byte, error)
    LayerByDigest(Hash) (Layer, error)
    LayerByDiffID(Hash) (Layer, error)
}

type Layer interface {
    Digest() (Hash, error)
    DiffID() (Hash, error)
    Compressed() (io.ReadCloser, error)
    Uncompressed() (io.ReadCloser, error)
    Size() (int64, error)
    MediaType() (types.MediaType, error)
}

Fetching Image Metadata

Using remote.Get with a parsed reference retrieves only the manifest list; the call does not download the full image. desc.Image() then resolves the appropriate image for the host architecture, still without pulling layers. All data is lazily loaded, so network traffic occurs only when a specific piece of information is accessed.

package main
import (
    "github.com/google/go-containerregistry/pkg/authn"
    "github.com/google/go-containerregistry/pkg/name"
    "github.com/google/go-containerregistry/pkg/v1/remote"
)
func main() {
    ref, err := name.ParseReference("xxx")
    if err != nil { panic(err) }
    img, err := tryRemote(context.TODO(), ref, GetDockerOption())
    if err != nil { panic(err) }
    // use img (type v1.Image) here
}

Reading System Packages

After obtaining an Image instance, Image.LayerByDiffID returns a specific layer. Calling Layer.Uncompressed() yields an io.Reader for the tar stream, which can be walked to extract package information from files such as apk manifests.

func tarOnceOpener(r io.Reader) func() ([]byte, error) {
    var once sync.Once
    var b []byte
    var err error
    return func() ([]byte, error) {
        once.Do(func() { b, err = ioutil.ReadAll(r) })
        if err != nil { return nil, xerrors.Errorf("unable to read tar file: %w", err) }
        return b, nil
    }
}

func WalkLayerTar(layer io.Reader, analyzeFn WalkFunc) ([]string, []string, error) {
    tr := tar.NewReader(layer)
    for {
        hdr, err := tr.Next()
        if err == io.EOF { break }
        if err != nil { return nil, nil, xerrors.Errorf("failed to extract the archive: %w", err) }
        // process hdr.Name, handle whiteout files, call analyzeFn(...)
    }
    return nil, nil, nil
}

Handling Whiteout Files

OverlayFS uses special files to represent deletions: .wh..wh..opq – Indicates that the entire directory has been removed. .wh. prefix – Marks an individual file as deleted in the current layer.

These entries are recorded and skipped during analysis because the deletion is logical, not physical.

Analyzing Java Applications

The same layer‑reading approach can open JAR files, initialize a ZIP reader, and parse MANIFEST.MF, pom.properties, or other metadata to determine artifact coordinates and versions.

func parseArtifact(c conf, fileName string, r io.ReadCloser) ([]types.Library, error) {
    defer r.Close()
    b, err := ioutil.ReadAll(r)
    if err != nil { return nil, xerrors.Errorf("unable to read the jar file: %w", err) }
    zr, err := zip.NewReader(bytes.NewReader(b), int64(len(b)))
    if err != nil { return nil, xerrors.Errorf("zip error: %w", err) }
    // iterate over files, handle pom.properties, MANIFEST.MF, nested jars, etc.
    return libs, nil
}

References

https://github.com/google/go-containerregistry

https://github.com/aquasecurity/fanal

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerImage Analysissecurity scanningContainer ImageOCIgo-containerregistry
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.