How to Parse Container Images in Go with go‑containerregistry
This article explains how to programmatically parse container images using Google's go‑containerregistry library, covering basic concepts such as ImageIndex, Image Manifest, layers, and diffIDs, and demonstrates retrieving image metadata, system packages, and Java application dependencies through code examples and practical Go snippets.
Background
Container images are essential in modern development workflows. While developers usually build, push, and run images, security‑focused platforms need to scan and analyze the image itself. This article shows how to parse a container image programmatically using Go.
go‑containerregistry
go‑containerregistry is an open‑source Go library from Google that provides a high‑level API for accessing container images. The underlying resources can be remote registries, local tar files, or a Docker daemon. The project also ships the crane and gcrane command‑line tools for interacting with registries.
Basic Interfaces
Image concepts
ImageIndex – An OCI ImageIndex that groups multiple architecture‑specific images under a single tag.
Image Manifest – Describes a single image and lists the digests of all its layers.
Image Config – Holds metadata such as creation time, architecture, and the uncompressed layer diffID s. The Config’s hash is the image ID shown by docker image.
Layer – A read‑only filesystem layer. Each layer has a compressed digest (used in registries) and an uncompressed diffID (used locally).
type ImageIndex interface {
MediaType() (types.MediaType, error)
Digest() (Hash, error)
Size() (int64, error)
IndexManifest() (*IndexManifest, error)
RawManifest() ([]byte, error)
Image(Hash) (Image, error)
ImageIndex(Hash) (ImageIndex, error)
}
type Image interface {
Layers() ([]Layer, error)
MediaType() (types.MediaType, error)
Size() (int64, error)
ConfigName() (Hash, error)
ConfigFile() (*ConfigFile, error)
RawConfigFile() ([]byte, error)
Digest() (Hash, error)
Manifest() (*Manifest, error)
RawManifest() ([]byte, error)
LayerByDigest(Hash) (Layer, error)
LayerByDiffID(Hash) (Layer, error)
}
type Layer interface {
Digest() (Hash, error)
DiffID() (Hash, error)
Compressed() (io.ReadCloser, error)
Uncompressed() (io.ReadCloser, error)
Size() (int64, error)
MediaType() (types.MediaType, error)
}Fetching Image Metadata
Using remote.Get with a parsed reference retrieves only the manifest list; the call does not download the full image. desc.Image() then resolves the appropriate image for the host architecture, still without pulling layers. All data is lazily loaded, so network traffic occurs only when a specific piece of information is accessed.
package main
import (
"github.com/google/go-containerregistry/pkg/authn"
"github.com/google/go-containerregistry/pkg/name"
"github.com/google/go-containerregistry/pkg/v1/remote"
)
func main() {
ref, err := name.ParseReference("xxx")
if err != nil { panic(err) }
img, err := tryRemote(context.TODO(), ref, GetDockerOption())
if err != nil { panic(err) }
// use img (type v1.Image) here
}Reading System Packages
After obtaining an Image instance, Image.LayerByDiffID returns a specific layer. Calling Layer.Uncompressed() yields an io.Reader for the tar stream, which can be walked to extract package information from files such as apk manifests.
func tarOnceOpener(r io.Reader) func() ([]byte, error) {
var once sync.Once
var b []byte
var err error
return func() ([]byte, error) {
once.Do(func() { b, err = ioutil.ReadAll(r) })
if err != nil { return nil, xerrors.Errorf("unable to read tar file: %w", err) }
return b, nil
}
}
func WalkLayerTar(layer io.Reader, analyzeFn WalkFunc) ([]string, []string, error) {
tr := tar.NewReader(layer)
for {
hdr, err := tr.Next()
if err == io.EOF { break }
if err != nil { return nil, nil, xerrors.Errorf("failed to extract the archive: %w", err) }
// process hdr.Name, handle whiteout files, call analyzeFn(...)
}
return nil, nil, nil
}Handling Whiteout Files
OverlayFS uses special files to represent deletions: .wh..wh..opq – Indicates that the entire directory has been removed. .wh. prefix – Marks an individual file as deleted in the current layer.
These entries are recorded and skipped during analysis because the deletion is logical, not physical.
Analyzing Java Applications
The same layer‑reading approach can open JAR files, initialize a ZIP reader, and parse MANIFEST.MF, pom.properties, or other metadata to determine artifact coordinates and versions.
func parseArtifact(c conf, fileName string, r io.ReadCloser) ([]types.Library, error) {
defer r.Close()
b, err := ioutil.ReadAll(r)
if err != nil { return nil, xerrors.Errorf("unable to read the jar file: %w", err) }
zr, err := zip.NewReader(bytes.NewReader(b), int64(len(b)))
if err != nil { return nil, xerrors.Errorf("zip error: %w", err) }
// iterate over files, handle pom.properties, MANIFEST.MF, nested jars, etc.
return libs, nil
}References
https://github.com/google/go-containerregistry
https://github.com/aquasecurity/fanal
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
