Operations 8 min read

How Harbor Enables Seamless Container Image Replication Across Registries

This article explains the design and implementation of Harbor's policy‑based Docker image replication, detailing its architecture, job service workflow, state‑machine handling, and how it reduces storage‑specific dependencies while simplifying large‑scale container registry synchronization.

Efficient Ops
Efficient Ops
Efficient Ops
How Harbor Enables Seamless Container Image Replication Across Registries

Introduction

Container image replication and publishing have long lacked robust tools, creating a major pain point in development and operations. The open‑source Harbor Registry offers powerful image replication/synchronization, becoming a favorite feature among users.

Harbor Project Overview

Harbor, open‑sourced by VMware in March, helps users quickly set up an enterprise‑grade registry with a graphical UI, role‑based access control, remote image replication, AD/LDAP integration, audit logs, and native Chinese support. Since its launch, it has garnered over 900 stars and 200 forks on GitHub.

The latest release adds policy‑based Docker image replication, allowing synchronization across data centers and environments via an intuitive management interface.

Feature Overview

Harbor centers on the concept of a "project". By configuring a replication policy for a project, administrators specify which images to copy and the target registry instance, including its address and credentials.

When a policy is activated, all images in the source project are copied to the destination. Subsequent pushes or deletions in the source are automatically synchronized as long as the policy remains active.

In large container clusters, multiple registry servers can be load‑balanced using a master‑slave publishing model, allowing a single push to propagate to many instances. Hierarchical multi‑level publishing is also supported.

Design and Implementation

Traditional image replication relied on copying raw data via tools like rsync or storage‑specific object copy mechanisms, which tied the process to the underlying storage backend.

Harbor avoids this dependency by invoking the registry API to download and transfer images, making the process storage‑agnostic.

A new component, the Job Service, manages replication tasks. When a project‑level replication is triggered, a series of image‑level jobs are created and scheduled by the Job Service, which updates each job’s status in the database for UI visibility.

The Job Service receives requests via a REST API, faces two main challenges: rate‑limiting massive replication requests to avoid excessive I/O, and handling policy changes that may invalidate running tasks.

It implements a producer‑consumer model using a task queue, dispatcher, and worker pool. Go channels are used: the scheduler places jobs into a channel, the dispatcher pulls jobs and assigns them to workers, and completed workers return to another channel for reuse, enabling easy concurrency control.

Each worker runs a state machine where handlers are registered for different states, allowing task cancellation or error handling. The state machine is extensible, and its transitions are illustrated below.

For remote synchronization, the Running state is further divided into sub‑states, as shown:

The process begins by downloading the manifest of a specific tag from the source Harbor, analyzing its blobs, and checking each blob’s existence on the target. Missing blobs are transferred, followed by a manifest check to avoid redundant uploads and prevent endless sync loops.

Repeating this for every tag of an image completes full synchronization.

Conclusion and Outlook

This article detailed the design and implementation of Harbor’s new remote image replication feature. Future enhancements may include richer policy controls, filtering options, and scheduling capabilities, as well as continued user feedback integration.

For more information, visit the Harbor project at https://github.com/vmware/harbor.

cloud nativeoperationsHarborImage ReplicationContainer Registryjob service
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.