Operations 10 min read

Automate Harbor Image Cleanup with Python: A Step‑by‑Step Guide

This article explains how to use a Python script to automatically delete outdated images from a private Harbor registry, detailing the background, approach, deletion workflow, current limitations, and providing the full source code for easy adaptation.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Automate Harbor Image Cleanup with Python: A Step‑by‑Step Guide

Background

The company is adopting Kubernetes and has set up a private Harbor image registry. Developers push code multiple times daily, causing the number of built images to grow rapidly. Because the Harbor server has limited disk space and many images become unnecessary, manually deleting them via the Harbor UI is time‑consuming. An automated script is therefore needed; a previous shell script had issues, and the author now chooses Python for the solution.

Approach

Image tags follow a timestamp format such as 20190411.11.23 or 20181212.12.12 (YYYYMMDD.HH.MM). Deletions target an entire month of images, focusing on older tags. The script interactively lets the user select a project, repository, and image type, where the type is derived from the first six characters of the tag (e.g., 201904, 201812).

Deletion Process

Select project → select repository under the project → select image type → delete.

Limitations

This is an initial version; many features are unfinished and no optimization has been performed. The script is intended solely to achieve the desired result, and feedback is welcomed. Below are the concrete scripts, used together, written for Python 3.6. Replace the placeholder with your own Harbor address in clean_harbor_image.py.

#!/usr/bin/env python
#--coding:utf-8--
import requests
import sys,json,re
import test
class Harbor_API:
    def __init__(self):
        self.login_user = 'admin'
        self.login_password = 'Harbor12345'
        self.login_url = 'https://xxx.xx.x.xxx/login'
        self.projects_url = 'https://xxx.xx.x.xxx/api/projects'
        self.repo_url = "https://xxx.xx.x.xxx/api/repositories?project_id="
        self.image_url = "https://xxx.xx.x.xxx/api/repositories/"
        self.headers = {
            'Host':'xxx.xx.x.xxx',
            'Origin':'https://xxx.xx.x.xxx',
            'Referer':'https://xxx.xx.x.xxx/harbor/sign-in',
            'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
        }
        self.data = {'principal': self.login_user, 'password': self.login_password}
        self.s = requests.Session()
    def get_repo(self,project_id):
        url = self.repo_url + str(project_id)
        res = self.s.get(url)
        all_repo = json.loads(res.text)
        repo_data = []
        for repo in all_repo:
            data = {}
            data["id"] = repo["id"]
            data["name"] = repo["name"]
            repo_data.append(data)
        return repo_data
    def get_image(self,project_id,repo_name):
        url = self.repo_url+str(project_id)+("&q=%s") % repo_name
        res = self.s.get(url)
        count = json.loads(res.text)[0]["tags_count"]
        return count
    def get_image_tags(self,project_name,repo_name):
        url = self.image_url+project_name+"%2F"+repo_name+"/tags/"
        res = self.s.get(url)
        all_tags = json.loads(res.text)
        tags_data = []
        for tag in all_tags:
            tags_data.append(tag["name"])
        tags_seri1 = []
        for i in tags_data:
            tags_seri1.append(i[0:6])
        tags_seri2 = []
        [tags_seri2.append(i) for i in tags_seri1 if not i in tags_seri2]
        tags_seri3 = []
        for i in tags_seri2:
            data = {}
            data["name"] = i
            data["count"] = tags_seri1.count(i)
            tags_seri3.append(data)
        return tags_seri3
    def get_projects(self):
        res = self.s.get(self.projects_url)
        repo_project = json.loads(res.text)
        project_data = []
        for project in repo_project:
            data = {}
            data["id"] = project["project_id"]
            data["name"] = project["name"]
            project_data.append(data)
        return project_data
    def delete_image(self,all_tags,project_name,repo_name,seri):
        url = self.image_url+project_name+"%2F"+repo_name+"/tags/"
        res = self.s.get(url)
        all_tags = json.loads(res.text)
        tags_data = []
        for tag in all_tags:
            tags_data.append(tag["name"])
        url2 = self.image_url + project_name + "%2F" + repo_name + "/tags/"
        tag = []
        for i in tags_data:
            ret = re.findall(r"%s.*" % seri,i)
            if not ret:
                continue
            else:
                tag.append(ret[0])
        for i in tag:
            url2 = url + i
            print(url2)
            ret = self.s.delete(url2)
        return ret
    def login(self):
        res = self.s.post(self.login_url, headers=self.headers, data=self.data, verify=False)
        return res.status_code
def run():
    ss = Harbor_API()
    status_code = ss.login()
    if status_code == 200:
        all_projects = ss.get_projects()
        print("--------当前harbor下以下项目-------")
        id_list = []
        for i in all_projects:
            print("id:%s-----项目名:%s" % (i["id"],i["name"]))
            id_list.append(i["id"])
        while True:
            project_id = input("请输入上面的项目id,查看该项目下的镜像仓库:")
            project_id = int(project_id.strip())
            if project_id in id_list:
                all_repo = ss.get_repo(project_id)
                print("--------当前项目下有以下镜像仓库-------")
                repo_id_list = []
                for i in all_repo:
                    print("id:%s-----仓库名:%s" % (i["id"],i["name"]))
                    repo_id_list.append(i["id"])
                while True:
                    repo_id = input("请输入上面的仓库id,查看该项目下的镜像类别和数量:")
                    repo_id = repo_id.strip()
                    if int(repo_id) in repo_id_list:
                        for i in all_repo:
                            if i["id"] == int(repo_id):
                                repo_name = i["name"]
                                count = ss.get_image(project_id,repo_name)
                                repo_name = test.tt(repo_name)
                        for i in all_projects:
                            if i["id"] == project_id:
                                project_name = i["name"]
                        all_tags = ss.get_image_tags(project_name,repo_name)
                        for i in all_tags:
                            print("项目为%s,仓库为%s的镜像类型有%s,数量为%s" % (project_name,repo_name,i["name"],i["count"]))
                        seri_name = []
                        for i in all_tags:
                            seri_name.append(i["name"])
                        while True:
                            seri = input("请输入要删除的镜像类型,如201805:").strip()
                            if seri in seri_name:
                                for i in all_tags:
                                    if i["name"] == seri:
                                        print("类型为%s,其数量为%s个" % (seri,i["count"]))
                                        print("project %s  repo %s seri %s" % (project_name,repo_name,seri))
                                        ret = ss.delete_image(all_tags,project_name,repo_name,seri)
                                        if ret:
                                            print("删除成功")
                                            quit()
                                        else:
                                            print("false")
                            else:
                                print("该仓库id不存在,请重新输入。")
                    else:
                        print("该id不存在,请重新输入。")
if __name__ == '__main__':
    run()

for-clean_harbor_image.py – to fetch repositories

#! /usr/bin/env python
import re

def tt(str):
    ret = re.findall("\w{1,20}/(\w{1,20}[-_]?\w{1,20}[-_]?\w{1,20}/?\w{1,5})",str)
    return ret[0]
# ss = tt("sc/cmccsq-v2-device")
# print(ss)
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonautomationKubernetesDevOpsHarborContainer Registry
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.