Operations 23 min read

How We Shrunk a 9 GB Git Repository to 350 MB: A Complete Step‑by‑Step Guide

When Tencent Meeting’s user base exploded to 300 million, its Git repository ballooned to over 17 GB, crippling clone speed, so the team devised a comprehensive slimming process that removed old history, migrated large files to LFS, created a new repo, handled special branches, adapted platform settings, and verified the result, ultimately reducing the repo size from 9 GB to 350 MB.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
How We Shrunk a 9 GB Git Repository to 350 MB: A Complete Step‑by‑Step Guide

Facing a rapid increase in online‑meeting demand, the Tencent Meeting client team discovered that their Git repository had accumulated more than 17.7 GB of data, with many files never converted to Git LFS. The large repository caused severe disk pressure and slow clone operations, prompting a full‑scale slimming effort.

1. Slimming Results

Repository size reduced from 9 GB to 350 MB.

Local clone size dropped from 17.7 GB to 12.2 GB.

Clone speed increased by 45 % on a MacBook M1 Pro and by 56 % on a Windows devcloud.

Full build pipeline checkout time fell from 16 minutes to under 5 minutes.

2. Preparation Before Slimming

Use a wired network and a machine with at least several hundred gigabytes of free space (the team used a MacBook M1 Pro). Ensure remote access is set up if working from home.

2.1 Repository Locking

Lock the repository to prevent pushes during the slimming window (e.g., during a holiday). All developers must push pending changes beforehand.

2.2 Disable Third‑Party Sync

If the repository is synchronized with external tools (e.g., UGit), disable those sync tasks before proceeding.

3. Overall Slimming Plan

Create a new repository that retains the original project name, path, and ID, while keeping the old repository as a backup. This allows other platforms to continue using the same Git URL.

3.1 What to Remove

Delete history older than six months.

Convert large files that are not yet tracked by LFS.

4. Detailed Commands

4.1 Clone Original Repository

git clone https://example.com/test.git

Copy the cloned directory to a sibling folder copyForCompare for later verification.

4.2 Increase Open‑File Limit

ulimit -n 9999999  # solve "too many open files" errors

4.3 Fetch All Branches and LFS Objects

git fetch --all
git lfs fetch --all

4.4 Truncate History with git filter-branch

Identify the commit to truncate (e.g., ff75cc5cdbf0423a24b4f5438e52683210813ba0) and its parent 7ffe6782272879056ca9618f1d85a5f9716f8e90. Then run:

git filter-branch --force --parent-filter '
    read parent
    if [ "$parent" = "-p 7ffe6782272879056ca9618f1d85a5f9716f8e90" ]
    then
        echo
    else
        echo "$parent"
    fi' --tag-name-filter cat -- --all

4.5 Verify Truncation

git log --all --pretty=oneline | grep ff75cc5cdbf0

Ensure the truncated commit’s parent is empty.

4.6 Clean Up References

rm -Rf .git/refs/original
rm -Rf .git/logs

4.7 Migrate Large Files to LFS

git lfs migrate import --everything --include="*.jar,tool/ATG/index.js,xxx"

After migration, verify the .gitattributes file contains the new LFS entries.

4.8 Push to New Repository

git remote remove origin
git remote add origin https://example.com/test_backup.git
git push origin --no-verify --all
git push origin --no-verify --tags

5. New Repository Validation

Clone the new repo and run git lfs pull to fetch all LFS objects. Use a diff tool (e.g., Beyond Compare) to compare the new clone with the original copyForCompare directory, ensuring no file changes.

6. Handling Special Branches

For branches that existed before the truncation point, adjust the parent‑filter script to keep their specific parent IDs, and modify the Python callback used by git‑filter‑repo to delete history only up to each branch’s cut‑off date.

6.1 Example Python Callback

#!/usr/bin/env python3
import os, git_filter_repo as fr
k_work_dir = "/path/to/repo"
k_master_cut_date = b"1654038307"
# ... other branch cut dates ...

def commitCallBackFun(commit, metadata):
    ts, _ = commit.committer_date.split()
    if ts >= k_master_cut_date and commit.author_name == b"author1":
        commit.message = b"Repository slimming: history trimmed"
    if commit.branch.decode("utf-8").endswith("refs/heads/master") and ts < k_master_cut_date:
        commit.file_changes = []
    # handle other branches similarly

if __name__ == '__main__':
    os.chdir(k_work_dir)
    args = fr.FilteringOptions.parse_args(['--force', '--debug'])
    fr.RepoFilter(args, commit_callback=commitCallBackFun).run()

7. Platform Adaptation

Update the code‑management platform (e.g., 工蜂) to swap project IDs, rename the backup repo to test and the new repo to test_backup, and synchronize permissions, branch protection rules, and CI pipelines.

7.1 Third‑Party Tools

Notify owners of third‑party tools (e.g., UGit) to remove old workspaces and re‑clone the new repository.

7.2 Build Machines

Because commit IDs change, CI agents must delete their cached clones and perform a fresh clone of the new repository.

8. Final Verification Checklist

Can the repo be cloned locally?

Do the latest main‑branch files match the backup?

Randomly compare other branches for file count and content.

Build the project and run the main workflow.

Verify CI pipelines trigger correctly and MR operations work.

Ensure write permissions are restored on the slimmed repo while the backup remains read‑only.

9. Common Pitfalls & Solutions

9.1 LFS "User is null or anonymous" Error

git config lfs.https://example.com/test_backup.git/info/lfs.access basic

Run git lfs env to confirm the access mode is set to basic.

9.2 Push Errors After URL Change

git remote set-url origin https://example.com/test_backup.git
git remote -v

If ~/.gitconfig contains [email protected]:.insteadof=... entries, comment them out.

9.3 "Too many open files" During git lfs fetch

ulimit -n 9999999

9.4 Git/LFS Version Issues

Upgrade Git and Git‑LFS to the latest stable versions to avoid checkout hangs and other errors.

10. Rollback Plan

If any step fails, the original repository remains untouched as a backup. The code‑management platform can revert the project ID swap and restore the old repo.

11. Closing Remarks

The repository slimming process was meticulous and time‑consuming but resulted in a dramatic size reduction—from 9 GB to 350 MB—significantly improving developer productivity and CI performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

devopsGitgit-lfsgit-filter-repoRepository Slimming
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.