How We Shrunk a 9 GB Git Repository to 350 MB: A Complete Step‑by‑Step Guide
When Tencent Meeting’s user base exploded to 300 million, its Git repository ballooned to over 17 GB, crippling clone speed, so the team devised a comprehensive slimming process that removed old history, migrated large files to LFS, created a new repo, handled special branches, adapted platform settings, and verified the result, ultimately reducing the repo size from 9 GB to 350 MB.
Facing a rapid increase in online‑meeting demand, the Tencent Meeting client team discovered that their Git repository had accumulated more than 17.7 GB of data, with many files never converted to Git LFS. The large repository caused severe disk pressure and slow clone operations, prompting a full‑scale slimming effort.
1. Slimming Results
Repository size reduced from 9 GB to 350 MB.
Local clone size dropped from 17.7 GB to 12.2 GB.
Clone speed increased by 45 % on a MacBook M1 Pro and by 56 % on a Windows devcloud.
Full build pipeline checkout time fell from 16 minutes to under 5 minutes.
2. Preparation Before Slimming
Use a wired network and a machine with at least several hundred gigabytes of free space (the team used a MacBook M1 Pro). Ensure remote access is set up if working from home.
2.1 Repository Locking
Lock the repository to prevent pushes during the slimming window (e.g., during a holiday). All developers must push pending changes beforehand.
2.2 Disable Third‑Party Sync
If the repository is synchronized with external tools (e.g., UGit), disable those sync tasks before proceeding.
3. Overall Slimming Plan
Create a new repository that retains the original project name, path, and ID, while keeping the old repository as a backup. This allows other platforms to continue using the same Git URL.
3.1 What to Remove
Delete history older than six months.
Convert large files that are not yet tracked by LFS.
4. Detailed Commands
4.1 Clone Original Repository
git clone https://example.com/test.gitCopy the cloned directory to a sibling folder copyForCompare for later verification.
4.2 Increase Open‑File Limit
ulimit -n 9999999 # solve "too many open files" errors4.3 Fetch All Branches and LFS Objects
git fetch --all
git lfs fetch --all4.4 Truncate History with git filter-branch
Identify the commit to truncate (e.g., ff75cc5cdbf0423a24b4f5438e52683210813ba0) and its parent 7ffe6782272879056ca9618f1d85a5f9716f8e90. Then run:
git filter-branch --force --parent-filter '
read parent
if [ "$parent" = "-p 7ffe6782272879056ca9618f1d85a5f9716f8e90" ]
then
echo
else
echo "$parent"
fi' --tag-name-filter cat -- --all4.5 Verify Truncation
git log --all --pretty=oneline | grep ff75cc5cdbf0Ensure the truncated commit’s parent is empty.
4.6 Clean Up References
rm -Rf .git/refs/original
rm -Rf .git/logs4.7 Migrate Large Files to LFS
git lfs migrate import --everything --include="*.jar,tool/ATG/index.js,xxx"After migration, verify the .gitattributes file contains the new LFS entries.
4.8 Push to New Repository
git remote remove origin
git remote add origin https://example.com/test_backup.git
git push origin --no-verify --all
git push origin --no-verify --tags5. New Repository Validation
Clone the new repo and run git lfs pull to fetch all LFS objects. Use a diff tool (e.g., Beyond Compare) to compare the new clone with the original copyForCompare directory, ensuring no file changes.
6. Handling Special Branches
For branches that existed before the truncation point, adjust the parent‑filter script to keep their specific parent IDs, and modify the Python callback used by git‑filter‑repo to delete history only up to each branch’s cut‑off date.
6.1 Example Python Callback
#!/usr/bin/env python3
import os, git_filter_repo as fr
k_work_dir = "/path/to/repo"
k_master_cut_date = b"1654038307"
# ... other branch cut dates ...
def commitCallBackFun(commit, metadata):
ts, _ = commit.committer_date.split()
if ts >= k_master_cut_date and commit.author_name == b"author1":
commit.message = b"Repository slimming: history trimmed"
if commit.branch.decode("utf-8").endswith("refs/heads/master") and ts < k_master_cut_date:
commit.file_changes = []
# handle other branches similarly
if __name__ == '__main__':
os.chdir(k_work_dir)
args = fr.FilteringOptions.parse_args(['--force', '--debug'])
fr.RepoFilter(args, commit_callback=commitCallBackFun).run()7. Platform Adaptation
Update the code‑management platform (e.g., 工蜂) to swap project IDs, rename the backup repo to test and the new repo to test_backup, and synchronize permissions, branch protection rules, and CI pipelines.
7.1 Third‑Party Tools
Notify owners of third‑party tools (e.g., UGit) to remove old workspaces and re‑clone the new repository.
7.2 Build Machines
Because commit IDs change, CI agents must delete their cached clones and perform a fresh clone of the new repository.
8. Final Verification Checklist
Can the repo be cloned locally?
Do the latest main‑branch files match the backup?
Randomly compare other branches for file count and content.
Build the project and run the main workflow.
Verify CI pipelines trigger correctly and MR operations work.
Ensure write permissions are restored on the slimmed repo while the backup remains read‑only.
9. Common Pitfalls & Solutions
9.1 LFS "User is null or anonymous" Error
git config lfs.https://example.com/test_backup.git/info/lfs.access basicRun git lfs env to confirm the access mode is set to basic.
9.2 Push Errors After URL Change
git remote set-url origin https://example.com/test_backup.git
git remote -vIf ~/.gitconfig contains [email protected]:.insteadof=... entries, comment them out.
9.3 "Too many open files" During git lfs fetch
ulimit -n 99999999.4 Git/LFS Version Issues
Upgrade Git and Git‑LFS to the latest stable versions to avoid checkout hangs and other errors.
10. Rollback Plan
If any step fails, the original repository remains untouched as a backup. The code‑management platform can revert the project ID swap and restore the old repo.
11. Closing Remarks
The repository slimming process was meticulous and time‑consuming but resulted in a dramatic size reduction—from 9 GB to 350 MB—significantly improving developer productivity and CI performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
