Fundamentals 19 min read

Understanding How Git Works: Objects, Branches, Merges, and the Myers Diff Algorithm

This article explains the internal mechanics of Git, covering its distributed nature, object model (commits, trees, blobs), branch creation, merging strategies, conflict resolution, and the Myers algorithm used for diff operations, illustrated with command‑line examples and diagrams.

政采云技术
政采云技术
政采云技术
Understanding How Git Works: Objects, Branches, Merges, and the Myers Diff Algorithm

Introduction

Git is a distributed version control system that stores multiple local repositories and optionally a central server. Although many developers use Git without knowing its internals, understanding how Git manages repositories can broaden one’s perspective on source control.

Git Features

Differences

SVN is a centralized VCS; the repository lives on a single server and developers work on local copies, pulling updates before committing.

Git is distributed; every host acts as a full repository without a required central server.

Advantages

Git tracks whether a file as a whole has changed, rather than line‑by‑line differences.

Git stores snapshots of files in a miniature file system; unchanged files are referenced, not duplicated.

Almost all Git operations are local, making them extremely fast compared to centralized systems.

How Git Actually Works

To explore Git’s operation we look at the hidden .git directory. Inside, the most relevant sub‑directory is objects , which contains three primary object types:

Commit : links tree objects to form history and stores metadata such as parent commits.

Tree : represents a directory, recording entries and the blob objects they point to.

Blob : holds the raw content of a file (a snapshot).

Commit Objects

Listing the objects directory shows many two‑character folders. Git hashes the first two characters of an object’s SHA‑1 to create the folder name and uses the remaining 38 characters as the object identifier.

objects
├── 0c
│   ├── 8867d7e175f46d4bcd66698ac13f4ca00cf592
│   └── c8002da0403724dfaa6792885eaa97faa71bcf
├── 1b
│   └── 716fafdd3aeb3636222a0026d1d4971078db05
…

Running git log -4 --oneline shows the latest four commits with short hashes. Converting a short hash to its full form with git rev-parse reveals the 40‑character identifier used in the object tree.

git rev-parse 9a5bf36
# => 9a5bf367f10390c64a3f7b3e738b78bd78a3d781

Inspecting the full object with git cat-file -p 9a5bf36 displays a commit object containing a tree hash, parent hash, author, committer, and message.

Tree Objects

Running git cat-file -p <tree‑hash> shows entries such as:

100644 blob 0cc8002da0403724dfaa6792885eaa97faa71bcf    README.md
040000 tree 3c121291ffc25ce6792f9350883b77cea2633048    src

This demonstrates that a tree can contain both blob files and nested tree directories, mirroring the project’s directory structure.

Blob Objects

Displaying a blob with git cat-file -p <blob‑hash> reveals the raw file content, e.g., a LICENSE file.

MIT License

Copyright (c) 2019

Permission is hereby granted, free of charge, to any person obtaining a copy …

Branch Creation and Merging

A branch in Git is simply a mutable pointer to a commit object. The default branch is master . Creating a new branch adds a new pointer to the current commit, which can be switched instantly because Git only moves the pointer.

The special HEAD pointer indicates the currently checked‑out branch. Switching branches updates HEAD to point to a different commit.

Code Merge and Conflicts

When merging, Git performs a three‑way merge using the two branch tips and their common ancestor. The result is a new commit with two parents.

If the same lines are edited in both branches, a conflict occurs. Git marks the conflicting sections with <<<<<<< HEAD, =======, and >>>>>>> markers.

// Merge conflict example
Auto-merging index.html
CONFLICT (content): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.

Code Merge Algorithm (Myers)

Git’s diff engine is based on the Myers algorithm, which finds the shortest edit script between two sequences. The algorithm models the problem as a graph where moving right represents a deletion, moving down an insertion, and moving diagonally a match.

The optimal path yields a diff such as:

- A
- B
  C
+ B
  A
  B
- B
  A
+ C

Git also shows context lines with @@ markers, indicating the range of lines affected in each file.

@@ -1,15 +1,5 @@
-  console.log('watch')
-  const add = (a,c) => { … }
+  const add = (a,b) => { … }
   add(4,8)
-  console.log(reduce(-2,-9))
-  console.log(new Date().getDate(),'第二次提交')

Conclusion

The article provides a brief overview of Git’s internal mechanisms, including objects, branching, merging, conflict resolution, and the Myers diff algorithm. Readers are encouraged to explore further to deepen their understanding of this powerful tool.

References

Pro Git

Advanced Git

The Myers diff algorithm: part 1 (https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/)

gitVersion ControlDiff Algorithmobjectsbranchingmerging
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.