Understanding Git's Use of SHA-1
Explore how Git uses SHA-1 hashes for content-addressable storage, commit integrity, and distributed version control.
Why Git Uses Hashes
Git is fundamentally a content-addressable filesystem. Every object (commit, tree, blob, tag) is identified by the SHA-1 hash of its content:
- -Content-addressable: Objects are stored and retrieved by their hash, not by filename
- -Integrity: Any corruption is immediately detectable-hash won't match
- -Deduplication: Identical content has the same hash, stored only once
- -Distributed: Hashes are globally unique, enabling decentralized collaboration
Linus Torvalds designed Git to detect corruption instantly. If a single bit flips in any object, the hash changes and Git knows something is wrong. This makes Git incredibly reliable for distributed development.
Git Object Types
Git stores four types of objects, each identified by a SHA-1 hash:
Blob (Binary Large Object)
Stores file content. The hash is computed from the file data plus a header.
SHA-1("blob " + filesize + "\0" + file_content) Tree
Stores directory structure. Lists filenames, permissions, and blob/tree hashes.
100644 blob a906cb2a4a904a152e80877d4088654daad0c859 README.md 040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0 src/
Commit
Stores metadata: tree hash, parent commit(s), author, committer, message, timestamp.
tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0 parent 0d1d7fc32e5a0204bd39d46d4c8a4e9a8b5c6e7f author Alice <alice@example.com> 1715097600 +0000 committer Alice <alice@example.com> 1715097600 +0000 Add user authentication feature
Tag
Stores annotated tag information: object hash, type, tagger, message.
object 0d1d7fc32e5a0204bd39d46d4c8a4e9a8b5c6e7f type commit tag v1.0.0 tagger Alice <alice@example.com> 1715097600 +0000 Release version 1.0.0
How Git Computes Hashes
Let's compute a Git blob hash manually to understand the process:
Hello, Git!
blob 12\0Hello, Git!
type + space + size + null byte + content 8ab686eafeb1f44702738c8b0f24f2567c36da6d
echo "Hello, Git!" | git hash-object --stdin
8ab686eafeb1f44702738c8b0f24f2567c36da6d (printf "blob 12\0"; echo "Hello, Git!") | openssl sha1
import hashlib
content = b"Hello, Git!"
header = f"blob {len(content)}\0".encode()
store = header + content
hash_val = hashlib.sha1(store).hexdigest()
print(hash_val) // 8ab686eafeb1f44702738c8b0f24f2567c36da6d Content-Addressable Storage
Git stores objects in .git/objects/ using the hash as the path:
.git/objects/8a/b686eafeb1f44702738c8b0f24f2567c36da6d
↑↑ ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
dir filename (remaining 38 characters) Commit Hash Integrity
Every commit hash depends on its entire history, creating a tamper-evident chain:
Commit C = SHA-1(tree + parent_B + author + message) ↓ Commit B = SHA-1(tree + parent_A + author + message) ↓ Commit A = SHA-1(tree + author + message)
Commands like git rebase or git commit --amend
create new commits with different hashes. The old commits still exist (until garbage collected), but the branch now points to new commits.
Practical Git Commands
git cat-file -t 8ab686ea
git cat-file -p 8ab686ea
git hash-object README.md
git fsck --full
git show --format=raw HEAD
SHA-1 Collision Concerns
In 2017, Google demonstrated the first SHA-1 collision (SHAttered attack). Git's response:
Git now includes collision detection. If you try to add an object that collides with an existing one, Git rejects it. This prevents the SHAttered attack from working against Git repositories.
Git is transitioning to SHA-256. Git 2.29+ supports SHA-256 repositories.
Command: git init --object-format=sha256
Creating a SHA-1 collision requires massive computational resources (Google spent $110,000 in compute time). For most projects, SHA-1 remains secure enough. Critical infrastructure should migrate to SHA-256.
Real-World Applications
Distributed Collaboration
Developers can work offline, create commits, and later merge without conflicts because hashes are globally unique.
Efficient Storage
Identical files across branches are stored only once. Git deduplicates automatically using content hashes.
Corruption Detection
If a disk error corrupts a file, Git detects it immediately because the hash won't match.
Run git fsck to verify integrity.
Reproducible Builds
Commit hashes uniquely identify code state. CI/CD systems use hashes to ensure they're building the exact code that was tested.
Try It Yourself
Experiment with Git hashes using our Hash Calculator:
- 1. Create a file:
echo "Hello, Git!" > test.txt - 2. Get Git's hash:
git hash-object test.txt - 3. Go to the Hash Calculator
- 4. Select SHA-1 algorithm
- 5. Enter:
blob 12\0Hello, Git!(with actual null byte) - 6. Compare the hash-it should match Git's output
Official Resources
Git Documentation
- → Pro Git Book (Official) (Git-SCM.com)
- → Git hash-object Documentation (Git-SCM.com)
- → Git Source Code Repository (GitHub)
SHA-1 & SHA-256
- → NIST FIPS 180-4: SHA-1 and SHA-256 (NIST)
- → Announcing the First SHA-1 Collision (Google Security Blog)