MVCS (Minimal Version Control System) is an educational implementation of a content-addressable version control system inspired by Git. It demonstrates the core concepts of modern VCS without the complexity of a production system.
- Content-Addressable Storage: All objects are stored by their SHA-256 hash
- Immutability: Once created, objects never change
- Three Object Types: Blobs (files), Trees (directories), and Commits (snapshots)
- Directed Acyclic Graph: Commits form a DAG structure through parent relationships
.mvcs/
├── objects/ # Content-addressable object store
│ └── ab/ # First 2 chars of hash as subdirectory
│ └── cdef... # Remaining hash chars as filename
├── refs/
│ └── heads/
│ └── master # Branch pointer (contains commit hash)
├── HEAD # Symbolic ref to current branch
└── index # Staging area
- objects/: Split into subdirectories to avoid filesystem limits on files per directory
- refs/heads/: Separates branches from tags and other refs (though we only implement branches)
- HEAD: Points to current branch, allows switching (in future implementation)
- index: Simple text file for staging area
All objects follow the same storage format:
<type> <size>\0<content>
Example:
blob 13\0Hello, world!
The SHA-256 hash is computed over the entire object (header + content).
Stores file contents exactly as-is.
blob <size>\0<file content>
Represents a directory, containing entries:
tree <size>\0<mode> <name>\0<hash><mode> <name>\0<hash>...
Each entry:
mode: Octal file mode (e.g., 100644 for regular file)name: File/directory name\0: Null separatorhash: 32 bytes of SHA-256 hash (binary)
Represents a snapshot of the repository:
commit <size>\0tree <tree-hash>
parent <parent-hash>
author <author>
time <timestamp>
<commit message>
Fields:
tree: Root tree hashparent: Parent commit hash (optional, omitted for first commit)author: Author name/emailtime: Unix timestamp- Message after blank line
Simple text format:
<hash> <mode> <path>
<hash> <mode> <path>
...
Each line represents a staged file with its hash, mode, and path.
References (branches) are simple text files containing a commit hash:
<commit-hash>\n
HEAD can be:
- Symbolic:
ref: refs/heads/master\n - Detached:
<commit-hash>\n
Currently only symbolic refs are used.
- Read file contents
- Create blob object with header
- Compute SHA-256 hash
- Write object to
.mvcs/objects/<ab>/<cdef...> - Update index entry with hash, mode, and path
- Write updated index
- Read index (staging area)
- Create tree entries from index
- Write tree object
- Get current HEAD commit (if exists)
- Create commit object with:
- Tree hash
- Parent hash (if not first commit)
- Author from environment
- Current timestamp
- User's message
- Write commit object
- Update current branch reference
- Read HEAD to get current commit
- Read commit object
- Display commit info
- If commit has parent:
- Load parent commit
- Repeat from step 3
SHA256(header + content) -> 32-byte hashStored as hex string in refs/index:
64 hex characters (0-9, a-f)
Stored in binary in tree entries:
32 bytes of raw hash
- All object data is malloc'd
- Caller responsible for freeing returned data
- Objects are read entirely into memory (no streaming)
- Maximum file size: 10MB (configurable via MAX_CONTENT_SIZE)
- Functions return 0 on success, -1 on error
- Errors printed to stderr
- No exceptions (C doesn't have them)
- Simple errno-style error propagation
- No branching/merging
- No diff computation
- No compression (could add zlib)
- No network operations
- No garbage collection
- No file permissions beyond basic mode
- No symbolic links
- No large file support
- Single-threaded only
- Maximum 1000 tree entries per tree
- Maximum 1000 index entries
- Maximum 10MB file size
- Maximum 4096 character paths
- No Unicode normalization
- No line-ending conversion
- Init: O(1) - just create directories
- Add: O(1) - hash file + update index
- Commit: O(n) where n = files in index
- Log: O(d) where d = depth of history
- Hash Lookup: O(1) - direct file access by hash
Using SHA-256 (not SHA-1) provides:
- 2^256 possible hashes
- Collision resistance for our use case
- Industry standard cryptographic hash
- Objects stored in subdirectories (first 2 chars)
- No user input in file paths (hashes only)
- No symlink following
- Simple permission model (0755 for dirs)
- Hash strings validated (64 hex chars)
- File size limits enforced
- Path length limits enforced
- Array bounds checked
- Content-addressable storage
- Same object types (blob/tree/commit)
- DAG structure for commits
- Staging area concept
- Similar directory layout
- Git uses SHA-1 (now SHA-256 optional)
- Git compresses objects with zlib
- Git uses pack files for efficiency
- Git has complex branch/merge
- Git has network protocols
- Git has index caching
- Git has extensive optimization
Possible additions while maintaining simplicity:
- Compression: Add zlib compression to objects
- Branches: Implement branch creation/switching
- Tags: Add lightweight/annotated tags
- Diff: Basic file diff computation
- Status: Show working directory status
- Reset: Reset HEAD to specific commit
- Checkout: Restore files from commits
- Ignore: Support .mvcsignore file
mvcs.h: All type definitions and function declarationsrepository.c: Init and repository checksobject.c: Low-level object read/writetree.c: Tree object operationscommit.c: Commit object operationsindex.c: Staging area managementrefs.c: Reference operationscommands.c: High-level command implementationsmain.c: CLI and command dispatch
Manual testing focuses on:
- Creating repositories
- Adding various file types
- Multiple commits
- History traversal
- Object inspection
- Edge cases (empty files, large files, special characters)
This implementation is designed to teach:
- How content-addressable storage works
- Why hashing provides data integrity
- How commits form a graph structure
- What a staging area does
- How references point to commits
- Filesystem-based storage design
Students can:
- Read the entire codebase (~1000 LOC)
- Understand all data structures
- Trace command execution
- Modify and extend functionality
- Build from scratch
- Pro Git Book: https://git-scm.com/book
- Git Internals: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
- SHA-256: https://en.wikipedia.org/wiki/SHA-2