Skip to content

Automattic/chunkhash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

chunkhash

Hash a file in parts — across separate process runs or machines — and get a digest identical to hashing the whole file at once. Supports md5, sha1, sha256, and sha512.

Useful when a file is too large to hold on disk at once, or arrives as a chunked upload, but you still need the checksum of the complete file.

How it works

Go's standard-library hashers (md5.New, sha1.New, sha256.New, sha512.New) implement encoding.BinaryMarshaler / BinaryUnmarshaler. The marshaled blob captures the hasher's full intermediate state — running registers, total bytes seen, and any buffered tail bytes that don't yet fill a block. Restoring that state into a fresh hasher resumes exactly where the previous chunk left off, even when a chunk boundary falls mid-block.

The running digest is not enough to resume — you must carry the marshaled state. Chunks must be supplied in order (hashes are not commutative).

Library

h, _ := chunkhash.New(chunkhash.SHA256)
h.Write(chunk1)
state, _ := h.State()        // persist this blob between chunks

// later, possibly another process:
h2, _ := chunkhash.Resume(state)
h2.Write(chunk2)
fmt.Println(h2.SumHex())     // == sha256(chunk1 || chunk2)

Sum/SumHex are non-destructive, so you can record an interim checkpoint digest and keep writing.

CLI

No state files on disk. Reads one chunk from stdin and prints two tab-separated fields — the running digest and an opaque state token:

chunkhash [-algo <md5|sha1|sha256|sha512>]   # start fresh (defaults to md5)
chunkhash -state <token>                     # resume; -algo ignored

stdout:  <hex-digest>\t<state-token>

The state token is a single shell-safe word; capture it and pass it back via -state for the next chunk.

Example — three uneven chunks, final digest matches sha256sum:

S=$(head -c 1234 big.bin                 | chunkhash -algo sha256 | cut -f2)
S=$(tail -c +1235 big.bin | head -c 6000 | chunkhash -state "$S"  | cut -f2)
        tail -c +7235 big.bin            | chunkhash -state "$S"  | cut -f1
sha256sum big.bin   # same digest

Build & test

go build ./cmd/chunkhash
go test ./...

About

Calculate hashes for data in chunks. Using a state string to ensure continuity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages