Skip to content

Instantly share code, notes, and snippets.

@masak
Last active November 11, 2025 13:51
Show Gist options
  • Select an option

  • Save masak/2415865 to your computer and use it in GitHub Desktop.

Select an option

Save masak/2415865 to your computer and use it in GitHub Desktop.
How is git commit sha1 formed

(The below text is licensed with CC0, which means that if you want to use or translate it, that is OK by me.)

Ok, I geeked out, and this is probably more information than you need. But it completely answers the question. Sorry. ☺

Locally, I'm at this commit:

$ git show
commit d6cd1e2bd19e03a81132a23b2025920577f84e37
Author: jnthn <[email protected]>
Date:   Sun Apr 15 16:35:03 2012 +0200

    When I added FIRST/NEXT/LAST, it was idiomatic but not quite so fast. This makes it faster. Another little bit of masak++'s program.

So that's the sha1 I want to reproduce. d6cd1e2bd19e03a81132a23b2025920577f84e37

When I started my investigations, I thought it was something like these things that went into a commit:

$ git --no-replace-objects cat-file commit HEAD
tree 9bedf67800b2923982bdf60c89c57ce6fd2d9a1c
parent de1eaf515ebea46dedea7b3ae0e5ebe3e1818971
author jnthn <[email protected]> 1334500503 +0200
committer jnthn <[email protected]> 1334500545 +0200

When I added FIRST/NEXT/LAST, it was idiomatic but not quite so fast. This makes it faster. Another little bit of masak++'s program.

That is

  • The source tree of the commit (which unravels to all the subtrees and blobs)
  • The parent commit sha1
  • The author info
  • The committer info (right, those are different!)
  • The commit message

But it turns out there is also a NUL-terminated header that gets appended to this, containing the word "commit", and the length in bytes of all of the above information:

$ printf "commit %s\0" $(git --no-replace-objects cat-file commit HEAD | wc -c)
commit 327

(No, you can't see the NUL byte.)

Put this header and the rest of the information together:

$ (printf "commit %s\0" $(git --no-replace-objects cat-file commit HEAD | wc -c); git cat-file commit HEAD)
commit 327tree 9bedf67800b2923982bdf60c89c57ce6fd2d9a1c
parent de1eaf515ebea46dedea7b3ae0e5ebe3e1818971
author jnthn <[email protected]> 1334500503 +0200
committer jnthn <[email protected]> 1334500545 +0200

When I added FIRST/NEXT/LAST, it was idiomatic but not quite so fast. This makes it faster. Another little bit of masak++'s program.

...and what you get hashes to the right sha1!

$ (printf "commit %s\0" $(git --no-replace-objects cat-file commit HEAD | wc -c); git cat-file commit HEAD) | sha1sum
d6cd1e2bd19e03a81132a23b2025920577f84e37  -
@lemanschik
Copy link

@masak i went with my own meta versioning system i simple store the additional git compatible information as additional meta i go for content unification and then SHA-512 also doing the same for Large assets as i can version blocks. I do not store initial files i store content Blobs of a fixed size 20MB Blocks as this looks like a magic number at present for performance. And as this does not depend on one block per file it reduces space needs a lot.

I do it a bit like dockers overlay file implementations which do also content hashing on block level but less deterministic and predictable. But when you use Docker + BTRFS you come near to my feature set.

@NguyenTuKien
Copy link

This is very useful! Would you mind if I translated it to share with my friends?

@masak
Copy link
Author

masak commented Oct 20, 2025

@NguyenTuKien Go right ahead! Also, for posterity, I slapped a CC0 license on the gist, which means that someone else with the same question will automatically get a "yes" answer via the license. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment