Last active
April 18, 2025 13:30
-
-
Save vadimkantorov/1aecedbd1758010258020f34d75a95dd to your computer and use it in GitHub Desktop.
A simple git lfs dedup impl done with hard links to avoid duplication of data object files (suitable for readonly cloned repos like models/datasets from HuggingFace, leaves the repo in an invalid state)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Usage: bash git_lfs_clone_dedup.sh https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 ~/DeepSeek-V3-0324 | |
# Usage: bash git_lfs_clone_dedup.sh [email protected]:deepseek-ai/DeepSeek-V3-0324 ~/DeepSeek-V3-0324 | |
# https://github.com/git-lfs/git-lfs/discussions/6029 | |
GIT_LFS_SKIP_SMUDGE=1 git clone $1 $2 | |
cd $2 | |
git lfs fetch | |
git lfs ls-files -l | while read SHA DASH FILEPATH; do rm "$FILEPATH" && ln ".git/lfs/objects/${SHA:0:2}/${SHA:2:2}/$SHA" "$FILEPATH"; done | |
#git lfs ls-files -l | while read SHA DASH FILEPATH; do mv ".git/lfs/objects/${SHA:0:2}/${SHA:2:2}/$SHA" "$FILEPATH"; done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment