To efficiently clone only a specific subset of a large Git repository, you can use sparse checkout. This approach is useful when you don't need the entire repository content, saving both time and storage. Below is a step-by-step guide to achieving this.
I have been trying to make sparse checkout work for a while and the following solution has finally worked. https://askubuntu.com/a/1464994/1928663
- Clone the Repository with Sparse Checkout
Use the --depth 1
flag to clone only the latest commit, and --filter=blob:none
to avoid downloading file contents initially:
git clone --depth 1 --filter=blob:none https://github.com/danuw/azure-docs.git --sparse
Resulted in (note the size of 307 kiB downloaded when the repo is gigabytes heavy
Cloning into 'azure-docs'...
remote: Enumerating objects: 8438, done.
remote: Counting objects: 100% (8438/8438), done.
remote: Compressing objects: 100% (7673/7673), done.
remote: Total 8438 (delta 51), reused 4682 (delta 25), pack-reused 0 (from 0)
Receiving objects: 100% (8438/8438), 2.56 MiB | 16.60 MiB/s, done.
Resolving deltas: 100% (51/51), done.
remote: Enumerating objects: 85, done.
remote: Counting objects: 100% (85/85), done.
remote: Compressing objects: 100% (81/81), done.
remote: Total 85 (delta 52), reused 16 (delta 4), pack-reused 0 (from 0)
Receiving objects: 100% (85/85), 307.16 KiB | 5.04 MiB/s, done.
Resolving deltas: 100% (52/52), done.
- Navigate to the Cloned Repository
Change your directory to the cloned repository:
cd azure-docs
- Initialize Sparse Checkout Mode
Enable sparse checkout in cone mode, which simplifies the process of selecting specific directories:
git sparse-checkout init --cone
- Set the Directory to be Checked Out
Specify the directory you want to check out from the repository. In this example, we are checking out the articles/iot-operations directory:
git sparse-checkout set articles/iot-operations
After these steps, only the files within the articles/iot-operations directory will be checked out into your local repository, minimizing the data you download and store.
remote: Enumerating objects: 189, done.
remote: Counting objects: 100% (189/189), done.
remote: Compressing objects: 100% (182/182), done.
remote: Total 189 (delta 10), reused 106 (delta 7), pack-reused 0 (from 0)
Receiving objects: 100% (189/189), 10.96 MiB | 21.30 MiB/s, done.
Resolving deltas: 100% (10/10), done.
Updating files: 100% (190/190), done.
Hope that helps you too...