Skip to content

Instantly share code, notes, and snippets.

@lindacmsheard
Last active October 28, 2024 12:16
Show Gist options
  • Save lindacmsheard/553c46a8bd705405a965e5bf76e00cff to your computer and use it in GitHub Desktop.
Save lindacmsheard/553c46a8bd705405a965e5bf76e00cff to your computer and use it in GitHub Desktop.
Fetch a git directory with the fsspec library

Using fsspec to fetch from public git repositories

Thanks to https://sebastianwallkoetter.wordpress.com/2022/01/30/copy-github-folders-using-python/ for a solutuion to get hold of code from github when github integration is not available.

Adapting to add full recursion below:

Fetch without recursion

import fsspec
from pathlib import Path

destination = Path.home() / "test_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix())

Fetch recursively and flatten into destination

import fsspec
from pathlib import Path

destination = Path.home() / "test_recursive_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix(), recursive=True)

Fetch recursively and write recursively

import fsspec
from pathlib import Path

# set up destination as a subfolder in the current working directory
destination = Path.cwd() / "builtin" / "subdir"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="mygithub", repo="my_github_repo")
start = "folder/i/need/from/repo"
for p,d,f in fs.walk(start):
    relpath = p.replace(start,".")
    dest = destination / relpath
    fs.get(fs.ls(p), dest.as_posix())

with tqdm:

import fsspec
from pathlib import Path
from tqdm import tqdm

destination = Path.cwd() / "builtin" / "azureml"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="azuregig", repo="work_with_weather_data")
start = "process_data/azureml_cli_v2"
paths = len(list(fs.walk(start)))
for p,s,f in tqdm(fs.walk(start), total=paths):
    relpath = p.replace(start,".")
    dest = destination / relpath
    fs.get(fs.ls(p), dest.as_posix())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment