Skip to content

Instantly share code, notes, and snippets.

@devhero
Last active September 9, 2024 14:06
Show Gist options
  • Select an option

  • Save devhero/8ae2229d9ea1a59003ced4587c9cb236 to your computer and use it in GitHub Desktop.

Select an option

Save devhero/8ae2229d9ea1a59003ced4587c9cb236 to your computer and use it in GitHub Desktop.
python download and extract remote file tar.gzip
# Instruct the interpreter to create a network request and create an object representing the request state. This can be done using the urllib module.
import urllib.request
import tarfile
thetarfile = "http://file.tar.gz"
ftpstream = urllib.request.urlopen(thetarfile)
thetarfile = tarfile.open(fileobj=ftpstream, mode="r|gz")
thetarfile.extractall()
# The ftpstream object is a file-like that represents the connection to the ftp server. Then the tarfile module can access this stream. Since we do not pass the filename, we have to specify the compression in the mode parameter.
@ozcanyarimdunya

ozcanyarimdunya commented Jun 10, 2021

Copy link
Copy Markdown

In case you use python's requests module:

import requests
import tarfile

url = ".tar.gz url here"
response = requests.get(url, stream=True)
file = tarfile.open(fileobj=response.raw, mode="r|gz")
file.extractall(path=".")

@bhuiyanmobasshir94

bhuiyanmobasshir94 commented Nov 19, 2021

Copy link
Copy Markdown
import requests

with open(local_filename, 'wb') as f:
    r = requests.get(url, stream=True)
    for chunk in r.raw.stream(1024, decode_content=False):
        if chunk:
            f.write(chunk)
            f.flush()

@Jukoo

Jukoo commented Feb 8, 2022

Copy link
Copy Markdown
import  requests 
import  tarfile

with requests.get(link , stream=True) as  rx  , tarfile.open(fileobj=rx.raw  , mode="r:gz") as  tarobj  : 
        tarobj.extractall() 

@vsobolev

Copy link
Copy Markdown

These scripts don't work for me. Do I need to login before downloading? What doing with autorization?

@devhero

devhero commented Oct 19, 2022

Copy link
Copy Markdown
Author

These scripts don't work for me. Do I need to login before downloading? What doing with autorization?

Look how is easy using requests.
For lazy ones I transcribe there:

import requests
r = requests.get('http://protected_file_url', auth=('user', 'pass'))

So latest proposal could become:

import  requests 
import  tarfile

with requests.get(link , stream=True, auth=('user', 'pass')) as  rx  , tarfile.open(fileobj=rx.raw  , mode="r:gz") as  tarobj  : 
        tarobj.extractall() 

@vrahikar

Copy link
Copy Markdown

What if I have tar file already on remote server and just need to untar remotely ? I tried couple of ways but it takes more than 2 hrs to untar contents of 2GB.
e.g. snippet:

    if source_filename.endswith('tar.gz'):
        cmd = f"unpigz -dc --fast -p 16 {source_filename} | (cd {destination_path} && tar xf -)"
        print(f"CMD: {cmd}")
        print("tar file extraction started...")
        output = conn.execute_command(cmd, shell=True)
        print("tarfile extraction success")
        return True

tried attaching more cores also but no use.
here conn is remote connection handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment