Last active
January 28, 2025 15:03
-
-
Save IsmailM/e929e91b06c892d3bfca65d537899245 to your computer and use it in GitHub Desktop.
using the PGP api to get fastq urls, md5s and sizes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The below is using JQ from https://stedolan.github.io/jq/ + | |
# the PGP API v1.2 - https://www.personalgenomes.org.uk/api/v1.2/ | |
curl -X GET "https://www.personalgenomes.org.uk/api/v1.2/all_wgs" -H "accept: application/json" | jq -r ' | |
.[] | [ | |
.hex_id, | |
(.data[]?.fastq_ftp), | |
(.data[]?.fastq_md5), | |
(.data[]?.fastq_bytes | split(";") | .[] | tonumber | . /1024/1024/1024) | |
] | flatten | @csv' > wgs_fastqs.csv | |
# Note, some of the records have three fastq files - so the CSV does not fully line up :( | |
# The 3 exome sequencing datasets | |
# Note this endpoint is not documented, but it exists (sorry) | |
curl -X GET "https://www.personalgenomes.org.uk/api/v1.2/all_wxs" -H "accept: application/json" | jq -r ' | |
.[] | [ | |
.hex_id, | |
(.data[]?.fastq_ftp), | |
(.data[]?.fastq_md5), | |
(.data[]?.fastq_bytes | split(";") | .[] | tonumber | . /1024/1024/1024) | |
] | flatten | @csv' > wxs_fastqs.csv | |
# Note in the above you can also split the fastq_ftp and fastq_md5 fields | |
(.data[]?.fastq_ftp | split(";")), | |
(.data[]?.fastq_md5 | split(";")), | |
#The FTP file can then be downloaded using curl | |
# e.g. | |
# Download the first file from uk35C650 | |
curl -X GET "https://www.personalgenomes.org.uk/api/v1.3/download_url/uk35C650" -H "accept: application/json" | jq '.[0].download_url' | |
# % Total % Received % Xferd Average Speed Time Time Time Current | |
# Dload Upload Total Spent Left Speed | |
# 100 2157 100 2157 0 0 10722 0 --:--:-- --:--:-- --:--:-- 10731 | |
# "ftp.sra.ebi.ac.uk/vol1/fastq/ERR172/004/ERR1726424/ERR1726424_1.fastq.gz" | |
curl -LO ftp.sra.ebi.ac.uk/vol1/fastq/ERR172/004/ERR1726424/ERR1726424_1.fastq.gz | |
# % Total % Received % Xferd Average Speed Time Time Time Current | |
# Dload Upload Total Spent Left Speed | |
# 21 32.3G 21 7026M 0 0 90.5M 0 0:06:05 0:01:17 0:04:48 110M |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment