Created
January 4, 2022 20:53
-
-
Save rknx/2fb51d9e4f81fca33ebd96ed352a7067 to your computer and use it in GitHub Desktop.
Convert GFF3 file to GTF2.5
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
########## Anuj Sharma ########## | |
########## [email protected] ########## | |
########## github/rknx ########## | |
########## 2022/01/04 ########## | |
[[ -z "$1" ]] && echo "Usage: gff2gtf.sh in.gff > out.gtf" >&2 && exit | |
[[ ! -s "$1" ]] && echo "Provide valid input file" >&2 && exit | |
# Remove everything (the sequences) starting with ##FASTA | |
# Remove contig names | |
# Simplify second column | |
# Change ID to gene_id, tRNA to transcript, and format 9th column | |
# only keep valid entries for 3rd column | |
sed -n '/##FASTA/q;p' $1 | \ | |
grep -v "^##" | \ | |
awk -vFS="\t" -vOFS="\t" '{split($2, a, ":"); $2=a[1]; print $0}' | \ | |
sed 's/ID=/gene_id=/g; s/=/ "/g; s/;/"; /g; s/$/"/g; s/tRNA/transcript/g' | \ | |
grep -E "gene|transcript|exon|CDS|UTR|start_codon|stop_codon|Selenocysteine" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment