bcftools view -s NA20818,NA20819 filename.vcf.gz
bcftools view -f PASS filename.vcf.gz
bcftools view -H
bcftools view -r chr20:1-200000 -s NA20818,NA20819 filename.vcf.gz
bcftools view -t ^chr20:1-30000000 filename.vcf.gz > out.vcf
bcftools view -R 0002.vcf in.vcf.gz
bcftools view -R 0002.tsv in.vcf.gz
- The format of 0002.tsv: 20 79000 80000 20 90000 100000 Obs.: the file should be tab separated and have no whitespace in the end
bcftools view -v snps lc_bams.bcftools.20170319.NA12878.vcf.gz
bcftools view -c1 input.vcf
bcftools view -H -C0 concat.allchrs.sites.vcf.gz
bcftools query -f '%CHROM\t%POS\n' filename.vcf
bcftools query -f '%INFO/AF\n'
bcftools query -f '%QUAL\n' 0002.vcf
bcftools query -f '%CHROM\t%POS\t%INFO/DP\n' in.vcf.gz
bcftools query -f '%set\n' out_combine.vcf.gz |sort |uniq
bcftools query -l test.vcf
bcftools filter -r1,2 filename.vcf.gz
bcftools filter -sFilterName -e'IDV<5' input.vcf
bcftools filter -s FilterName -e'DP>50000 | IDV<9' input.vcf
bcftools filter -sFilterName -e'FORMAT/DP<5' input.vcf
bcftools filter -sFilterName -e'INFO/DP < 5' input.vcf
Obs.: A space between the INFO field and the operation symbol (> < = etc), and between the op. sym. and the number is mandatory.
bcftools view -f.,PASS lc_bams.bcftools.20170411.exc.norm.SNP.filtered.vcf.gz
bcftools stats -f "PASS,." file.vcf
bcftools view -m2 -M2 -v snps input.vcf.gz
bcftools view -m3 -v snps input.vcf.gz
bcftools view -i 'set="freebayes_lcex"' combined.all.chr20.vcf.gz
bcftools view -i 'QUAL<10' in.vcf.gz
Obs.: It will remove all FORMAT annotations except the GT information bcftools annotate -x FORMAT ifile.vcf.gz
bcftools annotate --remove INFO in.vcf.gz
Annotating a vcf file using the annotations from a different VCF (in this case we only annotate the INFO/DP)
bcftools annotate -c 'INFO/DP' -a annt.vcf.gz in.vcf.gz
see page https://github.com/samtools/bcftools/wiki/HOWTOs#annotate-from-bed
bcftools view -G input.vcf.gz
bcftools norm --check-ref ws -f ref.fa in.vcf.gz -o out.vcf.gz -Oz
The samplenames.txt file has the following format: oldsamplename newsamplename
bcftools reheader -s samplenames.txt NA12878.giab.SNP.chr20.non_valid.vcf.gz -o NA12878.giab.SNP.chr20.non_valid.reheaded.vcf.gz
bcftools reheader -h newheader.txt filename.vcf.gz -o combined.vcf.gz
export BCFTOOLS_PLUGINS=~/bin/bcftools-1.6/plugins/
bcftools +tag2tag in.vcf -- -r --pl-to-gl
bcftools +fixref file.bcf -- -f ref.fa
With -any I will split the multiallelic variants (SNPs+INDELs) into several records
bcftools norm -m -any in.vcf.gz -o out.norm.vcf.gz -Oz
For example:
chr20 60280 . TTTCCA TTTCCATTCCA,T 744 PASS .
Will be converted to:
chr20 60280 . TTTCCA TTTCCATTCCA 744 PASS . chr20 60280 . TTTCCA T 744 PASS .
bcftools view -u in.vcf.gz -o missing_genotypes.vcf.gz -Oz
Select a particular genotype (0/1 or 1/1) from a vcf. In this case access sample accessed by index 8:
bcftools view -H combined.snps_indels_chr1.filt.vcf.gz.onlyvariants.vcf.gz.ensembl.vcf.gz.85706.vcf.gz -i 'GT[8]="het"'
bcftools view -i'AC=2' in.vcf.gz
If we have a tab in a VCF defined in the header like: ##INFO=<ID=GRCH37_38_REF_STRING_MATCH,Number=0,Type=Flag,Description="Indicates reference allele in origin GRCh37 vcf string-matches reference allele in dbsnp GRCh38 vcf">
bcftools view -H -i'GRCH37_38_REF_STRING_MATCH=1' ALL.chr7_GRCh38.genotypes.20170504.ensembl.vcf.NA12878.biallelic.nonvariants.nonvalid.snps.vcf.gz |less
bcftools view -H -i'GRCH37_38_REF_STRING_MATCH=0' ALL.chr7_GRCh38.genotypes.20170504.ensembl.vcf.NA12878.biallelic.nonvariants.nonvalid.snps.vcf.gz |less
bcftools view -i 'INFO/AF > 0.8' z.vcf.gz
Left-align without correcting for the REF allele (the flag "w" only presents a warning when REF is not correct)
bcftools norm --fasta-ref Homo_sapiens_assembly38.fasta --check-ref w file.vcf.gz -Oz -o out.vcf.gz 1&> bcftools_norm.log