View

Printing out info for only 2 samples:

bcftools view -s NA20818,NA20819 filename.vcf.gz

Printing stats only for variants passing the filter:

bcftools view -f PASS filename.vcf.gz

Printing variants without header:

bcftools view -H

Printing variants on a particular region for specific samples:

bcftools view -r chr20:1-200000 -s NA20818,NA20819 filename.vcf.gz

Print all variants except for the ones falling within region:

bcftools view -t ^chr20:1-30000000 filename.vcf.gz > out.vcf

View the positions passed in a file (accepted files are .vcf and .bed):

bcftools view -R 0002.vcf in.vcf.gz

View the positions passed in a tsv file:

bcftools view -R 0002.tsv in.vcf.gz

The format of 0002.tsv: 20 79000 80000 20 90000 100000 Obs.: the file should be tab separated and have no whitespace in the end

Selecting snps from file:

bcftools view -v snps lc_bams.bcftools.20170319.NA12878.vcf.gz

Selecting the variants from a VCF (excluding 0|0 genotypes)

bcftools view -c1 input.vcf

Selecting the non-variants from a VCF(AC=0)

bcftools view -H -C0 concat.allchrs.sites.vcf.gz

Query

Print out the chr\tpos

bcftools query -f '%CHROM\t%POS\n' filename.vcf

Print out the AF INFO field

bcftools query -f '%INFO/AF\n'

Getting a particular annotation from the VCF

bcftools query -f '%QUAL\n' 0002.vcf

Printing chr pos and a particular annotation from a VCF:

bcftools query -f '%CHROM\t%POS\t%INFO/DP\n' in.vcf.gz

Printing out the sets assigned by GATK CombineVariants

bcftools query -f '%set\n' out_combine.vcf.gz |sort |uniq

Printing a list of samples from a VCF:

bcftools query -l test.vcf

Filtering:

Filter variants per region (in this example, print out only variants mapped to chr1 and chr2)

bcftools filter -r1,2 filename.vcf.gz

Using one of the INFO annotations (IDV)

bcftools filter -sFilterName -e'IDV<5' input.vcf

Or - logical operator:

bcftools filter -s FilterName -e'DP>50000 | IDV<9' input.vcf

Filtering on FORMAT annotation:

bcftools filter -sFilterName -e'FORMAT/DP<5' input.vcf

Filtering on INFO annotation:

bcftools filter -sFilterName -e'INFO/DP < 5' input.vcf

Obs.: A space between the INFO field and the operation symbol (> < = etc), and between the op. sym. and the number is mandatory.

Printing out variants that pass the filter:

bcftools view -f.,PASS lc_bams.bcftools.20170411.exc.norm.SNP.filtered.vcf.gz

Stats and filtering:

bcftools stats -f "PASS,." file.vcf

Select only biallelic (excluding multiallelic) snps

bcftools view -m2 -M2 -v snps input.vcf.gz

Select only the multiallelic snps

bcftools view -m3 -v snps input.vcf.gz

Printing the set info in the INFO field:

bcftools view -i 'set="freebayes_lcex"' combined.all.chr20.vcf.gz

Printing all entries having a quality <10

bcftools view -i 'QUAL<10' in.vcf.gz

Annotate

Removing FORMAT column from the VCF

Obs.: It will remove all FORMAT annotations except the GT information bcftools annotate -x FORMAT ifile.vcf.gz

Removing INFO field from VCF

bcftools annotate --remove INFO in.vcf.gz

Annotating a vcf file using the annotations from a different VCF (in this case we only annotate the INFO/DP)

bcftools annotate -c 'INFO/DP' -a annt.vcf.gz in.vcf.gz

Anotating a vcf file with a tabular file:

see page https://github.com/samtools/bcftools/wiki/HOWTOs#annotate-from-bed

Drop individual genotype information

bcftools view -G input.vcf.gz

Correcting the REF/ALT swaps:

bcftools norm --check-ref ws -f ref.fa in.vcf.gz -o out.vcf.gz -Oz

Changing the sample names in a VCF:

The samplenames.txt file has the following format: oldsamplename newsamplename

bcftools reheader -s samplenames.txt NA12878.giab.SNP.chr20.non_valid.vcf.gz -o NA12878.giab.SNP.chr20.non_valid.reheaded.vcf.gz

Changing the header:

bcftools reheader -h newheader.txt filename.vcf.gz -o combined.vcf.gz

Plugins

export BCFTOOLS_PLUGINS=~/bin/bcftools-1.6/plugins/

taqg2tag:

Convert PL to GL

bcftools +tag2tag in.vcf -- -r --pl-to-gl

Getting stats on the number of REF/ALT swaps and other things:

bcftools +fixref file.bcf -- -f ref.fa

Normalizing the multiallelic variants:

With -any I will split the multiallelic variants (SNPs+INDELs) into several records

bcftools norm -m -any in.vcf.gz -o out.norm.vcf.gz -Oz

For example:

chr20 60280 . TTTCCA TTTCCATTCCA,T 744 PASS .

Will be converted to:

chr20 60280 . TTTCCA TTTCCATTCCA 744 PASS . chr20 60280 . TTTCCA T 744 PASS .

Selecting the missing (uncalled) genotypes:

bcftools view -u in.vcf.gz -o missing_genotypes.vcf.gz -Oz

Select a particular genotype (0/1 or 1/1) from a vcf. In this case access sample accessed by index 8:

bcftools view -H combined.snps_indels_chr1.filt.vcf.gz.onlyvariants.vcf.gz.ensembl.vcf.gz.85706.vcf.gz -i 'GT[8]="het"'

Select all lines having exactly AC=2

bcftools view -i'AC=2' in.vcf.gz

If we have a tab in a VCF defined in the header like: ##INFO=<ID=GRCH37_38_REF_STRING_MATCH,Number=0,Type=Flag,Description="Indicates reference allele in origin GRCh37 vcf string-matches reference allele in dbsnp GRCh38 vcf">

We can check for the records having this tag by doing:

bcftools view -H -i'GRCH37_38_REF_STRING_MATCH=1' ALL.chr7_GRCh38.genotypes.20170504.ensembl.vcf.NA12878.biallelic.nonvariants.nonvalid.snps.vcf.gz |less

And the contrary by doing:

bcftools view -H -i'GRCH37_38_REF_STRING_MATCH=0' ALL.chr7_GRCh38.genotypes.20170504.ensembl.vcf.NA12878.biallelic.nonvariants.nonvalid.snps.vcf.gz |less

Filtering a VCF depending on a certain Allele frequency:

bcftools view -i 'INFO/AF > 0.8' z.vcf.gz

Left-align without correcting for the REF allele (the flag "w" only presents a warning when REF is not correct)

bcftools norm --fasta-ref Homo_sapiens_assembly38.fasta --check-ref w file.vcf.gz -Oz -o out.vcf.gz 1&> bcftools_norm.log

flaviaerius/bcftools.md