Skip to content

Instantly share code, notes, and snippets.

View alexcritschristoph's full-sized avatar
🦠
microbes are eating the world

Alex Crits-Christoph alexcritschristoph

🦠
microbes are eating the world
View GitHub Profile
@alexcritschristoph
alexcritschristoph / articles.tsv
Last active March 20, 2025 22:20
"lab leak" articles
date notes Source article
1/23/2020 China built a lab to study SARS and Ebola in Wuhan Daily Mail https://www.dailymail.co.uk/health/article-7922379/Chinas-lab-studying-SARS-Ebola-Wuhan-outbreaks-center.html
3/4/2020 Don’t buy China’s story: The coronavirus may have leaked from a lab NY Post https://nypost.com/2020/02/22/dont-buy-chinas-story-the-coronavirus-may-have-leaked-from-a-lab/
3/5/2020 Pompeo France 24 https://www.france24.com/en/20200503-pompeo-says-enormous-evidence-coronavirus-originated-in-wuhan-lab
3/5/2020 Coronavirus Epidemic Draws Scrutiny to Labs Handling Deadly Pathogens WSJ https://www.wsj.com/articles/coronavirus-epidemic-draws-scrutiny-to-labs-handling-deadly-pathogens-11583349777
3/30/2020 Experts know the new coronavirus is not a bioweapon. They disagree on whether it could have leaked from a research lab Bulletin of the Atomic Scientists https://thebulletin.org/2020/03/experts-know-the-new-coronavirus-is-not-a-bioweapon-they-disagree-on-whether-it-could-have-leaked-from-a-researc
# no fuss script todownload an ncbi genome in ~1 second:
# usage download_ncbi.sh GCA_000330525.1
genome_accession=$1
datasets download genome accession ${genome_accession} --include gbff
unzip ncbi_dataset.zip
mv ncbi_dataset/data/${genome_accession}/genomic.gbff ${genome_accession}.gbff
rm -rf ./ncbi_dataset*
@alexcritschristoph
alexcritschristoph / get_consenus.py
Last active August 8, 2024 02:46
get the consensus sequence from a BAM
## returns the consensus sequence of a bam
## minimum 3x depth of coverage at a site required
import pysam
import sys
import pandas as pd
import argparse
from Bio import SeqIO
import numpy as np
from collections import defaultdict
import pandas as pd
@alexcritschristoph
alexcritschristoph / get_read_info.py
Created May 21, 2021 06:43
Percent Identity of reads from a BAM file
import pysam
bamfile = pysam.AlignmentFile('file.bam')
for read in bamfile.fetch():
number_of_mismatches = read.get_tag("NM")
read_length = read.infer_query_length()
read_percent_id = (1 - float(number_of_mismatches) / float(read_length)) * 100
# if you want a specific read
if read.query_name == 'my_read':
import pysam
import sys
import pandas as pd
import argparse
from Bio import SeqIO
import numpy as np
from collections import defaultdict
import pandas as pd
P2C = {'A':0, 'C':1, 'T':2, 'G':3}
@alexcritschristoph
alexcritschristoph / parse_antismash_proteins.py
Created January 31, 2020 01:10
Parses all proteins from antismash and labels them by genome and cluster number
import glob
from Bio import SeqIO
for fn in glob.glob('./antiSMASH/*/*cluster*.gbk'):
genome = fn.split("/")[-2]
cluster_num = fn.split("cluster")[1].split(".")[0]
i = 0
for record in SeqIO.parse(fn, 'genbank'):
for feature in record.features:
if feature.type == 'CDS':
print(">" + genome + "|" + cluster_num + "|" + str(i) + "|" + str(feature.location.start) + ":" + str(feature.location.end))
@alexcritschristoph
alexcritschristoph / deseq2-analysis-template.R
Last active March 6, 2016 21:09 — forked from stephenturner/deseq2-analysis-template.R
Template for analysis with DESeq2
## DESeq2 made as easy as it should have always been, but for some reason isn't.
## Code based on: https://gist.github.com/stephenturner/f60c1934405c127f09a6
library('DESeq2')
'''
Our starting CSV/TSV looks like:
Sample Pairing1 Pairing2 Pairing3 Status KXB65094.1 KXB65950.1 KXB67202.1 ....
1 a Active Active Positive Active 0 0 1
2 b Active Active Positive Active 0 1 1
@alexcritschristoph
alexcritschristoph / gc_window.py
Created September 2, 2015 19:50
Calculates average GC for a window size over an entire genome
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from random import randint
from Bio.SeqUtils import GC
import sys
#There should be one and only one record, the entire genome:
print "reading"
mito_record = SeqIO.read(open(sys.argv[1]), "fasta")
gcs = []
@alexcritschristoph
alexcritschristoph / random_forest.py
Created September 2, 2015 19:44
Basic usage of importing training data and predicting using sklearn
# Import the random forest package
from sklearn.ensemble import RandomForestClassifier
from sklearn import cross_validation
import numpy as np
dataset = np.loadtxt('training_data.csv', delimiter=",")
# Create the random forest object which will include all the parameters
# for the fit
forest = RandomForestClassifier(n_estimators = 100)
@alexcritschristoph
alexcritschristoph / simulate_assembly.py
Created September 2, 2015 19:43
Generates randomized contigs from a list of FASTA genomes (naive assembled metagenome simulator)
import os
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from random import randint
j = 1
import sys
for root, subdirs, files in os.walk(sys.argv[1]):
for f in files:
seq = os.path.join(root,f)