Skip to content

Instantly share code, notes, and snippets.

View alexllc's full-sized avatar

Alex Lau alexllc

  • The University of Texas MD Anderson Cancer Center
  • Houston, TX
View GitHub Profile
@alexllc
alexllc / auto_qc_pp.py
Last active October 29, 2024 17:37
Simple automatic Visium QC and preprocessing
import numpy as np
import pandas as pd
import seaborn as sns
from scipy.signal import find_peaks
import matplotlib as mpl
import matplotlib.pyplot as plt
import scanpy as sc
from anndata import AnnData
@alexllc
alexllc / archgdrive.md
Created June 23, 2024 20:39
Setting up googledrive sync on Arch linux

Best way to set this up is to install opam (the ocaml package manager) For the current (2024-06-23) version of googledrive ocaml fuse (0.7.32), you can't use the latest version of opam because it uses old packages.

  1. Install opam first (opam - Install)
sudo bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)"

Then run

@alexllc
alexllc / cca_st_tutorial.ipynb
Created June 14, 2024 22:19
Brief introduction to handling ST data in python with scanpy
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
chrom chromStart chromEnd name gieStain
chr1 0 2300000 p36.33 gneg
chr1 2300000 5300000 p36.32 gpos25
chr1 5300000 7100000 p36.31 gneg
chr1 7100000 9100000 p36.23 gpos25
chr1 9100000 12500000 p36.22 gneg
chr1 12500000 15900000 p36.21 gpos50
chr1 15900000 20100000 p36.13 gneg
chr1 20100000 23600000 p36.12 gpos25
chr1 23600000 27600000 p36.11 gneg
@alexllc
alexllc / convert_qptiff.ipynb
Last active June 8, 2025 21:02
Processing images from Perkin Elmer Vectra scanner (.qptiff) into OpenSlide compatible .tiff files
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@alexllc
alexllc / add_metadata_to_tif.py
Created January 24, 2024 18:10
Bash and python script to use ImageMagick to convert non-pyramidal images to tiff pyramid images and add openslide compatible metadata
"""Python script to spoof metadata to converted tiff files
This script adds necessary metadata to images converted from tif/jpg/png and are not compatible with openslide format. Using the tifftools library, we can spoof the metadata to make it compatible with openslide or other image processing libraries. This script is created using the guide from http://www.andrewjanowczyk.com/converting-an-existing-image-into-an-openslide-compatible-format/
This script requires the following packages:
- argparse
- tifftools
You need to specify the power and microns per pixel for all images, run them separately if you have different specifications for different images.
"""
@alexllc
alexllc / linux_school_vpn_connection.md
Last active February 12, 2023 07:33
How to connect to L2TP/IPsec VPN for linux users

Connecting to L2TP/IPsec VPN client with Linux

CUHK VPN

(Tested on Manjaro, Ubuntu 18.04, Ubuntu 20.04)

Our school uses the L2TP/IPsec with PSK protocol, and I've read plenty of stackexchage and forum posts regarding issues with setting up L2TP/IPsec on a Linux system. After many trials and errors I managed to set it up, and helped friends and colleagues who use Linux as their PC. I haven't shared or replied to any threads about the solution I found, primarily because I don't really know why the other methods didn't work and why this setting in particular works 🤷 So here we go.

I am going to set this up with the GUI Networkmanger, which should be installed on your MANJARO distro by default, if not, run sudo pacman -S NetworkManager

Next, open up your Networkmanager by Systems Settings > Network - Connections

# function to retreive KEGG pathway names from KEGGREST. For some strange reasons, they won't take a list of queries even though the query size was set at 100. However, when I execute the query one path at a time, I would get an error message saying "Forbidden (HTTP 403)", hence the sys.sleep break. I am guess querying more than 100 within a short period of time would not be allowed too. I'm expecting a KEGGgraph input of graphs with path ID that are like "X00010", but feel free to remove the gsub function if you have already formatted your pathID into the "map00010" form.
library(KEGGREST)
kegg_all_names <- function(pathID) {
repeats <- floor(length(pathID) / 100)
final_rep <- length(pathID) %% 100
pathNames <- NULL
for (i in 1:repeats) {
pathNames <- c(pathNames, unlist(lapply(gsub("X", "map", pathID[(1 + (i * 100) - 100):(i * 100)]), function(x) keggFind("pathway", x))))

Strategies to deal with technical replicates in TCGA

Potential issue

There are many instances where more than one aliquot is provided by TCGAbiolinks datasets, but you will only need one of those. In this case, GDC has offered a standard set of Replicate Sample rules to select the most 'scientifically advantageous' aliquot for study.

Barcode meanings

From the GDC Documentation Encyclopedia

#' filter_replicate_samples
#'
#' Function to filter technical replicates in TCGA samples
#' Doc: https://gist.github.com/alexllc/8dcd229ed3ad7f069e92dc30d5eac83a
#'
#' @source \url{http://gdac.broadinstitute.org/runs/stddata__2014_01_15/samples_report/READ_Replicate_Samples.html}
#' @param bcr list of barcodes
#' @param verbose print out which barcodes are kept and which ones are filtered
#'
#' @return subset of the list of barcodes