This is my notebook about how to use the pipeline from nfcore-differentialabundance https://nf-co.re/differentialabundance with nfcode-atacseq https://nf-co.re/atacseq/

The feature file

The atacseq pipeline produced the following output: /results/bwa/merged_replicate/macs2/narrow_peak/consensus/consensus_peaks.mRp.clN.annotatePeaks.txt

PeakID (cmd=annotatePeaks.pl consensus_peaks.mRp.clN.bed genome.fa -gid -gtf genes.gtf -cpu 6)	Chr	Start	End	Strand	Peak Score	Focus Ratio/Region Size	Annotation (...)
Interval_332495	chr9	124851870	124853798	+	0	NA	promoter-TSS (WDR38) (...)
Interval_128586	chr17	26937319	26937399	+	0	NA	Intergenic (...)
Interval_331654	chr9	120441459	120442773	+	0	NA	intron (CDK5RAP2, intron 23 of 37) (...)
Interval_267936	chr6	5032815	5034691	+	0	NA	exon (LYRM4, exon 4 of 4) (...)
Interval_343213	chrX	130530694	130530770	+	0	NA	Intergenic (...)
Interval_214618	chr3	52490455	52491417	+	0	NA	intron (NISCH, intron 7 of 8) (...)
Interval_284446	chr6	159683511	159683887	+	0	NA	intron (SOD2, intron 3 of 4) (...)
Interval_219482	chr3	108008329	108008684	+	0	NA	Intergenic (...)
Interval_113578	chr16	1025293	1025709	+	0	NA	Intergenic (...)

sanitize the spaces, replace "PeakId" with "gene_id", using

sed 's/^PeakID[^\t]*/gene_id/'consensus_peaks.mRp.clN.annotatePeaks.txt |\
	sed -r '1 s/[^A-Za-z0-9_\t]+/_/g' |\
	sed 's/Gene_Name/gene_name/'  > differentialabundance.atacseq.GRCh38.annot.tsv

it now looks like:

gene_id	Chr	Start	End	Strand	Peak_Score	Focus_Ratio_Region_Size	Annotation (...)
Interval_332495	chr9	124851870	124853798	+	0	NA	promoter-TSS (WDR38) (...)
Interval_128586	chr17	26937319	26937399	+	0	NA	Intergenic (...)
Interval_331654	chr9	120441459	120442773	+	0	NA	intron (CDK5RAP2, intron 23 of 37) (...)
Interval_267936	chr6	5032815	5034691	+	0	NA	exon (LYRM4, exon 4 of 4) (...)
Interval_343213	chrX	130530694	130530770	+	0	NA	Intergenic (...)
Interval_214618	chr3	52490455	52491417	+	0	NA	intron (NISCH, intron 7 of 8) (...)
Interval_284446	chr6	159683511	159683887	+	0	NA	intron (SOD2, intron 3 of 4) (...)
Interval_219482	chr3	108008329	108008684	+	0	NA	Intergenic (...)
Interval_113578	chr16	1025293	1025709	+	0	NA	Intergenic (...)

The matrix file

The atacseq pipeline produced the following output: results/bwa/merged_replicate/macs2/narrow_peak/consensus/consensus_peaks.mRp.clN.featureCounts.txt

# Program:featureCounts v2.0.1; Command:"featureCounts" "-F" "SAF" "-O" "--fracOverlap" "0.2" "-p" "-T" "6" "-a" "consensus_peaks.mRp.clN.saf" "-s" "0" "-o" "consensus_peaks.mRp.clN.featureCounts.txt" "QX_F_SRS7605858_REP2.mLb.clN.sorted.bam" "QX_F_SRS7605858_REP1.mLb.clN.sorted.bam" "YR_F_SRS7606898_REP1.mLb.clN.sorted.bam" "YR_F_SRS7606898_REP2.mLb.clN.sorted.bam" "QX_M_SRS7604988_REP1.mLb.clN.sorted.bam" "QX_M_SRS7604988_REP2.mLb.clN.sorted.bam" "QX_M_SRS7755799_REP1.mLb.clN.sorted.bam" "QX_M_SRS7755799_REP2.mLb.clN.sorted.bam" "QX_F_SRS7604526_REP1.mLb.clN.sorted.bam" "QX_F_SRS7604526_REP2.mLb.clN.sorted.bam" "ZP_M_SRS7604190_REP2.mLb.clN.sorted.bam" "ZP_M_SRS7604190_REP1.mLb.clN.sorted.bam" "QX_F_SRS7756262_REP1.mLb.clN.sorted.bam" "QX_F_SRS7756262_REP2.mLb.clN.sorted.bam" "YR_F_SRS7606597_REP1.mLb.clN.sorted.bam" "YR_F_SRS7606597_REP2.mLb.clN.sorted.bam" "ZP_F_SRS7606383_REP1.mLb.clN.sorted.bam" "ZP_F_SRS7606383_REP2.mLb.clN.sorted.bam" "YR_M_SRS7605759_REP2.mLb.clN.sorted.bam" "YR_M_SRS7605759_REP1.mLb.clN.sorted.bam" "LA_M_SRS7604996_REP2.mLb.clN.sorted.bam" "LA_M_SRS7604996_REP1.mLb.clN.sorted.bam"  (...)
Geneid	Chr	Start	End	Strand	Length	QX_F_SRS7605858_REP2.mLb.clN.sorted.bam	QX_F_SRS7605858_REP1.mLb.clN.sorted.bam	YR_F_SRS7606898_REP1.mLb.clN.sorted.bam (...)
Interval_1	chr1	10009	10619	+	611	486	377	671 (...)
Interval_2	chr1	13043	13515	+	473	35	30	61 (...)
Interval_3	chr1	14396	14795	+	400	44	26	46 (...)
Interval_4	chr1	15629	17902	+	2274	124	112	155 (...)
Interval_5	chr1	28730	29603	+	874	76	80	136 (...)
Interval_6	chr1	136407	136812	+	406	28	20	34 (...)
Interval_7	chr1	137922	139478	+	1557	47	38	77 (...)
Interval_8	chr1	180747	184640	+	3894	886	760	1106 (...)

Warning

Question: is it the correct file I should use ? how about using a count from the bigwig or bam files ?

replace GenId with gene_id, remove some columns, sanitize the sample names:

tail -n +2 consensus_peaks.mRp.clN.featureCounts.txt |cut -f1,7- |\
	sed 's%.mLb.clN.sorted.bam%%g' |\
	sed 's/^Geneid/gene_id/' > differentialabundance.atacseq.GRCh38.featureCount.tsv

it now lows like:

gene_id	QX_F_SRS7605858_REP2	QX_F_SRS7605858_REP1	YR_F_SRS7606898_REP1	YR_F_SRS7606898_REP2	QX_M_SRS7604988_REP1	QX_M_SRS7604988_REP2	QX_M_SRS7755799_REP1
Interval_1	486	377	671	235	117	76	193
Interval_2	35	30	61	21	10	11	49
Interval_3	44	26	46	17	7	12	27
Interval_4	124	112	155	85	29	36	103
Interval_5	76	80	136	50	53	46	176
Interval_6	28	20	34	19	7	5	9
Interval_7	47	38	77	27	18	19	16
Interval_8	886	760	1106	409	183	118	368
Interval_9	108	85	142	68	25	28	57

The samplesheet

the file is generated using the header of differentialabundance.atacseq.GRCh38.featureCount.tsv (I can do this because the phenotype is in the sample names)

head -n 1 differentialabundance.atacseq.GRCh38.featureCount.tsv |\
	 cut -f2- | tr "\t" "\n" |\
   awk -F '_' 'BEGIN {printf("sample,tissue,sex,tissue_sex\n");} {printf("%s,%s,%s,%s_%s\n",$0,$1,$2,$1,$2);}' > samplesheet.csv

the samplesheet:

sample,tissue,sex,tissue_sex
QX_F_SRS7605858_REP2,QX,F,QX_F
QX_F_SRS7605858_REP1,QX,F,QX_F
YR_F_SRS7606898_REP1,YR,F,YR_F
(...)

The contrasts file

the file is generated manually:

id,variable,reference,target
YR_F_vs_M,tissue_sex,YR_F,YR_M
ZP_F_vs_M,tissue_sex,ZP_F,ZP_M
QX_F_vs_M,tissue_sex,QX_F,QX_M

invoking nextflow

nextflow run  nf-core/differentialabundance  \
  -c local.cfg \
	-profile singularity \
	-work-dir "WORK" \
	-resume \
	--outdir "results" \
	--input samplesheet.csv \
	--contrasts contrasts.csv \
	--features  differentialabundance.atacseq.GRCh38.annot.tsv \
	--matrix  differentialabundance.atacseq.GRCh38.featureCount.tsv \
	--features_metadata_cols "gene_id,gene_name" \
	--features_id_col gene_id \
	--features_name_col "gene_name" \
	--sizefactors_from_controls false \
	--shinyngs_build_app true

and it should work...

Output

Warning

the static html file in study is huge (>181M) and crashed firefox. I also tried to run the shiny app but my firefox freezes

Author

Pierre Lindenbaum 20240808

lindenb/README.md