This is my notebook about how to use the pipeline from nfcore-differentialabundance https://nf-co.re/differentialabundance with nfcode-atacseq https://nf-co.re/atacseq/
The atacseq pipeline produced the following output: /results/bwa/merged_replicate/macs2/narrow_peak/consensus/consensus_peaks.mRp.clN.annotatePeaks.txt
PeakID (cmd=annotatePeaks.pl consensus_peaks.mRp.clN.bed genome.fa -gid -gtf genes.gtf -cpu 6) Chr Start End Strand Peak Score Focus Ratio/Region Size Annotation (...)
Interval_332495 chr9 124851870 124853798 + 0 NA promoter-TSS (WDR38) (...)
Interval_128586 chr17 26937319 26937399 + 0 NA Intergenic (...)
Interval_331654 chr9 120441459 120442773 + 0 NA intron (CDK5RAP2, intron 23 of 37) (...)
Interval_267936 chr6 5032815 5034691 + 0 NA exon (LYRM4, exon 4 of 4) (...)
Interval_343213 chrX 130530694 130530770 + 0 NA Intergenic (...)
Interval_214618 chr3 52490455 52491417 + 0 NA intron (NISCH, intron 7 of 8) (...)
Interval_284446 chr6 159683511 159683887 + 0 NA intron (SOD2, intron 3 of 4) (...)
Interval_219482 chr3 108008329 108008684 + 0 NA Intergenic (...)
Interval_113578 chr16 1025293 1025709 + 0 NA Intergenic (...)
sanitize the spaces, replace "PeakId" with "gene_id", using
sed 's/^PeakID[^\t]*/gene_id/'consensus_peaks.mRp.clN.annotatePeaks.txt |\
sed -r '1 s/[^A-Za-z0-9_\t]+/_/g' |\
sed 's/Gene_Name/gene_name/' > differentialabundance.atacseq.GRCh38.annot.tsv
it now looks like:
gene_id Chr Start End Strand Peak_Score Focus_Ratio_Region_Size Annotation (...)
Interval_332495 chr9 124851870 124853798 + 0 NA promoter-TSS (WDR38) (...)
Interval_128586 chr17 26937319 26937399 + 0 NA Intergenic (...)
Interval_331654 chr9 120441459 120442773 + 0 NA intron (CDK5RAP2, intron 23 of 37) (...)
Interval_267936 chr6 5032815 5034691 + 0 NA exon (LYRM4, exon 4 of 4) (...)
Interval_343213 chrX 130530694 130530770 + 0 NA Intergenic (...)
Interval_214618 chr3 52490455 52491417 + 0 NA intron (NISCH, intron 7 of 8) (...)
Interval_284446 chr6 159683511 159683887 + 0 NA intron (SOD2, intron 3 of 4) (...)
Interval_219482 chr3 108008329 108008684 + 0 NA Intergenic (...)
Interval_113578 chr16 1025293 1025709 + 0 NA Intergenic (...)
The atacseq pipeline produced the following output: results/bwa/merged_replicate/macs2/narrow_peak/consensus/consensus_peaks.mRp.clN.featureCounts.txt
# Program:featureCounts v2.0.1; Command:"featureCounts" "-F" "SAF" "-O" "--fracOverlap" "0.2" "-p" "-T" "6" "-a" "consensus_peaks.mRp.clN.saf" "-s" "0" "-o" "consensus_peaks.mRp.clN.featureCounts.txt" "QX_F_SRS7605858_REP2.mLb.clN.sorted.bam" "QX_F_SRS7605858_REP1.mLb.clN.sorted.bam" "YR_F_SRS7606898_REP1.mLb.clN.sorted.bam" "YR_F_SRS7606898_REP2.mLb.clN.sorted.bam" "QX_M_SRS7604988_REP1.mLb.clN.sorted.bam" "QX_M_SRS7604988_REP2.mLb.clN.sorted.bam" "QX_M_SRS7755799_REP1.mLb.clN.sorted.bam" "QX_M_SRS7755799_REP2.mLb.clN.sorted.bam" "QX_F_SRS7604526_REP1.mLb.clN.sorted.bam" "QX_F_SRS7604526_REP2.mLb.clN.sorted.bam" "ZP_M_SRS7604190_REP2.mLb.clN.sorted.bam" "ZP_M_SRS7604190_REP1.mLb.clN.sorted.bam" "QX_F_SRS7756262_REP1.mLb.clN.sorted.bam" "QX_F_SRS7756262_REP2.mLb.clN.sorted.bam" "YR_F_SRS7606597_REP1.mLb.clN.sorted.bam" "YR_F_SRS7606597_REP2.mLb.clN.sorted.bam" "ZP_F_SRS7606383_REP1.mLb.clN.sorted.bam" "ZP_F_SRS7606383_REP2.mLb.clN.sorted.bam" "YR_M_SRS7605759_REP2.mLb.clN.sorted.bam" "YR_M_SRS7605759_REP1.mLb.clN.sorted.bam" "LA_M_SRS7604996_REP2.mLb.clN.sorted.bam" "LA_M_SRS7604996_REP1.mLb.clN.sorted.bam" (...)
Geneid Chr Start End Strand Length QX_F_SRS7605858_REP2.mLb.clN.sorted.bam QX_F_SRS7605858_REP1.mLb.clN.sorted.bam YR_F_SRS7606898_REP1.mLb.clN.sorted.bam (...)
Interval_1 chr1 10009 10619 + 611 486 377 671 (...)
Interval_2 chr1 13043 13515 + 473 35 30 61 (...)
Interval_3 chr1 14396 14795 + 400 44 26 46 (...)
Interval_4 chr1 15629 17902 + 2274 124 112 155 (...)
Interval_5 chr1 28730 29603 + 874 76 80 136 (...)
Interval_6 chr1 136407 136812 + 406 28 20 34 (...)
Interval_7 chr1 137922 139478 + 1557 47 38 77 (...)
Interval_8 chr1 180747 184640 + 3894 886 760 1106 (...)
Warning
Question: is it the correct file I should use ? how about using a count from the bigwig or bam files ?
replace GenId with gene_id, remove some columns, sanitize the sample names:
tail -n +2 consensus_peaks.mRp.clN.featureCounts.txt |cut -f1,7- |\
sed 's%.mLb.clN.sorted.bam%%g' |\
sed 's/^Geneid/gene_id/' > differentialabundance.atacseq.GRCh38.featureCount.tsv
it now lows like:
gene_id QX_F_SRS7605858_REP2 QX_F_SRS7605858_REP1 YR_F_SRS7606898_REP1 YR_F_SRS7606898_REP2 QX_M_SRS7604988_REP1 QX_M_SRS7604988_REP2 QX_M_SRS7755799_REP1
Interval_1 486 377 671 235 117 76 193
Interval_2 35 30 61 21 10 11 49
Interval_3 44 26 46 17 7 12 27
Interval_4 124 112 155 85 29 36 103
Interval_5 76 80 136 50 53 46 176
Interval_6 28 20 34 19 7 5 9
Interval_7 47 38 77 27 18 19 16
Interval_8 886 760 1106 409 183 118 368
Interval_9 108 85 142 68 25 28 57
the file is generated using the header of differentialabundance.atacseq.GRCh38.featureCount.tsv (I can do this because the phenotype is in the sample names)
head -n 1 differentialabundance.atacseq.GRCh38.featureCount.tsv |\
cut -f2- | tr "\t" "\n" |\
awk -F '_' 'BEGIN {printf("sample,tissue,sex,tissue_sex\n");} {printf("%s,%s,%s,%s_%s\n",$0,$1,$2,$1,$2);}' > samplesheet.csv
the samplesheet:
sample,tissue,sex,tissue_sex
QX_F_SRS7605858_REP2,QX,F,QX_F
QX_F_SRS7605858_REP1,QX,F,QX_F
YR_F_SRS7606898_REP1,YR,F,YR_F
(...)
the file is generated manually:
id,variable,reference,target
YR_F_vs_M,tissue_sex,YR_F,YR_M
ZP_F_vs_M,tissue_sex,ZP_F,ZP_M
QX_F_vs_M,tissue_sex,QX_F,QX_M
nextflow run nf-core/differentialabundance \
-c local.cfg \
-profile singularity \
-work-dir "WORK" \
-resume \
--outdir "results" \
--input samplesheet.csv \
--contrasts contrasts.csv \
--features differentialabundance.atacseq.GRCh38.annot.tsv \
--matrix differentialabundance.atacseq.GRCh38.featureCount.tsv \
--features_metadata_cols "gene_id,gene_name" \
--features_id_col gene_id \
--features_name_col "gene_name" \
--sizefactors_from_controls false \
--shinyngs_build_app true
and it should work...
Warning
the static html file in study is huge (>181M) and crashed firefox. I also tried to run the shiny app but my firefox freezes
Pierre Lindenbaum 20240808