Chapter 7 DNAseq alignment

The pipeline tl_bwaMRecal can be used to preprocess the fastq files from DNA sequencing. It can take paired fastq files, read groups from multiple batches as input.

bwaMRecal <- cwlLoad("pl_bwaMRecal")
## markdup loaded
inputs(bwaMRecal)
## inputs:
##   outBam (string):  
##   RG (string):  
##   threads (int):  
##   Ref (File):  
##   FQ1s (File):  
##   FQ2s (File):  
##   knowSites:
##     type: array
##     prefix:

The pipeline includes three steps: BWA alignment, mark duplicate, and base recalibration. The steps can be a single tool or a sub-pipeline that includes several tools each.

runs(bwaMRecal)
## List of length 3
## names(3): bwaAlign markdup BaseRecal
  1. bwaAlign: BWA alignment step is a sub-pipeline which includes the following tools:
runs(runs(bwaMRecal)[[1]])
## List of length 4
## names(4): bwa sam2bam sortBam idxBam
  • bwa: to align fastqs and read groups to reference genome with bwa.
  • sam2bam: to convert the alignments from “sam” to “bam” format with samtools.
  • sortBam: to sort the “bam” file by coordinates with samtools.
  • idxBam: To index “bam” file with samtools.
  1. markdup: MarkDuplicates runs a single command line tool Picard that identifies duplicate reads.
runs(bwaMRecal)[[2]]
## class: cwlProcess 
##  cwlClass: CommandLineTool 
##  cwlVersion: v1.0 
##  baseCommand: picard MarkDuplicates 
## requirements:
## - class: DockerRequirement
##   dockerPull: quay.io/biocontainers/picard:2.21.1--0
## inputs:
##   ibam (File): I= 
##   obam (string): O= 
##   matrix (string): M= 
## outputs:
## mBam:
##   type: File
##   outputBinding:
##     glob: $(inputs.obam)
## Mat:
##   type: File
##   outputBinding:
##     glob: $(inputs.matrix)
  1. BaseRecal: Alignment recalibration is a sub-pipeline that runs several tools from GATK toolkit.
runs(runs(bwaMRecal)[[3]])
## List of length 5
## names(5): BaseRecalibrator ApplyBQSR samtools_index samtools_flagstat samtools_stats
  • BaseRecalibrator and ApplyBQSR: alignment recalibration by GATK toolkit.
  • samtools_index: to index bam file with samtools.
  • samtools_flagstat and samtools_stats: to summarize alignments with samtools.

The output of bwaMRecal pipeline includes the duplicates matrix from markdup step, final processed bam files and flag summary files from BaseRecal step.

outputs(bwaMRecal)
## outputs:
## BAM:
##   type: File
##   outputSource: BaseRecal/rcBam
## matrix:
##   type: File
##   outputSource: markdup/Mat
## flagstat:
##   type: File
##   outputSource: BaseRecal/flagstat
## stats:
##   type: File
##   outputSource: BaseRecal/stats