Chapter 4 Writing Pipeline

We can connect multiple tools together into a pipeline. Here is an example to uncompress an R script and execute it with Rscript.

Here we define a simple Rscript tool without using docker.

d1 <- InputParam(id = "rfile", type = "File")
Rs <- cwlProcess(baseCommand = "Rscript",
               inputs = InputParamList(d1))
Rs
## class: cwlProcess 
##  cwlClass: CommandLineTool 
##  cwlVersion: v1.0 
##  baseCommand: Rscript 
## inputs:
##   rfile (File):  
## outputs:
## output:
##   type: stdout

Test run:

Rs$rfile <- r4$output
tres <- runCWL(Rs, outdir = tempdir())
## INFO Final process status is success
readLines(tres$output)
## [1] "[1] 10  7  2  5  8"

The pipeline includes two steps, decompressing with predefined cwlProcess of GZ and compiling with cwlProcess of Rs. The input file is a compressed file for the first “Uncomp” step.

i1 <- InputParam(id = "cwl_zfile", type = "File")
s1 <- cwlStep(id = "Uncomp", run = GZ,
              In = list(zfile = "cwl_zfile"))
s2 <- cwlStep(id = "Compile", run = Rs,
              In = list(rfile = "Uncomp/rfile"))

In step 1 (‘s1’), the pipeline runs the cwlProcess of GZ, where the input zfile is defined in ‘i1’ with id of “cwl_zfile”. In step 2 (‘s2’), the pipeline runs the cwlProcess of Rs, where the input rfile is from the output of the step 1 (“Uncomp/rfile”) using the format of <step>/<output>.

The pipeline output will be defined as the output of the step 2 (“Compile/output”) using the format of <step>/<output> as shown below.

o1 <- OutputParam(id = "cwl_cout", type = "File",
                  outputSource = "Compile/output")

The cwlWorkflow function is used to initiate the pipeline by specifying the inputs and outputs. Then we can simply use + to connect all steps to build the final pipeline.

cwl <- cwlWorkflow(inputs = InputParamList(i1),
                    outputs = OutputParamList(o1))
cwl <- cwl + s1 + s2
cwl
## class: cwlWorkflow 
##  cwlClass: Workflow 
##  cwlVersion: v1.0 
## inputs:
##   cwl_zfile (File):  
## outputs:
## cwl_cout:
##   type: File
##   outputSource: Compile/output
## steps:
## Uncomp:
##   run: Uncomp.cwl
##   in:
##     zfile: cwl_zfile
##   out:
##   - rfile
## Compile:
##   run: Compile.cwl
##   in:
##     rfile: Uncomp/rfile
##   out:
##   - output

Let’s run the pipeline.

cwl$cwl_zfile <- zzfil
r7 <- runCWL(cwl, outdir = tempdir())
## INFO Final process status is success
readLines(r7$output)
## [1] "[1] 7 4 6 8 2"

Tips: Sometimes, we need to adjust some arguments of certain tools in a pipeline besides of parameter inputs. The function arguments can help to modify arguments for a tool, tool in a pipeline, or even tool in a sub-workflow. For example,

arguments(cwl, step = "Uncomp") <- list("-d", "-c", "-f")
runs(cwl)$Uncomp
## class: cwlProcess 
##  cwlClass: CommandLineTool 
##  cwlVersion: v1.0 
##  baseCommand: gzip 
## arguments: -d -c -f 
## inputs:
##   zfile (File):  /private/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/sample.R.gz
## outputs:
## rfile:
##   type: File
##   outputBinding:
##     glob: $(inputs.zfile.nameroot)
## stdout: $(inputs.zfile.nameroot)

4.1 Scattering pipeline

The scattering feature can specifies the associated workflow step or subworkflow to execute separately over a list of input elements. To use this feature, ScatterFeatureRequirement must be specified in the workflow requirements. Different scatter methods can be used in the associated step to decompose the input into a discrete set of jobs. More details can be found at: https://www.commonwl.org/v1.0/Workflow.html#WorkflowStep.

Here is an example to execute multiple R scripts. First, we need to set the input and output types to be array of “File”, and add the requirements. In the “Compile” step, the scattering input is required to be set with the scatter option.

i2 <- InputParam(id = "cwl_rfiles", type = "File[]")
o2 <- OutputParam(id = "cwl_couts", type = "File[]", outputSource = "Compile/output")
req1 <- requireScatter()
cwl2 <- cwlWorkflow(requirements = list(req1),
                    inputs = InputParamList(i2),
                    outputs = OutputParamList(o2))
s1 <- cwlStep(id = "Compile", run = Rs,
              In = list(rfile = "cwl_rfiles"),
              scatter = "rfile")
cwl2 <- cwl2 + s1
cwl2
## class: cwlWorkflow 
##  cwlClass: Workflow 
##  cwlVersion: v1.0 
## requirements:
## - class: ScatterFeatureRequirement
## inputs:
##   cwl_rfiles (File[]):  
## outputs:
## cwl_couts:
##   type: File[]
##   outputSource: Compile/output
## steps:
## Compile:
##   run: Compile.cwl
##   in:
##     rfile: cwl_rfiles
##   out:
##   - output
##   scatter: rfile

Multiple R scripts can be assigned to the workflow inputs and executed.

cwl2$cwl_rfiles <- c(r4b$output, r4b$output)
r8 <- runCWL(cwl2, outdir = tempdir())
## INFO Final process status is success
r8$output
## [1] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/249fd9089dc572177a13e6917496877be4594046"  
## [2] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/249fd9089dc572177a13e6917496877be4594046_2"

4.2 Pipeline plot

The function plotCWL can be used to visualize the relationship of inputs, outputs and the analysis for a tool or pipeline.

plotCWL(cwl)