Chapter 4 Writing Pipeline
We can connect multiple tools together into a pipeline. Here is an
example to uncompress an R script and execute it with Rscript
.
Here we define a simple Rscript
tool without using docker.
d1 <- InputParam(id = "rfile", type = "File")
Rs <- cwlProcess(baseCommand = "Rscript",
inputs = InputParamList(d1))
Rs
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: Rscript
## inputs:
## rfile (File):
## outputs:
## output:
## type: stdout
Test run:
## [1;30mINFO[0m Final process status is success
## [1] "[1] 10 7 2 5 8"
The pipeline includes two steps, decompressing with predefined
cwlProcess
of GZ
and compiling with cwlProcess
of Rs
. The
input file is a compressed file for the first “Uncomp” step.
i1 <- InputParam(id = "cwl_zfile", type = "File")
s1 <- cwlStep(id = "Uncomp", run = GZ,
In = list(zfile = "cwl_zfile"))
s2 <- cwlStep(id = "Compile", run = Rs,
In = list(rfile = "Uncomp/rfile"))
In step 1 (‘s1’), the pipeline runs the cwlProcess
of GZ
, where
the input zfile
is defined in ‘i1’ with id of “cwl_zfile”. In step 2
(‘s2’), the pipeline runs the cwlProcess
of Rs
, where the input
rfile
is from the output of the step 1 (“Uncomp/rfile”) using the
format of <step>/<output>
.
The pipeline output will be defined as the output of the step 2
(“Compile/output”) using the format of <step>/<output>
as shown
below.
The cwlWorkflow
function is used to initiate the pipeline by
specifying the inputs
and outputs
. Then we can simply use +
to
connect all steps to build the final pipeline.
cwl <- cwlWorkflow(inputs = InputParamList(i1),
outputs = OutputParamList(o1))
cwl <- cwl + s1 + s2
cwl
## class: cwlWorkflow
## cwlClass: Workflow
## cwlVersion: v1.0
## inputs:
## cwl_zfile (File):
## outputs:
## cwl_cout:
## type: File
## outputSource: Compile/output
## steps:
## Uncomp:
## run: Uncomp.cwl
## in:
## zfile: cwl_zfile
## out:
## - rfile
## Compile:
## run: Compile.cwl
## in:
## rfile: Uncomp/rfile
## out:
## - output
Let’s run the pipeline.
## [1;30mINFO[0m Final process status is success
## [1] "[1] 7 4 6 8 2"
Tips: Sometimes, we need to adjust some arguments of certain tools in
a pipeline besides of parameter inputs. The function arguments
can
help to modify arguments for a tool, tool in a pipeline, or even tool
in a sub-workflow. For example,
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: gzip
## arguments: -d -c -f
## inputs:
## zfile (File): /private/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/sample.R.gz
## outputs:
## rfile:
## type: File
## outputBinding:
## glob: $(inputs.zfile.nameroot)
## stdout: $(inputs.zfile.nameroot)
4.1 Scattering pipeline
The scattering feature can specifies the associated workflow step or
subworkflow to execute separately over a list of input elements. To
use this feature, ScatterFeatureRequirement
must be specified in the
workflow requirements. Different scatter
methods can be used in the
associated step to decompose the input into a discrete set of
jobs. More details can be found at:
https://www.commonwl.org/v1.0/Workflow.html#WorkflowStep.
Here is an example to execute multiple R scripts. First, we need to
set the input and output types to be array of “File”, and add the
requirements. In the “Compile” step, the scattering input is required
to be set with the scatter
option.
i2 <- InputParam(id = "cwl_rfiles", type = "File[]")
o2 <- OutputParam(id = "cwl_couts", type = "File[]", outputSource = "Compile/output")
req1 <- requireScatter()
cwl2 <- cwlWorkflow(requirements = list(req1),
inputs = InputParamList(i2),
outputs = OutputParamList(o2))
s1 <- cwlStep(id = "Compile", run = Rs,
In = list(rfile = "cwl_rfiles"),
scatter = "rfile")
cwl2 <- cwl2 + s1
cwl2
## class: cwlWorkflow
## cwlClass: Workflow
## cwlVersion: v1.0
## requirements:
## - class: ScatterFeatureRequirement
## inputs:
## cwl_rfiles (File[]):
## outputs:
## cwl_couts:
## type: File[]
## outputSource: Compile/output
## steps:
## Compile:
## run: Compile.cwl
## in:
## rfile: cwl_rfiles
## out:
## - output
## scatter: rfile
Multiple R scripts can be assigned to the workflow inputs and executed.
## [1;30mINFO[0m Final process status is success
## [1] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/249fd9089dc572177a13e6917496877be4594046"
## [2] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/249fd9089dc572177a13e6917496877be4594046_2"