Chapter 3 Wrap command line tools
3.1 Input Parameters
3.1.1 Essential Input parameters
For the input parameters, three options need to be defined usually, id, type, and prefix. The type can be string, int, long, float, double, and so on. More detail can be found at: https://www.commonwl.org/v1.0/CommandLineTool.html#CWLType.
Here is an example from CWL user
guide. Here we defined
an echo
with different type of input parameters by InputParam
. The
stdout
option can be used to capture the standard output stream to a
file.
e1 <- InputParam(id = "flag", type = "boolean", prefix = "-f")
e2 <- InputParam(id = "string", type = "string", prefix = "-s")
e3 <- InputParam(id = "int", type = "int", prefix = "-i")
e4 <- InputParam(id = "file", type = "File", prefix = "--file=", separate = FALSE)
echoA <- cwlProcess(baseCommand = "echo",
inputs = InputParamList(e1, e2, e3, e4),
stdout = "output.txt")
Then we can assign values for the input parameters.
echoA$flag <- TRUE
echoA$string <- "Hello"
echoA$int <- 1
tmpfile <- tempfile()
write("World", tmpfile)
echoA$file <- tmpfile
## [1;30mINFO[0m Final process status is success
## [1] "\033[1;30mINFO\033[0m [job echoA.cwl] /private/tmp/docker_tmpxhmbf59f$ echo \\"
## [2] " --file=/private/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/tmp94ebi5hg/stge8b3fed3-6918-40e6-8f75-0f443460f4d0/file82097fdeaaa5 \\"
## [3] " -f \\"
## [4] " -i \\"
## [5] " 1 \\"
## [6] " -s \\"
## [7] " Hello > /private/tmp/docker_tmpxhmbf59f/output.txt"
The command shows the parameters work as we defined. The parameters
are in alphabetical orders by default, but can be modified by the
position
argument in InputParam
function.
3.1.2 Array Inputs
A similar example to CWL user guide. We can define three different type of array as inputs.
a1 <- InputParam(id = "A", type = "string[]", prefix = "-A")
a2 <- InputParam(id = "B",
type = InputArrayParam(items = "string",
prefix="-B=", separate = FALSE))
a3 <- InputParam(id = "C", type = "string[]", prefix = "-C=",
itemSeparator = ",", separate = FALSE)
echoB <- cwlProcess(baseCommand = "echo",
inputs = InputParamList(a1, a2, a3))
Then we can assign values for the three input parameters:
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: echo
## inputs:
## A (string[]): -A a b c
## B:
## type: array
## prefix: -B= d e f
## C (string[]): -C= g h i
## outputs:
## output:
## type: stdout
Now we can check whether the command behaves as we expected.
## [1;30mINFO[0m Final process status is success
## [1] "\033[1;30mINFO\033[0m [job echoB.cwl] /private/tmp/docker_tmpn93lkfs0$ echo \\"
## [2] " -A \\"
## [3] " a \\"
## [4] " b \\"
## [5] " c \\"
## [6] " -B=d \\"
## [7] " -B=e \\"
## [8] " -B=f \\"
## [9] " -C=g,h,i > /private/tmp/docker_tmpn93lkfs0/d24089c4b17d2ef479f4cdeb037297507021ddb1"
3.2 Output Parameters
3.2.1 Capturing Output
Similar to the input parameters, the output is a list of output parameters. Three options id, type and glob can be defined. The glob option is used to define a pattern to find files relative to the output directory.
Here is an example to unzip a compressed gz
file. First, we generate
a compressed R script file.
zzfil <- file.path(tempdir(), "sample.R.gz")
zz <- gzfile(zzfil, "w")
cat("sample(1:10, 5)", file = zz, sep = "\n")
close(zz)
Then we build a tool called gz
(a cwlProcess
object) to uncompress
a input file using ‘gzip’.
ofile <- "sample.R"
z1 <- InputParam(id = "uncomp", type = "boolean", prefix = "-d")
z2 <- InputParam(id = "out", type = "boolean", prefix = "-c")
z3 <- InputParam(id = "zfile", type = "File")
o1 <- OutputParam(id = "rfile", type = "File", glob = ofile)
gz <- cwlProcess(baseCommand = "gzip",
inputs = InputParamList(z1, z2, z3),
outputs = OutputParamList(o1),
stdout = ofile)
Now the gz
is ready to uncompress the previous generated compressed
file.
## [1;30mINFO[0m Final process status is success
## [1] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/sample.R"
We can use arguments
argument to modify some default parameters.
z1 <- InputParam(id = "zfile", type = "File")
o1 <- OutputParam(id = "rfile", type = "File", glob = ofile)
Gz <- cwlProcess(baseCommand = "gzip",
arguments = list("-d", "-c"),
inputs = InputParamList(z1),
outputs = OutputParamList(o1),
stdout = ofile)
Gz
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: gzip
## arguments: -d -c
## inputs:
## zfile (File):
## outputs:
## rfile:
## type: File
## outputBinding:
## glob: sample.R
## stdout: sample.R
## [1;30mINFO[0m Final process status is success
To make it for general usage, we can define a pattern with javascript
to glob the output, which requires node
from ‘nodejs’ to be installed
in your system PATH.
Or we can use the CWL built-in file property, nameroot
, directly.
pfile <- "$(inputs.zfile.nameroot)"
o2 <- OutputParam(id = "rfile", type = "File", glob = pfile)
req1 <- requireJS()
GZ <- cwlProcess(baseCommand = "gzip",
arguments = list("-d", "-c"),
requirements = list(), ## assign list(req1) if node installed.
inputs = InputParamList(z1),
outputs = OutputParamList(o2),
stdout = pfile)
GZ$zfile <- zzfil
r4b <- runCWL(GZ, outdir = tempdir())
## [1;30mINFO[0m Final process status is success
3.2.2 Array Outputs
We can also capture multiple output files with glob
pattern.
a <- InputParam(id = "a", type = InputArrayParam(items = "string"))
b <- OutputParam(id = "b", type = OutputArrayParam(items = "File"),
glob = "*.txt")
touch <- cwlProcess(baseCommand = "touch", inputs = InputParamList(a),
outputs = OutputParamList(b))
touch$a <- c("a.txt", "b.log", "c.txt")
r5 <- runCWL(touch, outdir = tempdir())
## [1;30mINFO[0m Final process status is success
## [1] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/a.txt"
## [2] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/c.txt"
The ‘touch’ command generates three files, but the output only collects
two files with ‘.txt’ suffix as defined in the OutputParam
using the
‘glob’ option.
3.2.3 Standard output
Usually, the stdout
option is a string or an expression of output
file name from the command line tool. The command’s standard output
stream will be captured into a file written to the designated output
directory. When the stdout
field is defined, an output parameter
with the type of “stdout” should be also assigned with no
“outputBinding” set.
An example for command tool “cat” is defined with stdout
field in
the output, with the name passed from the input parameter “p2”:
## define Cat
p1 <- InputParam(id = "infiles", type = "File[]")
p2 <- InputParam(id = "outfile", type = "string",
default = "catout.txt", position = -1)
Cat <- cwlProcess(baseCommand = "cat",
inputs = InputParamList(p1, p2),
stdout = "$(inputs.outfile)")
## assign values to inputs
afile <- file.path(tempdir(), "a.txt")
bfile <- file.path(tempdir(), "b.txt")
write("a", afile)
write("b", bfile)
Cat$infiles <- list(afile, bfile)
## [1;30mINFO[0m Final process status is success
## [1] "\033[1;30mINFO\033[0m [job Cat.cwl] /private/tmp/docker_tmpt0o68s_7$ cat \\"
## [2] " /private/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/tmpzenr6hg0/stg3e2af0d4-9366-4269-8873-52956719a84f/a.txt \\"
## [3] " /private/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/tmpzenr6hg0/stg14e37ece-450b-4796-b431-1fb34a2eed1b/b.txt > /private/tmp/docker_tmpt0o68s_7/catout.txt"
In this example, we used the parameter “p2” to pass the name to the standard output.
In theInputParam
of “p2”, the position is assigned to a negative
value (-1), which means the parameters will not be used in the command
and only uses for passing variable. To write the “Cat” tool to a CWL
file, the “inputBinding” field will be skipped for this parameter.