Chapter 3 Wrap command line tools

3.1 Input Parameters

3.1.1 Essential Input parameters

For the input parameters, three options need to be defined usually, id, type, and prefix. The type can be string, int, long, float, double, and so on. More detail can be found at: https://www.commonwl.org/v1.0/CommandLineTool.html#CWLType.

Here is an example from CWL user guide. Here we defined an echo with different type of input parameters by InputParam. The stdout option can be used to capture the standard output stream to a file.

e1 <- InputParam(id = "flag", type = "boolean", prefix = "-f")
e2 <- InputParam(id = "string", type = "string", prefix = "-s")
e3 <- InputParam(id = "int", type = "int", prefix = "-i")
e4 <- InputParam(id = "file", type = "File", prefix = "--file=", separate = FALSE)
echoA <- cwlProcess(baseCommand = "echo",
                  inputs = InputParamList(e1, e2, e3, e4),
                  stdout = "output.txt")

Then we can assign values for the input parameters.

echoA$flag <- TRUE
echoA$string <- "Hello"
echoA$int <- 1

tmpfile <- tempfile()
write("World", tmpfile)
echoA$file <- tmpfile
r2 <- runCWL(echoA, outdir = tempdir())
## INFO Final process status is success
r2$command
## [1] "\033[1;30mINFO\033[0m [job echoA.cwl] /private/tmp/docker_tmpxhmbf59f$ echo \\"                                                             
## [2] "    --file=/private/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/tmp94ebi5hg/stge8b3fed3-6918-40e6-8f75-0f443460f4d0/file82097fdeaaa5 \\"
## [3] "    -f \\"                                                                                                                                  
## [4] "    -i \\"                                                                                                                                  
## [5] "    1 \\"                                                                                                                                   
## [6] "    -s \\"                                                                                                                                  
## [7] "    Hello > /private/tmp/docker_tmpxhmbf59f/output.txt"

The command shows the parameters work as we defined. The parameters are in alphabetical orders by default, but can be modified by the position argument in InputParam function.

3.1.2 Array Inputs

A similar example to CWL user guide. We can define three different type of array as inputs.

a1 <- InputParam(id = "A", type = "string[]", prefix = "-A")
a2 <- InputParam(id = "B",
                 type = InputArrayParam(items = "string",
                                        prefix="-B=", separate = FALSE))
a3 <- InputParam(id = "C", type = "string[]", prefix = "-C=",
                 itemSeparator = ",", separate = FALSE)
echoB <- cwlProcess(baseCommand = "echo",
                 inputs = InputParamList(a1, a2, a3))

Then we can assign values for the three input parameters:

echoB$A <- letters[1:3]
echoB$B <- letters[4:6]
echoB$C <- letters[7:9]
echoB
## class: cwlProcess 
##  cwlClass: CommandLineTool 
##  cwlVersion: v1.0 
##  baseCommand: echo 
## inputs:
##   A (string[]): -A a b c
##   B:
##     type: array
##     prefix: -B= d e f
##   C (string[]): -C= g h i
## outputs:
## output:
##   type: stdout

Now we can check whether the command behaves as we expected.

r3 <- runCWL(echoB, outdir = tempdir())
## INFO Final process status is success
r3$command
## [1] "\033[1;30mINFO\033[0m [job echoB.cwl] /private/tmp/docker_tmpn93lkfs0$ echo \\"         
## [2] "    -A \\"                                                                              
## [3] "    a \\"                                                                               
## [4] "    b \\"                                                                               
## [5] "    c \\"                                                                               
## [6] "    -B=d \\"                                                                            
## [7] "    -B=e \\"                                                                            
## [8] "    -B=f \\"                                                                            
## [9] "    -C=g,h,i > /private/tmp/docker_tmpn93lkfs0/d24089c4b17d2ef479f4cdeb037297507021ddb1"

3.2 Output Parameters

3.2.1 Capturing Output

Similar to the input parameters, the output is a list of output parameters. Three options id, type and glob can be defined. The glob option is used to define a pattern to find files relative to the output directory.

Here is an example to unzip a compressed gz file. First, we generate a compressed R script file.

zzfil <- file.path(tempdir(), "sample.R.gz")
zz <- gzfile(zzfil, "w")
cat("sample(1:10, 5)", file = zz, sep = "\n")
close(zz)

Then we build a tool called gz (a cwlProcess object) to uncompress a input file using ‘gzip’.

ofile <- "sample.R"
z1 <- InputParam(id = "uncomp", type = "boolean", prefix = "-d")
z2 <- InputParam(id = "out", type = "boolean", prefix = "-c")
z3 <- InputParam(id = "zfile", type = "File")
o1 <- OutputParam(id = "rfile", type = "File", glob = ofile)
gz <- cwlProcess(baseCommand = "gzip",
               inputs = InputParamList(z1, z2, z3),
               outputs = OutputParamList(o1),
               stdout = ofile)

Now the gz is ready to uncompress the previous generated compressed file.

gz$uncomp <- TRUE
gz$out <- TRUE
gz$zfile <- zzfil
r4 <- runCWL(gz, outdir = tempdir())
## INFO Final process status is success
r4$output
## [1] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/sample.R"

We can use arguments argument to modify some default parameters.

z1 <- InputParam(id = "zfile", type = "File")
o1 <- OutputParam(id = "rfile", type = "File", glob = ofile)
Gz <- cwlProcess(baseCommand = "gzip",
               arguments = list("-d", "-c"),
               inputs = InputParamList(z1),
               outputs = OutputParamList(o1),
               stdout = ofile)
Gz
## class: cwlProcess 
##  cwlClass: CommandLineTool 
##  cwlVersion: v1.0 
##  baseCommand: gzip 
## arguments: -d -c 
## inputs:
##   zfile (File):  
## outputs:
## rfile:
##   type: File
##   outputBinding:
##     glob: sample.R
## stdout: sample.R
Gz$zfile <- zzfil
r4a <- runCWL(Gz, outdir = tempdir())
## INFO Final process status is success

To make it for general usage, we can define a pattern with javascript to glob the output, which requires node from ‘nodejs’ to be installed in your system PATH.

pfile <- "$(inputs.zfile.path.split('/').slice(-1)[0].split('.').slice(0,-1).join('.'))"

Or we can use the CWL built-in file property, nameroot, directly.

pfile <- "$(inputs.zfile.nameroot)"
o2 <- OutputParam(id = "rfile", type = "File", glob = pfile)
req1 <- requireJS()
GZ <- cwlProcess(baseCommand = "gzip",
               arguments = list("-d", "-c"),
               requirements = list(), ## assign list(req1) if node installed.
               inputs = InputParamList(z1),
               outputs = OutputParamList(o2),
               stdout = pfile)
GZ$zfile <- zzfil
r4b <- runCWL(GZ, outdir = tempdir())
## INFO Final process status is success

3.2.2 Array Outputs

We can also capture multiple output files with glob pattern.

a <- InputParam(id = "a", type = InputArrayParam(items = "string"))
b <- OutputParam(id = "b", type = OutputArrayParam(items = "File"),
                 glob = "*.txt")
touch <- cwlProcess(baseCommand = "touch", inputs = InputParamList(a),
                    outputs = OutputParamList(b))
touch$a <- c("a.txt", "b.log", "c.txt")
r5 <- runCWL(touch, outdir = tempdir())
## INFO Final process status is success
r5$output
## [1] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/a.txt"
## [2] "/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/RtmptDExzb/c.txt"

The ‘touch’ command generates three files, but the output only collects two files with ‘.txt’ suffix as defined in the OutputParam using the ‘glob’ option.

3.2.3 Standard output

Usually, the stdout option is a string or an expression of output file name from the command line tool. The command’s standard output stream will be captured into a file written to the designated output directory. When the stdout field is defined, an output parameter with the type of “stdout” should be also assigned with no “outputBinding” set.

An example for command tool “cat” is defined with stdout field in the output, with the name passed from the input parameter “p2”:

## define Cat
p1 <- InputParam(id = "infiles", type = "File[]")
p2 <- InputParam(id = "outfile", type = "string",
                 default = "catout.txt", position = -1)
Cat <- cwlProcess(baseCommand = "cat",
                  inputs = InputParamList(p1, p2),
                  stdout = "$(inputs.outfile)")
## assign values to inputs
afile <- file.path(tempdir(), "a.txt")
bfile <- file.path(tempdir(), "b.txt")
write("a", afile)
write("b", bfile)
Cat$infiles <- list(afile, bfile)
## run the tool
r6 <- runCWL(Cat, outdir = tempdir())
## INFO Final process status is success
r6$command
## [1] "\033[1;30mINFO\033[0m [job Cat.cwl] /private/tmp/docker_tmpt0o68s_7$ cat \\"                                                                                        
## [2] "    /private/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/tmpzenr6hg0/stg3e2af0d4-9366-4269-8873-52956719a84f/a.txt \\"                                          
## [3] "    /private/var/folders/7t/9l4kkf_j2sqbpn321y9g5558z96ck_/T/tmpzenr6hg0/stg14e37ece-450b-4796-b431-1fb34a2eed1b/b.txt > /private/tmp/docker_tmpt0o68s_7/catout.txt"

In this example, we used the parameter “p2” to pass the name to the standard output.

In the InputParam of “p2”, the position is assigned to a negative value (-1), which means the parameters will not be used in the command and only uses for passing variable. To write the “Cat” tool to a CWL file, the “inputBinding” field will be skipped for this parameter.