Bioinformatics tools and pipelines using R and CWL
2021-02-27
Preface
This book introduces the R/Bioconductor packages, Rcwl and RcwlPipelines, to improve the way of building, managing and running Bioinformatics tools and pipelines within R.
The Rcwl
package is built on top of the Common Workflow Language
(CWL), and provides a simple and user-friendly way to wrap command
line tools into data analysis pipelines in R. The RcwlPipelines
package manages a collection of bioinformatics tools and pipelines
based on Rcwl
.
0.1 R package installation
The Rcwl
and RcwlPipelines
packages can be installed from
Bioconductor or Github:
BiocManager::install(c("Rcwl", "RcwlPipelines"))
# or the development version
BiocManager::install(c("rworkflow/Rcwl", "rworkflow/RcwlPipelines"))
To load the packages into R session:
0.2 System requirements
In addition to the R packages, the following tools are also required
to successfully run the tools/pipelines. If not locally available,
these tools will be installed automatically, powered by the basilisk
package.
- python (>= 2.7)
- cwltool (>= 1.0.2018)
- nodejs
The cwltool is the reference implementation of the Common Workflow Language, which is used to run the CWL scripts. The nodejs is required when the CWL scripts use JavaScript. More details about these tools can be found here: * https://github.com/common-workflow-language/cwltool * https://nodejs.org
0.3 Docker
The Docker container simplifies software installation and management, especially for bioinformatics tools/pipelines requiring different runtime environments and library dependencies. A CWL runner can perform this work automatically by pulling the Docker containers and mounting the paths of input files.
The Docker requirement is optional, as CWL scripts can also be run locally with all the dependencies pre-installed.
0.4 Structure of the book
- Introduction
- Get started
- Wrap command line tools
- Writing Pipeline
- Tool/pipeline execution
- RcwlPipelines
- DNAseq alignment
- DNAseq variant calling
- Bulk RNAseq
- Single cell RNAseq
- miRNA
0.5 R session information
The R session information for compiling this mannual is shown below:
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS/LAPACK: /Users/qi31566/miniconda3/envs/r-base/lib/libopenblasp-r0.3.12.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] bookdown_0.21 DropletUtils_1.10.3
## [3] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
## [5] Biobase_2.50.0 GenomicRanges_1.42.0
## [7] GenomeInfoDb_1.26.2 IRanges_2.24.1
## [9] MatrixGenerics_1.2.1 matrixStats_0.58.0
## [11] BiocStyle_2.18.1 BiocParallel_1.24.1
## [13] RcwlPipelines_1.7.7 BiocFileCache_1.14.0
## [15] dbplyr_2.1.0 Rcwl_1.7.12
## [17] S4Vectors_0.28.1 BiocGenerics_0.36.0
## [19] yaml_2.2.1
##
## loaded via a namespace (and not attached):
## [1] ellipsis_0.3.1 rprojroot_2.0.2
## [3] scuttle_1.0.4 XVector_0.30.0
## [5] fs_1.5.0 rstudioapi_0.13
## [7] remotes_2.2.0 bit64_4.0.5
## [9] fansi_0.4.2 sparseMatrixStats_1.2.1
## [11] codetools_0.2-18 R.methodsS3_1.8.1
## [13] cachem_1.0.4 knitr_1.31
## [15] pkgload_1.2.0 jsonlite_1.7.2
## [17] R.oo_1.24.0 HDF5Array_1.18.1
## [19] shiny_1.6.0 DiagrammeR_1.0.6.1
## [21] BiocManager_1.30.10 compiler_4.0.3
## [23] httr_1.4.2 dqrng_0.2.1
## [25] basilisk_1.2.1 backports_1.2.1
## [27] assertthat_0.2.1 Matrix_1.3-2
## [29] fastmap_1.1.0 limma_3.46.0
## [31] cli_2.3.1 later_1.1.0.1
## [33] visNetwork_2.0.9 htmltools_0.5.1.1
## [35] prettyunits_1.1.1 tools_4.0.3
## [37] igraph_1.2.6 glue_1.4.2
## [39] GenomeInfoDbData_1.2.4 dplyr_1.0.4
## [41] batchtools_0.9.15 rappdirs_0.3.3
## [43] tinytex_0.29 Rcpp_1.0.6
## [45] jquerylib_0.1.3 rhdf5filters_1.2.0
## [47] vctrs_0.3.6 DelayedMatrixStats_1.12.3
## [49] xfun_0.21 stringr_1.4.0
## [51] ps_1.5.0 beachmat_2.6.4
## [53] testthat_3.0.2 mime_0.10
## [55] lifecycle_1.0.0 devtools_2.3.2
## [57] edgeR_3.32.1 zlibbioc_1.36.0
## [59] basilisk.utils_1.2.2 hms_1.0.0
## [61] promises_1.2.0.1 rhdf5_2.34.0
## [63] RColorBrewer_1.1-2 curl_4.3
## [65] memoise_2.0.0 reticulate_1.18
## [67] sass_0.3.1 stringi_1.5.3
## [69] RSQLite_2.2.3 desc_1.2.0
## [71] checkmate_2.0.0 filelock_1.0.2
## [73] pkgbuild_1.2.0 rlang_0.4.10
## [75] pkgconfig_2.0.3 bitops_1.0-6
## [77] evaluate_0.14 lattice_0.20-41
## [79] Rhdf5lib_1.12.1 purrr_0.3.4
## [81] htmlwidgets_1.5.3 bit_4.0.4
## [83] processx_3.4.5 tidyselect_1.1.0
## [85] magrittr_2.0.1 R6_2.5.0
## [87] generics_0.1.0 base64url_1.4
## [89] DelayedArray_0.16.1 DBI_1.1.1
## [91] pillar_1.5.0 withr_2.4.1
## [93] RCurl_1.98-1.2 tibble_3.0.6
## [95] crayon_1.4.1 utf8_1.1.4
## [97] rmarkdown_2.7 progress_1.2.2
## [99] usethis_2.0.1 locfit_1.5-9.4
## [101] grid_4.0.3 data.table_1.14.0
## [103] blob_1.2.1 callr_3.5.1
## [105] git2r_0.28.0 digest_0.6.27
## [107] xtable_1.8-4 tidyr_1.1.2
## [109] httpuv_1.5.5 brew_1.0-6
## [111] R.utils_2.10.1 bslib_0.2.4
## [113] sessioninfo_1.1.1