Chapter 1 Introduction
The bioinformatics community increasingly relies on ‘workflow’ frameworks to manage large and complex biomedical data (Köster and Rahmann, 2012; Di Tommaso et al., 2017). One solution facilitating portable, reproducible, and scalable workflows across a variety of software and hardware environments is the Common Workflow Language (CWL) (Amstutz et al., 2016).
“The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.”
The CWL has been widely adopted by large biomedical projects such as The Cancer Genome Atlas (TCGA) (Weinstein et al., 2013) and Galaxy (Afgan et al., 2018). However, as a domain-specific language, the implementation of CWL requires a level of expertise that is often beyond the capabilities of wet-lab researchers and even skilled data scientists. In addition, the impact of CWL pipelines is weakened by poor integration with downstream statistical analysis tools such as R and Bioconductor (Huber et al., 2015; Amezquita et al., 2020).
In this book, we introduce a Bioconductor toolchain for use and
development of reproducible bioinformatics pipelines in CWL using
Rcwl
and RcwlPipelines
. Rcwl
provides a familiar R interface
to, and expands the scope of, CWL.
Rcwl
enables best practices and standardized data flow between
different tools, and promotes modularization for easy sharing of
established pipelines or critical steps. RcwlPipelines
is a
collection of commonly used bioinformatics tools and pipeline recipes
based on Rcwl
.RcwlPipelines
develops a community-driven platform
for open source, open development, and open review of best-practice
CWL bioinformatics pipelines.
Rcwl
and RcwlPipelines
reduces the learning curve required to
apply findable, accessible, interoperable, and reusable (FAIR)
principles to the analysis of multi-omics biological experiments, and
to promote community-wide sharing of cloud-ready bioinformatics
workflows.