Chapter 1 Introduction

The bioinformatics community increasingly relies on ‘workflow’ frameworks to manage large and complex biomedical data (Köster and Rahmann, 2012; Di Tommaso et al., 2017). One solution facilitating portable, reproducible, and scalable workflows across a variety of software and hardware environments is the Common Workflow Language (CWL) (Amstutz et al., 2016).

“The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.”

The CWL has been widely adopted by large biomedical projects such as The Cancer Genome Atlas (TCGA) (Weinstein et al., 2013) and Galaxy (Afgan et al., 2018). However, as a domain-specific language, the implementation of CWL requires a level of expertise that is often beyond the capabilities of wet-lab researchers and even skilled data scientists. In addition, the impact of CWL pipelines is weakened by poor integration with downstream statistical analysis tools such as R and Bioconductor (Huber et al., 2015; Amezquita et al., 2020).

In this book, we introduce a Bioconductor toolchain for use and development of reproducible bioinformatics pipelines in CWL using Rcwl and RcwlPipelines. Rcwl provides a familiar R interface to, and expands the scope of, CWL.

Rcwl enables best practices and standardized data flow between different tools, and promotes modularization for easy sharing of established pipelines or critical steps. RcwlPipelines is a collection of commonly used bioinformatics tools and pipeline recipes based on Rcwl.RcwlPipelines develops a community-driven platform for open source, open development, and open review of best-practice CWL bioinformatics pipelines.

Rcwl and RcwlPipelines reduces the learning curve required to apply findable, accessible, interoperable, and reusable (FAIR) principles to the analysis of multi-omics biological experiments, and to promote community-wide sharing of cloud-ready bioinformatics workflows.