# reference_genome

Download (if not previously exist as a local file), rename (as *.fa), and index the reference genome with samtools and bwa
Recipe source code: https://github.com/rworkflow/ReUseDataRecipe/blob/master/reference_genome.R (opens new window)
Data source: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/ (opens new window); http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/ (opens new window); http://ftp.ensembl.org/pub/release-104/fasta/mus_musculus/dna/ (opens new window)

# Inputs

label type description
fasta reference genome string;File Can be a file path (if locally available) or a url as indicated in 'Data source'

# Outputs

label type description
fa indexed reference genome File *.fa, *.fai files, and some secondary files

# Example:

## Get data from evaluating recipe
recipeLoad(reference_genome, return=TRUE)
reference_genome$fasta = 'http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.MT.fa.gz'
getData(reference_genome, outdir = 'data/folder', notes = c('homo sapiens', 'grch38', 'ensembl'), conda = TRUE, docker = FALSE)

## Get data from Google bucket directly
dataUpdate('data/folder', cloud=TRUE)
dh <- dataSearch(c('homo sapiens', 'grch38', '1000 genomes'))
getCloudData(dh, outdir = 'data/folder')