Halvade on Docker¶

Setup¶

Halvade on Docker is intended to work on a single node, for multi node usage use a local yarn cluster. The Docker image already includes the required binaries. But a reference folder needs to be provided, please look here for an overview of the required reference files. The Halvade Docker run script can be downloaded here:

wget https://bitbucket.org/dries_decap/halvade-docker/downloads/halvade-docker.sh

This script will automatically download the Docker image if it is not available on your machine yet.

Run¶

To run Halvade with the docker image the reference folder and the inputs need to be provided as arguments.

# SOMATIC
# FASTQ input, a folder with paired FASTQ files per read group
./halvade-docker.sh somatic /halvade/ref/ /halvade/input/tumor/ /halvade/input/normal/
# BAM input, already aligned reads with read groups added
./halvade-docker.sh somatic /halvade/ref/ /halvade/input/tumor.bam /halvade/input/normal.bam

# GERMLINE
# FASTQ input, a folder with paired FASTQ files per read group
./halvade-docker.sh germline /halvade/ref/ /halvade/input/germline
# BAM input, already aligned reads with read groups added
./halvade-docker.sh germline /halvade/ref/ /halvade/input/germline.bam

Running this will create a docker working directory in the current directory that will contain the output files. The script will show where to find this after the job has finished.

Input¶

There are several valid inputs that Halvade accepts:

a directory with paired fastq|fq(.gz)? files per read group or unaligned BAM files per read group. The files must have _1.fastq(.gz)? or _1.fq(.gz)? suffixes for the first file and _2.fastq(.gz)? or _2.fq(.gz)? for the second.
a directory which has already been preprocessed, containing a folder per read group and fastq|fq(.gz)? files in thos folders
a single aligned BAM file with containing all read groups of a sample with read group information

Other Options¶

The script supports these options:

--exome: run the exome pipeline
--tmpdir <string>: set the folder for tmp files
--germlineSM <string>: germline samplename
--tumorSM <string>: tumor samplename
--normalSM <string>: normal samplename
--partitions <int>: override default number of partitions

To override automatically detected resources:

--memory <int>: sets the available memory in GB
--cpus <int>: sets the number of available CPUs
--executor_memory <int>: sets the memory in MB per executor
--executor_cpus <int>: sets the number of CPUs per executor

Additional Halvade options can be set with the variable HALVADE_OPTS:

HALVADE_OPTS="--variant_caller both"

An overview of additional Halvade options can be found here