Welcome to Halvade’s documentation!¶

Halvade implements germline and somatic variant calling pipelines based on the best-practices pipelines from the Broad Institute using the Apache Spark framework. Halvade will produce a VCF output file which contains the single nucleotide polymorphisms (SNPs). Short insertions and deletions (indels) are also included when supported by the used tools. This program requires Hadoop Yarn and Apache Spark to run, since both are typically installed on Linux clusters this documentation only provides information for a Linux setup. GATK 4.1.2.0 is used in this guide.

For new users with access to a local Spark cluster, we advise you to start here. If you do not have access to an existing local Spark cluster, you can either run Halvade on Docker or in the cloud with Amazon EMR or Google Cloud

Note

Halvade is available under the GNU license and uses open source tools which need to be added to every node in the cluster in a specified directory.