Halvade synopsis

Germline pipeline

The class used to run this tool is be.ugent.intec.halvade.job.GermlinePipeline.

Required options

--germline STR Input. This gives the absolute path of the input. The input can either be an aligned BAM file or a folder with preprocessed data.
-o, --output STR
 Output. The output VCF file, this file will automatically be gzipped.
-s, --knownSites STR
 Known sites VCF. This gives the absolute path to the VCF file containing known sites, i.e. dbSNP.
-m, --memory DBL
 Memory. This is the available memory for tools that are run in an executor, in GigaBytes.
-r, --reference STR
 Reference. This gives the absolute path of the reference FASTA file.
-b, --bindir STR
 Bindaries folder. This provides the absolute path to the folder containing the necessary binaries: samtools, bwa, GATK.

Optional options

--keep_files Keep intermediate files. This will not remove the intermediate files. Use this for debugging purposes.
--just_align Only align. Runs only the alignment step in the alignment and writes the Spark RDD containing SAM files per partitions to the output folder. The folder contains a SAM file per partition that is sorted and grouped per genomic region.
--filter_non_primary_chr
 Filter non-primary chr. Filters chromosomes to only keep the primary: (chr)1-22, X, Y and M(T).
--prepped_sam Use input from Halvade. This indicates that the input folder contains SAM files that were aligned with Halvade using the --just_align argument in a previous run.
--no_gvcf No GVCF. Get the normal VCF file instead of the GVCF file.
--samplename STR
 Sample name. This gives the sample name for the tool, will be used if the input are FASTQ. If the input is an aligned BAM, the information is extracted from the file by default.
--bwa_reproducable INT
 Fixed chunk size in BWA. This option will let BWA use a fixed chunk size independent of number of threads. This leads to reproducable results and might use less memory.
--ref_dict Reference DICT. Provide a reference dictionary other than the default detected .dict file.
--java_serializer
 Java serializer. Use the Java serializer instead of the default Kryo serializer, might improve performance on certain systems.
--tmp STR Temp directory. This is the directory where all intermediate and temporary files will be stored, is /tmp/ by default.
--get_regions Only get the regions. This will run Halvade up to the sorting and splitting of genomic regions and give a list of genomic regions as output. Halvade is halted after this.
--use_elprep Use Elprep. With this option Halvade will use elprep for the mark duplicates and/or BQSR step. Use this when a lot of memory is available per executor to increase performance. To let Halvade do the BQSR step with elprep aswell the dbsnp file must be the dbsnp file processed by elprep and must have the elsites extension.
--log STR Log level. Gives the log level of Halvade, possible values are ERROR, WARN, INFO and DEBUG, is ERROR by default.
--persist STR Persist level. Sets the level to persist Spark RDD’s to. Possible options are mem, mem_ser, mem_disk, mem_disk_ser, disk, where the default is disk.
--partitions INT
 Partitions. This sets the number of partitions to use during the pipeline.
--overwrite Overwrite. This will allow the tool to automatically overwrite the output directory/files if it exists already.
--help Help. Displays the list of arguments for this tool.

Somatic pipeline

The class used to run this tool is be.ugent.intec.halvade.job.SomaticPipeline.

Required options

--normal STR Normal input. This gives the absolute path of the normal input. The input can either be an aligned BAM file or a folder with preprocessed data.
--tumor STR Tumor input. This gives the absolute path of the tumor input. The input can either be an aligned BAM file or a folder with preprocessed data.
-o, --output STR
 Output. The output VCF file, this file will automatically be gzipped.
-s, --knownSites STR
 Known sites VCF. This gives the absolute path to the VCF file containing known sites, i.e. dbSNP.
-m, --memory DBL
 Memory. This is the available memory for tools that are run in an executor, in GigaBytes.
-r, --reference STR
 Reference. This gives the absolute path of the reference FASTA file.
-b, --bindir STR
 Bindaries folder. This provides the absolute path to the folder containing the necessary binaries: samtools, bwa, GATK, and optionally Strelka2.

Optional options

--exome Exome pipeline. Run the pipeline on an WXS/Exome sample.
--filter_non_primary_chr
 Filter non-primary chr. Filters chromosomes to only keep the primary: (chr)1-22, X, Y and M(T).
--tumorsm STR Tumor Sample name. This gives the tumor sample name for the tool, will be used if the input are FASTQ. If the input is an aligned BAM, the information is extracted from the file by default.
--normalsm STR Normal Sample name. This gives the tumor sample name for the tool, will be used if the input are FASTQ. If the input is an aligned BAM, the information is extracted from the file by default.
--tmp STR Temp directory. This is the directory where all intermediate and temporary files will be stored, is /tmp/ by default.
--java_serializer
 Java serializer. Use the Java serializer instead of the default Kryo serializer, might improve performance on certain systems.
--log STR Log level. Gives the log level of Halvade, possible values are ERROR, WARN, INFO and DEBUG, is ERROR by default.
--partitions INT
 Partitions. This sets the number of partitions to use during the pipeline.
--overwrite Overwrite. This will allow the tool to automatically overwrite the output directory/files if it exists already.
--variant_caller
 Variant Caller. Sets the variant caller to use, valid options are mutect2 [default], strelka2 and both.
--help Help. Displays the list of arguments for this tool.

Preprocess

The class used to run this tool is be.ugent.intec.halvade.job.Preprocess.

Required options

--manifest STR Manifest file. This manifest file contains the absolute paths of the input files. Per line is either a location to a BAM file or a tab-separated pair of FASTQ (possible gzipped) files with optional readgroup name. If no readgroup name is provided, a random readgroup id will be assigned per FASTQ pair.
-o, --output STR
 Output. The output directory, a subfolder per readgroup will be created with the interleaved paired-end reads in small chunks.

Optional options

--overwrite Overwrite. This will allow the tool to automatically overwrite the output directory/files if it exists already.
--help Help. Displays the list of arguments for this tool.

Merge Vcf

The class used to run this tool is be.ugent.intec.halvade.job.MergeVcf.

Required options

-i, --input STR
 Input. The input is either a directory containing only VCF files that need to be merged, i.e. the output of a Spark job or a comma separated list of VCF files.
-o, --output STR
 Output. The output VCF file, this file will automatically be gzipped.
-r, --reference STR
 Reference. This gives the absolute path of the reference FASTA file.

Optional options

-h, --header STR
 Header. This is the absolute path to a VCF file, of which the header will be taken for the merged VCF file.
--overwrite Overwrite. This will allow the tool to automatically overwrite the output directory/files if it exists already.
--help Help. Displays the list of arguments for this tool.