Halvade synopsis¶
Germline pipeline¶
The class used to run this tool is be.ugent.intec.halvade.job.GermlinePipeline.
Required options¶
| --germline STR | Input. This gives the absolute path of the input. The input can either be an aligned BAM file or a folder with preprocessed data. |
| -o, --output STR | |
| Output. The output VCF file, this file will automatically be gzipped. | |
| -s, --knownSites STR | |
| Known sites VCF. This gives the absolute path to the VCF file containing known sites, i.e. dbSNP. | |
| -m, --memory DBL | |
| Memory. This is the available memory for tools that are run in an executor, in GigaBytes. | |
| -r, --reference STR | |
| Reference. This gives the absolute path of the reference FASTA file. | |
| -b, --bindir STR | |
Bindaries folder. This provides the absolute path to the folder containing the necessary binaries: samtools, bwa, GATK. | |
Optional options¶
| --keep_files | Keep intermediate files. This will not remove the intermediate files. Use this for debugging purposes. |
| --just_align | Only align. Runs only the alignment step in the alignment and writes the Spark RDD containing SAM files per partitions to the output folder. The folder contains a SAM file per partition that is sorted and grouped per genomic region. |
| --filter_non_primary_chr | |
| Filter non-primary chr. Filters chromosomes to only keep the primary: (chr)1-22, X, Y and M(T). | |
| --prepped_sam | Use input from Halvade. This indicates that the input folder contains SAM files that were aligned with Halvade using the --just_align argument in a previous run. |
| --no_gvcf | No GVCF. Get the normal VCF file instead of the GVCF file. |
| --samplename STR | |
| Sample name. This gives the sample name for the tool, will be used if the input are FASTQ. If the input is an aligned BAM, the information is extracted from the file by default. | |
| --bwa_reproducable INT | |
| Fixed chunk size in BWA. This option will let BWA use a fixed chunk size independent of number of threads. This leads to reproducable results and might use less memory. | |
| --ref_dict | Reference DICT. Provide a reference dictionary other than the default detected .dict file. |
| --java_serializer | |
| Java serializer. Use the Java serializer instead of the default Kryo serializer, might improve performance on certain systems. | |
| --tmp STR | Temp directory. This is the directory where all intermediate and temporary files will be stored, is /tmp/ by default. |
| --get_regions | Only get the regions. This will run Halvade up to the sorting and splitting of genomic regions and give a list of genomic regions as output. Halvade is halted after this. |
| --use_elprep | Use Elprep. With this option Halvade will use elprep for the mark duplicates and/or BQSR step. Use this when a lot of memory is available per executor to increase performance. To let Halvade do the BQSR step with elprep aswell the dbsnp file must be the dbsnp file processed by elprep and must have the elsites extension. |
| --log STR | Log level. Gives the log level of Halvade, possible values are ERROR, WARN, INFO and DEBUG, is ERROR by default. |
| --persist STR | Persist level. Sets the level to persist Spark RDD’s to. Possible options are mem, mem_ser, mem_disk, mem_disk_ser, disk, where the default is disk. |
| --partitions INT | |
| Partitions. This sets the number of partitions to use during the pipeline. | |
| --overwrite | Overwrite. This will allow the tool to automatically overwrite the output directory/files if it exists already. |
| --help | Help. Displays the list of arguments for this tool. |
Somatic pipeline¶
The class used to run this tool is be.ugent.intec.halvade.job.SomaticPipeline.
Required options¶
| --normal STR | Normal input. This gives the absolute path of the normal input. The input can either be an aligned BAM file or a folder with preprocessed data. |
| --tumor STR | Tumor input. This gives the absolute path of the tumor input. The input can either be an aligned BAM file or a folder with preprocessed data. |
| -o, --output STR | |
| Output. The output VCF file, this file will automatically be gzipped. | |
| -s, --knownSites STR | |
| Known sites VCF. This gives the absolute path to the VCF file containing known sites, i.e. dbSNP. | |
| -m, --memory DBL | |
| Memory. This is the available memory for tools that are run in an executor, in GigaBytes. | |
| -r, --reference STR | |
| Reference. This gives the absolute path of the reference FASTA file. | |
| -b, --bindir STR | |
Bindaries folder. This provides the absolute path to the folder containing the necessary binaries: samtools, bwa, GATK, and optionally Strelka2. | |
Optional options¶
| --exome | Exome pipeline. Run the pipeline on an WXS/Exome sample. |
| --filter_non_primary_chr | |
| Filter non-primary chr. Filters chromosomes to only keep the primary: (chr)1-22, X, Y and M(T). | |
| --tumorsm STR | Tumor Sample name. This gives the tumor sample name for the tool, will be used if the input are FASTQ. If the input is an aligned BAM, the information is extracted from the file by default. |
| --normalsm STR | Normal Sample name. This gives the tumor sample name for the tool, will be used if the input are FASTQ. If the input is an aligned BAM, the information is extracted from the file by default. |
| --tmp STR | Temp directory. This is the directory where all intermediate and temporary files will be stored, is /tmp/ by default. |
| --java_serializer | |
| Java serializer. Use the Java serializer instead of the default Kryo serializer, might improve performance on certain systems. | |
| --log STR | Log level. Gives the log level of Halvade, possible values are ERROR, WARN, INFO and DEBUG, is ERROR by default. |
| --partitions INT | |
| Partitions. This sets the number of partitions to use during the pipeline. | |
| --overwrite | Overwrite. This will allow the tool to automatically overwrite the output directory/files if it exists already. |
| --variant_caller | |
Variant Caller. Sets the variant caller to use, valid options are mutect2 [default], strelka2 and both. | |
| --help | Help. Displays the list of arguments for this tool. |
Preprocess¶
The class used to run this tool is be.ugent.intec.halvade.job.Preprocess.
Required options¶
| --manifest STR | Manifest file. This manifest file contains the absolute paths of the input files. Per line is either a location to a BAM file or a tab-separated pair of FASTQ (possible gzipped) files with optional readgroup name. If no readgroup name is provided, a random readgroup id will be assigned per FASTQ pair. |
| -o, --output STR | |
| Output. The output directory, a subfolder per readgroup will be created with the interleaved paired-end reads in small chunks. | |
Optional options¶
| --overwrite | Overwrite. This will allow the tool to automatically overwrite the output directory/files if it exists already. |
| --help | Help. Displays the list of arguments for this tool. |
Merge Vcf¶
The class used to run this tool is be.ugent.intec.halvade.job.MergeVcf.
Required options¶
| -i, --input STR | |
| Input. The input is either a directory containing only VCF files that need to be merged, i.e. the output of a Spark job or a comma separated list of VCF files. | |
| -o, --output STR | |
| Output. The output VCF file, this file will automatically be gzipped. | |
| -r, --reference STR | |
| Reference. This gives the absolute path of the reference FASTA file. | |
Optional options¶
| -h, --header STR | |
| Header. This is the absolute path to a VCF file, of which the header will be taken for the merged VCF file. | |
| --overwrite | Overwrite. This will allow the tool to automatically overwrite the output directory/files if it exists already. |
| --help | Help. Displays the list of arguments for this tool. |