Required reference filesΒΆ
The reference files that are required by Halvade consists of the human genome reference FASTA file, some additional files created for BWA alignment and the dbSNP database. The index file used in the BWA aligner is created with the BWA tool itself. The FASTA index and dictionary files used by the GATK can be created by samtools and GATK or are often available for download together. All additional files for the FASTA reference (except the dbSNP database) need to have the same prefix, this should be done correctly by the tools. Here we will download the files to /halvade/ref. BWA is used to create the BWA index, this page shows how to get this binary.
1 2 | halvaderef=/halvade/ref cd $halvaderef |
Halvade uses the genome reference FASTA file and the corresponding dbSNP file from the GATK reference bundle project. Download and process as follows, assuming you are now in the $halvaderef directory:
1 2 3 4 5 6 7 8 | https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.tbi https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.fai https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dict # create the BWA indexes with BWA $halvadebin/bwa index Homo_sapiens_assembly38.fasta |
Now you should have these reference files in your halvade reference folder:
Homo_sapiens_assembly38.dictHomo_sapiens_assembly38.fastaHomo_sapiens_assembly38.fasta.annHomo_sapiens_assembly38.fasta.ambHomo_sapiens_assembly38.fasta.bwtHomo_sapiens_assembly38.fasta.faiHomo_sapiens_assembly38.fasta.pacHomo_sapiens_assembly38.fasta.saHomo_sapiens_assembly38.dbsnp138.vcfHomo_sapiens_assembly38.dbsnp138.vcf.tbi
If a different reference is used, make sure that the prefixes of the .dict file and all .fasta.* files have the exact same prefix. The vcf file and the tbi index can have a different prefix. All files as listed above need to be available. The Halvade script will detect a fasta|fa file and a vcf|vcf.gz file in the provided folder, so only a single fasta|Fa with corresponding additional files and a single vcf|vcf.gz file with corresponding index should be available in this folder.