Genome indexes for Mus musculus (mm39)

BUILDING HISAT2 INDEXES IN CSC Here is the case for house mouse genome (mm39). The genome indexing step requires big memory and it might not be possible to carry out it on a laptop. Genome indexes for Mus musculus (mm39) were created using HISAT2 v2.2.1 on CSC (IT Center for Science), thanks to CSC-Puhti. 1. Create conda environment folder file to install the required packages, install and add the bin directory to the path. mkdir STRTN-env conda-containerize new --prefix STRTN-env STRTN-env.yml export PATH="<install_dir>/STRTN-env/bin:$PATH" 2. Load the required module. module load tykky export PATH="<install_dir>/STRTN-env/bin:$PATH" module load r-env if test -f ~/.Renviron; then sed -i '/TMPDIR/d' ~/.Renviron fi echo "TMPDIR=${WorkingDir_PATH}" >> ~/.Renviron 3. Obtain the genome sequences of reference and ERCC spike-ins. You may add the ribosomal DNA repetitive unit for human (U13369) and mouse (BK000964). wget https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/mm39.fa.gz unpigz -c mm39.fa.gz | ruby -ne '$ok = $_ !~ /^>chrUn_/ if $_ =~ /^>/; puts $_ if $ok' > mouse_reference.fasta wget https://tsapps.nist.gov/srmext/certificates/documents/SRM2374_putative_T7_products_NoPolyA_v2.FASTA cat SRM2374_putative_T7_products_NoPolyA_v2.FASTA >> mouse_reference.fasta 4. Extract splice sites and exons from a GTF file. Here we used wgEncodeGencodeBasicVM30 as the annotation file. You may additionally perform `hisat2_extract_snps_haplotypes_UCSC.py` to extract SNPs and haplotypes from a dbSNP file for human and mouse. wget https://hgdownload.soe.ucsc.edu/goldenPath/mm39/database/wgEncodeGencodeBasicVM30.txt.gz unpigz -c wgEncodeGencodeBasicVM30.txt.gz | hisat2_extract_splice_sites.py - | grep -v ^chrUn > splice_sites.txt unpigz -c wgEncodeGencodeBasicVM30.txt.gz | hisat2_extract_exons.py - | grep -v ^chrUn > exons.txt 5. Build the HISAT2 index. This outputs a set of files with suffixes. Here, `mouse_reference.1.ht2`, `mouse_reference.2.ht2`, ..., `mouse_reference.8.ht2` are generated.<br>In this case, `mouse_reference` is the basename used for `-i, --index`. hisat2-build mouse_reference.fasta --ss splice_sites.txt --exon exons.txt mouse_index/mouse_reference 6. Create the sequence dictionary for the reference and Spike-in sequences. This is required for the Picard MergeBamAlignment program. Note that the original FASTA file (`mouse_reference.fasta` here) is also required. picard CreateSequenceDictionary R=mouse_reference.fasta O=mouse_reference.dict 7. Put the genome indexes, genome fasta file, sequence dictionary to same folder. mv mouse_reference.dict mouse_reference mv mouse_reference.fasta mouse_reference

{"references": ["Kim D., Paggi J.M., Park C., Bennett C., and Salzberg S.L. (2019). Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. Aug;37(8):907-915."]}

Related Organizations

University of Helsinki
Finland

Keywords

mm39, genome indexes, mouse

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	19
download	downloads	3

19
views
3
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

19

3