Software – BAC: HPC and bioinformatics at SZN

The HPC cluster provides a collection of software, mainly in the bioinformatics field, and a generally computation oriented collection of libraries.

Falkor HPC cluster software list

Name	Category	Homepage	Description	Version	Prefix Path	Modulefile	Notes	Programming Language
Abyss	assembler	http://www.bcgsc.ca/platform/bioinfo/software/abyss	ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.	2.0.2 2.1.4	/opt/abyss /opt/abyss-2.1.4	module load abyss/2.0-openmpi module load abyss/2.1.4-openmpi
AdMixture		https://dalexander.github.io/admixture/	ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.	1.3.0		module load admixture/1.3.0
AlignGraph	assembler	https://github.com/baoe/AlignGraph	Algorithm for secondary de novo genome assembly guided by closely related references		/opt/aligngraph	module load aligngraph/latest	this environment enables path to nucmer/pblat aligners too
ANGSD		http://www.popgen.dk/angsd/index.php/ANGSD	ANGSD is a software for analyzing next generation sequencing data. The software can handle a number of different input types from mapped reads to imputed genotype probabilities. Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. This is especially useful for low and medium depth data. The software is written in C++ and has been used on large sample sizes.	0.933-18	/opt/angsd	module load angsd/latest
AntiSmash		https://antismash.secondarymetabolites.org/#!/download		4.2.0 5		module load antismash/4.2.0 module load antismash/5
Augustus	sequence analysis	https://bioinf.uni-greifswald.de/augustus/	AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences. It can be run on this web server, on a new web server for larger input files or be downloaded and run locally. It is open source so you can compile it for your computing platform.	3.3.2		module load augustus/3.3.2
bam-readcount		https://github.com/genome/bam-readcount		0.8.0		module load bam-readcount/0.8.0
bamtools	formats toolkit	https://github.com/pezmaster31/bamtools	BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.	2.5.1	/opt/bamtools/bin	module load bamtools/2.5.1
bayescan		http://cmpg.unibe.ch/software/BayeScan/		2.1		module load bayescan/2.1
bbmap/bbtools	tool suite	https://jgi.doe.gov/data-and-tools/bbtools/	BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving.	38.08	/opt/bbmap	module load bbmap/38.08		Java
bcftools	data toolkit	https://samtools.github.io/bcftools/bcftools.html	BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.	1.7.1 1.9	/opt/bcftools/bin /opt/bcftools-1.9	module load bcftools/1.7.1 module load bcftools/1.9
bcl2fastq	sequence toolkit	https://emea.support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html		2.20.0		module load bcl2fastq/2.20.0
beast		https://beast.community	BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology.	2.6.2		module load beast/2.6.2
bedtools	analisys toolkit	http://bedtools.readthedocs.io/en/latest/	Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.	2.27.1	/opt/bedtools-2.2.27	module load bedtools/2.27.1
BLAST	sequence aligner	https://blast.ncbi.nlm.nih.gov/Blast.cgi		2.7.1 2.10.1		module load blast/2.7.1 module load blast/2.10.1
bloomtree	sequence alignment	http://www.cs.cmu.edu/~ckingsf/software/bloomtree/		0.3.5	/op/bin/
bowtie1	sequence alignment	http://bowtie-bio.sourceforge.net/index.shtml	Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).	1.2.2	/opt/bowtie-1.2.2/	module load bowtie/1.2.2 loads correct PATH
bowtie2	sequence alignment	http://bowtie-bio.sourceforge.net/bowtie2/index.shtml	Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.	2.2.3 2.3.4.1	/opt/bowtie2-2.3.4.1/bin	module load bowtie/2.2.3 module load bowtie/2.3.4.1 loads correct PATH
burst	short reads aligner	https://github.com/knights-lab/BURST		0.99		module load burst/0.99
BUSCO		https://busco.ezlab.org		3		module load busco/3
bwa	sequence alignment	http://bio-bwa.sourceforge.net	BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.	0.7.10 0.7.15	/opt/bwa-0.7.17	module load bwa/0.7.17
canu	assembler	https://github.com/marbl/canu	Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).	1.7.1		module load canu/1.7.1
CDHIT	sequence analysis	http://weizhongli-lab.org/cd-hit/	CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.	cdhit/4.6.8	/opt/cdhit	module load cdhit/4.6.8
checkV		https://bitbucket.org/berkeleylab/checkv/src/master/	CheckV is a fully automated command-line pipeline for assessing the quality of single-contig viral genomes, including identification of host contamination for integrated proviruses, estimating completeness for genome fragments, and identification of closed genomes.	0.6.0		module load checkV/0.6.0
cutadapt	sequence toolkit	https://cutadapt.readthedocs.io/en/stable/		3.4		module load cutadapt/3.4
deeptools	sequence toolkit	https://github.com/deeptools/deepTools	User-friendly tools for exploring deep-sequencing data	3.1.3		module load deeptools/3.1.3
deepvirfinder	sequence prediction	https://github.com/jessieren/DeepVirFinder	DeepVirFinder predicts viral sequences using deep learning method. The method has good prediction accuracy for short viral sequences, so it can be used to predict sequences from the metagenomic data.	latest		module load deepvirfinder/latest
diamond	sequence alignment	https://github.com/bbuchfink/diamond	DIAMOND is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000.	0.9.22	/opt/diamond/	module load diamond/0.9.22
DRAM	annotation tool	https://github.com/WrightonLabCSU/DRAM	DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs.	latest		module load DRAM/latest
dsuite		https://github.com/millanek/Dsuite		0.3 0.4		module load dsuite/0.3 module load dsuite/0.4
enrichm	comparative genomics toolkit	https://github.com/geronimp/enrichM	EnrichM is a set of comparative genomics tools for large sets of metagenome assembled genomes (MAGs).	0.5.0		module load enrichm/0.5.0
exabayes	phylogenetic toolkit	https://cme.h-its.org/exelixis/web/software/exabayes/manual/manual.html#sec-2	ExaBayes is a tool for Bayesian phylogenetic analyses.	1.5 1.5-mpi		module load exabayes/1.5 module load exabayes/1.5-mpi
express	rna-seq	https://bioinformaticshome.com/tools/rna-seq/descriptions/eXpress.html	eXpress is a tool to quantify RNA-seq data, but it is also applicable to ChIP-seq, metagenomics, and large-scale sequencing data in general.	1.5.1		module load express/1.5.1
FastTree	phylogenetic toolkit	http://www.microbesonline.org/fasttree/	FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.	2.1.11		module load fasttree/2.1.11
FastX	sequence toolkit	http://hannonlab.cshl.edu/fastx_toolkit/	The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.	0.0.13		module load fastx/0.0.13
fiona	sequence toolkit	https://academic.oup.com/bioinformatics/article/30/17/i356/199558		0.2.10		module load fiona/0.2.10
flash	sequence toolkit	https://ccb.jhu.edu/software/FLASH/	FLASH (Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments.	2.2		module load flash/2.2
FastQC	raw data analysis	https://www.bioinformatics.babraham.ac.uk/projects/fastqc/	A quality control tool for high throughput sequence data.	0.11.5	os package	os package
FreeBayes	alignment tool	https://github.com/ekg/freebayes	FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.	1.2.0 git branch	/opt/freebayes/bin	module load freebayes/1.2.0	Should use --recursive when cloning from git repo Modified manually the Makefile to change the prefix path
gapfiller	assembler	https://sourceforge.net/projects/gapfiller/	GapFiller is a seed-and-extend local assembler to fill the gap within paired reads. It can be used for both DNA and RNA and it has been tested on Illumina data. GapFiller can be used whenever a sequence is to be assembled starting from reads lying on its ends, provided a loose estimate of sequence length.	2.1.1	/opt/gapfiller	module load gapfiller/2.1.1
gapseq		https://github.com/jotech/gapseq	Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks	1.1		module load gapseq/latest
GaTK		https://gatk.broadinstitute.org/hc/en-us	Variant Discovery in High-Throughput Sequencing Data	3.8 4.1.3.0		module load gatk/3.8 module load gatk/4.1.3.0
GenomeThreader	gene prediction	https://genomethreader.org	GenomeThreader is a software tool to compute gene structure predictions.	1.6.6 1.7.0		module load genomethreader/1.6.6 module load genomethreader/1.7.0
Genrich		https://github.com/jsh58/Genrich	Genrich is a peak-caller for genomic enrichment assays (e.g. ChIP-seq, ATAC-seq). It analyzes alignment files generated following the assay and produces a file detailing peaks of significant enrichment.			module load genrich/latest
gffread		https://github.com/gpertea/gffread	GFF/GTF utility providing format conversions, filtering, FASTA sequence extraction and more.	0.11.4		gffread/0.11.4
gmap	alignment and mapping tool	http://research-pub.gene.com/gmap/	Gmap is a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models.	latest 01.2018	/opt/gmap/bin	module load gmap/latest
Go	programming language		The Go programming language	1.12.1		module load go/1.12.1
grinder		https://github.com/zyxue/biogrinder	Grinder is a versatile program to create random shotgun and amplicon sequence libraries based on DNA, RNA or proteic reference sequences provided in a FASTA file.	0.5.4		module load grinder/0.5.4
gromacs	molecular dynamics	http://www.gromacs.org/	GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.	5.1	/opt/gromacs	module load gromacs/5.1	MPI support compiled
hh-suite		https://github.com/soedinglab/hh-suite	The HH-suite is an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).	3.1.0		module load hh-suite/latest
hic-pro		https://github.com/nservant/HiC-Pro	HiC-Pro was designed to process Hi-C data, from raw fastq files (paired-end Illumina data) to normalized contact maps. It supports the main Hi-C protocols, including digestion protocols as well as protocols that do not require restriction enzymes such as DNase Hi-C. I	2.11.1		module load hic-pro/2.11.1
hicup		https://www.bioinformatics.babraham.ac.uk/projects/hicup/	A tool for mapping and performing quality control on Hi-C data	0.7.2		module load hicup/0.7.2
Hisat	sequence aligment/mapping	https://ccb.jhu.edu/software/hisat2/index.shtml	HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome).	2.1.0		module load hisat/2.1.0
HMMER	sequence alignment	http://hmmer.org/	Hammer is a tool for error correction of short read datasets with non-uniform coverage, such as single-cell data. In particular, Hammer does not make any uniformity assumptions on the distribution of the reads along the genome. It is based on a combination of the Hamming graph build from the set of k-mers and a simple probabilistic model for sequencing errors.	3.1b2	/opt/hmmer-3.1.2/	module load hmmer/3.1.2
Humann			HUMAnN is a method for efficiently and accurately profiling the abundance of microbial metabolic pathways and other molecular functions from metagenomic or metatranscriptomic sequencing data.	2 3		module load humann/2 module load humann/3
hyde		https://hybridization-detection.readthedocs.io	HyDe is a software package that detects hybridization in phylogenomic data sets using phylogenetic invariants.	0.4.3		module load hyde/latest
ima2p		https://github.com/arunsethuraman/ima2p	Ma2p is a parallel implementation of IMa2, using OpenMPI-C++ - a Bayesian MCMC based method for inferring population demography under the IM (Isolation with Migration) model. Please refer to Sethuraman and Hey (2015) for details of implementation.			module load ima2p/latest
ima3		https://github.com/jodyhey/IMa3
Infernal		http://eddylab.org/infernal/	Infernal ("INFERence of RNA ALignment") is for searching DNA sequence databases for RNA structure and sequence similarities.	1.1.4		module load infernal/1.1.4-openmpislurm module load infernal/1.1.4
Interproscan	sequence analysis	https://www.ebi.ac.uk/interpro/download.html	InterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium.	5.29 5.33	/opt/interproscan-5.29-68.0/	module load interproscan/5.29 module load interproscan/5.33	needs Java environment
iq-tree		http://www.iqtree.org	A fast and effective stochastic algorithm to infer phylogenetic trees by maximum likelihood. IQ-TREE compares favorably to RAxML and PhyML in terms of likelihoods with similar computing time	1.6.11 2.1.3		module load iq-tree/1.6.11 module load iq-tree/2.1.3
Jellyfish	sequence toolkit	https://github.com/gmarcais/Jellyfish	Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence.	2.2.10		module load jellyfish/2.2.10
MACS2		https://hbctraining.github.io/Intro-to-ChIPseq/lessons/05_peak_calling_macs.html	A commonly used tool for identifying transcription factor binding sites is named Model-based Analysis of ChIP-seq (MACS). The MACS algorithm captures the influence of genome complexity to evaluate the significance of enriched ChIP regions. Although it was developed for the detection of transcription factor binding sites it is also suited for larger regions.	2.1.2		module load MACS2/2.1.2
mafft	sequence alignment	https://mafft.cbrc.jp/alignment/software/source.html	Multiple alignment program for amino acid or nucleotide sequences	7.397	/opt/mafft/bin	module load mafft/7.397
malt		https://uni-tuebingen.de/it/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/malt/	MALT performs alignment of metagenomic reads against a database of reference sequences (such as NR, GenBank or Silva) and produces a MEGAN RMA file as output.	0.4.1 0.5		module load malt/0.4.1 module load malt/0.5
matam	assembler	https://github.com/bonsai-team/matam	MATAM, a software dedicated to the fast and accurate targeted assembly of short reads sequenced from a genomic marker of interest. The method implements a stepwise process based on construction and analysis of a read overlap graph.		/opt/matam/bin	module load matam/latest	installed under dedicated conda environment
maxquant		https://www.maxquant.org	MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data.	1.6.3 1.6.5 1.6.10 1.6.17		module load maxquant/1.6.3 module load maxquant/1.6.5 module load maxquant/1.6.10 module load maxquant/1.6.17
MEGA		https://www.megasoftware.net/docs	MEGA analysis suite	10.0.5		module load mega/10.0.5
meme		https://meme-suite.org/meme/	The MEME Suite allows the biologist to discover novel motifs in collections of unaligned nucleotide or protein sequences, and to perform a wide variety of other motif-based analyses.	5.0.5 5.3.3		module load meme/5.0.5 module load meme/5.3.3
metawrap		https://github.com/bxlab/metaWRAP	MetaWRAP aims to be an easy-to-use metagenomic wrapper suite that accomplishes the core tasks of metagenomic analysis from start to finish: read quality control, assembly, visualization, taxonomic profiling, extracting draft genomes (binning), and functional annotation.	1.1.1		module load metawrap/1.1.1
mga	sequence aligner	https://bio.tools/mga	Multiple Genome Aligner computes multiple genome alignments of large, closely related DNA sequences. MGA is a software tool for efficiently aligning two or more sufficiently similar genomic sized sequences [HKO02]. It belongs to the category of anchor-based multiple alignment methods. mga uses multiMEMs (or MEMs, MUMs as special cases) to anchor the alignment.	latest		module load mga/latest
mira	assembler	http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html	MIRA is a multi-pass DNA sequence data assembler/mapper for whole genome and EST/RNASeq projects. Supports Sanger, Illumina, Ion Torrent, 454.	4.0.4	/opt/mira	module load mira/4.0.4	statically compiled binaries installed	C/C++
mitoz	assembler	https://academic.oup.com/nar/article/47/11/e63/5377471	MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization	2.3		module load mitoz/2.3
mmseqs	sequence toolkit	https://github.com/soedinglab/MMseqs2	MMseqs2: ultra fast and sensitive sequence search and clustering suite			module load mmseqs2/latest
mothur		https://mothur.org	This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.	1.4.3		module load mothur/1.4.3
MPICH2 (hydra process manager)	mpi library	https://www.mpich.org	MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.	3.2	/opt/mpi/mpich2/bin/	module load mpi/mpich2	This version can be used natively with the SLURM workload manager. Have a look at https://slurm.schedmd.com/mpi_guide.html#mpich2 (section "MPICH with MPIEXEC") For infos on the hydra process manager you can refer to https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
MPICH2 (--with-pm=none --with-pmi=slurm)	mpi library	https://www.mpich.org	MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.	3.2	/opt/mpi/mpich2-slurm/bin/	module load mpi/mpich2-slurm	This version links to slurm explicitly as a process manager and does link against libpmi. Refer to this link from the mpich FAQ for more infos: https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Note_that_the_default_build_of_MPICH_will_work_fine_in_SLURM_environments._No_extra_steps_are_needed.
MrBayes	sequence analysis	http://mrbayes.sourceforge.net	MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.	3.2.6	/opt/bin/	module load mrbayes/3.2.7 module load mrbayes/3.2.7-openmpi	A mpich2-slurm enabled version is available in /opt/bin as mb_mpich DEPRECATED: An mpi version is also available in /opt/bin as mb_mpi To execute it please use: mpirun -np /opt/bin/mb_mpi
mummer	sequence alignment	http://mummer.sourceforge.net/	Ultra-fast alignment of large-scale DNA and protein sequences	4.0	/opt/mummer	module load mummer/4.0
muscle	sequence alignment	https://www.drive5.com/muscle/	MUSCLE is one of the most widely-used methods in biology. On average, MUSCLE is cited by ten new papers every day.	3.8.31		module load muscle/3.8.31
NCBI Blast	sequence alignment	https://blast.ncbi.nlm.nih.gov/Blast.cgi	Blast is an acronym for Basic Local Alignment Search Tool. BLAST¬†finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.	2.7.1	/opt/blast-2.7.1/bin/
OpenMPI	mpi library	https://www.open-mpi.org	The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available.	2.1	/opt/mpi/openmpi	module load mpi/openmpi
Pairagon	sequence alignment		A pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels.	1.1	/opt/pairagon	module load pairagon/1.1
Qiime2	sequence analysis	https://qiime2.org/	QIIME 2‚Ñ¢ is a next-generation microbiome bioinformatics platform that is extensible, free, open source, and community developed.	2.0	installed via conda	module load qiime/2
R	programming language	https://www.r-project.org/	R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.	3.5.1	/opt/R-3.5.1	module load R/3.5.1
salmon	rna-seq	https://salmon.readthedocs.io/en/latest/salmon.html	Salmon is a tool for wicked-fast transcript quantification from RNA-seq data.	0.11.3	/opt/salmon-0.11.3	module load salmon/0.11.3
snowball	assembler	https://github.com/algbioi/snowball/wiki		1.2		module load snowball/1.2
Spades	sequence assembly toolkit	http://cab.spbu.ru/software/spades/	SPAdes ‚Äì St. Petersburg genome assembler ‚Äì is an assembly toolkit containing various assembly pipelines. The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide additional contigs that will be used as long reads.	3.12	/opt/spades-3.12/bin/	module load spades/3.12
STAR	sequence aligner	https://github.com/alexdobin/STAR	ultrafast universal RNA-seq aligner	2.6.0c	/opt/STAR/	module load STAR/2.6.0c
TensorFlow	machine learning framework	https://www.tensorflow.org/	Machine learning framework	1.8	You can load the working environment by source /opt/tensorflow/virtpy/bin/activate or by loading the vica environment module load vica/latest			Python
TopHat	sequence aligner	https://ccb.jhu.edu/software/tophat/index.shtml	TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.	2.1.1		module load tophat/2.1.1
trf		https://bioinformaticshome.com/tools/DNA-sequence-analysis/descriptions/TRF.html#gsc.tab=0	Tandem Repeats Finder is a tool to find tandem repeats in DNA sequences. The Tandem Repeats Finder algorithm uses k-tuples for matching to speed up the computation and computes consensus sequences.	4.0.9		module load trf/4.0.9
trimal	alignment tool	https://bioweb.pasteur.fr/packages/pack@trimal@1.4.1	A tool for automated alignment trimming in large-scale phylogenetic analyses	1.4		module load trimal/1.4
Trimmomatic	sequence toolkit	http://www.usadellab.org/cms/?page=trimmomatic	A flexible read trimming tool for Illumina NGS data	0.38		module load trimmomatic/0.38
Trinity	assembler	https://github.com/trinityrnaseq/trinityrnaseq/wiki	Trinity assembles transcript sequences from Illumina RNA-Seq data.	2.8.4 2.11 2.15		module load trinity/2.8.4 module load trinity/2.11 module load trinity/2.15
trinotate	annotation tool	https://github.com/Trinotate/Trinotate	Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.	3.2.0		module load trinotate/3.2.0
trnascan	sequence analysis	https://users.soe.ucsc.edu/~lowe/thesis/node20.html		1.4		modul load trnascan/1.4
umap				1.1.1		modul load umap/1.1.1
vcftools	data toolkit	https://vcftools.sourceforge.net	VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.	0.1.17		module load vcftools/latest
vcontact		https://bitbucket.org/MAVERICLab/vcontact2	vConTACT2 is a tool to perform guilt-by-contig-association classification of viral genomic sequence data. It's designed to cluster and provide taxonomic context of viral metagenomic sequencing data.	0.9.11		module load vcontact/0.9.11
velvet	assembler	https://www.ebi.ac.uk/~zerbino/velvet/	Sequence assembler for very short reads	1.2.10	/opt/velvet	module load velvet/1.2.10
vibrant	sequence annotation	https://github.com/AnantharamanLab/VIBRANT	Virus Identification By iteRative ANnoTation	1.0.1		module load vibrant/1.0.1
vica	sequence analysis	https://github.com/USDA-ARS-GBRU/vica	Software to identify highly divergent DNA and RNA viruses and phages in microbiomes		/opt/tensorflow/virtpy/bin	module load vica/latest		Python
virsorter	sequence analysis	https://github.com/simroux/VirSorter	VirSorter: mining viral signal from microbial genomic data	git master branch at nov. 2018		module load virsorter/latest
Vsearch	sequence alignment	https://github.com/torognes/vsearch	VSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH uses an optimal global aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH which by default uses a heuristic seed and extend aligner. This usually results in more accurate alignments and overall improved sensitivity (recall) with VSEARCH, especially for alignments with gaps.	2.6.2 2.21.1		module load vsearch/2.6.2 module load vsearch/2.21.1
whokaryote	sequence analysis	https://github.com/LottePronk/whokaryote	Whokaryote uses a random forest classifier that uses gene-structure based features and optionally Tiara (https://github.com/ibe-uw/tiara) predictions to predict whether a contig is from a eukaryote or from a prokaryote. You can use Whokaryote to determine which contigs need eukaryotic gene prediction and which need prokaryotic gene prediction.	git master branch		module load whokaryote/latest