The HPC cluster provides a collection of software, mainly in the bioinformatics field, and a generally computation oriented collection of libraries.

Kraken HPC cluster software list

NameDescriptionVersionPrefix PathModulefile
AbyssABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.2.0.2/opt/abyssmodule load abyss/2.0
bamtoolsBamTools provides both a programmer's API and an end-user's toolkit for handling
BAM files.
2.4.1/opt/module load bamtools/2.4.1
bbtoolsBBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving./opt/bbmap/module load bbmap/1.0

loads correct PATH
bedopsBEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster.2.4.26/opt/bedopsmodule load bedops/2.4.26

loads correct PATH
bedtoolsCollectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.2.21.0/opt/bedtools2
bloomtree0.3.5/op/bin/
bowtie1Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).1.1.1/opt/bowtie-1.1.1/module load bowtie/1.1.1

loads correct PATH
bowtie2Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.2.3.1 (git master)/opt/bowtie2/module load bowtie/2.3.1

loads correct PATH

bwaBWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.0.7.10
0.7.15
/opt/bwa-0.7.10
/opt/bwa-0.7.15
module load bwa/0.7.10
module load bwa/0.7.15
CD-hitCD-HIT is a very fast program for clustering and comparing protein or nucleotide sequences.4.7/opt/cdhitmodule load cdhit/4.7
diamondDIAMOND is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000.0.8.22/opt/bin/
EMBOSS toolsEMBOSS is a collection made of hundreds of open, well documented applications for molecular sequence and other analyses6.6.0/usr/bin/
FastQCFastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.0.11.2/opt/FastQC/
FreeBayesFreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.1.1.0 git branch/op/freebayes/binmodule load freebayes/1.1.0
HAMMERHammer is a tool for error correction of short read datasets with non-uniform coverage, such as single-cell data. In particular, Hammer does not make any uniformity assumptions on the distribution of the reads along the genome. It is based on a combination of the Hamming graph build from the set of k-mers and a simple probabilistic model for sequencing errors./usr/local/bin/
HAMMERHammer is a tool for error correction of short read datasets with non-uniform coverage, such as single-cell data. In particular, Hammer does not make any uniformity assumptions on the distribution of the reads along the genome. It is based on a combination of the Hamming graph build from the set of k-mers and a simple probabilistic model for sequencing errors.3.1b2/opt/hmmer-3/module load hmmer/3.1b2
HDF5HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.1.10.1/opt/hdf5/serial
/opt/hdf5/mpich2
/opt/hdf5/mpich2-slurm
/opt/hdf5/openmpi
module load hdf5/mpich2 (parallel version, linked against /opt/mpich2/lib libraries)

module load hdf5/serial
module load hdf5/openmpi
module load hdf5/mpich2-slurm
JAGSJAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS.4.2.0/opt/JAGSmodule load jags/4.2.0
MagicBlastMagic-BLAST is a new tool for mapping large sets of next-generation RNA or DNA sequencing runs against a whole genome or transcriptome.1.0.0/opt/magicblast/binmodule load magicblast/1.0
MaltMALT performs alignment of metagenomic reads against a database of reference sequences (such as NR, GenBank or Silva) and produces a MEGAN RMA file as output.0.3.8/opt/malt/module load malt/0.3.8
MeraculousMeraculous is a whole genome assembler for Next Generation Sequencing data geared for large genomes. It is a hybrid k-mer/read-based assembler that capitalizes on the high accuracy of Illumina sequence by eschewing an explicit error correction step which we argue to be redundant with the assembly process. Meraculous achieves high performance with large datasets by utilizing lightweight data structures and multi-threaded parallelization, allowing to assemble human-sized genomes on commodity clusters in under a day.2.2.4/opt/meraculous/bin/module load meraculous/2.2.4
MothurThis project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. In February 2009 we released the first version of mothur, which had accelerated versions of the popular DOTUR and SONS programs. Since then we have added the functionality of a number of other popular tools. mothur is currently the most cited bioinformatics tool for analyzing 16S rRNA gene sequences.
MPICH2
(hydra process manager)
MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.3.2/opt/mpich2/bin/module load mpi/mpich2
MPICH2
(--with-pm=none --with-pmi=slurm)
MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.3.2/opt/mpich2/mpich2-slurm/bin/module load mpi/mpich2-slurm
MrBayesMrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.3.2.6/opt/bin/
NCBI BlastBlast is an acronym for Basic Local Alignment Search Tool. BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.2.2.30+/opt/ncbi-blast-2.2.30+/bin/
NetCDFNetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.4.4.1.1/opt/netcdfmodule load netcdf/serial
module load netcdf/mpich2
module load netcdf/mpich2-slurm
module load netcdf/openmpi
OpenMPIThe Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. 2.1/opt/mpi/openmpimodule load mpi/openmpi
PilerPILER is public domain software for analyzing repetitive DNA found in genome sequences.1.0/opt/pilermodule load piler/1.0
PythonPython programming language3.6.1/opt/python/3.6.1module load python/3.6.1
qiimeQIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication quality graphics and statistics.1.9.1via miniconda2 environment
source /opt/miniconda2/bin/activate qiime1
module load miniconda2/qiime1

loads correct python virtualenv for qiime
qiime2QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication quality graphics and statistics.2.0via miniconda2 environment source /opt/miniconda2/bin/activate qiime2module load miniconda2/qiime2

loads correct python virtualenv for qiime
RR is a free software environment for statistical computing and graphics.3.4.1/opt/R/3.4.1module load R/3.4.1
raxml8.2.10/opt/raxml/pthreads/bin
/opt/raxml/mpich2/bin
/opt/raxml/mpich2-slurm/bin
module load raxml/pthreads
module load raxml/mpich2
module load raxml/mpich2-slurm
raxml-ng8.2.10/opt/raxml-ng/pthreads/bin
/opt/raxml-ng/mpich2/bin
/opt/raxml-ng/mpich2-slurm/bin
module load raxml-ng/pthreads
module load raxml-ng/mpich2
module load raxml-ng/mpich2-slurm
RECONThe program RECON has been designed for constructing profiles of nucleosome potential, characterizing the probability of nucleosome formation along DNA sequences. The program used for recognition of nucleosome formation sites in genomic DNA sequences.1.08/opt/RECON-1.08module load recon/1.0.8
RepeatMaskerRepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program.4.0.7/opt/RepeatMaskermodule load repeatmasker/4.0.7
RepeatModelerRepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.1.0.10/opt/RepeatModeler-open-1.0.10/
RepeatScoutRepeatScout is a tool to discover repetitive substrings in DNA.1.0.5/opt/RepeatScout
RepetThe REPET package integrates bioinformatics programs in order to tackle biological issues at the genomic scale.2.5/opt/repet2.5module load repet/global
RetroSeqRetroSeq is a tool for discovery and genotyping of transposable element variants (TEVs) (also known as mobile element insertions) from next-gen sequencing reads aligned to a reference genome in BAM format. The goal is to call TEVs that are not present in the reference genome but present in the sample that has been sequenced. It should be noted that RetroSeq can be used to locate any class of viral insertion in any species where whole-genome sequencing data with a suitable reference genome is available.1.41/opt/RetroSeq/module load retroseq/1.41
RoaryRoary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of samples, something which is computationally infeasible with existing methods, without compromising the quality of the results.3.6.1/opt/roary-3.6.1module load roary/3.6.1 + perlbrew env
SGASGA is a de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.0.10.15/opt/sga/bin/module load sga/0.10.15

loads correct binary path
SpadesSPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.
The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide additional contigs that will be used as long reads.
3.10.1/opt/spades/bin/
TrimmomaticTrimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.0.32/opt/Trimmomatic-0.32/
TrinityTrinity, developed at the Broad Institute and the [Hebrew University of Jerusalem] (http://www.cs.huji.ac.il), represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.r20140717/opt/trinityrnaseq_r20140717/
vcftoolsVCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.0.1.15/opt/vcftoolsmodule load vcftools/0.1.15
VelvetVelvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs.1.2.10/opt/bin/
VsearchVSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH uses an optimal global aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH which by default uses a heuristic seed and extend aligner. This usually results in more accurate alignments and overall improved sensitivity (recall) with VSEARCH, especially for alignments with gaps.2.6.0/opt/vsearch/binmodule load vsearch/2.6.0

Falkor HPC cluster software list

NameCategoryHomepageDescriptionVersionModulefile
Abyssassemblerhttp://www.bcgsc.ca/platform/bioinfo/software/abyssABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.2.0.2
2.1.4
module load abyss/2.0-openmpi
module load abyss/2.1.4-openmpi
AlignGraphassemblerhttps://github.com/baoe/AlignGraphAlgorithm for secondary de novo genome assembly guided by closely related referencesmodule load aligngraph/latest
bamtoolsformats toolkithttps://github.com/pezmaster31/bamtoolsBamTools provides both a programmer's API and an end-user's toolkit for handling
BAM files.
2.5.1module load bamtools/2.5.1
bbmap/bbtoolstool suitehttps://jgi.doe.gov/data-and-tools/bbtools/BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving.38.08module load bbmap/38.08
bcftoolsdata toolkithttps://samtools.github.io/bcftools/bcftools.htmlBCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.1.7.1module load bcftools/1.7.1
bedtoolsanalisys toolkithttp://bedtools.readthedocs.io/en/latest/Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.2.27.1module load bedtools/2.27.1
bloomtreesequence alignmenthttp://www.cs.cmu.edu/~ckingsf/software/bloomtree/0.3.5
bowtie1sequence alignmenthttp://bowtie-bio.sourceforge.net/index.shtmlBowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).1.2.2module load bowtie/1.2.2
loads correct PATH
bowtie2sequence alignmenthttp://bowtie-bio.sourceforge.net/bowtie2/index.shtmlBowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.2.3.4.1module load bowtie/2.3.4.1

loads correct PATH

bwasequence alignmenthttp://bio-bwa.sourceforge.netBWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.0.7.10
0.7.15
module load bwa/0.7.17
canuassemblerhttps://github.com/marbl/canuCanu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).1.7.1module load canu/1.7.1
CDHITsequence analysishttp://weizhongli-lab.org/cd-hit/CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.cdhit/4.6.8module load cdhit/4.6.8
diamondsequence alignmenthttps://github.com/bbuchfink/diamondDIAMOND is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000.0.9.22module load diamond/0.9.22
FastQCraw data analysis
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/A quality control tool for high throughput sequence data.
0.11.5os package
FreeBayesalignment toolhttps://github.com/ekg/freebayesFreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.1.1.0 git branchmodule load freebayes/1.1.0
gapfillerassemblerhttps://sourceforge.net/projects/gapfiller/GapFiller is a seed-and-extend local assembler to fill the gap within paired reads.
It can be used for both DNA and RNA and it has been tested on Illumina data.
GapFiller can be used whenever a sequence is to be assembled starting from reads lying on its ends, provided a loose estimate of sequence length.
2.1.1module load gapfiller/2.1.1
gmapalignment and mapping toolhttp://research-pub.gene.com/gmap/Gmap is a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models.latest 01.2018module load gmap/latest
gromacsmolecular dynamicshttp://www.gromacs.org/GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.5.1module load gromacs/5.1
HMMERsequence alignmenthttp://hmmer.org/Hammer is a tool for error correction of short read datasets with non-uniform coverage, such as single-cell data. In particular, Hammer does not make any uniformity assumptions on the distribution of the reads along the genome. It is based on a combination of the Hamming graph build from the set of k-mers and a simple probabilistic model for sequencing errors.3.1b2module load hmmer/3.1.2
Hisatsequence aligment/mappinghttps://ccb.jhu.edu/software/hisat2/index.shtmlHISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome).2.1.0module load hisat/2.1.0
Interproscansequence analysishttps://www.ebi.ac.uk/interpro/download.htmlInterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium.5.29module load interproscan/5.29
mafftsequence alignmenthttps://mafft.cbrc.jp/alignment/software/source.htmlMultiple alignment program for amino acid or nucleotide sequences7.397module load mafft/7.397
matamassemblerhttps://github.com/bonsai-team/matamMATAM, a software dedicated to the fast and accurate targeted assembly of short reads sequenced from a genomic marker of interest. The method implements a stepwise process based on construction and analysis of a read overlap graph.module load matam/latest
miraassemblerhttp://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.htmlMIRA is a multi-pass DNA sequence data assembler/mapper for whole genome and EST/RNASeq projects. Supports Sanger, Illumina, Ion Torrent, 454.4.0.4module load mira/4.0.4
MPICH2
(hydra process manager)
mpi libraryhttps://www.mpich.orgMPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.3.2module load mpi/mpich2
MPICH2
(--with-pm=none --with-pmi=slurm)
mpi libraryhttps://www.mpich.orgMPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.3.2module load mpi/mpich2-slurm
MrBayessequence analysishttp://mrbayes.sourceforge.netMrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.3.2.6module load mrbayes/3.2.7
module load mrbayes/3.2.7-openmpi
mummersequence alignmenthttp://mummer.sourceforge.net/Ultra-fast alignment of large-scale DNA and protein sequences4.0module load mummer/4.0
musclesequence alignmenthttps://www.drive5.com/muscle/MUSCLE is one of the most widely-used methods in biology. On average, MUSCLE is cited by ten new papers every day. 3.8.31module load muscle/3.8.31
NCBI Blastsequence alignmenthttps://blast.ncbi.nlm.nih.gov/Blast.cgiBlast is an acronym for Basic Local Alignment Search Tool. BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.2.7.1
OpenMPImpi libraryhttps://www.open-mpi.orgThe Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. 2.1module load mpi/openmpi
Pairagonsequence alignmentA pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels.1.1module load pairagon/1.1
Qiime2sequence analysishttps://qiime2.org/QIIME 2™ is a next-generation microbiome bioinformatics platform that is extensible, free, open source, and community developed.2.0module load qiime/2
Rprogramming languagehttps://www.r-project.org/R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.3.5.1module load R/3.5.1
salmonrna-seqhttps://salmon.readthedocs.io/en/latest/salmon.htmlSalmon is a tool for wicked-fast transcript quantification from RNA-seq data.0.11.3module load salmon/0.11.3
snowballassemblerhttps://github.com/algbioi/snowball/wiki1.2module load snowball/1.2
Spadessequence assembly toolkithttp://cab.spbu.ru/software/spades/SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.
The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide additional contigs that will be used as long reads.
3.12module load spades/3.12
STARsequence aligner
https://github.com/alexdobin/STARultrafast universal RNA-seq aligner2.6.0cmodule load STAR/2.6.0c
TensorFlowmachine learning frameworkhttps://www.tensorflow.org/Machine learning framework1.8
TopHatsequence aligner
https://ccb.jhu.edu/software/tophat/index.shtmlTopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.2.1.1module load tophat/2.1.1
Trimmomaticsequence toolkithttp://www.usadellab.org/cms/?page=trimmomaticA flexible read trimming tool for Illumina NGS data0.38module load trimmomatic/0.38
velvetassemblerhttps://www.ebi.ac.uk/~zerbino/velvet/Sequence assembler for very short reads1.2.10module load velvet/1.2.10
vicasequence analysishttps://github.com/USDA-ARS-GBRU/vicaSoftware to identify highly divergent DNA and RNA viruses and phages in microbiomesmodule load vica/latest
virsortersequence analysishttps://github.com/simroux/VirSorterVirSorter: mining viral signal from microbial genomic datagit master branch at nov. 2018module load virsorter/latest
Vsearchsequence alignmenthttps://github.com/torognes/vsearchVSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH uses an optimal global aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH which by default uses a heuristic seed and extend aligner. This usually results in more accurate alignments and overall improved sensitivity (recall) with VSEARCH, especially for alignments with gaps.2.6.2module load vsearch/2.6.2