The HPC cluster provides a collection of software, mainly in the bioinformatics field, and a generally computation oriented collection of libraries.

Falkor HPC cluster software list

NameCategoryHomepageDescriptionVersionModulefile
Abyssassemblerhttp://www.bcgsc.ca/platform/bioinfo/software/abyssABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.2.0.2
2.1.4
module load abyss/2.0-openmpi
module load abyss/2.1.4-openmpi
AlignGraphassemblerhttps://github.com/baoe/AlignGraphAlgorithm for secondary de novo genome assembly guided by closely related referencesmodule load aligngraph/latest
bamtoolsformats toolkithttps://github.com/pezmaster31/bamtoolsBamTools provides both a programmer's API and an end-user's toolkit for handling
BAM files.
2.5.1module load bamtools/2.5.1
bbmap/bbtoolstool suitehttps://jgi.doe.gov/data-and-tools/bbtools/BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving.38.08module load bbmap/38.08
bcftoolsdata toolkithttps://samtools.github.io/bcftools/bcftools.htmlBCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.1.7.1module load bcftools/1.7.1
bedtoolsanalisys toolkithttp://bedtools.readthedocs.io/en/latest/Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.2.27.1module load bedtools/2.27.1
bloomtreesequence alignmenthttp://www.cs.cmu.edu/~ckingsf/software/bloomtree/0.3.5
bowtie1sequence alignmenthttp://bowtie-bio.sourceforge.net/index.shtmlBowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).1.2.2module load bowtie/1.2.2
loads correct PATH
bowtie2sequence alignmenthttp://bowtie-bio.sourceforge.net/bowtie2/index.shtmlBowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.2.3.4.1module load bowtie/2.3.4.1

loads correct PATH

bwasequence alignmenthttp://bio-bwa.sourceforge.netBWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.0.7.10
0.7.15
module load bwa/0.7.17
canuassemblerhttps://github.com/marbl/canuCanu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).1.7.1module load canu/1.7.1
CDHITsequence analysishttp://weizhongli-lab.org/cd-hit/CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.cdhit/4.6.8module load cdhit/4.6.8
diamondsequence alignmenthttps://github.com/bbuchfink/diamondDIAMOND is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000.0.9.22module load diamond/0.9.22
FastQCraw data analysis
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/A quality control tool for high throughput sequence data.
0.11.5os package
FreeBayesalignment toolhttps://github.com/ekg/freebayesFreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.1.1.0 git branchmodule load freebayes/1.1.0
gapfillerassemblerhttps://sourceforge.net/projects/gapfiller/GapFiller is a seed-and-extend local assembler to fill the gap within paired reads.
It can be used for both DNA and RNA and it has been tested on Illumina data.
GapFiller can be used whenever a sequence is to be assembled starting from reads lying on its ends, provided a loose estimate of sequence length.
2.1.1module load gapfiller/2.1.1
gmapalignment and mapping toolhttp://research-pub.gene.com/gmap/Gmap is a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models.latest 01.2018module load gmap/latest
gromacsmolecular dynamicshttp://www.gromacs.org/GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.5.1module load gromacs/5.1
HMMERsequence alignmenthttp://hmmer.org/Hammer is a tool for error correction of short read datasets with non-uniform coverage, such as single-cell data. In particular, Hammer does not make any uniformity assumptions on the distribution of the reads along the genome. It is based on a combination of the Hamming graph build from the set of k-mers and a simple probabilistic model for sequencing errors.3.1b2module load hmmer/3.1.2
Hisatsequence aligment/mappinghttps://ccb.jhu.edu/software/hisat2/index.shtmlHISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome).2.1.0module load hisat/2.1.0
Interproscansequence analysishttps://www.ebi.ac.uk/interpro/download.htmlInterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium.5.29module load interproscan/5.29
mafftsequence alignmenthttps://mafft.cbrc.jp/alignment/software/source.htmlMultiple alignment program for amino acid or nucleotide sequences7.397module load mafft/7.397
matamassemblerhttps://github.com/bonsai-team/matamMATAM, a software dedicated to the fast and accurate targeted assembly of short reads sequenced from a genomic marker of interest. The method implements a stepwise process based on construction and analysis of a read overlap graph.module load matam/latest
miraassemblerhttp://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.htmlMIRA is a multi-pass DNA sequence data assembler/mapper for whole genome and EST/RNASeq projects. Supports Sanger, Illumina, Ion Torrent, 454.4.0.4module load mira/4.0.4
MPICH2
(hydra process manager)
mpi libraryhttps://www.mpich.orgMPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.3.2module load mpi/mpich2
MPICH2
(--with-pm=none --with-pmi=slurm)
mpi libraryhttps://www.mpich.orgMPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.3.2module load mpi/mpich2-slurm
MrBayessequence analysishttp://mrbayes.sourceforge.netMrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.3.2.6module load mrbayes/3.2.7
module load mrbayes/3.2.7-openmpi
mummersequence alignmenthttp://mummer.sourceforge.net/Ultra-fast alignment of large-scale DNA and protein sequences4.0module load mummer/4.0
musclesequence alignmenthttps://www.drive5.com/muscle/MUSCLE is one of the most widely-used methods in biology. On average, MUSCLE is cited by ten new papers every day. 3.8.31module load muscle/3.8.31
NCBI Blastsequence alignmenthttps://blast.ncbi.nlm.nih.gov/Blast.cgiBlast is an acronym for Basic Local Alignment Search Tool. BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.2.7.1
OpenMPImpi libraryhttps://www.open-mpi.orgThe Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. 2.1module load mpi/openmpi
Pairagonsequence alignmentA pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels.1.1module load pairagon/1.1
Qiime2sequence analysishttps://qiime2.org/QIIME 2™ is a next-generation microbiome bioinformatics platform that is extensible, free, open source, and community developed.2.0module load qiime/2
Rprogramming languagehttps://www.r-project.org/R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.3.5.1module load R/3.5.1
salmonrna-seqhttps://salmon.readthedocs.io/en/latest/salmon.htmlSalmon is a tool for wicked-fast transcript quantification from RNA-seq data.0.11.3module load salmon/0.11.3
snowballassemblerhttps://github.com/algbioi/snowball/wiki1.2module load snowball/1.2
Spadessequence assembly toolkithttp://cab.spbu.ru/software/spades/SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.
The current version of SPAdes works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide additional contigs that will be used as long reads.
3.12module load spades/3.12
STARsequence aligner
https://github.com/alexdobin/STARultrafast universal RNA-seq aligner2.6.0cmodule load STAR/2.6.0c
TensorFlowmachine learning frameworkhttps://www.tensorflow.org/Machine learning framework1.8
TopHatsequence aligner
https://ccb.jhu.edu/software/tophat/index.shtmlTopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.2.1.1module load tophat/2.1.1
Trimmomaticsequence toolkithttp://www.usadellab.org/cms/?page=trimmomaticA flexible read trimming tool for Illumina NGS data0.38module load trimmomatic/0.38
velvetassemblerhttps://www.ebi.ac.uk/~zerbino/velvet/Sequence assembler for very short reads1.2.10module load velvet/1.2.10
vicasequence analysishttps://github.com/USDA-ARS-GBRU/vicaSoftware to identify highly divergent DNA and RNA viruses and phages in microbiomesmodule load vica/latest
virsortersequence analysishttps://github.com/simroux/VirSorterVirSorter: mining viral signal from microbial genomic datagit master branch at nov. 2018module load virsorter/latest
Vsearchsequence alignmenthttps://github.com/torognes/vsearchVSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH uses an optimal global aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH which by default uses a heuristic seed and extend aligner. This usually results in more accurate alignments and overall improved sensitivity (recall) with VSEARCH, especially for alignments with gaps.2.6.2module load vsearch/2.6.2