News2010-02-01 - MOSAIK 1.0.1388 was released. This release recalculates the alignment qualities in MosaikSort for paired-end/mate-pair reads. A bug was fixed where the first mate or second mate flag was not set for orphaned paired-end reads when creating SAM and BAM files.

2010-01-24 - MOSAIK 1.0.1384 was released. More and more people are experimenting with mate-pair libraries on the Illumina platform (traditionally uses a paired-end library protocol). MosaikSort has now been updated to autodetect whether or not a mate-pair or paired-end library is being used - regardless of the platform (Illumina, Roche 454, AB SOLiD, etc.). Enjoy!

2010-01-13 - MOSAIK 1.0.1370 was released. This version fixes a bug in the SAM file output where padded bases were incorrectly being written. The version also fixes a bug where the NM flag in the BAM file output overflowed into negative numbers. SAM & BAM support work correctly with samtools now.

2009-11-19 - MOSAIK 1.0.1367 was released. This version contains quite a few bug fixes, but the colorspace translation is still being fixed.

2009-10-14 - MOSAIK 1.0.1307 was released. Our first open source release (the first closed source release was made way back in 2007).

IntroductionMOSAIK is a reference-guided assembler developed by Michael Strömberg comprising of four main modular programs:

MosaikBuild MosaikAligner MosaikSort MosaikAssembler. MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format. At this time, the workflow consists of supplying sequences in FASTA, FASTQ, Illumina Bustard & Gerald, or SRF file formats and producing assembly files (phrap ace and GigaBayes gig formats) which can be viewed with utilities such as consed or EagleView.

What's new?The overall alignment speed is much quicker now due to a banded Smith-Waterman algorithm implementation by Wan-Ping Lee. Longer Roche 454 reads align much quicker than before.

Alignment qualities have been studied heavily and two separate logistic regression models were created to increase the accuracy and usefulness of our alignment qualities.

A local alignment search option has been added to help rescue mates in paired-end/mate-pair reads that may be missing due to highly repetitive regions in the genome.

SOLiD support has finally come of age. MOSAIK imports and aligns SOLiD reads in colorspace, but now seamlessly converts the alignments back into basespace. No more downstream bioinformatics headaches.

Robust support for the SAM & BAM alignment file formats.

The command line parameters have been cleaned up and sensible default parameters have been chosen. This cuts down the ridiculously long command-lines to simply specifying an input file and an output file in most cases.

What makes MOSAIK different?Unlike many current read aligners, MOSAIK produces gapped alignments using the Smith-Waterman algorithm. Additionally, our program goes beyond producing pairwise alignments and produces reference-guided assemblies with gapped alignments. These features make it ideal for downstream single nucleotide polymorphism (SNP) and short insertion/deletion (INDEL) discovery.

MOSAIK is written in highly portable C++ and currently targetted for the following platforms: Microsoft Windows, Apple Mac OS X, FreeBSD, and Linux operating systems. Other platforms can easily be supported upon request.

MOSAIK is multithreaded. If you have a machine with 8 processors, you can use all 8 processors to align reads faster while using the same memory footprint as when using one processor.

MOSAIK supports multiple sequencing technologies. In addition to legacy technologies such as Sanger capillary sequencing, our program supports next generation technologies such as Roche 454, Illumina, AB SOLiD, and experimental support for the Helicos Heliscope.

Featuresaligns a large range of read lengths
from short Illumina reads to medium 454 reads to long legacy Sanger reads

can create an assembly with multiple sequencing technologies (Illumina, Roche 454, Helicos, and Sanger)

reference-guided aligner
use an entire genome as a reference when aligning reads

gapped alignment
especially useful for insertion / deletion (indel) detection

aligns 2 million Illumina 36 bp reads against the full human genome in 8 minutes using 8 processor cores

Finally Open Source!By popular demand, MOSAIK is now dual licensed under the GPL 2.0+ and via a commercial license available from the Marth Lab.

We are also very eager to add contributors to the MOSAIK code base. Let us know if you're interested.


GNU General Public License v2.0 or later

