KMD photo

Home Research Publications Software


Publically Available Software
EvoPipes.net:  Evolutionary bioinformatic pipelines for all
NU-IN:  Nucleotide evolution and input module for EvolSimulator2.1.0
SnoWhite: Cleaning pipeline for next-generation cDNA sequences
SCARF:  Scaffolded and corrected assembly of Roche 454




EVOPIPES LOGO

EvoPipes.net provides publicly available evolutionary bioinformatic pipelines to examine patterns of gene duplication, identify orthologs, test for positive selection, and more.  Much of my software listed here is also available there.  Check it out!




NU-IN LOGO

Nucleotide Evolution and Input Module for EvolSimulator2.1.0

NU-IN is is an adaptation and expansion of the EvolSimulator 2.1.0 genome evolution simulation program by Beiko and Charlebois (2007, http://bioinformatics.org.au/evolsim/).  NU-IN was designed to expand EvolSimulator in two fundamental ways:  1) Allow non-coding nucleotide evolution and 2) Permit input of genomes, gene family membership, and gene 'usefulness' (the selective retention of particular loci in particular environments).  With these changes, the user has the ability to use real genomic (coding) sequence data to initiate a simulation of one or more lineages, generate mutations through SNPs and copy number variation (as well as horizontal gene transfer), evolve genomes by drift and selection, and use output of previous simulations as starting points for further evolution.

Download Source Code and Documentation for NU-IN
(written in C++ and Perl on Ubuntu Linux)

NU-IN is free (as in beer and speech) and is licensed under the terms of the GNU General Public License.

Citation for NU-IN
Dlugosch KM, Barker MS, Rieseberg LH. NU-IN: Nucleotide evolution and input module for the EvolSimulator genome simulation platform.  BMC Research Notes, In progress

Citation for EvolSimulator2.1.0 Be sure to cite EvolSimulator along with NU-IN!!!
Beiko, R.G. and Charlebois, R.L. (2007). A simulation test bed for hypotheses of genome evolution. Bioinformatics 23:825-831.
Available at: http://bioinformatics.org.au/evolsim/





SNOWHITE LOGOSnoWhite

A Cleaning Pipeline for cDNA Sequences

Snowhite is a pipeline to call existing programs as well as custom scripts designed to flexibly and aggressively clean EST reads prior to assembly.  It takes in and returns fasta formatted sequence and (optionally) quality files.  It employs several steps:

1) Adapter Clipping:  SnoWhite can clip a user-specified number of bases or clip up to a user-specified sequence tag, from either end of each sequence.

2&4) Seqclean: SnoWhite passes files to TGI's Seqclean, a relatively old but still excellent tool for trimming polyA/T tails, primer contaminants, and uninformative sequences (Ns).

3) PolyA/T Trimming:  SnoWhite provides additional trimming governed by many tunable parameters.  In short, users can set tolerances for what constitutes a polyA/T, where to look for it in the sequence, and how much error to allow.

5) TagDust:  SnoWhite optionally implements TagDust, which is designed to find sequences that are composed almost entirely of primer/adapter fragments.  These primer 'multimers' or 'concatmers' are a persistent low-abundance feature of many datasets, and are extremely difficult to remove using traditional contaminant searches.  

Download Program v1.1.4 and Documentation
(written in Perl on Unbuntu Linux)

SnoWhite is free (as in beer and speech) and is licensed under the terms of the GNU General Public License.

Data Types:
454:  SnoWhite was written for Roche 454 data, and is ideal for this.
Illumina & SOLiD:  May require large amounts of RAM (e.g. >32GB for a large dataset).
Sanger:  Note that TagDust evaluates only the first 999bp of sequence,
    and TagDust does not tolerate vector sequences >2000nt.

Improvements / Issues:
v1.1.4 Additional clipping options, including specification of adapter sequences
v1.1.3 Trims terminal 'X' characters that may remain from user pre-processing
    (e.g. with vector masking by cross_match, most of which is clipped by Seqclean).
v1.1.2 Works around the 999 bp limit inherent in TagDust (see Readme file).
v1.1.1 Addresses a bug in the quality file editing after the TagDust step in v1.1.0.





SCARF LOGO

Scaffolded and Corrected Assembly of Roche 454

A next-gen sequence assembly tool for evolutionary genomics. Designed especially for assembling 454 EST sequences against high quality reference sequences from related species.

SCARF was created in order to knit together low-coverage 454 contigs that do not assemble during traditional de novo assembly, using a reference sequence library to orient the 454 sequences. SCARF is especially well suited for non-contiguous or low depth data sets such as EST (expressed sequence tag) libraries. SCARF can also be used to sort and assemble a pool of 454 sequence data according to a set of reference sequences (e.g. for metagenomics). See the documentation for a full description of the methodology behind SCARF. 

Barker, M. S., K. M. Dlugosch, A. C. C. Reddy, S. N. Amyotte, and L. H. Rieseberg. 2009. SCARF: Maximizing next-generation EST assemblies for evolutionary and population genomic analyses. Bioinformatics 25(4): 535-536.

More Information and Downloads at http://evopipes.net



All contents © Copyright 2010 Katrina M Dlugosch