Publically Available Software
EvoPipes.net:
Evolutionary bioinformatic pipelines for all
NU-IN: Nucleotide evolution and input module for
EvolSimulator2.1.0
SnoWhite: Cleaning pipeline for next-generation cDNA sequences
SCARF:
Scaffolded and corrected assembly of Roche 454

EvoPipes.net provides publicly available evolutionary bioinformatic pipelines to
examine patterns of gene duplication, identify orthologs, test for
positive selection, and more. Much of my software listed here is also available there. Check it out!
Nucleotide Evolution and Input Module for EvolSimulator2.1.0
NU-IN is is an adaptation and expansion of the EvolSimulator
2.1.0 genome evolution simulation program by Beiko and Charlebois
(2007, http://bioinformatics.org.au/evolsim/). NU-IN was
designed to expand EvolSimulator in two fundamental ways: 1)
Allow non-coding nucleotide evolution and 2) Permit input of genomes, gene family
membership, and gene 'usefulness' (the selective retention of
particular loci in particular environments). With these
changes, the user has the ability to use real genomic (coding) sequence
data to initiate a simulation of one or more lineages, generate
mutations through SNPs and copy number variation (as well as horizontal gene
transfer), evolve genomes by drift and selection, and use output of
previous simulations as starting points for further evolution.
Download Source Code and Documentation for
NU-IN (written in C++ and Perl on Ubuntu Linux)
NU-IN is free (as in beer and speech) and is
licensed under the terms of the GNU General Public
License.
Citation
for NU-IN
Dlugosch KM, Barker MS, Rieseberg LH. NU-IN: Nucleotide evolution and
input module for the EvolSimulator genome simulation platform.
BMC Research Notes, In progress
Citation for
EvolSimulator2.1.0 Be sure to cite
EvolSimulator along with NU-IN!!!
Beiko, R.G. and Charlebois, R.L. (2007). A simulation test bed for
hypotheses of genome evolution. Bioinformatics 23:825-831.
Available at: http://bioinformatics.org.au/evolsim/
SnoWhite
A Cleaning Pipeline for cDNA Sequences Snowhite
is a pipeline to call existing programs as well as custom scripts
designed to flexibly and aggressively clean EST reads prior to assembly. It takes in and returns fasta
formatted sequence and (optionally) quality files. It employs
several steps:
1)
Adapter Clipping: SnoWhite can clip a user-specified number of
bases or clip up to a user-specified sequence tag, from either end of
each sequence.
2&4) Seqclean: SnoWhite passes files to TGI's Seqclean, a relatively old but still excellent tool for trimming polyA/T tails, primer contaminants, and uninformative sequences (Ns).
3)
PolyA/T Trimming: SnoWhite provides additional trimming governed
by many tunable parameters. In short, users can set tolerances
for what constitutes a polyA/T, where to look for it in the sequence,
and how much error to allow.
5) TagDust: SnoWhite optionally implements TagDust,
which is designed to find sequences that are composed almost entirely
of primer/adapter fragments. These primer 'multimers' or
'concatmers' are a persistent low-abundance feature of many datasets,
and are extremely difficult to remove using traditional contaminant
searches.
Download Program v1.1.4 and Documentation (written in Perl on Unbuntu Linux)
SnoWhite is free (as in beer and speech) and is
licensed under the terms of the GNU General Public
License. Data Types: 454: SnoWhite was written for Roche 454 data, and is ideal for this. Illumina & SOLiD: May require large amounts of RAM (e.g. >32GB for a large dataset). Sanger: Note that TagDust evaluates only the first 999bp of sequence, and TagDust does not tolerate vector sequences >2000nt. Improvements / Issues: v1.1.4 Additional clipping options, including specification of adapter sequences v1.1.3 Trims terminal 'X' characters that may remain from user pre-processing (e.g. with vector masking by cross_match, most of which is clipped by Seqclean). v1.1.2 Works around the 999 bp limit inherent in TagDust (see Readme file). v1.1.1 Addresses a bug in the quality file editing after the TagDust step in v1.1.0.

Scaffolded
and Corrected Assembly of Roche 454
A next-gen
sequence assembly tool for evolutionary genomics. Designed
especially for assembling 454 EST sequences against high quality
reference sequences from related species.
SCARF was
created in order to knit
together low-coverage 454 contigs that do not assemble during
traditional de novo assembly, using a reference sequence library to
orient the 454 sequences. SCARF is especially well suited for
non-contiguous or low depth data sets such as EST (expressed sequence
tag) libraries. SCARF can also be used to sort and assemble a pool of
454 sequence data according to a set of reference sequences (e.g. for
metagenomics). See the documentation for a full description of the
methodology behind SCARF.
Barker,
M. S., K. M. Dlugosch, A. C. C. Reddy, S. N. Amyotte, and L. H.
Rieseberg. 2009. SCARF: Maximizing next-generation EST assemblies for
evolutionary and population genomic analyses. Bioinformatics 25(4):
535-536.
More
Information and Downloads at http://evopipes.net
|