Chalmers University of Gothenburg

PSTk-Classifier

PSTk-Classifier is a software written in C++ for classifying DNA
using a Bayesian approach. Different underlying models can be
selected - Naive (Nk), Markov (Mk) and Variable Length Markov
(VLMK). The classifier works by first constructing profiles for
all groups using fasta-files directly. The profiles are kept in
a directory. Then sample sequences (in a multifasta file) can be
scored against the profiles and a highscore list will be presented.

Documentation
A short manual describing the basic syntax, manual.pdf.
Here is also a small example that shows how to classify
a multifasta file of entries into two groups made from two single
fasta files.

Download
Download the zipped tar-archive here.

Known errors
A list of bugs that have been fixed are found here.

Installation
tar xvfz classifier.tar.gz (or gunzip classifier.tar.gz; tar xvf classifier.tar)
cd classifier
make

Tested on the following compilers
gcc version 3.3.1 (SuSE Linux)
gcc version 3.4.4 20050721 (Red Hat Linux 3.4.4-2)
gcc version 2.95.3 20010315 (sparc-sun-solaris 2.8, release)
gcc version 3.4.4 (Cygwin; Windows XP) (gdc 0.12, using dmd 0.125)

Directories
bin - here the binaries should be located after compilation
doc - short documentation of software
lib - libraries
src - all source code

Binaries
classifier
seq_gen
polluter
linebreaker

Help
All binaries implements "-h" flag resulting in help with options
and examples of syntax

Warranty
PSTk-Classifier - Software for classifying DNA sequences.

Copyright (C) 2005

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

Reference
Please cite if you use this software in any work of research

D. Dalevi, D. Dubhashi and M. Hermansson (2006)
Bayesian Classifiers for Detecting HGT using Fixed and Variable Order Markov Models of Genomic Signatures.
Bioinformatics [medline]

Contact
For questions regarding this software feel free to contact
Daniel Dalevi, dalevi@cs.chalmers.se

Last modified
23 January 2006

Computer Science and Engineering, Chalmers University of Technology, SE-412 96 Göteborg, Sweden
Telephone: +46-(0)31 772 1044; Fax: +46-(0)31 165655

Last Modified: 26 January 2006
dalevi@chalmers.se
Chalmers University of Technology Göteborg University