Use Biopython for Bioinformatics computation
Biopython is an open source project built on the top of Python, dedicated to biology related scientific computation (mainly bioinformatics). The project was formed in 1999. The main contributors of Biopython are Jeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck,Michiel de Hoon, Peter Cock, Tiag Antao, Eric Talevich.
Biopython was also selected as the 2010 Google summer code with the following contributed projects.
- Biopython and PyCogent interoperability
- Phylogenetics pipeline development in Galaxy
- Building python APIs for R phylogenetic toolkits
The main features of Biopython are:
The ability to parse bioinformatics files into Python utilizable data structures, including support for the following formats:
Blast output – both from standalone and WWW Blast
PubMed and Medline
ExPASy files, like Enzyme and Prosite
SCOP, including ‘dom’ and ‘lin’ files
Files in the supported formats can be iterated over record by record or indexed and accessed via a Dictionary interface.
Code to deal with popular on-line bioinformatics destinations such as:
NCBI – Blast, Entrez and PubMed services
ExPASy – Swiss-Prot and Prosite entries, as well as Prosite searches
Interfaces to common bioinformatics programs such as:
Standalone Blast from NCBI
Clustalw alignment program
EMBOSS command line tools
A standard sequence class that deals with sequences, ids on sequences, and sequence features.
Tools for performing common operations on sequences, such as translation, transcription and weight calculations.
Code to perform classification of data using k Nearest Neighbors, Naive Bayes or Support Vector Machines.
Code for dealing with alignments, including a standard way to create and deal with substitution matrices.
Code making it easy to split up parallelizable tasks into separate processes.
GUI-based programs to do basic sequence manipulations, translations, BLASTing, etc.
Extensive documentation and help with using the modules, including this file, on-line wiki documentation, the web site, and the mailing list.
Integration with BioSQL, a sequence database schema also supported by the BioPerl and BioJava projects.