computationalbiologytoolsanddatabases
Computational Biology Tools and Databases
DATABASES
Microbial genome databases http://www.ncbi.nlm.nih.gov:80/PMGifs/Genomes/micr.html
Protein Information Resource http://www-nbrf.georgetown.edu/pir/genome.html
Comparative genome analysis in P. Bork laboratory http://www.bork.embl-heidelberg.de/Genome/
TIGR: The Comprehensive Microbial Resource Home Page—the omniome http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl
Genome databases other than NCBI http://www.unl.edu/stc-95/ResTools/biotools/biotools10.html
Genome list at NIH http://molbio.info.nih.gov/molbio/db.html
Mitochondrial DNA Database MitBASE http://www3.ebi.ac.uk/Research/Mitbase/mitbase.pl
E. coli genome project http://www.genome.wisc.edu/
E. coli genome and proteome database GenProtEC http://genprotec.mbl.edu/
E. coli index http://web.bham.ac.uk/bcm4ght6/res.html
Organelle genome sequences http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/organelles.html
Parasite genome databases and genome research resources http://www.ebi.ac.uk/parasites/parasite-genome.html
Retroviral genotyping and analysis site http://www.ncbi.nlm.nih.gov/retroviruses/
GenBank at the National Center of Biotechnology Information, National Library of Medicine, Washington, DC accessible from: http://www.ncbi.nlm.nih.gov/Entrez
European Molecular Biology Laboratory (EMBL) Outstation at Hixton, England http://www.ebi.ac.uk/embl/index.html
DNA DataBank of Japan (DDBJ) at Mishima, Japan http://www.ddbj.nig.ac.jp/
Protein International Resource (PIR) database at the National Biomedical Research Foundation in Washington, DC http://www-nbrf.georgetown.edu/pirwww/
The SwissProt protein sequence database at ISREC, Swiss Institute for Experimental Cancer Research in Epalinges/Lausanne http://www.expasy.ch/cgi-bin/sprot-search-de
The Sequence Retrieval System (SRS) at the European Bioinformatics Institute allows both simple and complex concurrent searches of one or more sequence databases. The SRS system may also be used on a local machine to assist in the preparation of local sequence databases. http://srs6.ebi.ac.uk
Protein data bank (PDB) at the State University of New Jersey (Rutgers)a atomic coordinates of structures as PDB files, models, viewers, links to many other Web sites for structural analysis and classification http://www.rcsb.org/pdb
COG (cluster of orthologous groups): http://www.ncbi.nlm.nih.gov/COG/
DOGS: Database of genome sizes http://www.cbs.dtu.dk/databases/DOGS/index.html
allgenes.org: A comprehensive gene index (catalog) derived from ESTs and predicted genes http://www.allgenes.org/
GeneCensus Genome Comparisons by encoded protein structures http://bioinfo.mbb.yale.edu/genome/
GeneQuiz: An integrated system for large-scale biological sequence analysis and data management (Andrade et al. 1999; Hoersch et al. 2000) http://jura.ebi.ac.uk:8765/ext-genequiz/
Genes and disease: Map location on human chromosomes http://www.ncbi.nlm.nih.gov/disease/
Genome channel at Oak Ridge National Laboratories http://compbio.ornl.gov/channel/
GOLD™: Genomes OnLine Database (Kyrpides 1999) http://wit.integratedgenomics.com/GOLD/
IMGT ImMunoGeneTics Database specializing in Immunoglobulins, T-cell receptors, and Major Histocompatibility Complex (MHC) of all vertebrate species http://www.ebi.ac.uk/imgt/index.html
KEGG: Kyoto Encyclopedia of Genes and Genomes (Kanehisa and Goto 2000) http://www.genome.ad.jp/kegg/
MIA Molecular Information Agent: A Web server that searches biological databases for information on a macromolecule http://mia.sdsc.edu/
Orthologous gene alignments at TIGR http://www.tigr.org/tdb/toga/toga.shtml
PEDANT: A protein extraction, description, and analysis tool http://pedant.mips.biochem.mpg.de/
STRING Search Tool for Recurring Instances of Neighboring Genes http://www.Bork.EMBL-Heidelberg.DE/STRING/
Taxonomy browser at the NCBI arranges genomes taxonomically for sequence retrieval http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/
UniGene System gene-oriented clusters of GenBank sequences useful for gene identification http://www.ncbi.nlm.nih.gov/UniGene/
2D gel analysis of proteins: List of organisms http://www.expasy.ch/ch2d/2d-index.html
AlignAce for promoter analysis of coordinately regulated genes, e.g., microarrays by Gibbs sampling (Roth et al. 1998; Hughes et al. 2000; McGuire et al. 2000) http://atlas.med.harvard.edu/download/
ArrayExpress database at European Bioinformatics Institute for microarray analysis http://www.ebi.ac.uk/arrayexpress/
BRITE: Database of protein-protein interactions and cross-reference links http://www.genome.ad.jp/brite/brite.html
Ecocyc electronic encyclopedia of genes and metabolism of E. coli (Karp et al. 2000) http://ecocyc.PangeaSystems.com/ecocyc/
Expression Profiler tools for analysis and clustering of gene expression and sequence data http://ep.ebi.ac.uk/
Functional genomics sites http://www.ornl.gov/hgmis/publicat/hgn/hgnarch.html#fg http://www.ornl.gov/hgmis/publicat/hgn/hgnarch.html
GeneCensus Genome Comparisons by encoded protein structures http://bioinfo.mbb.yale.edu/genome/
GENECLUSTER; Tamayo et al. (1999) http://www.genome.wi.mit.edu/MPR/software.html
GeneX: A Collaborative Internet Database and Toolset for Gene Expression Data http://www.ncgr.org/genex/
MetaCyc metabolic encyclopedia (see EcoCyc) http://ecocyc.PangeaSystems.com/ecocyc/
Microarray guide: P. Brown lab http://cmgm.stanford.edu/pbrown/
Microarray project at NIH http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/
Microarray software http://rana.lbl.gov/
microarrays.org http://www.microarrays.org/
SMART: For the study of genetically mobile protein domains (Schultz et al. 2000) http://smart.embl-heidelberg.de/
SWISS-2DPAGE: Two-dimensional polyacrylamide gel electrophoresis database (Hoogland et al. 2000) http://www.expasy.ch/ch2d/
TIGR: Annotation and gene indexing resources, including analysis of the transcribed sequences represented in the public EST databases. http://www.tigr.org/tdb/tgi.shtml
WIT (What is there?): Interactive metabolic reconstruction on the Web (Overbeek et al. 2000) http://wit.mcs.anl.gov/WIT2/
GFF (Gene-Finding Features): Specification for describing genes and other features of genomics http://www.sanger.ac.uk/Software/GFF/
GO (gene ontology) controlled vocabulary http://genome-www.stanford.edu/GO/
MAGPIE: Multipurpose Automated Genome Project Investigation Environment http://www.rockefeller.edu/labheads/gaasterland/gaasterland.html,http://genomes.rockefeller.edu/research.shtml#magpie
http://magpie.genome.wisc.edu/tools.html
http://genomes.rockefeller.edu/research.shtml
TAMBIS: A conceptual model of molecular biology and bioinformatics and methods for querying the model (Baker et al. 1999) http://img.cs.man.ac.uk/tambis/
RDP: The Ribosomal Database Project (RDP) provides ribosome related data services to the scientific community, including online data analysis, rRNA derived phylogenetic trees, and aligned and annotated rRNA sequences http://rdp.cme.msu.edu/html/
"GO: dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing.
" http://www.geneontology.org/index.shtml
Miscelleneous Tools For Bioinformatics Analysis On the WWW
Pairwise Sequence Alignment
Global alignment programs (GAP, NAP) http://genome.cs.mtu.edu/align/align.html Huang (1994)
BLAST 2 sequence alignment (BLASTN, BLASTP) http://www.ncbi.nlm.nih.gov/gorf/bl2.html Altschul et al. (1990)
Bayes block aligner http://www.wadsworth.org/res&res/bioinfo
BCM Search Launcher: Pairwise sequence alignmenta http://searchlauncher.bcm.tmc.edu/seq-search/alignment.html
SIM—Local similarity program for finding alternative alignments http://www.expasy.ch/tools/sim.html
FASTA program suite http://fasta.bioch.virginia.edu/fasta/fasta_list.html Pearson and Miller (1992); Pearson (1996)
Likelihood-weighted sequence alignment (lwa)c http://stateslab.bioinformatics.med.umich.edu/service/lwa.html
Multiple Sequence Alignment
CLUSTALW or CLUSTALX (latter has graphical interface) FTP to ftp.ebi.ac.uk/pub/software ftp://ftp.ebi.ac.uk/pub/software a,d Thompson et al. (1994a, 1997); Higgins et al. (1996)
MSA http://www.psc.edu/, http://www.ibc.wustl.edu/ibc/msa.html, ftp://fastlink.nih.gov/pub/msa, cFTP to fastlink.nih.gov/pub/msa Lipman et al. (1989);Gupta et al. (1995)
PRALINE http://mathbio.nimr.mrc.ac.uk/~jhering/praline/ http://mathbio.nimr.mrc.ac.uk/%7Ejhering/praline/ Heringa (1999)
DIALIGN segment alignment http://www.gsf.de/biodv/dialign.html Morgenstern et al. (1996)
MultAlin http://protein.toulouse.inra.fr/multalin.html Corpet (1988)
Parallel PRRN progressive global alignment http://prrn.ims.u-tokyo.ac.jp/ Gotoh (1996)
SAGA genetic algorithm http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/saga_home_page.html http://igs-server.cnrs-mrs.fr/%7Ecnotred/Projects_home_page/saga_home_page.html Notredame and Higgins (1996)
Protein Profile Generation Tools Based on MSA
Aligned Segment Statistical Evaluation Tool (Asset) FTP to ncbi.nlm.nih.gov/pub/neuwald/asset ftp://ncbi.nlm.nih.gov/pub/neuwald/asset Neuwald and Green (1994)
BLOCKS Web site http://blocks.fhcrc.org/blocks/ Henikoff and Henikoff (1991, 1992)
eMOTIF Web server http://dna.Stanford.EDU/emotif/ Nevill-Manning et al. (1998)
GIBBS, the Gibbs sampler statistical method FTP to ncbi.nlm.nih.gov/pub/neuwald/gibbs9_95/ ftp://ncbi.nlm.nih.gov/pub/neuwald/gibbs9_95/ Lawrence et al. (1993); Liu et al. (1995); Neuwald et al. (1995)
HMMER hidden Markov model software http://hmmer.wustl.edu/ Eddy (1998)
MACAW, a workbench for multiple alignment construction and analysis FTP to ncbi.nlm.nih.gov/pub/macaw/ ftp://ncbi.nlm.nih.gov/pub/macaw/ Schuler et al. (1991)
MEME Web site, expectation maximization method http://meme.sdsc.edu/meme/website/ Bailey and Elkan (1995); Grundy et al. (1996, 1997); Bailey and Gribskov (1998)
Profile analysis at UCSDa,e http://www.sdsc.edu/projects/profile/ Gribskov and Veretnik (1996)
SAM hidden Markov model Web site http://www.cse.ucsc.edu/research/compbio/sam.html Krogh et al. (1994); Hughey and Krogh (1996)
RNA Tools
MFOLD minimum energy RNA configuration http://bioinfo.math.rpi.edu/~zukerm/rna/ http://bioinfo.math.rpi.edu/%7Ezukerm/rna/ Zuker et al. (1991)
RNA editing Web site, UCLA http://www.lifesci.ucla.edu/RNA/index.html Simpson et al. (1998)
RNA editing, uridine insertion/deletion http://www.lifesci.ucla.edu/RNA/trypanosome/ Simpson et al. (1998)
RNA modification database http://medlib.med.utah.edu/RNAmods/ Limbach et al. (1994); Rozenski et al. (1999)
RNA secondary structures, Group I introns, 16S rRNA, 23S rRNA http://www.rna.icmb.utexas.edu Gutell (1994); Schnare et al. (1996 and references therein)
tRNAscan-SE search server http://www.genetics.wustl.edu/eddy/tRNAscan-SE/ Lowe and Eddy (1997)
Vienna RNA package for RNA secondary structure prediction and comparison http://www.tbi.univie.ac.at/~ivo/RNA/ http://www.tbi.univie.ac.at/%7Eivo/RNA/ Hofacker et al. (1998); Wuchty et al. (1999)
DATABASE SEARCHES (Sequence similarity search with query sequence protein sequence database (or genomic sequencesa) search for database sequence that can be aligned with query sequence single sequence, e.g.,DAHQSNGA)
BLAST SUITE http://www.ncbi.nlm.nih.gov/BLAST/
FASTA SUITE http://fasta.bioch.virginia.edu/fasta/
WU-BLAST http://blast.wustl.edu/
PROFILESEARCH ftp://ftp.sdsc.edu/pub/sdsc/biology Alignment search with profile (scoring matrixb,d with gap penalties) protein sequence database prepare profile from a multiple sequence alignment (Profilemake) and align profile with database sequence profile representing gapped multiple sequence alignment, e.g.,D-HQSNGA,ESHQ-YTM,EAHQSN-L EGVQSYSL
MAST http://meme.sdsc.edu/meme/website/mast.html Search with position-specific scoring matrixc,d (PSSM) representing ungapped sequence alignment (BLOCK) protein sequence database prepare PSSM from ungapped region of multiple sequence alignment or search for patterns of same length in unaligned sequences,c then use for database search PSSM representing ungapped alignment, e.g.,DAHQSN,ESHQSY,EAHQSN,EGVQSY
PSI- BLAST http://www.ncbi.nlm.nih.gov/BLAST/ Iterative alignment search for similar sequences that starts with a query sequence, builds a gapped multiple alignment, and then uses the alignment to augment the searchd ses initial matches to query sequence to build a type of scoring matrix and searches for additional matches to the matrix by an iterative search methodd builds matches to query sequence, e.g.,DAHQSNGA,iteration 1H-SNGA EAHQSN-L -> further iterations. PSI-BLAST finds a set of sequences related to each other by the presence of common patterns (not every sequence may have same patterns).
PROSITE http://www.expasy.ch/prosite Search query sequence for patterns representative of protein familiese database of patterns found in protein families search for patterns represented by scoring matrix or hidden Markov model (profile HMM)e single sequence, e.g., DAHQSNGA
INTERPRO http://www.ebi.ac.uk/interpro
PFAM http://www.sanger.ac.uk/Pfam
CDD/IMPALA http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
BCM Search Launcher (with programming links to several servers) http://searchlauncher.bcm.tmc.edu/seq-search/protein-search.html
bic-swa Bic server European Bioinformatics Institute http://www.ebi.ac.uk/bic_sw/
MPsearchb National Institute of Agrobiological Resources, Tsukuba, Japan http://www.dna.affrc.go.jp/htbin/mp_PP.pl
Scanps G.Barton, European Bioinformatics Institute http://barton.ebi.ac.uk ;
SSEARCH E-mail server DNA Databank of Japan http://www.ddbj.nig.ac.jp/E-mail/homology.html
Swatc Phil Green, University of Washington http://www.genome.washington.edu/UWGC/analysistools/swat.cfm
Programs and Web sites for database similarity searches with a regular expression, motif, block, or profile
Regular Expression and Motifsa
EMOTIF Scan SwissProt and Genpept http://motif.stanford.edu/emotif/emotif-scan.html
Prosite patterns SwissProt and TrEMBL http://www.expasy.ch/tools/scnpsit2.html
ISREC pattern-finding service SwissProt and non-redundant EMBL database http://www.isrec.isb-sib.ch/software/PATFND_form.html
fpat PDB SwissProt Genpept http://stateslab.bioinformatics.med.umich.edu/service/fpat/
PHI-BLAST BLAST databases http://www.ncbi.nlm.nih.gov/
MOTIF SwissProt, PDB, PIR, PRF, Genes http://www.motif.genome.ad.jp/MOTIF2.html
BLOCKS
BLOCKSb most databases http://www.blocks.fhcrc.org/blockmkr/make_blocks.html
MASTc most databases http://meme.sdsc.edu/meme/website/
BLIMPSd locally available databases anonymous FTP ncbi.nlm.nih.gov/repository/blocks/unix/blimps ftp://ncbi.nlm.nih.gov/repository/blocks/unix/blimps
Probee BLAST databases anonymous FTP ncbi.nlm.nih.gov/pub/neuwald/probe1.0 ftp://ncbi.nlm.nih.gov/pub/neuwald/probe1.0
Genefindf PIR http://pir.georgetown.edu/gfserver
PROFILE Programs
Profilesearchg locally available databases anonymous FTP ftp.sdsc.edu/pub/sdsc/biology/profile_programs ftp://ftp.sdsc.edu/pub/sdsc/biology/profile_programs
Profile-SSh most databases http://www.psc.edu/general/software/packages/profiless/profiless.html
Search Genes and Coding Regions
FGENES and related programs that use linear discriminant analysis or hidden Markov modelsa http://genomic.sanger.ac.uk/gf/gf.shtml Solovyev et al. (1995);
GeneFinder access site at the Sanger Center http://genomic.sanger.ac.uk/gf/gf.html collection of methods
Genehacker for microbial genomes based on HMMs http://www-btls.jst.go.jp/GeneHacker/ Hirosawa et al. (1997)
GeneID-3 Web server using rule-based models, and GeneID+b http://www1.imim.es/geneid.html Mail server at geneid@darwin.bu.edu
GeneMark and GeneMark.hmmc uses hidden Markov models http://opal.biology.gatech.edu/GeneMark/
GeneParsera,b Web page, uses combination of neural network and dynamic programming methods http://beagle.colorado.edu/~eesnyder/GeneParser.html http://beagle.colorado.edu/%7Eeesnyder/GeneParser.html Snyder and Stormo (1993, 1995)
Genescan using Fourier transform of DNA sequences to find characteristic patterns http://202.41.10.146/~sn055/DOC/gs.htm Tiwari et al. (1997)
Genetic code variations http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c
GenLang using linguistic methods http://www.cbil.upenn.edu/ Dong and Searls (1994)
GenScan based on probabilistic model of gene structure for vertebrate, Drosophila, and plant genes http://genes.mit.edu/GENSCAN.html Burge and Karlin (1998)
Genseqer for aligning genomic and EST sequences http://bioinformatics.iastate.edu/cgi-bin/gs.cgi Close to SplicePredictor
Glimmer uses interpolated Markov models for prokaryotic translation http://www.tigr.org/softlab/glimmer/ Salzberg et al. (1998)
GrailIIa,b prediction by neural networks based on scores of characteristic sequence patterns and composition http://compbio.ornl.gov/ Uberbacher and Mural (1991); Uberbacher et al. (1996)
Initiation codon analysis http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c
Microbial genome coding region identification based on Markov chains of order 5 http://igs-server.cnrs-mrs.fr/~audic/selfid.html Audic and Claverie (1998)
Procrustes based on comparison of related genomic sequences http://www-hto.usc.edu/software/procrustes/ Gelfand et al. (1996)
Push-button Gene Finder for gene identification using Markov and hidden Markov models http://www.cse.ucsc.edu/research/compbio/pgf/
Translate tool at ExPASy http://www.expasy.ch/tools/dna.html
Translation machine on the Web at EBI http://www2.ebi.ac.uk/translate/
Translation of large genome sequences on the Web http://alces.med.umn.edu/rawtrans.html
Veil (Viterbi exon-intron locator) uses hidden Markov models for vertebrate DNA http://www.cs.jhu.edu/labs/compbio/veil.html Henderson et al. (1997)
Webgene, a set of gene prediction tools and concurrent database similarity searches http://www.itba.mi.cnr.it/webgene/
Webgenemark and Webgenemark.hmmc http://opal.biology.gatech.edu/GeneMark/ see GeneMark; Lukashin and Borodovsky (1998)
Promoter Prediction Program
ConsInspector–see Transfac databasea http://www.gsf.de/biodv/consinspector.html
FastM for transcription factor binding sites http://transfac.gbf.de/cgi-bin/fastm/fastm.pl Klingenhoff et al. (1999)
GeneExpress analysis of transcriptional regulations with TRRD database http://wwwmgs.bionet.nsc.ru/systems/GeneExpress/ Kolchanov et al. (1999a, b)
Genome inspector for combined analysis of multiple signals in genomes http://www.gsf.de/biodv/genomeinspector.html Quandt et al. (1997) GrailIIb prediction of TSS by neural networks based on scores of characteristic sequence patterns and composition
MAR-FINDER for finding matrix attachment regions http://www.futuresoft.org/MAR-Wiz/ Kramer et al. (1997); Singh et al. (1997)
MatInspectora – Transfac database http://www.gsf.de/biodv/matinspector.html (for downloading)
Mirage (Molecular Informatics Resource for the Analysis of Gene Expression)d http://www.ifti.org/
NNPP Promoter Prediction by Neural Network for prokaryotes or eukaryotes http://www.fruitfly.org/seq_tools/promoter.html Reese et al. (1996)
NSITE–search for TF binding sites or other consensus regulatory sequences http://genomic.sanger.ac.uk/gf/gf.shtml
OOTFD Object-Oriented Transcription Factor Database http://www.ifti.org/cgi-bin/ifti/ootfd.pl Ghosh (1998)
Pol3scan for RNAP III/tRNA promoter sequences using pattern scoring matrices http://irisbioc.bio.unipr.it/genomics.html Pavesi et al. (1994)
Promoter element weight matrices and HMMs http://www.epd.isb-sib.ch/promoter_elements/ Bucher (1990)
Promoter II for recognition of PolII sequences by neural networks http://www.cbs.dtu.dk/services/promoter/ Knudsen (1999)
PromoterScane http://bimas.dcrt.nih.gov/molbio/proscan/ Prestridge (1995) and see Web site
RegScan for promoter classification http://wwwmgs.bionet.nsc.ru/mgs/programs/classprom/ Babenko et al. (1999)
Sequence walkers for graphical viewing of the interaction of regulatory protein with DNA binding site http://www-lecb.ncifcrf.gov/~toms/walker/narcoverlogowalker.html Schneider (1997)
Signal scan for transcriptional elements http://bimas.dcrt.nih.gov:80/molbio/signal/ Prestridge (1991, 1996)
TargetFinder for promoter searching in selected annotated sequences http://www.tigem.it/ Lavorgna et al. (1999)
TESS for searching for transcription factor binding sites http://www.cbil.upenn.edu/tess/ Schug and Overton (1997a, b)
Tfbind for transcription factor binding sites http://tfbind.ims.u-tokyo.ac.jp Tsunoda and Takagi (1999)
Transfac programs providing search for TF binding sites. MatInd for making scoring matrices and MatInspector for searching for matches to matrices http://www.gsf.de/cgi-bin/matsearch.pl, http://www.gsf.de/biodv/staff_pub.html, Knüppel et al. 1994);Quandt et al. (1995);Heinemeyer et al. (1999);Klingenhoff et al. (1999)
Wentian Li's Website for multiple analysis http://linkage.rockefeller.edu/wli/gene/programs.html .
Protein Structure Analysis
The PredictProtein server at the European Molecular Biology Laboratory at Heidelberg, Germany important site for secondary structure prediction by PHD, predator, TOPITS, threader http://cubic.bioc.columbia.edu/predictprotein
Swiss Institute of Bioinformatics, Geneva basic types of protein analysisd databases, the Swiss-Model resource for prediction of protein models, Swiss-PdbViewer http://www.expasy.ch/
Protein Structure Viewer
Chime http://www.umass.edu/microbio/chime/ A Web browser plug-in that can be used to display and manipulate structures inside a Web page. There are many mouse-driven controls. Excellent for lecture presentations.
Cn3da http://www.ncbi.nlm.nih.gov/Structure/ (Hogue 1997) Provides viewing of three-dimensional structures from Entrez and MMDBa. Cn3D runs on Windows, MacOS, and Unix; simultaneously displays structural and sequence alignments; can show multiple superimposed images from NMR studies.
Mage http://kinemage.biochem.duke.edu (see Richardson and Richardson 1994) Standard molecular viewing features with animation and kaleidoscope effects.
Rasmolb http://www.umass.edu/microbio/rasmol/ (Sayle and Milner-White 1995) Most commonly used viewer for Windows, MacOS, UNIX, and VMS operating systems. Performs many functions.
Swiss 3D viewer, Spdbv http://www.expasy.ch/spdbv/mainpage.html (Guex and Peitsch 1997) Protein models can be built by structural alignments; calculates atomic angles and distances, threading, energy minimation, and interacts with the Swiss Model server.
Protein Secondary Structure Prediction
Modeller http://guitar.rockefeller.edu/modeller/modeller.html dynamic programming alignment of sequences and structures and molecular dynamics methods Sali et al. (1995)
Swiss-model http://www.expasy.ch/swissmod/SWISS-MODEL.html sequence alignment of query with sequences of known structure Peitsch (1996)
Whatif http://www.cmbi.kun.nl/whatif/ flexible molecular graphics rendering of models Rodriguez et al. (1998)
Baylor College of Medicine (BCM) http://searchlauncher.bcm.tmc.edu/seq-search/struc-predict.html collection of methods and linked to other servers
DSC http://www.bmm.icnet.uk/dsc/ linear discrimination King et al. (1997)
J-Pred structure prediction server http://jura.ebi.ac.uk:8888/ NNSSP, DSC, Predator, Mulpred,b Zpred,c Jnet,e and PHD Cuff et al. (1998);
NNPRED http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html http://www.cmpharm.ucsf.edu/%7Enomi/nnpredict.html neural networks enhanced to detect sequence periodicity Kneller et al. (1990)
NPS@ server, MLR combination for secondary structure predictiona http://pbil.ibcp.fr/NPSA/ combination of prediction methods using multivariate linear regression to optimize the predictions Guermeur et al. (1999)
Protein Sequence Analysis (PSA) Systemd http://bmerc-www.bu.edu/psa/index.html discrete space models (hidden Markov models) for patterns of a helices, b strands, tight turns, and loops in specific structural classes Stultz et al. (1993, 1997); White et al. (1994)
PREDATOR http://www.embl-heidelberg.de/argos/predator/predator_info.html based on analysis of long- and short-range amino acid interactions and alignments of sequence pairs Frishman and Argos (1995, 1996, 1997)
Predict Protein server http://www.embl-heidelberg.de/predictprotein/predictprotein.html ; see also mirror sites neural networks of multiple sequence alignment Rost and Sander (1994); Rost (1996)
PSSP http://searchlauncher.bcm.tmc.edu/seq-search/struc-predict.html nearest neighbor enhanced by non-intersecting local and multiple sequence alignments Salamov and Solovyev (1995, 1997)
Simpa96 http://pbil.ibcp.fr/NPSA/ nearest-neighbor method Levin (1997)
SOPM, SOPMA http://pbil.ibcp.fr/NPSA/ nearest-neighbor method based on sequence alignments Geourjon and Deleage (1994, 1995)
SSP http://searchlauncher.bcm.tmc.edu/seq-search/struc-predict.html linear discriminant analysis based on amino acid composition of local and adjacent regions see H option for this program on Web page
UCLA-DOE structure prediction server http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html collection of methods and linked to other servers Fischer and Eisenberg (1996)
Threading servers and program
123D http://www-lmmb.ncifcrf.gov/~nicka/123D.html contact potentials between amino acid side groups Alexandrov et al. (1996)
3D-PSSM http://www.bmm.icnet.uk/~3dpssm sequence-structure using position-specific scoring matrices Russell et al. (1997)
Honig lab http://honiglab.cpmc.columbia.edu/ threading methods using biophysical properties
Libra I http://www.ddbj.nig.ac.jp/htmls/E-mail/libra/LIBRA_I.html target sequence and 3D profile are aligned by dynamic programming Ota and Nishikawa (1997)
NCBI structure site http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/threading.html Gibbs sampling algorithm used to align sequence and structurea Bryant (1996)
Profit http://lore.came.sbg.ac.at/home.html fold recognition by the contact potential method M. Sippl
Threader 2 http://insulin.brunel.ac.uk/threader/threader.html prediction by recognition of the correct fold from a library of alternatives Jones et al. (1995)
TOPITS http://www.embl-heidelberg.de/predictprotein/doc/help_05.html detects similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold Rost (1995a,b)
UCLA-DOE structure prediction server http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html fold-recognition using 3D profiles and secondary structure prediction methods Fischer and Eisenberg (1996)
CASP http://predictioncenter.llnl.gov/ overall assesment of the methods
EMBOSS ( http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/index.html#embassy ) dowloadable source codes.
alignment consensus FUNCTION AUTHOR
cons Creates a consensus from multiple alignments HGMP
megamerger Merge two large overlapping nucleic acid sequences HGMP
merger Merge two overlapping sequences HGMP
alignment differences
diffseq Find differences between nearly identical sequences HGMP
alignment dot plots
dotmatcher Produces a dotplot of two sequences. Sanger
dotpath Displays a non-overlapping wordmatch dotplot of two sequences HGMP
dottup DNA sequence dot plot Sanger
polydot Multiple dotplot Sanger
alignment global
est2genome Align EST and genomic DNA sequences Sanger
needle Needleman-Wunsch global alignment. HGMP
stretcher Global alignment of two sequences. Sanger
alignment local
matcher Local alignment of two sequences Sanger
seqmatchall Does an all-against-all comparison of a set of sequences Sanger
supermatcher Finds a match of a large sequence against one or more sequences Sanger
water Smith-Waterman local alignment. HGMP
wordmatch Finds all exact matches of a given size between 2 sequences Sanger
alignment multiple
emma Multiple alignment program HGMP
infoalign Displays some simple information about sequences HGMP
plotcon Plots the quality of conservation of a sequence alignment HGMP
prettyplot Displays aligned sequences, with colouring and boxing. Sanger
showalign Display a multiple sequence alignment HGMP
tranalign Align nucleic coding regions given the aligned proteins HGMP
display
cirdna Draws circular maps of DNA constructs Norway
lindna Draws linear maps of DNA constructs Norway
pepnet Protein helical net plot HGMP
pepwheel Shows protein sequences as helices HGMP
prettyseq Output sequence with translated ranges HGMP
remap Display a sequence with restriction cut sites, translation etc.. HGMP
seealso Finds programs sharing group names HGMP
showdb Displays information on the currently available databases HGMP
showfeat Show features of a sequence. HGMP
showseq Display a sequence with features, translation etc HGMP
sixpack Display a DNA sequence with 6-frame translation and ORFs LION
textsearch Search sequence documentation text. SRS and Entrez are faster! HGMP
edit
biosed Replace or delete sequence sections HGMP
cutseq Removes a specified section from a sequence. HGMP
degapseq Removes gap characters from sequences HGMP
descseq Alter the name or description of a sequence. HGMP
entret Reads and writes (returns) flatfile entries HGMP
extractfeat Extract features from a sequence HGMP
extractseq Extract regions from a sequence. HGMP
listor Writes a list file of the logical OR of two sets of sequences HGMP
maskfeat Mask off features of a sequence HGMP
maskseq Mask off regions of a sequence. HGMP
newseq Type in a short new sequence. HGMP
noreturn Removes carriage return from ASCII files HGMP
notseq Excludes a set of sequences and writes out the remaining ones HGMP
nthseq Writes one sequence from a multiple set of sequences HGMP
pasteseq Insert one sequence into another. HGMP
revseq Reverse and complement a sequence. HGMP
seqret Reads and writes (returns) a sequence. Sanger
seqretsplit Reads and writes (returns) sequences in individual files HGMP
skipseq Reads and writes (returns) sequences, skipping the first few HGMP
splitter Split a sequence into (overlapping) smaller sequences. HGMP
trimest Trim poly-A tails off EST sequences HGMP
trimseq Trim ambiguous bits off the ends of sequences HGMP
union Reads sequence fragments and builds one sequence LION
vectorstrip Strips out DNA between a pair of vector sequences HGMP
yank Reads a range from a sequence, appends the full USA to a list file LION
enzyme kinetics
findkm Calculates Km and Vmax for an enzyme reaction HGMP
feature tables
coderet Extract CDS, mRNA and translations from feature tables HGMP
twofeat Finds neighbouring pairs of features in sequences HGMP
information
infoseq Displays some simple information about sequences HGMP
tfm Displays a program's help documentation manual HGMP
whichdb Search all databases for an entry HGMP
wossname Finds programs by keywords in their one-line documentation. HGMP
nucleic codon usage
cai CAI codon usage statistic HGMP
chips Codon usage statistics HGMP
codcmp Codon usage table comparison HGMP
cusp Create a codon usage table HGMP
syco Synonymous codon usage Gribskov statistic plot HGMP
nucleic composition
banana Bending and Curvature Plot in B-DNA Sanger
btwisted Calculates the twisting in a B-DNA sequence HGMP
chaos Create a chaos plot for a sequence. Sanger
compseq Counts the composition of dimer/trimer/etc words in a sequence HGMP
dan Plot melting temperatures for DNA. HGMP
freak Residue/base frequency table or plot HGMP
isochore Plots isochores in large DNA sequences Sanger
sirna Finds siRNA duplexes in mRNA HGMP
wordcount Counts words of a specified size in a DNA sequence. Sanger
nucleic cpg islands
cpgplot Plot CpG rich areas HGMP
cpgreport Reports CpG rich regions HGMP
geecee Calculates the fractional GC content of nucleic acid sequences Sanger
newcpgreport Report CpG rich areas EBI
newcpgseek Reports CpG rich regions EBI
nucleic gene finding
getorf Finds and extracts open reading frames (ORFs) HGMP
marscan Finds MAR/SAR sites in nucleic sequences HGMP
plotorf Plot potential open reading frames HGMP
showorf Pretty output of DNA translations HGMP
wobble Wobble base plot HGMP
nucleic motifs
dreg Regular expression search of a nucleotide sequence Sanger
fuzznuc Nucleic acid pattern search HGMP
fuzztran Protein pattern search after translation HGMP
nucleic mutation
msbar Mutate sequence beyond all recognition HGMP
shuffleseq Shuffles a set of sequences maintaining composition HGMP
nucleic primers
eprimer3 Picks PCR primers and hybridization oligos HGMP
primersearch Searches DNA sequences for matches with primer pairs HGMP
stssearch Searches a DNA database for matches with a set of STS primers Sanger
nucleic profiles
profit Scan a sequence or database with a matrix or profile HGMP
prophecy Creates matrices/profiles from multiple alignments HGMP
prophet Gapped alignment for profiles HGMP
nucleic repeats
einverted Finds DNA inverted repeats Sanger
equicktandem Finds tandem repeats Sanger
etandem Looks for tandem repeats in a nucleotide sequence. Sanger
palindrome Looks for inverted repeats in a nucleotide sequence. HGMP
nucleic restriction
recoder Find and remove restriction sites but maintain the same translation HGMP
redata Isoschizomers, references and Suppliers for Restriction Enzymes HGMP
restover Finds restriction enzymes that produce a specific overhang Sloan-Kettering Cancer Center
restrict Finds Restriction Enzyme Cleavage Sites HGMP
silent Silent mutation restriction enzyme scan HGMP
nucleic transcription
tfscan Scans DNA sequences for transcription factors. HGMP
nucleic translation
backtranseq Back translate a protein sequence HGMP
transeq Translates nucleic acid sequences. HGMP
phylogeny
distmat Creates a distance matrix from multiple alignments HGMP
protein 2d structure
garnier Predicts protein secondary structure EBI
helixturnhelix Finds nucleic acid binding domains. HGMP
hmoment Hydrophobic moment calculation HGMP
pepcoil Predicts coiled coil regions HGMP
tmap Predict transmembrane proteins Sanger
protein composition
charge Protein charge plot HGMP
checktrans ORF property statistics EBI
emowse Protein identification by mass spectrometry HGMP
iep Calculates the isoelectric point of a protein HGMP
mwfilter Filter noisy molwts from mass spec output HGMP
octanol Displays protein hydropathy Sanger
pepinfo Plots simple amino acid properties in parallel HGMP
pepstats Protein statistics HGMP
pepwindow Displays protein hydropathy Sanger
pepwindowall Displays protein hydropathy of a set of sequences Sanger
protein motifs
antigenic Finds antigenic sites in proteins HGMP
digest Protein proteolytic enzyme or reagent cleavage digest HGMP
fuzzpro Protein pattern search HGMP
oddcomp Finds protein sequence regions with a biased composition. Norway
patmatdb Matching a Prosite motif against a Protein Sequence Database. HGMP
patmatmotifs Compares a protein sequence to the PROSITE motif database. HGMP
pestfind Finds PEST motifs as potential proteolytic cleavage sites Austria
preg Regular expression search of a protein sequence Sanger
pscan Locates fingerprints (multiple motif features) in a protein sequence. HGMP
sigcleave Predicts signal peptide cleavage sites HGMP
utils database creation
aaindexextract Extract data from AAINDEX HGMP
cutgextract CUTG: Codon Usage Tabulated from GenBank by organism HGMP
printsextract Preprocesses the PRINTS database for use with the program PSCAN HGMP
prosextract Extracts ID, AC, and PA lines from the PROSITE motif database. HGMP
rebaseextract Extract data from REBASE HGMP
tfextract Extract data from TRANSFAC HGMP
utils database indexing
dbiblast Database indexing for BLAST 1 and 2 indexed databases Sanger
dbifasta Index a fasta database HGMP
dbiflat Database indexing for flat file databases Sanger
dbigcg Database indexing for GCG formatted databases Sanger
utils misc
embossdata Finds or fetches the data files read in by the EMBOSS programs HGMP
embossversion Writes the current EMBOSS version number HGMP
PHYLIP TOOLS ( http://evolution.genetics.washington.edu/phylip/programs.html ) downloadable source codes.
Heuristic search for best tree
PROTPARS Estimates phylogenies from protein sequences (input using the standard one-letter code for amino acids) using the parsimony method, in a variant which counts only those nucleotide changes that change the amino acid, on the assumption that silent changes are more easily accomplished."
DNAPARS. Estimates phylogenies by the parsimony method using nucleic acid sequences. Allows use the full IUB ambiguity codes, and estimates ancestral nucleotide states. Gaps treated as a fifth nucleotide state."
DNACOMP. Estimates phylogenies from nucleic acid sequence data using the compatibility criterion, which searches for the largest number of sites which could have all states (nucleotides) uniquely evolved on the same tree. Compatibility is particularly appropriate when sites vary greatly in their rates of evolution, but we do not know in advance which are the less reliable ones.
DNAML. Estimates phylogenies from nucleotide sequences by maximum likelihood. The model employed allows for unequal expected frequencies of the four nucleotides, for unequal rates of transitions and transversions, and for different (prespecified) rates of change in different categories of sites, with the program inferring which sites have which rates.
NAMLK. Same as DNAML but assumes a molecular clock. The use of the two programs together permits a likelihood ratio test of the molecular clock hypothesis to be made.
RESTML. Estimation of phylogenies by maximum likelihood using restriction sites data (not restriction fragments but presence/absence of individual sites). It employs the Jukes-Cantor symmetrical model of nucleotide change, which does not allow for differences of rate between transitions and transversions. This program is VERY slow."
FITCH. Estimates phylogenies from distance matrix data under the "additive tree model" according to which the distances are expected to equal the sums of branch lengths between the species. Uses the Fitch-Margoliash criterion and some related least squares criteria. Does not assume an evolutionary clock. This program will be useful with distances computed from DNA sequences, with DNA hybridization measurements, and with genetic distances computed from gene frequencies.
KITSCH. Estimates phylogenies from distance matrix data under the "ultrametric" model which is the same as the additive tree model except that an evolutionary clock is assumed. The Fitch-Margoliash criterion and other least squares criteria are assumed. This program will be useful with distances computes from DNA sequences, with DNA hybridization measurements, and with genetic distances computed from gene frequencies.
NEIGHBOR An implementation by Mary Kuhner and John Yamato of Saitou and Nei's "Neighbor Joining Method," and of the UPGMA (Average Linkage clustering) method. Neighbor Joining is a distance matrix method producing an unrooted tree without the assumption of a clock. UPGMA does assume a clock. The branch lengths are not optimized by the least squares criterion but the methods are very fast and thus can handle much larger data sets.
ONTML. Estimates phylogenies from gene frequency data by maximum likelihood under a model in which all divergence is due to genetic drift in the absence of new mutations. Does not assume a molecular clock. An alternative method of analyzing this data is to compute Nei's genetic distance and use one of the distance matrix programs.
MIX. Estimates phylogenies by some parsimony methods for discrete character data with two states (0 and 1). Allows use of the Wagner parsimony method, the Camin-Sokal parsimony method, or arbitrary mixtures of these. Also reconstructs ancestral states and allows weighting of characters."
DOLLOP Estimates phylogenies by the Dollo or polymorphism parsimony criteria for discrete character data with two states (0 and 1). Also reconstructs ancestral states and allows weighting of characters. Dollo parsimony is particularly appropriate for restriction sites data; with ancestor states specified as unknown it may be appropriate for restriction fragments data.
Branch-and-bound exact search for best tree
DNAPENNY. Finds all most parsimonious phylogenies for nucleic acid sequences by branch-and-bound search. This may not be practical (depending on the data) for more than 10 or 11 species.
PENNY. Finds all most parsimonious phylogenies for discrete-character data with two states, for the Wagner, Camin-Sokal, and mixed parsimony criteria using the branch-and-bound method of exact search. May be impractical (depending on the data) for more than 10-11 species.
DOLPENNY. Finds all most parsimonious phylogenies for discrete-character data with two states, for the Dollo or polymorphism parsimony criteria using the branch-and-bound method of exact search. May be impractical (depending on the data) for more than 10-11 species.
CLIQUE. Finds the largest clique of mutually compatible characters, and the phylogeny which they recommend, for discrete character data with two states. The largest clique (or all cliques within a given size range of the largest one) are found by a very fast branch and bound search method. The method does not allow for missing data. For such cases the T (Threshold) option of MIX may be a useful alternative. Compatibility methods are particular useful when some characters are of poor quality and the rest of good quality, but when it is not known in advance which ones are which.
Distances or bootstrap samples
DNADIST Computes four different distances between species from nucleic acid sequences. The distances can then be used in the distance matrix programs. The distances are the Jukes-Cantor formula, one based on Kimura's 2- parameter method, Jin and Nei's distance which allows for rate variation from site to site, and a maximum likelihood method using the model employed in DNAML. The latter method of computing distances can be very slow.
PROTDIST Computes a distance measure for protein sequences, using maximum likelihood estimates based on the Dayhoff PAM matrix, Kimura's 1983 approximation to it, or a model based on the genetic code plus a constraint on changing to a different category of amino acid. The distances can then be used in the distance matrix programs
SEQBOOT Reads in a data set, and produces multiple data sets from it by bootstrap resampling. Since most programs in the current version of the package allow processing of multiple data sets, this can be used together with the consensus tree program CONSENSE to do bootstrap (or delete-half-jackknife) analyses with most of the methods in this package. This program also allows the Archie/Faith technique of permutation of species within characters.
GENDIST Computes one of three different genetic distance formulas from gene frequency data. The formulas are Nei's genetic distance, the Cavalli- Sforza chord measure, and the genetic distance of Reynolds et. al. The former is appropriate for data in which new mutations occur in an infinite isoalleles neutral mutation model, the latter two for a model without mutation and with pure genetic drift. The distances are written to a file in a format appropriate for input to the distance matrix programs.
FACTOR Takes discrete multistate data with character state trees and produces the corresponding data set with two states (0 and 1). Written by Christopher Meacham
Tree manipulation, plotting, consensus
DRAWGRAM Plots rooted phylogenies, cladograms, and phenograms in a wide variety of user-controllable formats. The program is interactive and allows previewing of the tree on PC graphics screens, and Tektronix or DEC graphics terminals. Final output can be on a laser printer (such as the Apple Laserwriter or HP Laserjet), on graphics screens or terminals, in files readable by drawing programs such as PC Paintbrush, MacDraw, Idraw, and Xfig, on pen plotters (Hewlett-Packard or Houston Instruments) or on dot matrix printers capable of graphics
DRAWTREE Similar to DRAWGRAM but plots unrooted phylogenies
CONSENSE Computes consensus trees by the majority-rule consensus tree method, which also allows one to easily find the strict consensus tree. Does NOT compute the Adams consensus tree. Trees are input in a tree file in standard nested-parenthesis notation, which is produced by many of the tree estimation programs in the package. This program can be used as the final step in doing bootstrap analyses for many of the methods in the package
RETREE Reads in a tree (with branch lengths if necessary) and allows you to reroot the tree, to flip branches, to change species names and branch lengths, and then write the result out. Can be used to convert between rooted and unrooted trees.
Interactive tree manipulation
DNAMOVE Interactive construction of phylogenies from nucleic acid sequences, with their evaluation by parsimony and compatibility and the display of reconstructed ancestral bases. This can be used to find parsimony or compatibility estimates by hand.
MOVE Interactive construction of phylogenies from discrete character data with two states (0 and 1). Evaluates parsimony and compatibility criteria for those phylogenies and displays reconstructed states throughout the tree. This can be used to find parsimony or compatibility estimates by hand.
DOLMOVE Interactive construction of phylogenies from discrete character data with two states (0 and 1) using the Dollo or polymorphism parsimony criteria. Evaluates parsimony and compatibility criteria for those phylogenies and displays reconstructed states throughout the tree. This can be used to find parsimony or compatibility estimates by hand.
RETREE Reads in a tree (with branch lengths if necessary) and allows you to reroot the tree, to flip branches, to change species names and branch lengths, and then write the result out. Can be used to convert between rooted and unrooted trees. Does not refer to any data.
List of Other Phylogenetic Analysis Tools (http://evolution.genetics.washington.edu/phylip/software.html)
EBI Tools http://www.ebi.ac.uk/Tools/index.html
Homology & Similarity http://www.ebi.ac.uk/Tools/homology.html programs can be used to look for sequence similarity
http://www.ebi.ac.uk/blast/index.html - the BLAST
http://www.ebi.ac.uk/fasta/index.html or Fasta
Protein Functional Analysis http://www.ebi.ac.uk/Tools/protein.html
http://www.ebi.ac.uk/interpro/scan.html InterProScan
Structural Analysis http://www.ebi.ac.uk/Tools/structural.html can be used to search for motifs in your protein sequence
http://www.ebi.ac.uk/msd-srv/ssm - MSDfold
http://www.ebi.ac.uk/dali/ or DALI
Sequence Analysis http://www.ebi.ac.uk/Tools/sequence.html can be used to query your protein structure and compare it to those in the Protein Data Bank (PDB)
http://www.ebi.ac.uk/clustalw/index.html - ClustalW
Miscellaneous Tools http://www.ebi.ac.uk/Tools/misc.html a sequence alignment tool
http://www.ebi.ac.uk/microarray/ExpressionProfiler/ep.html Expression Profiler: A set of tools for clustering, analysis and visualization of gene expression and other genomic data
EXPASY TOOLS http://expasy.ch/
Proteomics and sequence analysis tools http://expasy.ch/tools/
Proteomics http://expasy.ch/tools/
http://expasy.ch/tools/peptident.html PeptIdent
http://expasy.ch/tools/peptide-mass.html PeptideMass
DNA -> Protein http://expasy.ch/tools/
http://expasy.ch/tools/dna.html Translate
Similarity searches http://expasy.ch/tools/
http://expasy.ch/tools/blast/ BLAST
Pattern and profile searches http://expasy.ch/tools/
http://expasy.ch/tools/scanprosite/ ScanProsite
Post-translational modification and topology prediction http://expasy.ch/tools/
Primary structure analysis http://expasy.ch/tools/
http://expasy.ch/tools/protparam.html ProtParam
http://expasy.ch/tools/pi_tool.html, pI/MW
http://expasy.ch/cgi-bin/protscale.pl ProtScale
Secondary and tertiary structure prediction http://expasy.ch/tools/
http://expasy.ch/swissmod/SWISS-MODEL.html SWISS-MODEL
http://expasy.ch/spdbv/ Swiss-PdbViewer
Alignment http://expasy.ch/tools/
http://www.ch.embnet.org/software/TCoffee.html T-COFFEE
http://expasy.ch/tools/sim-prot.html SIM
Biological text analysis http://expasy.ch/tools/
http://expasy.ch/melanie/ Software for 2-D PAGE analysis
Roche Applied Science's Biochemical Pathways http://expasy.ch/cgi-bin/search-biochem-index
RCSB-Developed Software
mmCIF Resources
CIFTr http://pdb.rutgers.edu/mmcif/CIFTr/index.html
CIFLIB http://pdb.rutgers.edu/mmcif/CIFLIB/index.html C language application program interface
CIFOBJ http://pdb.rutgers.edu/mmcif/CIFOBJ/index.html A class library of mmCIF dictionary access tools
CIFPARSE http://pdb.rutgers.edu/mmcif/CIFPARSE/index.html A library of access tools for mmCIF
CIFPARSE-OBJ http://pdb.rutgers.edu/mmcif/CIFPARSE-OBJ/index.html A library of access tools for mmCIF in C++
CIFTABLE (SSTable) http://pdb.rutgers.edu/mmcif/SSTABLE/index.html A class library of table access tools (old version)
CIFTABLE (ISTable) http://pdb.rutgers.edu/mmcif/ISTABLE/index.html A class library of table access tools
mmCIF loader http://pdb.rutgers.edu/mmcif/MMCIF-LOADER/index.html An application to load mmCIF data into relational databases and XML
OpenMMS Toolkit http://openmms.sdsc.edu A suite of Java source code that includes an mmCIF parser, RDBMS loader, XML translator, and Corba server
STAR (CIF) parser http://pdb.sdsc.edu/index.html Several object-oriented Perl modules for parsing mmCIF files and other STAR-compliant files without nested loops
Deposition Resources
ADIT - Workstation Version (alpha release) http://pdb.rutgers.edu/mmcif/ADIT/index.html A package for editing and checking structure data entries
MAXIT http://pdb.rutgers.edu/mmcif/MAXIT/index.html An application for processing and curation of macromolecular structure data
PDB_EXTRACT http://pdb.rutgers.edu/mmcif/demo.tar.gz (download) Tools and examples for extracting mmCIF data from structure determination applications
PDB Validation Suite (beta version) http://pdb.rutgers.edu/mmcif/VAL/index.html A tool for processing and checking structure data
FTP Archive Resources
bnl2rcsb ftp://ftp.rcsb.org/pub/pdb/software/ Perl script to convert a BNL FTP directory structure to an RCSB FTP directory structure
getPdbUpdate ftp://ftp.rcsb.org/pub/pdb/software/ Perl script to retrieve files from any update found at
Other Software Links*
mmCIF software tools
CBFLib http://www.bernstein-plus-sons.com/software/CBF/
A library of ANSI-C functions providing a simple mechanism for accessing Crystallographic Binary Files (CBF files) and Image-supporting CIF (imgCIF) files
cif2pdb http://www.bernstein-plus-sons.com/software/cif2pdb/ Program to convert mmCIF to pseudo-PDB format
CIFtbx2 http://www.bernstein-plus-sons.com/software/ciftbx/
Extended CIF Tool Box (Fortran) with CYCLOPS and cif2cif
OOSTAR http://www.sdsc.edu/pb/cif/OOSTAR.html
Applications to manipulate STAR files (Objective-C)
pdb2cif http://www.bernstein-plus-sons.com/software/pdb2cif/
Scripts to filter a PDB entry and produce mmCIF
Crystallography
ARP/wARP http://www.embl-hamburg.de/ARP/ A system for the refinement of protein structures via automatic updating and re-building of the model and solvent structure
CCP4 http://www.dl.ac.uk/CCP/CCP4/main.htmlA suite of programs covering all aspects of crystallographic structure determination, refinement and analysis
CNS http://cns.csb.yale.edu/v1.0/ A system for structure determination from crystallographic or NMR data
o
MAIN http://www-bmb.ijs.si/doc/An interactively driven suite of programs for molecular modeling, density modification, model refinement and structure analysis
http://imsb.au.dk/~mok/o/ An interactive system for building and manipulating models in electron density maps
SHELX http://shelx.uni-ac.gwdg.de/SHELX/ A set of programs for direct structure solution and refinement with high resolution diffraction data
SOLVE http://www.solve.lanl.gov/ An automated system for phase determination from MIR and MAD data
X-PLOR 3.851 http://xplor.csb.yale.edu/xplor-info/xploronline.html A program for structure determination from crystallographic or NMR data (Yale version)
X-PLOR/CNX http://www.accelrys.com/products/cnx/ A program for structure determination from crystallographic or NMR data (Accelrys version)
XtalView http://www.scripps.edu/pub/dem-web/toc.html An interactive system for building and manipulating models in electron density map and for phase determination from MIR or MAD data.
NMR
CNS http://cns.csb.yale.edu/v1.0/ A system for structure determination from crystallographic or NMR data
CYANA http://www.guentert.com/Cyana.html A program for the structure calculation of biological macromolecules on the basis of conformational constraints from NMR
Fantom http://www.scsb.utmb.edu/fantom/fm_home.html A program for structure calculation and refinement using torsion angle minimization with NMR data
X-PLOR 3.851 http://xplor.csb.yale.edu/xplor-info/xploronline.html A program for structure determination from crystallographic or NMR data (Yale version)
Structure Analysis and Verification
CE/CL http://cl.sdsc.edu/ Software for structure comparison by Combinatorial Extension (CE) and Compound Likeness (CL)
ENDscript http://genopole.toulouse.inra.fr/ENDscript
A Web server for searching homologous sequences and giving information on secondary structure elements, accessibility, hydropathy and protein-protein contacts
ESPript http://genopole.toulouse.inra.fr/ESPript Easy Sequencing in Postscript
Non-covalent bond finder http://www.umass.edu/microbio/chime/find-ncb/index.htm Software for finding non-covalent interactions for use with Chime 2 or higher
PASS http://www.delanet.com/~bradygp/pass A fast cavity-detection program for the identification and visualization of possible protein binding sites
Procheck http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html A program that checks the stereochemical quality of a protein structure
ProFit http://www.biochem.ucl.ac.uk/~martin/text/ProFit.readme A program for fitting protein structures on to each other
SARF2 http://123d.ncifcrf.gov/sarf2.html A program which searches for similar structural motifs (via an analysis of backbone fragments) in protein structures
Surface Racer http://monte.biochem.wisc.edu/~tsodikov/surface.html
A program that calculates exact accessible surface area, molecular surface area and average curvature of molecular surface, and analyzes cavities in the protein interior inaccessible from the outside.
SURFNET http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.html A program which generates surfaces and void regions between molecular surfaces
WHAT_CHECK http://www.sander.embl-heidelberg.de/whatcheck/ A system for protein structure validation derived from the WHAT IF program
WHAT IF http://www.cmbi.kun.nl/whatif/A protein structure analysis program that may be used for mutant prediction, structure verification and molecular graphics
Modeling and Simulation
ANALYZE http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/analyze/index.asp Cornell Theory Center program to classify and analyze conformations obtained from global searches; includes capabability to compare NMR intensites and coupling constants to experimental data
AMBER http://www.amber.ucsf.edu/amber/amber.html Assisted Model Building with Energy Refinement - a molecular dynamics and energy minimization program
AutoDock3.0 http://www.scripps.edu/pub/olson-web/dock/autodock
A suite of automated docking tools designed to predict how small molecules, such as substrate or drug candidates, bind to a receptor of known 3D structure
CHARMM http://yuri.harvard.edu/ Chemistry at HARvard Molecular Mechanics - a molecular dynamics and energy minimization program
ECEPPAK http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/eceppak/index.asp Cornell Theory Center package to carry out global conformational searches using the ECEPP/3 force field
FTDOCK http://www.bmm.icnet.uk/docking/ A program for carrying out rigid-body docking between biomolecules
GROMOS http://www.igc.ethz.ch/gromos/ A general-purpose molecular dynamics computer simulation package for the study of biomolecular systems
GROMACS http://md.chem.rug.nl/~gmxComplete modelling package for proteins, membrane systems and more, including fast molecular dynamics, normal mode analysis, essential dynamics analysis and many trajectory analysis utilities
ICM http://www.molsoft.com/MolSoft ICM programs and modules for applications including for structure analysis, modeling, docking, homology modeling and virtual ligand screening
JACKAL http://trantor.bioc.columbia.edu/~xiang/jackal/
Suite of tools for model building, structure prediction and refinement, reconstruction, and minimization; for SGI, Linux, and Sun Solaris
LOOPP http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/loopp/index.asp Linear Optimization of Protein Potentials. Cornell Theory Center program for potential optimization and alignments of sequences and structures
MAMMOTH http://icb.mssm.edu/services/mammoth/mammoth
MAtching Molecular Models Obtained from THeory - a program for automated pairwise and multiple structural alignments; for SGI, Linux, and Sun Solaris
MidasPlus http://www.cgl.ucsf.edu/Outreach/midasplus/A program for displaying, manipulating and analysing macromolecules
MODELLER http://guitar.rockefeller.edu/modeller/modeller.html A program for automated protein homology modeling
MOIL http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/moil/index.asp Cornell Theory Center package for molecular dynamics simulation of biological molecules
NAMD http://www.ks.uiuc.edu/Research/namd/ A parallel object-oriented molecular dynamics simulation program
WAM - Web Antibody Modelling http://antibody.bath.ac.uk A server for automated structure modeling from antibody Fv sequences
123D http://123d.ncifcrf.gov/123D+.htmlA program which threads a sequence through a set of structures using substitution matrix, secondary structure prediction and contact capacity potential
Molecular Graphics
BioEditor http://bioeditor.sdsc.edu/
Shockwave 3D PDB Viewer http://www.candomultimedia.com/medical A tool for creating and viewing dynamic, formatted structure annotations; for Windows
Free, easy to use tool for viewing molecular structures through a Web page--streams data directly from PDB on PC's and Mac; developed in Ireland
Chemscape Chime http://www.mdlchime.com/chime/
From MDL Information Systems. This program allows visualisation of structures within WWW browser pages. For further information about Chime see the UMass Chime Resources Page http://www.umass.edu/microbio/chime/
Java3D Molecular Visualisation System http://www.adcworks.com/projects/jmvs
Free Java/Java3D progam and source code
Mage and Kinemages http://kinemage.biochem.duke.edu/kinemage/kinemage.phpInteractive molecular display for research and educational uses. Free, open source for Macintosh, PC, Unix, and Linux. A Java version does 3-D Web display without plug-ins.
MOLMOL http://www.mol.biol.ethz.ch/wuthrich/software/molmol/
A program for displaying, analyzing, and manipulating the 3-D structure of biological macromolecules, with special emphasis on the study of protein or DNA structures determined by NMR
RasMol http://www.bernstein-plus-sons.com/software/rasmol/A free viewing system for PDB coordinate files that runs on Macintosh, PC and UNIX systems. Open source versions http://www.openrasmol.org/
Raster3D http://www.bmsc.washington.edu/raster3d/raster3d.htmlA set of tools for generating high quality raster images of proteins or other molecules. Freeware for UNIX, LINUX and PC.
RasTop (v. 2.0) http://www.geneinfinity.org/rastopA free user-friendly graphical interface to RasMol molecular visualization software (v. 2.7.2.1), available for Windows platforms
Ribbons http://sgce.cbse.uab.edu/ribbons/ A program for molecular illustration and error analysis
RmscopII http://rmscopii.sourceforge.net/
A Tcl/Tk script responsible to redirect PDB files or RasMol scripts to multiple RasMol sessions; can be used as a Web browser helper application or as a standalone program.
Swiss PDB viewer available from Switzerland http://www.expasy.ch/spdbv/ | Australia
A 3D graphics and molecular modeling program for the simultaneous analysis of multiple models and for model-building into electron density maps. The software is available for Macintosh or PC
Uppsala Electron Density Server http://portray.bmc.uu.se/eds/ Generated density maps
MolScript http://www.avatar.se/molscript/ A program for displaying structures in both detailed and schematic formats and writing images in various formats
MolView and MolView Lite http://www.danforthcenter.org/smith/MolView/molview.html Free molecular visualization programs for the Macintosh
PDB2MGIF http://www.dkfz-heidelberg.de/spec/pdb2mgif/
Free, user-friendly server that converts PDB files to animated gif files that can be used in Web pages and presentations. Simple step-by-step instructions can be found here http://www.rcsb.org/pdb/animation.html .
PocketMol http://birg.cs.wright.edu/pocketmol/pocketmol.html
Program to view and manipulate PDB files on a PocketPC
ProteinScope http://www.proteinscope.com
Free viewer to display and manipulate PDB files and create animations and slides of proteins
PyMOL http://www.pymol.org
A free and open-source molecular graphics system for visualization, animation, editing, and publication-quality imagery. PyMOL is scriptable and can be extended using the Python language. Supports Windows, Mac OSX, and Unix
Qmol http://lancelot.bio.cornell.edu/jason/qmol.html
A lightweight OpenGL based molecular viewer for Windows 95/NT/00 and X Windows
ViewerLite and ViewerPro (Discovery Studio) http://www.accelrys.com/dstudio/ds_viewer/ Molecular visualization programs for Macintosh and PC from Accelrys
VMD http://www.ks.uiuc.edu/Research/vmd/VMD (Visual Molecular Dynamics) runs on many platforms including MacOS X, and several versions of Unix and Windows. VMD provides visualization, analysis, and Tcl/Python scripting features, and has recently added sequence browsing and volumetric rendering features. VMD is distributed free of charge.
WebMol http://www.cmpharm.ucsf.edu/~walther/webmol.html A Java PDB Viewer. WebMol was designed to display and analyze structural information contained in the Protein Data Bank (PDB). It can be run as an applet or as a stand-alone application.
World Index of Molecular Visualization Resources http://molvis.sdsc.edu/visres/
A Visitor-Maintained Indices (VMI)TM Site by Eric Martz and Trevor D. Kramer. Contains many links to visualization tools, tutorials, and other resources.
TIGR Tools http://www.tigr.org/software/
Gene Finding/Annotation
MANATEE http://manatee.sourceforge.net/ is a web-based gene evaluation and genome annotation tool. Manatee can store and view annotation for prokaryotic and eukaryotic genomes. The Manatee interface allows biologists to quickly identify genes and make high quality functional assignments, such as GO classifications, using search data, paralogous families, and annotation suggestions generated from automated analysis.
GlimmerM http://www.tigr.org/software/glimmerm/.related organisms. A gene finder derived from Glimmer, but developed specifically for eukaryotes. It is based on a dynamic programing algorithm that considers all combinations of possible exons for inclusion in a gene model and chooses the best of these combinations. The decision about what gene model is best is a combination of the strength of the splice sites and the score of the exons generated by an interpolated Markov model (IMM). The system has been trained for Arabidopsis thaliana, Oryza sativa (rice), and Plasmodium falciparum (the malaria parasite), and should work well on closely
Glimmer http://www.tigr.org/software/glimmer/ A system for finding genes in microbial DNA, especially the genomes of bacteria and archaea. (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA.
GeneSplicer : A computational method for splice site prediction http://www.tigr.org/tdb/GeneSplicer/gene_spl.html A fast, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been trained and tested successfully on Plasmodium falciparum (malaria), Arabidopsis thaliana and human genomes. Training data sets for Human and Arabidopsis thaliana are included. It is fully described in Pertea M, Lin X, Salzberg SL. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 2001 Mar 1;29(5):1185-90 .
TransTerm http://www.tigr.org/software/transterm.html is a program that finds rho-independent transcription terminators in bacterial genomes. Each terminator found by the program is assigned a confidence value that provides an estimate of its probability of being a true terminator. TransTerm has been published: Prediction of Transcription Terminators in Bacterial Genomes Ermolaeva, M.D., Khalak, H.G., .White, O., Smith, H.O., Salzberg, S.L. Journal of Molecular Biology 301, 27-33 (2000)
EXONomy http://www.tigr.org/software/Exonomy/index.shtml is a new gene finder based on the Generalized Hidden Markov Model (GHMM) framework, similar to Genscan and Genie. It is highly reconfigurable and includes software for retraining. The replaceable submodels of the GHMM include homogeneous and inhomogeneous Markov models of selectable order, nonstationary Markov chains, windowed and non-windowed Weight Array Matrices (WWAM/WAM/WMM), Maximal Dependence Decomposition (MDD) trees, and codon bias. An EXONomy Web Interface is available.
Unveil http://www.tigr.org/software/Unveil/index.shtml is a new gene finder based on a 283-state Hidden Markov Model (HMM) similar to that described in [Henderson,J., Salzberg,S., and Fasman,K.H. (1997) J. Comput. Biol. 4, 127-141]. An Unveil Web Interface is available.
ELPH http://www.tigr.org/software/ELPH/index.shtml is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, assuming that each sequence contains one copy of the motif.
RepeatFinder ftp://ftp.tigr.org/pub/software/repeatFinder/ is a computational system for analysis of repetitive structure of genomic sequences. The method uses suffix trees for efficient computation of exact repeats and organizes those repeats into classes. The method can be applied to individual genome sequences or sets of sequences. The output is multi-fasta file of found repeat sequences that can be used as the target of searches.
RBSfinder ftp://ftp.tigr.org/pub/software/RBSfinder/ is a Perl script that implements an algorithm to find ribosome binding sites for genes in bacterial and archaeal genomes. It is normally run as a post-processor to the Glimmer gene finder or to other prokaryotic gene finders.
Combiner http://www.tigr.org/software/combiner/ is a program that predicts gene models using the output from other annotation software. It uses a statistical algorithm to identify patterns of evidence corresponding to gene models.
HBQCM: ftp://ftp.tigr.org/pub/software/qc/ Hexamer Based Quality Control Method as described in White O., Dunning T., Sutton G., Adams M., Venter J.C., and Fields C. (1993) A quality control algorithm for DNA sequencing projects. Nucleic Acids Research 21:3829-3838.
Alignment
MUMmer http://www.tigr.org/software/mummer/ A system for aligning whole genome sequences. Using an efficient data structure called a suffix tree, the system is able rapidly to align sequences containing millions of nucleotides. It is fully described in: A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg. Alignment of whole genomes. Nucleic Acids Research, 27:11 (1999), 2369-2376. A graphical viewer for the MUMmer output can be found here.
AAT ftp://ftp.tigr.org/pub/software/AAT/: A tool for analyzing and annotating genomic sequences. Huang, X., Adams, M.D., Zhou, H. and Kerlavage, A.R. (1997) Genomics 46, 37-45. The AAT package includes two sets of programs, one set (DPS/NAP) for comparing the query sequence with a protein database, and the other (DDS/GAP2) for comparing the query with a cDNA database.
Sequencing/Finishing
Assembler: http://www.tigr.org/software/assembler/ A tool for assembly of large sets of overlapping sequence data such as ESTs, BACs, or small genomes. This updated assembly tool delivers better performance and results than the previous version, assembling EST, BAC, and genome data with greater care given to repeat detection and contig-level overlapping. TIGR Assembler has been published (Sutton G., White, O., Adams, M., and Kerlavage, A. (1995) TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science & Technology 1:9-19). Also available, without a license, is the utility ta2ace for converting TIGR Assembler output into the "new" .ACE format used by Consed and other sequence assembly editors.
BAMBUS http://www.tigr.org/software/bambus/ is the first publicly available genome sequence scaffolding program. It orders and orients contigs into scaffolds based on various types of linking information. Additionally, BAMBUS allows users to build scaffolds in a hierarchical fashion by prioritizing the order in which links are used. BAMBUS runs on Unix systems.
Lucy: http://www.tigr.org/software/lucy/ A Sequence Cleanup Program. Lucy is a utility that prepares raw DNA sequence fragments for sequence assembly, possibly using the TIGR Assembler. The cleanup process includes quality assessment, confidence reassurance, vector trimming and vector removal. The primary advantage of Lucy over other similar utilities is that it is a fully integrated, stand alone program. You can view the Program Requirements. The Windows version of Lucy is available from Hui-Hsien Chou's webpage. Lucy is fully described in: DNA sequence quality trimming and vector removal. H.-H. Chou and M.H. Holmes. Bioinformatics, 17:12, pp. 1093-1104, 2001
Microarray
TM4: A package of Open Source software programsfor Microarray analysis http://www.tigr.org/software/tm4/ TIGR Microarray Data Analysis System (MIDAS) is a microarray data quality filtering and normalization tool that allows raw experimental data to be processed through various data normalizations, filters, and transformations via a user-designed analysis pipeline. Currently implemented normalization and data analysis algorithms include total-intensity normalization, Lowess (Locfit) normalization, flip-dye consistency checking, replicates analysis, intensity-dependent z-score filtering (slice analysis), etc. MIDAS is implemented by Java language and thus a platform-independent application. It requires JDK v1.3 or higher. Refer to the included manual for details.
MADAM (MicroArray DAta Manager) Microarray experiments produce large amounts of data for even the simplest of experiments. In order to analyze data from many experiments that data must be stored in an accessible form, such as in a database. MADAM (MicroArray DAta Manager) is a java-based application designed to load and retrieve microarray data to and from a database (also supplied with the software). MADAM provides data entry forms, data report forms and additional applications necessary to maintain microarray data for further analysis. Madam requires JRE 1.3.1.
TIGR MultiExperiment Viewer (MEV) is a Java application designed to allow the analysis of microarray data to identify patterns of gene expression and differentially expressed genes. Numerous normalization, clustering and distance algorithms have been implemented, along with a variety of graphical displays to best present the results. MEV was written to be flexible and expandable, and supports a variety of input and output formats. MEV requires version 1.2 or higher of Sun's JRE and J3D package.
TIGR Spotfinder is a software tool designed for Microarray image processing using the TIFF image files generated by most microarray scanners. TIGR Spotfinder was written in C/C++ for PCs running Windows NT/2000/ME/XP.
ArrayViewer http://www.tigr.org/tigr-scripts/license/new.pl?genre=soft&program=ArrayViewer is written in Java for cross-platform compatibility and reads and writes data using flat files or a database through stored procedures, See the ArrayViewer Overview as a Adobe Acrobat PDF File. Machines that lack the requirements for the MultiExperiment Viewer may use ArrayViewer for single experiment analysis. A software tool designed to facilitate the presentation and analysis of microarray expression data, leading to the identification of genes that are differentially expressed.
TIGR McCoder ftp://ftp.tigr.org/pub/software/Microarray/McCoder/ is a software package designed for a portable scanner with Palm OS to collect bar codes and then transfer the bar codes to PC as a plain text file. The package includes two programs: one that runs on the handheld scanner and one that runs on a regular PC with Windows 95/98/2000/NT. Transferred to PC, the scanned bar codes could be manipulated easily with McCoder.
Scheduler ftp://ftp.tigr.org/pub/software/Microarray/Scheduler/ is a web based tool that provides an efficient reservation method to manage lab instruments and office facilities. The Scheduler is designed as a two-tier system running on the Internet and can be configured to meet a variety of requirements.
NCBI Tools http://www.ncbi.nlm.nih.gov/
The Basic Local Alignment Search Tool (BLAST http://www.ncbi.nlm.nih.gov/BLAST/), for comparing gene and protein sequences against others in public databases, now comes in several flavors including PSI-BLAST, PHI-BLAST, and BLAST 2 sequences. Specialized BLASTs are also available for human, microbial, malaria, and other genomes, as well as for vector contamination, immunoglobulins, and tentative human consensus sequences.
Clusters of Orthologous Groups (COGs http://www.ncbi.nlm.nih.gov/COG/) currently covers 21 complete genomes from 17 major phylogenetic lineages. A COG is a cluster of very similar proteins found in at least three species. The presence or absence of a protein in different genomes can tell us about the evolution of the organisms, as well as point to new drug targets.
Map Viewer http://www.ncbi.nlm.nih.gov/mapview/static/MVstart.html shows integrated views of chromosome maps for 17 organisms. Used to view the NCBI assembly of complete genomes, including human, Map Viewer is a valuable tool for the identification and localization of genes, particularly those that contribute to diseases.
LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/ combines descriptive and sequence information on genetic loci through a single query interface. LocusLink covers information on official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, OMIM numbers, UniGene clusters, homology, map information, and related web sites.
UniGene http://www.ncbi.nlm.nih.gov/UniGene/ cluster is a non-redundant set of sequences that represents a unique gene. Well-characterized genes, as well as thousands of expressed sequence tag (EST) sequences have been included. Each cluster record also contains information such as the tissue types in which the gene has been expressed and map location. UniGene can assist in gene discovery, gene mapping projects, and large-scale expression analysis.
ORF finder http://www.ncbi.nlm.nih.gov/gorf/gorf.html identifies all possible ORFs in a DNA sequence by locating the standard and alternative stop and start codons. The deduced amino acid sequences can then be used to BLAST against GenBank. ORF finder is also packaged in the sequence submission software Sequin.
Electronic PCR http://www.ncbi.nlm.nih.gov/genome/sts/epcr.cgi allows you to search your DNA sequence for sequence tagged sites (STSs), which have been used as landmarks in various types of genomic maps. It compares the query sequence against data in NCBI's UniSTS, a unified, non-redundant view of STSs from a wide range of sources.
VAST Search http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html is a structure-structure similarity search service. It compares 3D coordinates of a newly determined protein structure to those in the MMDB/PDB database. VAST Search computes a list of similar structures that can be browsed interactively, using molecular graphics to view superimpositions and alignments.
The Cancer Chromosome Aberration Project (CCAP) http://www.ncbi.nlm.nih.gov/CCAP/ compiles information on the distinct chromosome aberrations that are associated with different cancers. The identification of chromosomal abnormalities by clinicians can enable the diagnosis of, classification of, and treatment selection for a given cancer.
HumanMouse Homology Maps http://www.ncbi.nlm.nih.gov/Homology/ compare genes in homologous segments of DNA from human and mouse sources, sorted by position in each genome. A total of 1793 loci are presented, most of which are genes. This map should be interpreted as a reflection of probable, not confirmed, homology relationships because of the lack of further information available for about half the loci.
VecScreen http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html is a tool for identifying segments of a nucleic acid sequence that may be of vector, linker or adapter origin prior to sequence analysis or submission. VecScreen was developed to combat the problem of vector contamination in public sequence databases. dbMHC provides an open, publicly accessible platform for DNA, and clinical data related to the human Major Histocompatibilty Complex (MHC). In addition the dbMHC will provide tools for further submission and analysis of research data linked to the MHC.
The Cancer Genome Anatomy Project (CGAP) http://www.ncbi.nlm.nih.gov/ncicgap/ aims to decipher the molecular anatomy of cancer cells. CGAP develops profiles of cancer cells by comparing gene expression in normal, precancerous, and malignant cells from a wide variety of tissues.
mRNA to Genomic Alignments: Spidey http://www.ncbi.nih.gov/IEB/Research/Ostell/Spidey aligns one or more mRNA sequences to a single genomic sequence. Spidey will try to determine the exon/intron structure, returning one or more models of the genomic structure, including the genomic/mRNA alignments for each exon.
Biology WorkBench http://biowb.sdsc.edu/
Protein Tools
Ndjinn Multiple Database Search
BL2SEQ Compare proteins to each other with BLAST
BL2SEQX Compare a protein to nucleotide sequences with BLAST
BLASTP Compare a PS to a PS DB
TBLASTN Compare a PS to a translated DB
PSIBLASTP Position Specific Iterative BLAST
FASTA Heuristic Sequence Similarity Search (PS Or DB)
TFASTA Compare a PS to a NS, PS DB
TFASTX Comp PS to Trans DNA (NS Or DB)
TFASTY Comp PS to Trans DNA (NS Or DB)
SSEARCH Smith Waterman Local Alignment of Proteins
CLUSTALW Multiple Sequence Alignment
CLUSTALWPROF Align Sequences to Existing Alignment (Profile)
ALIGN Optimal Global Alignment of Two PS
MSA Multiple Sequence Alignment (Sum of Pairs Criterion)
LALIGN Calculate N Best Local PS Alignments
LFASTA Local Alignment of Two PS
ROBUST Global alignment of Two PS (Show Robust Pairs)
SIM N Best Local Similarities Using Affine Weights
BESTSCOR Calculate the Best Self Comparison Score
CTREE Align protein sequences with confidence estimates
PRSS Compare a PS to a Shuffled PS
SAPS Statistical Analysis of PS
AASTATS Statistics Based on Amino Acid Abundance, including weight and specific volume
GREASE Kyte Doolittle Hydropathy Profile
RPSBLAST Compare a PS to a Conserved Domain DB
FINGERPRINTSCAN PRINTS fingerprint identification
PROSEARCH Search Prosite DB for Patterns in a PS
PPSEARCH Search Prosite DB for Patterns in a PS
PFSCAN Sequence Search Against a Set of Profiles (PROSITE and PFAM)
HMMPFAM Search against Pfam HMM database
BLIMPS Sequence Search Against a Set of Profiles (BLOCKS)
PATTERNMATCHDB Search for Regular Expressions (Patterns) in a protein sequence DB
PATTERNMATCH Search for Regular Expressions (Patterns) in a protein sequence
GOR4 Predict Secondary Structure of PS
RANDSEQ Randomize a Sequence
CHOFAS Predict Secondary Stucture of PS(s) (Chou Fasman)
HTH Predict HTH Motifs in Protein Chains
PELE Protein Structure Prediction
DSSP Secondary Structure/Solvent Exposure of PDB Proteins
TMAP Prediction of Transmembrane Segments
TMHMM Predict location of transmembrane helices and location of intervening loop regions
EXTCOEF Extinction coefficient calculation
PI Isoelectric point determination
Nucleic Acid Tools
BL2SEQ Compare nucleotides to each other with BLAST
BL2SEQX Compare a nucleotide to protein sequences with BLAST
BLASTN Compare a NS to a NS DB
BLASTX Compare a PS Derived from NS to a PS DB
TBLASTX Compare a translated NS to a translated DB
FASTA Nucleic Acid Sequence Comparisons (NS or DB)
FASTX Compare Translated NS to PS DB
FASTY Compare Translated NS to PS DB
SSEARCH Smith
CLUSTALW Multiple Sequence Alignment
CLUSTALWPROF Align Sequences to Existing Alignment (Profile)
ALIGN Optimal Global Sequence Alignment
LALIGN Calculate Optimal Local Sequence Alignments
LFASTA Calculate Local Sequence Alignments (Heuristic)
PATTERNMATCHDB Search for Regular Expressions (Patterns) in a nucleic sequence DB
PATTERNMATCH Search for Regular Expressions (Patterns) in a nucleic sequence
TACG Analyze a NS for Restriction Enzyme Sites
PRIMER3 Design Primer Pairs and Probes
NASTATS Nucleic Acid Statistics
BESTSCOR Calculate the Best Self Comparison Score
PFSCAN Sequence Search Against a Set of Profiles (PROSITE)
PRIMERCHECK Calculates melting point, length, %GC for a primer sequence
PRIMERTM Designs end primers based on a minimum Tm
SIXFRAME Generate & Import 6 Frame Translations on a NS
REVCOM Generate Reverse Complement of NS
RANDSEQ Randomize a Sequence
Alignment Tools
Ndjinn Multiple Database Search
SPLITSplit Alignment Into Component Sequences
DEGAP_SPLITSplit Alignment Into Component Sequences and Remove Gap Characters
Download Aligned Sequences
TEXSHADE Color Coded Plots of Pre Aligned Sequences
BOXSHADE Color Coded Plots of Pre Aligned Sequences
CLUSTALWPROF Align Two Existing Alignments (Profiles)
TMAP Prediction of Transmembrane Segments
DRAWTREEDRAWTREE Draw Unrooted Phylogenetic Tree from Alignment
DRAWGRAM Draw Rooted Phylogenetic Tree from Alignment
CLUSTALDIST Generate Distance Matrix with Clustal W
CLUSTALTREE Phylogenetic Analysis with Clustal W
DNADIST Compute Evolutionary Distance Matrix from NS Alignment
PROTDIST Compute Evolutionary Distance Matrix from PS Alignment
DNAPARS Infer an Unrooted Phylogeny from NS Alignment
PROTPARS Infer an Unrooted Phylogeny from PS Alignment
MVIEW Multiple Alignment Display
Structure Tools
PDF PDF Knowledge
CONVERT File format conversion utility
TNT Macromolecular Refinement Package