computationalbiologytoolsanddatabases

Computational Biology Tools and Databases

Download

DATABASES

Microbial genome databases http://www.ncbi.nlm.nih.gov:80/PMGifs/Genomes/micr.html

Protein Information Resource http://www-nbrf.georgetown.edu/pir/genome.html

Comparative genome analysis in P. Bork laboratory http://www.bork.embl-heidelberg.de/Genome/

TIGR: The Comprehensive Microbial Resource Home Page—the omniome http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl

Genome databases other than NCBI http://www.unl.edu/stc-95/ResTools/biotools/biotools10.html

Genome list at NIH http://molbio.info.nih.gov/molbio/db.html

Mitochondrial DNA Database MitBASE http://www3.ebi.ac.uk/Research/Mitbase/mitbase.pl

E. coli genome project http://www.genome.wisc.edu/

E. coli genome and proteome database GenProtEC http://genprotec.mbl.edu/

E. coli index http://web.bham.ac.uk/bcm4ght6/res.html

Organelle genome sequences http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/organelles.html

Parasite genome databases and genome research resources http://www.ebi.ac.uk/parasites/parasite-genome.html

Retroviral genotyping and analysis site http://www.ncbi.nlm.nih.gov/retroviruses/

GenBank at the National Center of Biotechnology Information, National Library of Medicine, Washington, DC accessible from: http://www.ncbi.nlm.nih.gov/Entrez

European Molecular Biology Laboratory (EMBL) Outstation at Hixton, England http://www.ebi.ac.uk/embl/index.html

DNA DataBank of Japan (DDBJ) at Mishima, Japan http://www.ddbj.nig.ac.jp/

Protein International Resource (PIR) database at the National Biomedical Research Foundation in Washington, DC http://www-nbrf.georgetown.edu/pirwww/

The SwissProt protein sequence database at ISREC, Swiss Institute for Experimental Cancer Research in Epalinges/Lausanne http://www.expasy.ch/cgi-bin/sprot-search-de

The Sequence Retrieval System (SRS) at the European Bioinformatics Institute allows both simple and complex concurrent searches of one or more sequence databases. The SRS system may also be used on a local machine to assist in the preparation of local sequence databases. http://srs6.ebi.ac.uk

Protein data bank (PDB) at the State University of New Jersey (Rutgers)a atomic coordinates of structures as PDB files, models, viewers, links to many other Web sites for structural analysis and classification http://www.rcsb.org/pdb

COG (cluster of orthologous groups): http://www.ncbi.nlm.nih.gov/COG/

DOGS: Database of genome sizes http://www.cbs.dtu.dk/databases/DOGS/index.html

allgenes.org: A comprehensive gene index (catalog) derived from ESTs and predicted genes http://www.allgenes.org/

GeneCensus Genome Comparisons by encoded protein structures http://bioinfo.mbb.yale.edu/genome/

GeneQuiz: An integrated system for large-scale biological sequence analysis and data management (Andrade et al. 1999; Hoersch et al. 2000) http://jura.ebi.ac.uk:8765/ext-genequiz/

Genes and disease: Map location on human chromosomes http://www.ncbi.nlm.nih.gov/disease/

Genome channel at Oak Ridge National Laboratories http://compbio.ornl.gov/channel/

GOLD™: Genomes OnLine Database (Kyrpides 1999) http://wit.integratedgenomics.com/GOLD/

IMGT ImMunoGeneTics Database specializing in Immunoglobulins, T-cell receptors, and Major Histocompatibility Complex (MHC) of all vertebrate species http://www.ebi.ac.uk/imgt/index.html

KEGG: Kyoto Encyclopedia of Genes and Genomes (Kanehisa and Goto 2000) http://www.genome.ad.jp/kegg/

MIA Molecular Information Agent: A Web server that searches biological databases for information on a macromolecule http://mia.sdsc.edu/

Orthologous gene alignments at TIGR http://www.tigr.org/tdb/toga/toga.shtml

PEDANT: A protein extraction, description, and analysis tool http://pedant.mips.biochem.mpg.de/

STRING Search Tool for Recurring Instances of Neighboring Genes http://www.Bork.EMBL-Heidelberg.DE/STRING/

Taxonomy browser at the NCBI arranges genomes taxonomically for sequence retrieval http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/

UniGene System gene-oriented clusters of GenBank sequences useful for gene identification http://www.ncbi.nlm.nih.gov/UniGene/

2D gel analysis of proteins: List of organisms http://www.expasy.ch/ch2d/2d-index.html

AlignAce for promoter analysis of coordinately regulated genes, e.g., microarrays by Gibbs sampling (Roth et al. 1998; Hughes et al. 2000; McGuire et al. 2000) http://atlas.med.harvard.edu/download/

ArrayExpress database at European Bioinformatics Institute for microarray analysis http://www.ebi.ac.uk/arrayexpress/

BRITE: Database of protein-protein interactions and cross-reference links http://www.genome.ad.jp/brite/brite.html

Ecocyc electronic encyclopedia of genes and metabolism of E. coli (Karp et al. 2000) http://ecocyc.PangeaSystems.com/ecocyc/

Expression Profiler tools for analysis and clustering of gene expression and sequence data http://ep.ebi.ac.uk/

Functional genomics sites http://www.ornl.gov/hgmis/publicat/hgn/hgnarch.html#fg http://www.ornl.gov/hgmis/publicat/hgn/hgnarch.html

GeneCensus Genome Comparisons by encoded protein structures http://bioinfo.mbb.yale.edu/genome/

GENECLUSTER; Tamayo et al. (1999) http://www.genome.wi.mit.edu/MPR/software.html

GeneX: A Collaborative Internet Database and Toolset for Gene Expression Data http://www.ncgr.org/genex/

MetaCyc metabolic encyclopedia (see EcoCyc) http://ecocyc.PangeaSystems.com/ecocyc/

Microarray guide: P. Brown lab http://cmgm.stanford.edu/pbrown/

Microarray project at NIH http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/

Microarray software http://rana.lbl.gov/

microarrays.org http://www.microarrays.org/

SMART: For the study of genetically mobile protein domains (Schultz et al. 2000) http://smart.embl-heidelberg.de/

SWISS-2DPAGE: Two-dimensional polyacrylamide gel electrophoresis database (Hoogland et al. 2000) http://www.expasy.ch/ch2d/

TIGR: Annotation and gene indexing resources, including analysis of the transcribed sequences represented in the public EST databases. http://www.tigr.org/tdb/tgi.shtml

WIT (What is there?): Interactive metabolic reconstruction on the Web (Overbeek et al. 2000) http://wit.mcs.anl.gov/WIT2/

GFF (Gene-Finding Features): Specification for describing genes and other features of genomics http://www.sanger.ac.uk/Software/GFF/

GO (gene ontology) controlled vocabulary http://genome-www.stanford.edu/GO/

MAGPIE: Multipurpose Automated Genome Project Investigation Environment http://www.rockefeller.edu/labheads/gaasterland/gaasterland.html,http://genomes.rockefeller.edu/research.shtml#magpie

http://magpie.genome.wisc.edu/tools.html

http://genomes.rockefeller.edu/research.shtml

TAMBIS: A conceptual model of molecular biology and bioinformatics and methods for querying the model (Baker et al. 1999) http://img.cs.man.ac.uk/tambis/

RDP: The Ribosomal Database Project (RDP) provides ribosome related data services to the scientific community, including online data analysis, rRNA derived phylogenetic trees, and aligned and annotated rRNA sequences http://rdp.cme.msu.edu/html/

"GO: dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing.

" http://www.geneontology.org/index.shtml

Miscelleneous Tools For Bioinformatics Analysis On the WWW

Pairwise Sequence Alignment

Global alignment programs (GAP, NAP) http://genome.cs.mtu.edu/align/align.html Huang (1994)

BLAST 2 sequence alignment (BLASTN, BLASTP) http://www.ncbi.nlm.nih.gov/gorf/bl2.html Altschul et al. (1990)

Bayes block aligner http://www.wadsworth.org/res&res/bioinfo

BCM Search Launcher: Pairwise sequence alignmenta http://searchlauncher.bcm.tmc.edu/seq-search/alignment.html

SIM—Local similarity program for finding alternative alignments http://www.expasy.ch/tools/sim.html

FASTA program suite http://fasta.bioch.virginia.edu/fasta/fasta_list.html Pearson and Miller (1992); Pearson (1996)

Likelihood-weighted sequence alignment (lwa)c http://stateslab.bioinformatics.med.umich.edu/service/lwa.html

Multiple Sequence Alignment

CLUSTALW or CLUSTALX (latter has graphical interface) FTP to ftp.ebi.ac.uk/pub/software ftp://ftp.ebi.ac.uk/pub/software a,d Thompson et al. (1994a, 1997); Higgins et al. (1996)

MSA http://www.psc.edu/, http://www.ibc.wustl.edu/ibc/msa.html, ftp://fastlink.nih.gov/pub/msa, cFTP to fastlink.nih.gov/pub/msa Lipman et al. (1989);Gupta et al. (1995)

PRALINE http://mathbio.nimr.mrc.ac.uk/~jhering/praline/ http://mathbio.nimr.mrc.ac.uk/%7Ejhering/praline/ Heringa (1999)

DIALIGN segment alignment http://www.gsf.de/biodv/dialign.html Morgenstern et al. (1996)

MultAlin http://protein.toulouse.inra.fr/multalin.html Corpet (1988)

Parallel PRRN progressive global alignment http://prrn.ims.u-tokyo.ac.jp/ Gotoh (1996)

SAGA genetic algorithm http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/saga_home_page.html http://igs-server.cnrs-mrs.fr/%7Ecnotred/Projects_home_page/saga_home_page.html Notredame and Higgins (1996)

Protein Profile Generation Tools Based on MSA

Aligned Segment Statistical Evaluation Tool (Asset) FTP to ncbi.nlm.nih.gov/pub/neuwald/asset ftp://ncbi.nlm.nih.gov/pub/neuwald/asset Neuwald and Green (1994)

BLOCKS Web site http://blocks.fhcrc.org/blocks/ Henikoff and Henikoff (1991, 1992)

eMOTIF Web server http://dna.Stanford.EDU/emotif/ Nevill-Manning et al. (1998)

GIBBS, the Gibbs sampler statistical method FTP to ncbi.nlm.nih.gov/pub/neuwald/gibbs9_95/ ftp://ncbi.nlm.nih.gov/pub/neuwald/gibbs9_95/ Lawrence et al. (1993); Liu et al. (1995); Neuwald et al. (1995)

HMMER hidden Markov model software http://hmmer.wustl.edu/ Eddy (1998)

MACAW, a workbench for multiple alignment construction and analysis FTP to ncbi.nlm.nih.gov/pub/macaw/ ftp://ncbi.nlm.nih.gov/pub/macaw/ Schuler et al. (1991)

MEME Web site, expectation maximization method http://meme.sdsc.edu/meme/website/ Bailey and Elkan (1995); Grundy et al. (1996, 1997); Bailey and Gribskov (1998)

Profile analysis at UCSDa,e http://www.sdsc.edu/projects/profile/ Gribskov and Veretnik (1996)

SAM hidden Markov model Web site http://www.cse.ucsc.edu/research/compbio/sam.html Krogh et al. (1994); Hughey and Krogh (1996)

RNA Tools

MFOLD minimum energy RNA configuration http://bioinfo.math.rpi.edu/~zukerm/rna/ http://bioinfo.math.rpi.edu/%7Ezukerm/rna/ Zuker et al. (1991)

RNA editing Web site, UCLA http://www.lifesci.ucla.edu/RNA/index.html Simpson et al. (1998)

RNA editing, uridine insertion/deletion http://www.lifesci.ucla.edu/RNA/trypanosome/ Simpson et al. (1998)

RNA modification database http://medlib.med.utah.edu/RNAmods/ Limbach et al. (1994); Rozenski et al. (1999)

RNA secondary structures, Group I introns, 16S rRNA, 23S rRNA http://www.rna.icmb.utexas.edu Gutell (1994); Schnare et al. (1996 and references therein)

tRNAscan-SE search server http://www.genetics.wustl.edu/eddy/tRNAscan-SE/ Lowe and Eddy (1997)

Vienna RNA package for RNA secondary structure prediction and comparison http://www.tbi.univie.ac.at/~ivo/RNA/ http://www.tbi.univie.ac.at/%7Eivo/RNA/ Hofacker et al. (1998); Wuchty et al. (1999)

DATABASE SEARCHES (Sequence similarity search with query sequence protein sequence database (or genomic sequencesa) search for database sequence that can be aligned with query sequence single sequence, e.g.,DAHQSNGA)

BLAST SUITE http://www.ncbi.nlm.nih.gov/BLAST/

FASTA SUITE http://fasta.bioch.virginia.edu/fasta/

WU-BLAST http://blast.wustl.edu/

PROFILESEARCH ftp://ftp.sdsc.edu/pub/sdsc/biology Alignment search with profile (scoring matrixb,d with gap penalties) protein sequence database prepare profile from a multiple sequence alignment (Profilemake) and align profile with database sequence profile representing gapped multiple sequence alignment, e.g.,D-HQSNGA,ESHQ-YTM,EAHQSN-L EGVQSYSL

MAST http://meme.sdsc.edu/meme/website/mast.html Search with position-specific scoring matrixc,d (PSSM) representing ungapped sequence alignment (BLOCK) protein sequence database prepare PSSM from ungapped region of multiple sequence alignment or search for patterns of same length in unaligned sequences,c then use for database search PSSM representing ungapped alignment, e.g.,DAHQSN,ESHQSY,EAHQSN,EGVQSY

PSI- BLAST http://www.ncbi.nlm.nih.gov/BLAST/ Iterative alignment search for similar sequences that starts with a query sequence, builds a gapped multiple alignment, and then uses the alignment to augment the searchd ses initial matches to query sequence to build a type of scoring matrix and searches for additional matches to the matrix by an iterative search methodd builds matches to query sequence, e.g.,DAHQSNGA,iteration 1H-SNGA EAHQSN-L -> further iterations. PSI-BLAST finds a set of sequences related to each other by the presence of common patterns (not every sequence may have same patterns).

PROSITE http://www.expasy.ch/prosite Search query sequence for patterns representative of protein familiese database of patterns found in protein families search for patterns represented by scoring matrix or hidden Markov model (profile HMM)e single sequence, e.g., DAHQSNGA

INTERPRO http://www.ebi.ac.uk/interpro

PFAM http://www.sanger.ac.uk/Pfam

CDD/IMPALA http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml

BCM Search Launcher (with programming links to several servers) http://searchlauncher.bcm.tmc.edu/seq-search/protein-search.html

bic-swa Bic server European Bioinformatics Institute http://www.ebi.ac.uk/bic_sw/

MPsearchb National Institute of Agrobiological Resources, Tsukuba, Japan http://www.dna.affrc.go.jp/htbin/mp_PP.pl

Scanps G.Barton, European Bioinformatics Institute http://barton.ebi.ac.uk ;

SSEARCH E-mail server DNA Databank of Japan http://www.ddbj.nig.ac.jp/E-mail/homology.html

Swatc Phil Green, University of Washington http://www.genome.washington.edu/UWGC/analysistools/swat.cfm

Programs and Web sites for database similarity searches with a regular expression, motif, block, or profile

Regular Expression and Motifsa

EMOTIF Scan SwissProt and Genpept http://motif.stanford.edu/emotif/emotif-scan.html

Prosite patterns SwissProt and TrEMBL http://www.expasy.ch/tools/scnpsit2.html

ISREC pattern-finding service SwissProt and non-redundant EMBL database http://www.isrec.isb-sib.ch/software/PATFND_form.html

fpat PDB SwissProt Genpept http://stateslab.bioinformatics.med.umich.edu/service/fpat/

PHI-BLAST BLAST databases http://www.ncbi.nlm.nih.gov/

MOTIF SwissProt, PDB, PIR, PRF, Genes http://www.motif.genome.ad.jp/MOTIF2.html

BLOCKS

BLOCKSb most databases http://www.blocks.fhcrc.org/blockmkr/make_blocks.html

MASTc most databases http://meme.sdsc.edu/meme/website/

BLIMPSd locally available databases anonymous FTP ncbi.nlm.nih.gov/repository/blocks/unix/blimps ftp://ncbi.nlm.nih.gov/repository/blocks/unix/blimps

Probee BLAST databases anonymous FTP ncbi.nlm.nih.gov/pub/neuwald/probe1.0 ftp://ncbi.nlm.nih.gov/pub/neuwald/probe1.0

Genefindf PIR http://pir.georgetown.edu/gfserver

PROFILE Programs

Profilesearchg locally available databases anonymous FTP ftp.sdsc.edu/pub/sdsc/biology/profile_programs ftp://ftp.sdsc.edu/pub/sdsc/biology/profile_programs

Profile-SSh most databases http://www.psc.edu/general/software/packages/profiless/profiless.html

Search Genes and Coding Regions

FGENES and related programs that use linear discriminant analysis or hidden Markov modelsa http://genomic.sanger.ac.uk/gf/gf.shtml Solovyev et al. (1995);

GeneFinder access site at the Sanger Center http://genomic.sanger.ac.uk/gf/gf.html collection of methods

Genehacker for microbial genomes based on HMMs http://www-btls.jst.go.jp/GeneHacker/ Hirosawa et al. (1997)

GeneID-3 Web server using rule-based models, and GeneID+b http://www1.imim.es/geneid.html Mail server at geneid@darwin.bu.edu

GeneMark and GeneMark.hmmc uses hidden Markov models http://opal.biology.gatech.edu/GeneMark/

GeneParsera,b Web page, uses combination of neural network and dynamic programming methods http://beagle.colorado.edu/~eesnyder/GeneParser.html http://beagle.colorado.edu/%7Eeesnyder/GeneParser.html Snyder and Stormo (1993, 1995)

Genescan using Fourier transform of DNA sequences to find characteristic patterns http://202.41.10.146/~sn055/DOC/gs.htm Tiwari et al. (1997)

Genetic code variations http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c

GenLang using linguistic methods http://www.cbil.upenn.edu/ Dong and Searls (1994)

GenScan based on probabilistic model of gene structure for vertebrate, Drosophila, and plant genes http://genes.mit.edu/GENSCAN.html Burge and Karlin (1998)

Genseqer for aligning genomic and EST sequences http://bioinformatics.iastate.edu/cgi-bin/gs.cgi Close to SplicePredictor

Glimmer uses interpolated Markov models for prokaryotic translation http://www.tigr.org/softlab/glimmer/ Salzberg et al. (1998)

GrailIIa,b prediction by neural networks based on scores of characteristic sequence patterns and composition http://compbio.ornl.gov/ Uberbacher and Mural (1991); Uberbacher et al. (1996)

Initiation codon analysis http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c

Microbial genome coding region identification based on Markov chains of order 5 http://igs-server.cnrs-mrs.fr/~audic/selfid.html Audic and Claverie (1998)

Procrustes based on comparison of related genomic sequences http://www-hto.usc.edu/software/procrustes/ Gelfand et al. (1996)

Push-button Gene Finder for gene identification using Markov and hidden Markov models http://www.cse.ucsc.edu/research/compbio/pgf/

Translate tool at ExPASy http://www.expasy.ch/tools/dna.html

Translation machine on the Web at EBI http://www2.ebi.ac.uk/translate/

Translation of large genome sequences on the Web http://alces.med.umn.edu/rawtrans.html

Veil (Viterbi exon-intron locator) uses hidden Markov models for vertebrate DNA http://www.cs.jhu.edu/labs/compbio/veil.html Henderson et al. (1997)

Webgene, a set of gene prediction tools and concurrent database similarity searches http://www.itba.mi.cnr.it/webgene/

Webgenemark and Webgenemark.hmmc http://opal.biology.gatech.edu/GeneMark/ see GeneMark; Lukashin and Borodovsky (1998)

Promoter Prediction Program

ConsInspector–see Transfac databasea http://www.gsf.de/biodv/consinspector.html

FastM for transcription factor binding sites http://transfac.gbf.de/cgi-bin/fastm/fastm.pl Klingenhoff et al. (1999)

GeneExpress analysis of transcriptional regulations with TRRD database http://wwwmgs.bionet.nsc.ru/systems/GeneExpress/ Kolchanov et al. (1999a, b)

Genome inspector for combined analysis of multiple signals in genomes http://www.gsf.de/biodv/genomeinspector.html Quandt et al. (1997) GrailIIb prediction of TSS by neural networks based on scores of characteristic sequence patterns and composition

MAR-FINDER for finding matrix attachment regions http://www.futuresoft.org/MAR-Wiz/ Kramer et al. (1997); Singh et al. (1997)

MatInspectora – Transfac database http://www.gsf.de/biodv/matinspector.html (for downloading)

Mirage (Molecular Informatics Resource for the Analysis of Gene Expression)d http://www.ifti.org/

NNPP Promoter Prediction by Neural Network for prokaryotes or eukaryotes http://www.fruitfly.org/seq_tools/promoter.html Reese et al. (1996)

NSITE–search for TF binding sites or other consensus regulatory sequences http://genomic.sanger.ac.uk/gf/gf.shtml

OOTFD Object-Oriented Transcription Factor Database http://www.ifti.org/cgi-bin/ifti/ootfd.pl Ghosh (1998)

Pol3scan for RNAP III/tRNA promoter sequences using pattern scoring matrices http://irisbioc.bio.unipr.it/genomics.html Pavesi et al. (1994)

Promoter element weight matrices and HMMs http://www.epd.isb-sib.ch/promoter_elements/ Bucher (1990)

Promoter II for recognition of PolII sequences by neural networks http://www.cbs.dtu.dk/services/promoter/ Knudsen (1999)

PromoterScane http://bimas.dcrt.nih.gov/molbio/proscan/ Prestridge (1995) and see Web site

RegScan for promoter classification http://wwwmgs.bionet.nsc.ru/mgs/programs/classprom/ Babenko et al. (1999)

Sequence walkers for graphical viewing of the interaction of regulatory protein with DNA binding site http://www-lecb.ncifcrf.gov/~toms/walker/narcoverlogowalker.html Schneider (1997)

Signal scan for transcriptional elements http://bimas.dcrt.nih.gov:80/molbio/signal/ Prestridge (1991, 1996)

TargetFinder for promoter searching in selected annotated sequences http://www.tigem.it/ Lavorgna et al. (1999)

TESS for searching for transcription factor binding sites http://www.cbil.upenn.edu/tess/ Schug and Overton (1997a, b)

Tfbind for transcription factor binding sites http://tfbind.ims.u-tokyo.ac.jp Tsunoda and Takagi (1999)

Transfac programs providing search for TF binding sites. MatInd for making scoring matrices and MatInspector for searching for matches to matrices http://www.gsf.de/cgi-bin/matsearch.pl, http://www.gsf.de/biodv/staff_pub.html, Knüppel et al. 1994);Quandt et al. (1995);Heinemeyer et al. (1999);Klingenhoff et al. (1999)

Wentian Li's Website for multiple analysis http://linkage.rockefeller.edu/wli/gene/programs.html .

Protein Structure Analysis

The PredictProtein server at the European Molecular Biology Laboratory at Heidelberg, Germany important site for secondary structure prediction by PHD, predator, TOPITS, threader http://cubic.bioc.columbia.edu/predictprotein

Swiss Institute of Bioinformatics, Geneva basic types of protein analysisd databases, the Swiss-Model resource for prediction of protein models, Swiss-PdbViewer http://www.expasy.ch/

Protein Structure Viewer

Chime http://www.umass.edu/microbio/chime/ A Web browser plug-in that can be used to display and manipulate structures inside a Web page. There are many mouse-driven controls. Excellent for lecture presentations.

Cn3da http://www.ncbi.nlm.nih.gov/Structure/ (Hogue 1997) Provides viewing of three-dimensional structures from Entrez and MMDBa. Cn3D runs on Windows, MacOS, and Unix; simultaneously displays structural and sequence alignments; can show multiple superimposed images from NMR studies.

Mage http://kinemage.biochem.duke.edu (see Richardson and Richardson 1994) Standard molecular viewing features with animation and kaleidoscope effects.

Rasmolb http://www.umass.edu/microbio/rasmol/ (Sayle and Milner-White 1995) Most commonly used viewer for Windows, MacOS, UNIX, and VMS operating systems. Performs many functions.

Swiss 3D viewer, Spdbv http://www.expasy.ch/spdbv/mainpage.html (Guex and Peitsch 1997) Protein models can be built by structural alignments; calculates atomic angles and distances, threading, energy minimation, and interacts with the Swiss Model server.

Protein Secondary Structure Prediction

Modeller http://guitar.rockefeller.edu/modeller/modeller.html dynamic programming alignment of sequences and structures and molecular dynamics methods Sali et al. (1995)

Swiss-model http://www.expasy.ch/swissmod/SWISS-MODEL.html sequence alignment of query with sequences of known structure Peitsch (1996)

Whatif http://www.cmbi.kun.nl/whatif/ flexible molecular graphics rendering of models Rodriguez et al. (1998)

Baylor College of Medicine (BCM) http://searchlauncher.bcm.tmc.edu/seq-search/struc-predict.html collection of methods and linked to other servers

DSC http://www.bmm.icnet.uk/dsc/ linear discrimination King et al. (1997)

J-Pred structure prediction server http://jura.ebi.ac.uk:8888/ NNSSP, DSC, Predator, Mulpred,b Zpred,c Jnet,e and PHD Cuff et al. (1998);

NNPRED http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html http://www.cmpharm.ucsf.edu/%7Enomi/nnpredict.html neural networks enhanced to detect sequence periodicity Kneller et al. (1990)

NPS@ server, MLR combination for secondary structure predictiona http://pbil.ibcp.fr/NPSA/ combination of prediction methods using multivariate linear regression to optimize the predictions Guermeur et al. (1999)

Protein Sequence Analysis (PSA) Systemd http://bmerc-www.bu.edu/psa/index.html discrete space models (hidden Markov models) for patterns of a helices, b strands, tight turns, and loops in specific structural classes Stultz et al. (1993, 1997); White et al. (1994)

PREDATOR http://www.embl-heidelberg.de/argos/predator/predator_info.html based on analysis of long- and short-range amino acid interactions and alignments of sequence pairs Frishman and Argos (1995, 1996, 1997)

Predict Protein server http://www.embl-heidelberg.de/predictprotein/predictprotein.html ; see also mirror sites neural networks of multiple sequence alignment Rost and Sander (1994); Rost (1996)

PSSP http://searchlauncher.bcm.tmc.edu/seq-search/struc-predict.html nearest neighbor enhanced by non-intersecting local and multiple sequence alignments Salamov and Solovyev (1995, 1997)

Simpa96 http://pbil.ibcp.fr/NPSA/ nearest-neighbor method Levin (1997)

SOPM, SOPMA http://pbil.ibcp.fr/NPSA/ nearest-neighbor method based on sequence alignments Geourjon and Deleage (1994, 1995)

SSP http://searchlauncher.bcm.tmc.edu/seq-search/struc-predict.html linear discriminant analysis based on amino acid composition of local and adjacent regions see H option for this program on Web page

UCLA-DOE structure prediction server http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html collection of methods and linked to other servers Fischer and Eisenberg (1996)

Threading servers and program

123D http://www-lmmb.ncifcrf.gov/~nicka/123D.html contact potentials between amino acid side groups Alexandrov et al. (1996)

3D-PSSM http://www.bmm.icnet.uk/~3dpssm sequence-structure using position-specific scoring matrices Russell et al. (1997)

Honig lab http://honiglab.cpmc.columbia.edu/ threading methods using biophysical properties

Libra I http://www.ddbj.nig.ac.jp/htmls/E-mail/libra/LIBRA_I.html target sequence and 3D profile are aligned by dynamic programming Ota and Nishikawa (1997)

NCBI structure site http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/threading.html Gibbs sampling algorithm used to align sequence and structurea Bryant (1996)

Profit http://lore.came.sbg.ac.at/home.html fold recognition by the contact potential method M. Sippl

Threader 2 http://insulin.brunel.ac.uk/threader/threader.html prediction by recognition of the correct fold from a library of alternatives Jones et al. (1995)

TOPITS http://www.embl-heidelberg.de/predictprotein/doc/help_05.html detects similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold Rost (1995a,b)

UCLA-DOE structure prediction server http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html fold-recognition using 3D profiles and secondary structure prediction methods Fischer and Eisenberg (1996)

CASP http://predictioncenter.llnl.gov/ overall assesment of the methods

EMBOSS ( http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/index.html#embassy ) dowloadable source codes.

alignment consensus FUNCTION AUTHOR

cons Creates a consensus from multiple alignments HGMP

megamerger Merge two large overlapping nucleic acid sequences HGMP

merger Merge two overlapping sequences HGMP

alignment differences

diffseq Find differences between nearly identical sequences HGMP

alignment dot plots

dotmatcher Produces a dotplot of two sequences. Sanger

dotpath Displays a non-overlapping wordmatch dotplot of two sequences HGMP

dottup DNA sequence dot plot Sanger

polydot Multiple dotplot Sanger

alignment global

est2genome Align EST and genomic DNA sequences Sanger

needle Needleman-Wunsch global alignment. HGMP

stretcher Global alignment of two sequences. Sanger

alignment local

matcher Local alignment of two sequences Sanger

seqmatchall Does an all-against-all comparison of a set of sequences Sanger

supermatcher Finds a match of a large sequence against one or more sequences Sanger

water Smith-Waterman local alignment. HGMP

wordmatch Finds all exact matches of a given size between 2 sequences Sanger

alignment multiple

emma Multiple alignment program HGMP

infoalign Displays some simple information about sequences HGMP

plotcon Plots the quality of conservation of a sequence alignment HGMP

prettyplot Displays aligned sequences, with colouring and boxing. Sanger

showalign Display a multiple sequence alignment HGMP

tranalign Align nucleic coding regions given the aligned proteins HGMP

display

cirdna Draws circular maps of DNA constructs Norway

lindna Draws linear maps of DNA constructs Norway

pepnet Protein helical net plot HGMP

pepwheel Shows protein sequences as helices HGMP

prettyseq Output sequence with translated ranges HGMP

remap Display a sequence with restriction cut sites, translation etc.. HGMP

seealso Finds programs sharing group names HGMP

showdb Displays information on the currently available databases HGMP

showfeat Show features of a sequence. HGMP

showseq Display a sequence with features, translation etc HGMP

sixpack Display a DNA sequence with 6-frame translation and ORFs LION

textsearch Search sequence documentation text. SRS and Entrez are faster! HGMP

edit

biosed Replace or delete sequence sections HGMP

cutseq Removes a specified section from a sequence. HGMP

degapseq Removes gap characters from sequences HGMP

descseq Alter the name or description of a sequence. HGMP

entret Reads and writes (returns) flatfile entries HGMP

extractfeat Extract features from a sequence HGMP

extractseq Extract regions from a sequence. HGMP

listor Writes a list file of the logical OR of two sets of sequences HGMP

maskfeat Mask off features of a sequence HGMP

maskseq Mask off regions of a sequence. HGMP

newseq Type in a short new sequence. HGMP

noreturn Removes carriage return from ASCII files HGMP

notseq Excludes a set of sequences and writes out the remaining ones HGMP

nthseq Writes one sequence from a multiple set of sequences HGMP

pasteseq Insert one sequence into another. HGMP

revseq Reverse and complement a sequence. HGMP

seqret Reads and writes (returns) a sequence. Sanger

seqretsplit Reads and writes (returns) sequences in individual files HGMP

skipseq Reads and writes (returns) sequences, skipping the first few HGMP

splitter Split a sequence into (overlapping) smaller sequences. HGMP

trimest Trim poly-A tails off EST sequences HGMP

trimseq Trim ambiguous bits off the ends of sequences HGMP

union Reads sequence fragments and builds one sequence LION

vectorstrip Strips out DNA between a pair of vector sequences HGMP

yank Reads a range from a sequence, appends the full USA to a list file LION

enzyme kinetics

findkm Calculates Km and Vmax for an enzyme reaction HGMP

feature tables

coderet Extract CDS, mRNA and translations from feature tables HGMP

twofeat Finds neighbouring pairs of features in sequences HGMP

information

infoseq Displays some simple information about sequences HGMP

tfm Displays a program's help documentation manual HGMP

whichdb Search all databases for an entry HGMP

wossname Finds programs by keywords in their one-line documentation. HGMP

nucleic codon usage

cai CAI codon usage statistic HGMP

chips Codon usage statistics HGMP

codcmp Codon usage table comparison HGMP

cusp Create a codon usage table HGMP

syco Synonymous codon usage Gribskov statistic plot HGMP

nucleic composition

banana Bending and Curvature Plot in B-DNA Sanger

btwisted Calculates the twisting in a B-DNA sequence HGMP

chaos Create a chaos plot for a sequence. Sanger

compseq Counts the composition of dimer/trimer/etc words in a sequence HGMP

dan Plot melting temperatures for DNA. HGMP

freak Residue/base frequency table or plot HGMP

isochore Plots isochores in large DNA sequences Sanger

sirna Finds siRNA duplexes in mRNA HGMP

wordcount Counts words of a specified size in a DNA sequence. Sanger

nucleic cpg islands

cpgplot Plot CpG rich areas HGMP

cpgreport Reports CpG rich regions HGMP

geecee Calculates the fractional GC content of nucleic acid sequences Sanger

newcpgreport Report CpG rich areas EBI

newcpgseek Reports CpG rich regions EBI

nucleic gene finding

getorf Finds and extracts open reading frames (ORFs) HGMP

marscan Finds MAR/SAR sites in nucleic sequences HGMP

plotorf Plot potential open reading frames HGMP

showorf Pretty output of DNA translations HGMP

wobble Wobble base plot HGMP

nucleic motifs

dreg Regular expression search of a nucleotide sequence Sanger

fuzznuc Nucleic acid pattern search HGMP

fuzztran Protein pattern search after translation HGMP

nucleic mutation

msbar Mutate sequence beyond all recognition HGMP

shuffleseq Shuffles a set of sequences maintaining composition HGMP

nucleic primers

eprimer3 Picks PCR primers and hybridization oligos HGMP

primersearch Searches DNA sequences for matches with primer pairs HGMP

stssearch Searches a DNA database for matches with a set of STS primers Sanger

nucleic profiles

profit Scan a sequence or database with a matrix or profile HGMP

prophecy Creates matrices/profiles from multiple alignments HGMP

prophet Gapped alignment for profiles HGMP

nucleic repeats

einverted Finds DNA inverted repeats Sanger

equicktandem Finds tandem repeats Sanger

etandem Looks for tandem repeats in a nucleotide sequence. Sanger

palindrome Looks for inverted repeats in a nucleotide sequence. HGMP

nucleic restriction

recoder Find and remove restriction sites but maintain the same translation HGMP

redata Isoschizomers, references and Suppliers for Restriction Enzymes HGMP

restover Finds restriction enzymes that produce a specific overhang Sloan-Kettering Cancer Center

restrict Finds Restriction Enzyme Cleavage Sites HGMP

silent Silent mutation restriction enzyme scan HGMP

nucleic transcription

tfscan Scans DNA sequences for transcription factors. HGMP

nucleic translation

backtranseq Back translate a protein sequence HGMP

transeq Translates nucleic acid sequences. HGMP

phylogeny

distmat Creates a distance matrix from multiple alignments HGMP

protein 2d structure

garnier Predicts protein secondary structure EBI

helixturnhelix Finds nucleic acid binding domains. HGMP

hmoment Hydrophobic moment calculation HGMP

pepcoil Predicts coiled coil regions HGMP

tmap Predict transmembrane proteins Sanger

protein composition

charge Protein charge plot HGMP

checktrans ORF property statistics EBI

emowse Protein identification by mass spectrometry HGMP

iep Calculates the isoelectric point of a protein HGMP

mwfilter Filter noisy molwts from mass spec output HGMP

octanol Displays protein hydropathy Sanger

pepinfo Plots simple amino acid properties in parallel HGMP

pepstats Protein statistics HGMP

pepwindow Displays protein hydropathy Sanger

pepwindowall Displays protein hydropathy of a set of sequences Sanger

protein motifs

antigenic Finds antigenic sites in proteins HGMP

digest Protein proteolytic enzyme or reagent cleavage digest HGMP

fuzzpro Protein pattern search HGMP

oddcomp Finds protein sequence regions with a biased composition. Norway

patmatdb Matching a Prosite motif against a Protein Sequence Database. HGMP

patmatmotifs Compares a protein sequence to the PROSITE motif database. HGMP

pestfind Finds PEST motifs as potential proteolytic cleavage sites Austria

preg Regular expression search of a protein sequence Sanger

pscan Locates fingerprints (multiple motif features) in a protein sequence. HGMP

sigcleave Predicts signal peptide cleavage sites HGMP

utils database creation

aaindexextract Extract data from AAINDEX HGMP

cutgextract CUTG: Codon Usage Tabulated from GenBank by organism HGMP

printsextract Preprocesses the PRINTS database for use with the program PSCAN HGMP

prosextract Extracts ID, AC, and PA lines from the PROSITE motif database. HGMP

rebaseextract Extract data from REBASE HGMP

tfextract Extract data from TRANSFAC HGMP

utils database indexing

dbiblast Database indexing for BLAST 1 and 2 indexed databases Sanger

dbifasta Index a fasta database HGMP

dbiflat Database indexing for flat file databases Sanger

dbigcg Database indexing for GCG formatted databases Sanger

utils misc

embossdata Finds or fetches the data files read in by the EMBOSS programs HGMP

embossversion Writes the current EMBOSS version number HGMP

PHYLIP TOOLS ( http://evolution.genetics.washington.edu/phylip/programs.html ) downloadable source codes.

Heuristic search for best tree

PROTPARS Estimates phylogenies from protein sequences (input using the standard one-letter code for amino acids) using the parsimony method, in a variant which counts only those nucleotide changes that change the amino acid, on the assumption that silent changes are more easily accomplished."

DNAPARS. Estimates phylogenies by the parsimony method using nucleic acid sequences. Allows use the full IUB ambiguity codes, and estimates ancestral nucleotide states. Gaps treated as a fifth nucleotide state."

DNACOMP. Estimates phylogenies from nucleic acid sequence data using the compatibility criterion, which searches for the largest number of sites which could have all states (nucleotides) uniquely evolved on the same tree. Compatibility is particularly appropriate when sites vary greatly in their rates of evolution, but we do not know in advance which are the less reliable ones.

DNAML. Estimates phylogenies from nucleotide sequences by maximum likelihood. The model employed allows for unequal expected frequencies of the four nucleotides, for unequal rates of transitions and transversions, and for different (prespecified) rates of change in different categories of sites, with the program inferring which sites have which rates.

NAMLK. Same as DNAML but assumes a molecular clock. The use of the two programs together permits a likelihood ratio test of the molecular clock hypothesis to be made.

RESTML. Estimation of phylogenies by maximum likelihood using restriction sites data (not restriction fragments but presence/absence of individual sites). It employs the Jukes-Cantor symmetrical model of nucleotide change, which does not allow for differences of rate between transitions and transversions. This program is VERY slow."

FITCH. Estimates phylogenies from distance matrix data under the "additive tree model" according to which the distances are expected to equal the sums of branch lengths between the species. Uses the Fitch-Margoliash criterion and some related least squares criteria. Does not assume an evolutionary clock. This program will be useful with distances computed from DNA sequences, with DNA hybridization measurements, and with genetic distances computed from gene frequencies.

KITSCH. Estimates phylogenies from distance matrix data under the "ultrametric" model which is the same as the additive tree model except that an evolutionary clock is assumed. The Fitch-Margoliash criterion and other least squares criteria are assumed. This program will be useful with distances computes from DNA sequences, with DNA hybridization measurements, and with genetic distances computed from gene frequencies.

NEIGHBOR An implementation by Mary Kuhner and John Yamato of Saitou and Nei's "Neighbor Joining Method," and of the UPGMA (Average Linkage clustering) method. Neighbor Joining is a distance matrix method producing an unrooted tree without the assumption of a clock. UPGMA does assume a clock. The branch lengths are not optimized by the least squares criterion but the methods are very fast and thus can handle much larger data sets.

ONTML. Estimates phylogenies from gene frequency data by maximum likelihood under a model in which all divergence is due to genetic drift in the absence of new mutations. Does not assume a molecular clock. An alternative method of analyzing this data is to compute Nei's genetic distance and use one of the distance matrix programs.

MIX. Estimates phylogenies by some parsimony methods for discrete character data with two states (0 and 1). Allows use of the Wagner parsimony method, the Camin-Sokal parsimony method, or arbitrary mixtures of these. Also reconstructs ancestral states and allows weighting of characters."

DOLLOP Estimates phylogenies by the Dollo or polymorphism parsimony criteria for discrete character data with two states (0 and 1). Also reconstructs ancestral states and allows weighting of characters. Dollo parsimony is particularly appropriate for restriction sites data; with ancestor states specified as unknown it may be appropriate for restriction fragments data.

Branch-and-bound exact search for best tree

DNAPENNY. Finds all most parsimonious phylogenies for nucleic acid sequences by branch-and-bound search. This may not be practical (depending on the data) for more than 10 or 11 species.

PENNY. Finds all most parsimonious phylogenies for discrete-character data with two states, for the Wagner, Camin-Sokal, and mixed parsimony criteria using the branch-and-bound method of exact search. May be impractical (depending on the data) for more than 10-11 species.

DOLPENNY. Finds all most parsimonious phylogenies for discrete-character data with two states, for the Dollo or polymorphism parsimony criteria using the branch-and-bound method of exact search. May be impractical (depending on the data) for more than 10-11 species.

CLIQUE. Finds the largest clique of mutually compatible characters, and the phylogeny which they recommend, for discrete character data with two states. The largest clique (or all cliques within a given size range of the largest one) are found by a very fast branch and bound search method. The method does not allow for missing data. For such cases the T (Threshold) option of MIX may be a useful alternative. Compatibility methods are particular useful when some characters are of poor quality and the rest of good quality, but when it is not known in advance which ones are which.

Distances or bootstrap samples

DNADIST Computes four different distances between species from nucleic acid sequences. The distances can then be used in the distance matrix programs. The distances are the Jukes-Cantor formula, one based on Kimura's 2- parameter method, Jin and Nei's distance which allows for rate variation from site to site, and a maximum likelihood method using the model employed in DNAML. The latter method of computing distances can be very slow.

PROTDIST Computes a distance measure for protein sequences, using maximum likelihood estimates based on the Dayhoff PAM matrix, Kimura's 1983 approximation to it, or a model based on the genetic code plus a constraint on changing to a different category of amino acid. The distances can then be used in the distance matrix programs

SEQBOOT Reads in a data set, and produces multiple data sets from it by bootstrap resampling. Since most programs in the current version of the package allow processing of multiple data sets, this can be used together with the consensus tree program CONSENSE to do bootstrap (or delete-half-jackknife) analyses with most of the methods in this package. This program also allows the Archie/Faith technique of permutation of species within characters.

GENDIST Computes one of three different genetic distance formulas from gene frequency data. The formulas are Nei's genetic distance, the Cavalli- Sforza chord measure, and the genetic distance of Reynolds et. al. The former is appropriate for data in which new mutations occur in an infinite isoalleles neutral mutation model, the latter two for a model without mutation and with pure genetic drift. The distances are written to a file in a format appropriate for input to the distance matrix programs.

FACTOR Takes discrete multistate data with character state trees and produces the corresponding data set with two states (0 and 1). Written by Christopher Meacham

Tree manipulation, plotting, consensus

DRAWGRAM Plots rooted phylogenies, cladograms, and phenograms in a wide variety of user-controllable formats. The program is interactive and allows previewing of the tree on PC graphics screens, and Tektronix or DEC graphics terminals. Final output can be on a laser printer (such as the Apple Laserwriter or HP Laserjet), on graphics screens or terminals, in files readable by drawing programs such as PC Paintbrush, MacDraw, Idraw, and Xfig, on pen plotters (Hewlett-Packard or Houston Instruments) or on dot matrix printers capable of graphics

DRAWTREE Similar to DRAWGRAM but plots unrooted phylogenies

CONSENSE Computes consensus trees by the majority-rule consensus tree method, which also allows one to easily find the strict consensus tree. Does NOT compute the Adams consensus tree. Trees are input in a tree file in standard nested-parenthesis notation, which is produced by many of the tree estimation programs in the package. This program can be used as the final step in doing bootstrap analyses for many of the methods in the package

RETREE Reads in a tree (with branch lengths if necessary) and allows you to reroot the tree, to flip branches, to change species names and branch lengths, and then write the result out. Can be used to convert between rooted and unrooted trees.

Interactive tree manipulation

DNAMOVE Interactive construction of phylogenies from nucleic acid sequences, with their evaluation by parsimony and compatibility and the display of reconstructed ancestral bases. This can be used to find parsimony or compatibility estimates by hand.

MOVE Interactive construction of phylogenies from discrete character data with two states (0 and 1). Evaluates parsimony and compatibility criteria for those phylogenies and displays reconstructed states throughout the tree. This can be used to find parsimony or compatibility estimates by hand.

DOLMOVE Interactive construction of phylogenies from discrete character data with two states (0 and 1) using the Dollo or polymorphism parsimony criteria. Evaluates parsimony and compatibility criteria for those phylogenies and displays reconstructed states throughout the tree. This can be used to find parsimony or compatibility estimates by hand.

List of Other Phylogenetic Analysis Tools (http://evolution.genetics.washington.edu/phylip/software.html)

EBI Tools http://www.ebi.ac.uk/Tools/index.html

Homology & Similarity http://www.ebi.ac.uk/Tools/homology.html programs can be used to look for sequence similarity

http://www.ebi.ac.uk/blast/index.html - the BLAST

http://www.ebi.ac.uk/fasta/index.html or Fasta

Protein Functional Analysis http://www.ebi.ac.uk/Tools/protein.html

http://www.ebi.ac.uk/interpro/scan.html InterProScan

Structural Analysis http://www.ebi.ac.uk/Tools/structural.html can be used to search for motifs in your protein sequence

http://www.ebi.ac.uk/msd-srv/ssm - MSDfold

http://www.ebi.ac.uk/dali/ or DALI

Sequence Analysis http://www.ebi.ac.uk/Tools/sequence.html can be used to query your protein structure and compare it to those in the Protein Data Bank (PDB)

http://www.ebi.ac.uk/clustalw/index.html - ClustalW

Miscellaneous Tools http://www.ebi.ac.uk/Tools/misc.html a sequence alignment tool

http://www.ebi.ac.uk/microarray/ExpressionProfiler/ep.html Expression Profiler: A set of tools for clustering, analysis and visualization of gene expression and other genomic data

EXPASY TOOLS http://expasy.ch/

Proteomics and sequence analysis tools http://expasy.ch/tools/

Proteomics http://expasy.ch/tools/

http://expasy.ch/tools/peptident.html PeptIdent

http://expasy.ch/tools/peptide-mass.html PeptideMass

DNA -> Protein http://expasy.ch/tools/

http://expasy.ch/tools/dna.html Translate

Similarity searches http://expasy.ch/tools/

http://expasy.ch/tools/blast/ BLAST

Pattern and profile searches http://expasy.ch/tools/

http://expasy.ch/tools/scanprosite/ ScanProsite

Post-translational modification and topology prediction http://expasy.ch/tools/

Primary structure analysis http://expasy.ch/tools/

http://expasy.ch/tools/protparam.html ProtParam

http://expasy.ch/tools/pi_tool.html, pI/MW

http://expasy.ch/cgi-bin/protscale.pl ProtScale

Secondary and tertiary structure prediction http://expasy.ch/tools/

http://expasy.ch/swissmod/SWISS-MODEL.html SWISS-MODEL

http://expasy.ch/spdbv/ Swiss-PdbViewer

Alignment http://expasy.ch/tools/

http://www.ch.embnet.org/software/TCoffee.html T-COFFEE

http://expasy.ch/tools/sim-prot.html SIM

Biological text analysis http://expasy.ch/tools/

http://expasy.ch/melanie/ Software for 2-D PAGE analysis

Roche Applied Science's Biochemical Pathways http://expasy.ch/cgi-bin/search-biochem-index

RCSB-Developed Software

mmCIF Resources

CIFTr http://pdb.rutgers.edu/mmcif/CIFTr/index.html

CIFLIB http://pdb.rutgers.edu/mmcif/CIFLIB/index.html C language application program interface

CIFOBJ http://pdb.rutgers.edu/mmcif/CIFOBJ/index.html A class library of mmCIF dictionary access tools

CIFPARSE http://pdb.rutgers.edu/mmcif/CIFPARSE/index.html A library of access tools for mmCIF

CIFPARSE-OBJ http://pdb.rutgers.edu/mmcif/CIFPARSE-OBJ/index.html A library of access tools for mmCIF in C++

CIFTABLE (SSTable) http://pdb.rutgers.edu/mmcif/SSTABLE/index.html A class library of table access tools (old version)

CIFTABLE (ISTable) http://pdb.rutgers.edu/mmcif/ISTABLE/index.html A class library of table access tools

mmCIF loader http://pdb.rutgers.edu/mmcif/MMCIF-LOADER/index.html An application to load mmCIF data into relational databases and XML

OpenMMS Toolkit http://openmms.sdsc.edu A suite of Java source code that includes an mmCIF parser, RDBMS loader, XML translator, and Corba server

STAR (CIF) parser http://pdb.sdsc.edu/index.html Several object-oriented Perl modules for parsing mmCIF files and other STAR-compliant files without nested loops

Deposition Resources

ADIT - Workstation Version (alpha release) http://pdb.rutgers.edu/mmcif/ADIT/index.html A package for editing and checking structure data entries

MAXIT http://pdb.rutgers.edu/mmcif/MAXIT/index.html An application for processing and curation of macromolecular structure data

PDB_EXTRACT http://pdb.rutgers.edu/mmcif/demo.tar.gz (download) Tools and examples for extracting mmCIF data from structure determination applications

PDB Validation Suite (beta version) http://pdb.rutgers.edu/mmcif/VAL/index.html A tool for processing and checking structure data

FTP Archive Resources

bnl2rcsb ftp://ftp.rcsb.org/pub/pdb/software/ Perl script to convert a BNL FTP directory structure to an RCSB FTP directory structure

getPdbUpdate ftp://ftp.rcsb.org/pub/pdb/software/ Perl script to retrieve files from any update found at

Other Software Links*

mmCIF software tools

CBFLib http://www.bernstein-plus-sons.com/software/CBF/

A library of ANSI-C functions providing a simple mechanism for accessing Crystallographic Binary Files (CBF files) and Image-supporting CIF (imgCIF) files

cif2pdb http://www.bernstein-plus-sons.com/software/cif2pdb/ Program to convert mmCIF to pseudo-PDB format

CIFtbx2 http://www.bernstein-plus-sons.com/software/ciftbx/

Extended CIF Tool Box (Fortran) with CYCLOPS and cif2cif

OOSTAR http://www.sdsc.edu/pb/cif/OOSTAR.html

Applications to manipulate STAR files (Objective-C)

pdb2cif http://www.bernstein-plus-sons.com/software/pdb2cif/

Scripts to filter a PDB entry and produce mmCIF

Crystallography

ARP/wARP http://www.embl-hamburg.de/ARP/ A system for the refinement of protein structures via automatic updating and re-building of the model and solvent structure

CCP4 http://www.dl.ac.uk/CCP/CCP4/main.htmlA suite of programs covering all aspects of crystallographic structure determination, refinement and analysis

CNS http://cns.csb.yale.edu/v1.0/ A system for structure determination from crystallographic or NMR data

MAIN http://www-bmb.ijs.si/doc/An interactively driven suite of programs for molecular modeling, density modification, model refinement and structure analysis

http://imsb.au.dk/~mok/o/ An interactive system for building and manipulating models in electron density maps

SHELX http://shelx.uni-ac.gwdg.de/SHELX/ A set of programs for direct structure solution and refinement with high resolution diffraction data

SOLVE http://www.solve.lanl.gov/ An automated system for phase determination from MIR and MAD data

X-PLOR 3.851 http://xplor.csb.yale.edu/xplor-info/xploronline.html A program for structure determination from crystallographic or NMR data (Yale version)

X-PLOR/CNX http://www.accelrys.com/products/cnx/ A program for structure determination from crystallographic or NMR data (Accelrys version)

XtalView http://www.scripps.edu/pub/dem-web/toc.html An interactive system for building and manipulating models in electron density map and for phase determination from MIR or MAD data.

NMR

CNS http://cns.csb.yale.edu/v1.0/ A system for structure determination from crystallographic or NMR data

CYANA http://www.guentert.com/Cyana.html A program for the structure calculation of biological macromolecules on the basis of conformational constraints from NMR

Fantom http://www.scsb.utmb.edu/fantom/fm_home.html A program for structure calculation and refinement using torsion angle minimization with NMR data

X-PLOR 3.851 http://xplor.csb.yale.edu/xplor-info/xploronline.html A program for structure determination from crystallographic or NMR data (Yale version)

Structure Analysis and Verification

CE/CL http://cl.sdsc.edu/ Software for structure comparison by Combinatorial Extension (CE) and Compound Likeness (CL)

ENDscript http://genopole.toulouse.inra.fr/ENDscript

A Web server for searching homologous sequences and giving information on secondary structure elements, accessibility, hydropathy and protein-protein contacts

ESPript http://genopole.toulouse.inra.fr/ESPript Easy Sequencing in Postscript

Non-covalent bond finder http://www.umass.edu/microbio/chime/find-ncb/index.htm Software for finding non-covalent interactions for use with Chime 2 or higher

PASS http://www.delanet.com/~bradygp/pass A fast cavity-detection program for the identification and visualization of possible protein binding sites

Procheck http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html A program that checks the stereochemical quality of a protein structure

ProFit http://www.biochem.ucl.ac.uk/~martin/text/ProFit.readme A program for fitting protein structures on to each other

SARF2 http://123d.ncifcrf.gov/sarf2.html A program which searches for similar structural motifs (via an analysis of backbone fragments) in protein structures

Surface Racer http://monte.biochem.wisc.edu/~tsodikov/surface.html

A program that calculates exact accessible surface area, molecular surface area and average curvature of molecular surface, and analyzes cavities in the protein interior inaccessible from the outside.

SURFNET http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.html A program which generates surfaces and void regions between molecular surfaces

WHAT_CHECK http://www.sander.embl-heidelberg.de/whatcheck/ A system for protein structure validation derived from the WHAT IF program

WHAT IF http://www.cmbi.kun.nl/whatif/A protein structure analysis program that may be used for mutant prediction, structure verification and molecular graphics

Modeling and Simulation

ANALYZE http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/analyze/index.asp Cornell Theory Center program to classify and analyze conformations obtained from global searches; includes capabability to compare NMR intensites and coupling constants to experimental data

AMBER http://www.amber.ucsf.edu/amber/amber.html Assisted Model Building with Energy Refinement - a molecular dynamics and energy minimization program

AutoDock3.0 http://www.scripps.edu/pub/olson-web/dock/autodock

A suite of automated docking tools designed to predict how small molecules, such as substrate or drug candidates, bind to a receptor of known 3D structure

CHARMM http://yuri.harvard.edu/ Chemistry at HARvard Molecular Mechanics - a molecular dynamics and energy minimization program

ECEPPAK http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/eceppak/index.asp Cornell Theory Center package to carry out global conformational searches using the ECEPP/3 force field

FTDOCK http://www.bmm.icnet.uk/docking/ A program for carrying out rigid-body docking between biomolecules

GROMOS http://www.igc.ethz.ch/gromos/ A general-purpose molecular dynamics computer simulation package for the study of biomolecular systems

GROMACS http://md.chem.rug.nl/~gmxComplete modelling package for proteins, membrane systems and more, including fast molecular dynamics, normal mode analysis, essential dynamics analysis and many trajectory analysis utilities

ICM http://www.molsoft.com/MolSoft ICM programs and modules for applications including for structure analysis, modeling, docking, homology modeling and virtual ligand screening

JACKAL http://trantor.bioc.columbia.edu/~xiang/jackal/

Suite of tools for model building, structure prediction and refinement, reconstruction, and minimization; for SGI, Linux, and Sun Solaris

LOOPP http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/loopp/index.asp Linear Optimization of Protein Potentials. Cornell Theory Center program for potential optimization and alignments of sequences and structures

MAMMOTH http://icb.mssm.edu/services/mammoth/mammoth

MAtching Molecular Models Obtained from THeory - a program for automated pairwise and multiple structural alignments; for SGI, Linux, and Sun Solaris

MidasPlus http://www.cgl.ucsf.edu/Outreach/midasplus/A program for displaying, manipulating and analysing macromolecules

MODELLER http://guitar.rockefeller.edu/modeller/modeller.html A program for automated protein homology modeling

MOIL http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/moil/index.asp Cornell Theory Center package for molecular dynamics simulation of biological molecules

NAMD http://www.ks.uiuc.edu/Research/namd/ A parallel object-oriented molecular dynamics simulation program

WAM - Web Antibody Modelling http://antibody.bath.ac.uk A server for automated structure modeling from antibody Fv sequences

123D http://123d.ncifcrf.gov/123D+.htmlA program which threads a sequence through a set of structures using substitution matrix, secondary structure prediction and contact capacity potential

Molecular Graphics

BioEditor http://bioeditor.sdsc.edu/

Shockwave 3D PDB Viewer http://www.candomultimedia.com/medical A tool for creating and viewing dynamic, formatted structure annotations; for Windows

Free, easy to use tool for viewing molecular structures through a Web page--streams data directly from PDB on PC's and Mac; developed in Ireland

Chemscape Chime http://www.mdlchime.com/chime/

From MDL Information Systems. This program allows visualisation of structures within WWW browser pages. For further information about Chime see the UMass Chime Resources Page http://www.umass.edu/microbio/chime/

Java3D Molecular Visualisation System http://www.adcworks.com/projects/jmvs

Free Java/Java3D progam and source code

Mage and Kinemages http://kinemage.biochem.duke.edu/kinemage/kinemage.phpInteractive molecular display for research and educational uses. Free, open source for Macintosh, PC, Unix, and Linux. A Java version does 3-D Web display without plug-ins.

MOLMOL http://www.mol.biol.ethz.ch/wuthrich/software/molmol/

A program for displaying, analyzing, and manipulating the 3-D structure of biological macromolecules, with special emphasis on the study of protein or DNA structures determined by NMR

RasMol http://www.bernstein-plus-sons.com/software/rasmol/A free viewing system for PDB coordinate files that runs on Macintosh, PC and UNIX systems. Open source versions http://www.openrasmol.org/

Raster3D http://www.bmsc.washington.edu/raster3d/raster3d.htmlA set of tools for generating high quality raster images of proteins or other molecules. Freeware for UNIX, LINUX and PC.

RasTop (v. 2.0) http://www.geneinfinity.org/rastopA free user-friendly graphical interface to RasMol molecular visualization software (v. 2.7.2.1), available for Windows platforms

Ribbons http://sgce.cbse.uab.edu/ribbons/ A program for molecular illustration and error analysis

RmscopII http://rmscopii.sourceforge.net/

A Tcl/Tk script responsible to redirect PDB files or RasMol scripts to multiple RasMol sessions; can be used as a Web browser helper application or as a standalone program.

Swiss PDB viewer available from Switzerland http://www.expasy.ch/spdbv/ | Australia

A 3D graphics and molecular modeling program for the simultaneous analysis of multiple models and for model-building into electron density maps. The software is available for Macintosh or PC

Uppsala Electron Density Server http://portray.bmc.uu.se/eds/ Generated density maps

MolScript http://www.avatar.se/molscript/ A program for displaying structures in both detailed and schematic formats and writing images in various formats

MolView and MolView Lite http://www.danforthcenter.org/smith/MolView/molview.html Free molecular visualization programs for the Macintosh

PDB2MGIF http://www.dkfz-heidelberg.de/spec/pdb2mgif/

Free, user-friendly server that converts PDB files to animated gif files that can be used in Web pages and presentations. Simple step-by-step instructions can be found here http://www.rcsb.org/pdb/animation.html .

PocketMol http://birg.cs.wright.edu/pocketmol/pocketmol.html

Program to view and manipulate PDB files on a PocketPC

ProteinScope http://www.proteinscope.com

Free viewer to display and manipulate PDB files and create animations and slides of proteins

PyMOL http://www.pymol.org

A free and open-source molecular graphics system for visualization, animation, editing, and publication-quality imagery. PyMOL is scriptable and can be extended using the Python language. Supports Windows, Mac OSX, and Unix

Qmol http://lancelot.bio.cornell.edu/jason/qmol.html

A lightweight OpenGL based molecular viewer for Windows 95/NT/00 and X Windows

ViewerLite and ViewerPro (Discovery Studio) http://www.accelrys.com/dstudio/ds_viewer/ Molecular visualization programs for Macintosh and PC from Accelrys

VMD http://www.ks.uiuc.edu/Research/vmd/VMD (Visual Molecular Dynamics) runs on many platforms including MacOS X, and several versions of Unix and Windows. VMD provides visualization, analysis, and Tcl/Python scripting features, and has recently added sequence browsing and volumetric rendering features. VMD is distributed free of charge.

WebMol http://www.cmpharm.ucsf.edu/~walther/webmol.html A Java PDB Viewer. WebMol was designed to display and analyze structural information contained in the Protein Data Bank (PDB). It can be run as an applet or as a stand-alone application.

World Index of Molecular Visualization Resources http://molvis.sdsc.edu/visres/

A Visitor-Maintained Indices (VMI)TM Site by Eric Martz and Trevor D. Kramer. Contains many links to visualization tools, tutorials, and other resources.

TIGR Tools http://www.tigr.org/software/

Gene Finding/Annotation

MANATEE http://manatee.sourceforge.net/ is a web-based gene evaluation and genome annotation tool. Manatee can store and view annotation for prokaryotic and eukaryotic genomes. The Manatee interface allows biologists to quickly identify genes and make high quality functional assignments, such as GO classifications, using search data, paralogous families, and annotation suggestions generated from automated analysis.

GlimmerM http://www.tigr.org/software/glimmerm/.related organisms. A gene finder derived from Glimmer, but developed specifically for eukaryotes. It is based on a dynamic programing algorithm that considers all combinations of possible exons for inclusion in a gene model and chooses the best of these combinations. The decision about what gene model is best is a combination of the strength of the splice sites and the score of the exons generated by an interpolated Markov model (IMM). The system has been trained for Arabidopsis thaliana, Oryza sativa (rice), and Plasmodium falciparum (the malaria parasite), and should work well on closely

Glimmer http://www.tigr.org/software/glimmer/ A system for finding genes in microbial DNA, especially the genomes of bacteria and archaea. (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA.

GeneSplicer : A computational method for splice site prediction http://www.tigr.org/tdb/GeneSplicer/gene_spl.html A fast, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been trained and tested successfully on Plasmodium falciparum (malaria), Arabidopsis thaliana and human genomes. Training data sets for Human and Arabidopsis thaliana are included. It is fully described in Pertea M, Lin X, Salzberg SL. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 2001 Mar 1;29(5):1185-90 .

TransTerm http://www.tigr.org/software/transterm.html is a program that finds rho-independent transcription terminators in bacterial genomes. Each terminator found by the program is assigned a confidence value that provides an estimate of its probability of being a true terminator. TransTerm has been published: Prediction of Transcription Terminators in Bacterial Genomes Ermolaeva, M.D., Khalak, H.G., .White, O., Smith, H.O., Salzberg, S.L. Journal of Molecular Biology 301, 27-33 (2000)

EXONomy http://www.tigr.org/software/Exonomy/index.shtml is a new gene finder based on the Generalized Hidden Markov Model (GHMM) framework, similar to Genscan and Genie. It is highly reconfigurable and includes software for retraining. The replaceable submodels of the GHMM include homogeneous and inhomogeneous Markov models of selectable order, nonstationary Markov chains, windowed and non-windowed Weight Array Matrices (WWAM/WAM/WMM), Maximal Dependence Decomposition (MDD) trees, and codon bias. An EXONomy Web Interface is available.

Unveil http://www.tigr.org/software/Unveil/index.shtml is a new gene finder based on a 283-state Hidden Markov Model (HMM) similar to that described in [Henderson,J., Salzberg,S., and Fasman,K.H. (1997) J. Comput. Biol. 4, 127-141]. An Unveil Web Interface is available.

ELPH http://www.tigr.org/software/ELPH/index.shtml is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, assuming that each sequence contains one copy of the motif.

RepeatFinder ftp://ftp.tigr.org/pub/software/repeatFinder/ is a computational system for analysis of repetitive structure of genomic sequences. The method uses suffix trees for efficient computation of exact repeats and organizes those repeats into classes. The method can be applied to individual genome sequences or sets of sequences. The output is multi-fasta file of found repeat sequences that can be used as the target of searches.

RBSfinder ftp://ftp.tigr.org/pub/software/RBSfinder/ is a Perl script that implements an algorithm to find ribosome binding sites for genes in bacterial and archaeal genomes. It is normally run as a post-processor to the Glimmer gene finder or to other prokaryotic gene finders.

Combiner http://www.tigr.org/software/combiner/ is a program that predicts gene models using the output from other annotation software. It uses a statistical algorithm to identify patterns of evidence corresponding to gene models.

HBQCM: ftp://ftp.tigr.org/pub/software/qc/ Hexamer Based Quality Control Method as described in White O., Dunning T., Sutton G., Adams M., Venter J.C., and Fields C. (1993) A quality control algorithm for DNA sequencing projects. Nucleic Acids Research 21:3829-3838.

Alignment

MUMmer http://www.tigr.org/software/mummer/ A system for aligning whole genome sequences. Using an efficient data structure called a suffix tree, the system is able rapidly to align sequences containing millions of nucleotides. It is fully described in: A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg. Alignment of whole genomes. Nucleic Acids Research, 27:11 (1999), 2369-2376. A graphical viewer for the MUMmer output can be found here.

AAT ftp://ftp.tigr.org/pub/software/AAT/: A tool for analyzing and annotating genomic sequences. Huang, X., Adams, M.D., Zhou, H. and Kerlavage, A.R. (1997) Genomics 46, 37-45. The AAT package includes two sets of programs, one set (DPS/NAP) for comparing the query sequence with a protein database, and the other (DDS/GAP2) for comparing the query with a cDNA database.

Sequencing/Finishing

Assembler: http://www.tigr.org/software/assembler/ A tool for assembly of large sets of overlapping sequence data such as ESTs, BACs, or small genomes. This updated assembly tool delivers better performance and results than the previous version, assembling EST, BAC, and genome data with greater care given to repeat detection and contig-level overlapping. TIGR Assembler has been published (Sutton G., White, O., Adams, M., and Kerlavage, A. (1995) TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science & Technology 1:9-19). Also available, without a license, is the utility ta2ace for converting TIGR Assembler output into the "new" .ACE format used by Consed and other sequence assembly editors.

BAMBUS http://www.tigr.org/software/bambus/ is the first publicly available genome sequence scaffolding program. It orders and orients contigs into scaffolds based on various types of linking information. Additionally, BAMBUS allows users to build scaffolds in a hierarchical fashion by prioritizing the order in which links are used. BAMBUS runs on Unix systems.

Lucy: http://www.tigr.org/software/lucy/ A Sequence Cleanup Program. Lucy is a utility that prepares raw DNA sequence fragments for sequence assembly, possibly using the TIGR Assembler. The cleanup process includes quality assessment, confidence reassurance, vector trimming and vector removal. The primary advantage of Lucy over other similar utilities is that it is a fully integrated, stand alone program. You can view the Program Requirements. The Windows version of Lucy is available from Hui-Hsien Chou's webpage. Lucy is fully described in: DNA sequence quality trimming and vector removal. H.-H. Chou and M.H. Holmes. Bioinformatics, 17:12, pp. 1093-1104, 2001

Microarray

TM4: A package of Open Source software programsfor Microarray analysis http://www.tigr.org/software/tm4/ TIGR Microarray Data Analysis System (MIDAS) is a microarray data quality filtering and normalization tool that allows raw experimental data to be processed through various data normalizations, filters, and transformations via a user-designed analysis pipeline. Currently implemented normalization and data analysis algorithms include total-intensity normalization, Lowess (Locfit) normalization, flip-dye consistency checking, replicates analysis, intensity-dependent z-score filtering (slice analysis), etc. MIDAS is implemented by Java language and thus a platform-independent application. It requires JDK v1.3 or higher. Refer to the included manual for details.

MADAM (MicroArray DAta Manager) Microarray experiments produce large amounts of data for even the simplest of experiments. In order to analyze data from many experiments that data must be stored in an accessible form, such as in a database. MADAM (MicroArray DAta Manager) is a java-based application designed to load and retrieve microarray data to and from a database (also supplied with the software). MADAM provides data entry forms, data report forms and additional applications necessary to maintain microarray data for further analysis. Madam requires JRE 1.3.1.

TIGR MultiExperiment Viewer (MEV) is a Java application designed to allow the analysis of microarray data to identify patterns of gene expression and differentially expressed genes. Numerous normalization, clustering and distance algorithms have been implemented, along with a variety of graphical displays to best present the results. MEV was written to be flexible and expandable, and supports a variety of input and output formats. MEV requires version 1.2 or higher of Sun's JRE and J3D package.

TIGR Spotfinder is a software tool designed for Microarray image processing using the TIFF image files generated by most microarray scanners. TIGR Spotfinder was written in C/C++ for PCs running Windows NT/2000/ME/XP.

ArrayViewer http://www.tigr.org/tigr-scripts/license/new.pl?genre=soft&program=ArrayViewer is written in Java for cross-platform compatibility and reads and writes data using flat files or a database through stored procedures, See the ArrayViewer Overview as a Adobe Acrobat PDF File. Machines that lack the requirements for the MultiExperiment Viewer may use ArrayViewer for single experiment analysis. A software tool designed to facilitate the presentation and analysis of microarray expression data, leading to the identification of genes that are differentially expressed.

TIGR McCoder ftp://ftp.tigr.org/pub/software/Microarray/McCoder/ is a software package designed for a portable scanner with Palm OS to collect bar codes and then transfer the bar codes to PC as a plain text file. The package includes two programs: one that runs on the handheld scanner and one that runs on a regular PC with Windows 95/98/2000/NT. Transferred to PC, the scanned bar codes could be manipulated easily with McCoder.

Scheduler ftp://ftp.tigr.org/pub/software/Microarray/Scheduler/ is a web based tool that provides an efficient reservation method to manage lab instruments and office facilities. The Scheduler is designed as a two-tier system running on the Internet and can be configured to meet a variety of requirements.

NCBI Tools http://www.ncbi.nlm.nih.gov/

The Basic Local Alignment Search Tool (BLAST http://www.ncbi.nlm.nih.gov/BLAST/), for comparing gene and protein sequences against others in public databases, now comes in several flavors including PSI-BLAST, PHI-BLAST, and BLAST 2 sequences. Specialized BLASTs are also available for human, microbial, malaria, and other genomes, as well as for vector contamination, immunoglobulins, and tentative human consensus sequences.

Clusters of Orthologous Groups (COGs http://www.ncbi.nlm.nih.gov/COG/) currently covers 21 complete genomes from 17 major phylogenetic lineages. A COG is a cluster of very similar proteins found in at least three species. The presence or absence of a protein in different genomes can tell us about the evolution of the organisms, as well as point to new drug targets.

Map Viewer http://www.ncbi.nlm.nih.gov/mapview/static/MVstart.html shows integrated views of chromosome maps for 17 organisms. Used to view the NCBI assembly of complete genomes, including human, Map Viewer is a valuable tool for the identification and localization of genes, particularly those that contribute to diseases.

LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/ combines descriptive and sequence information on genetic loci through a single query interface. LocusLink covers information on official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, OMIM numbers, UniGene clusters, homology, map information, and related web sites.

UniGene http://www.ncbi.nlm.nih.gov/UniGene/ cluster is a non-redundant set of sequences that represents a unique gene. Well-characterized genes, as well as thousands of expressed sequence tag (EST) sequences have been included. Each cluster record also contains information such as the tissue types in which the gene has been expressed and map location. UniGene can assist in gene discovery, gene mapping projects, and large-scale expression analysis.

ORF finder http://www.ncbi.nlm.nih.gov/gorf/gorf.html identifies all possible ORFs in a DNA sequence by locating the standard and alternative stop and start codons. The deduced amino acid sequences can then be used to BLAST against GenBank. ORF finder is also packaged in the sequence submission software Sequin.

Electronic PCR http://www.ncbi.nlm.nih.gov/genome/sts/epcr.cgi allows you to search your DNA sequence for sequence tagged sites (STSs), which have been used as landmarks in various types of genomic maps. It compares the query sequence against data in NCBI's UniSTS, a unified, non-redundant view of STSs from a wide range of sources.

VAST Search http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html is a structure-structure similarity search service. It compares 3D coordinates of a newly determined protein structure to those in the MMDB/PDB database. VAST Search computes a list of similar structures that can be browsed interactively, using molecular graphics to view superimpositions and alignments.

The Cancer Chromosome Aberration Project (CCAP) http://www.ncbi.nlm.nih.gov/CCAP/ compiles information on the distinct chromosome aberrations that are associated with different cancers. The identification of chromosomal abnormalities by clinicians can enable the diagnosis of, classification of, and treatment selection for a given cancer.

HumanMouse Homology Maps http://www.ncbi.nlm.nih.gov/Homology/ compare genes in homologous segments of DNA from human and mouse sources, sorted by position in each genome. A total of 1793 loci are presented, most of which are genes. This map should be interpreted as a reflection of probable, not confirmed, homology relationships because of the lack of further information available for about half the loci.

VecScreen http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html is a tool for identifying segments of a nucleic acid sequence that may be of vector, linker or adapter origin prior to sequence analysis or submission. VecScreen was developed to combat the problem of vector contamination in public sequence databases. dbMHC provides an open, publicly accessible platform for DNA, and clinical data related to the human Major Histocompatibilty Complex (MHC). In addition the dbMHC will provide tools for further submission and analysis of research data linked to the MHC.

The Cancer Genome Anatomy Project (CGAP) http://www.ncbi.nlm.nih.gov/ncicgap/ aims to decipher the molecular anatomy of cancer cells. CGAP develops profiles of cancer cells by comparing gene expression in normal, precancerous, and malignant cells from a wide variety of tissues.

mRNA to Genomic Alignments: Spidey http://www.ncbi.nih.gov/IEB/Research/Ostell/Spidey aligns one or more mRNA sequences to a single genomic sequence. Spidey will try to determine the exon/intron structure, returning one or more models of the genomic structure, including the genomic/mRNA alignments for each exon.

Biology WorkBench http://biowb.sdsc.edu/

Protein Tools

Ndjinn Multiple Database Search

BL2SEQ Compare proteins to each other with BLAST

BL2SEQX Compare a protein to nucleotide sequences with BLAST

BLASTP Compare a PS to a PS DB

TBLASTN Compare a PS to a translated DB

PSIBLASTP Position Specific Iterative BLAST

FASTA Heuristic Sequence Similarity Search (PS Or DB)

TFASTA Compare a PS to a NS, PS DB

TFASTX Comp PS to Trans DNA (NS Or DB)

TFASTY Comp PS to Trans DNA (NS Or DB)

SSEARCH Smith Waterman Local Alignment of Proteins

CLUSTALW Multiple Sequence Alignment

CLUSTALWPROF Align Sequences to Existing Alignment (Profile)

ALIGN Optimal Global Alignment of Two PS

MSA Multiple Sequence Alignment (Sum of Pairs Criterion)

LALIGN Calculate N Best Local PS Alignments

LFASTA Local Alignment of Two PS

ROBUST Global alignment of Two PS (Show Robust Pairs)

SIM N Best Local Similarities Using Affine Weights

BESTSCOR Calculate the Best Self Comparison Score

CTREE Align protein sequences with confidence estimates

PRSS Compare a PS to a Shuffled PS

SAPS Statistical Analysis of PS

AASTATS Statistics Based on Amino Acid Abundance, including weight and specific volume

GREASE Kyte Doolittle Hydropathy Profile

RPSBLAST Compare a PS to a Conserved Domain DB

FINGERPRINTSCAN PRINTS fingerprint identification

PROSEARCH Search Prosite DB for Patterns in a PS

PPSEARCH Search Prosite DB for Patterns in a PS

PFSCAN Sequence Search Against a Set of Profiles (PROSITE and PFAM)

HMMPFAM Search against Pfam HMM database

BLIMPS Sequence Search Against a Set of Profiles (BLOCKS)

PATTERNMATCHDB Search for Regular Expressions (Patterns) in a protein sequence DB

PATTERNMATCH Search for Regular Expressions (Patterns) in a protein sequence

GOR4 Predict Secondary Structure of PS

RANDSEQ Randomize a Sequence

CHOFAS Predict Secondary Stucture of PS(s) (Chou Fasman)

HTH Predict HTH Motifs in Protein Chains

PELE Protein Structure Prediction

DSSP Secondary Structure/Solvent Exposure of PDB Proteins

TMAP Prediction of Transmembrane Segments

TMHMM Predict location of transmembrane helices and location of intervening loop regions

EXTCOEF Extinction coefficient calculation

PI Isoelectric point determination

Nucleic Acid Tools

BL2SEQ Compare nucleotides to each other with BLAST

BL2SEQX Compare a nucleotide to protein sequences with BLAST

BLASTN Compare a NS to a NS DB

BLASTX Compare a PS Derived from NS to a PS DB

TBLASTX Compare a translated NS to a translated DB

FASTA Nucleic Acid Sequence Comparisons (NS or DB)

FASTX Compare Translated NS to PS DB

FASTY Compare Translated NS to PS DB

SSEARCH Smith

CLUSTALW Multiple Sequence Alignment

CLUSTALWPROF Align Sequences to Existing Alignment (Profile)

ALIGN Optimal Global Sequence Alignment

LALIGN Calculate Optimal Local Sequence Alignments

LFASTA Calculate Local Sequence Alignments (Heuristic)

PATTERNMATCHDB Search for Regular Expressions (Patterns) in a nucleic sequence DB

PATTERNMATCH Search for Regular Expressions (Patterns) in a nucleic sequence

TACG Analyze a NS for Restriction Enzyme Sites

PRIMER3 Design Primer Pairs and Probes

NASTATS Nucleic Acid Statistics

BESTSCOR Calculate the Best Self Comparison Score

PFSCAN Sequence Search Against a Set of Profiles (PROSITE)

PRIMERCHECK Calculates melting point, length, %GC for a primer sequence

PRIMERTM Designs end primers based on a minimum Tm

SIXFRAME Generate & Import 6 Frame Translations on a NS

REVCOM Generate Reverse Complement of NS

RANDSEQ Randomize a Sequence

Alignment Tools

Ndjinn Multiple Database Search

SPLITSplit Alignment Into Component Sequences

DEGAP_SPLITSplit Alignment Into Component Sequences and Remove Gap Characters

Download Aligned Sequences

TEXSHADE Color Coded Plots of Pre Aligned Sequences

BOXSHADE Color Coded Plots of Pre Aligned Sequences

CLUSTALWPROF Align Two Existing Alignments (Profiles)

TMAP Prediction of Transmembrane Segments

DRAWTREEDRAWTREE Draw Unrooted Phylogenetic Tree from Alignment

DRAWGRAM Draw Rooted Phylogenetic Tree from Alignment

CLUSTALDIST Generate Distance Matrix with Clustal W

CLUSTALTREE Phylogenetic Analysis with Clustal W

DNADIST Compute Evolutionary Distance Matrix from NS Alignment

PROTDIST Compute Evolutionary Distance Matrix from PS Alignment

DNAPARS Infer an Unrooted Phylogeny from NS Alignment

PROTPARS Infer an Unrooted Phylogeny from PS Alignment

MVIEW Multiple Alignment Display

Structure Tools

PDF PDF Knowledge

CONVERT File format conversion utility

TNT Macromolecular Refinement Package

sharma.animesh@gmail.com