ServicenavigationHauptnavigationTrailKarteikarten


Forschungsstelle
SIB
Projektnummer
31-63879.00
Projekttitel
Characterization of biomolecular sequence motifs by generalised profiles
Projekttitel Englisch
Characterization of biomolecular sequence motifs by generalised profiles

Texte zu diesem Projekt

 DeutschFranzösischItalienischEnglisch
Schlüsselwörter
-
-
-
Anzeigen
Kurzbeschreibung
-
-
-
Anzeigen

Erfasste Texte


KategorieText
Schlüsselwörter
(Englisch)
In silico gene discovery, protein homology domains, complete genome annotation, sequence motif databases, software development, generalized profiles, protein function prediction, metaprofiles
Kurzbeschreibung
(Englisch)

 

With the rapid growth of sequence databases, there is an increasing need for reliable functional annotation of newly predicted proteins. For the efficient transfer of annotation from a biochemically characterized protein to an uncharacterised one, a two step procedure is generally adopted.

The first step identifies similarity between a known protein and a new one. Different tools that we shall globally call predictors do this.

The second step transfers a series of relevant functional information to the uncharacterised protein based on a specific parser and/or a series of rules.

Our group has developed the PROSITE database, which uses generalized profiles and pattern predictors to identify similarities. The PROSITE database was largely used for the functional annotation of several genomes (1) and to automatically annotate some specific lines in the Swiss-Prot database (2).

In this grant proposal we would like to address the functional annotation problem at three different levels:

·         The family/subfamily classification.

·         The architecture of proteins and domain arrangement.

·         The identification of functional residues (active sites, disulfide bridges, post-translationally modified residues).

This approach will necessitate the development of second-generation predictors that are able to integrate such type of information.

Multiple sequence alignments (MSA) are a key step in the construction of predictors. Actually the composition of columns is the principal information extracted from the MSA and stored in the predictor. A general syntax to annotate MSA with functional and structural information would greatly increase the amount of information transferred to the predictor. The first part of the project will be to develop such syntax and use it for the construction of PROSITE profiles. The annotated information in the MSA will be transferred to profiles and then to the matched sequences, which will greatly facilitate the identification of functional residues.

The family/subfamily classification will be addressed with non-linear Hidden Markov models (HMMs) or branching HMMs. In this approach different HMMs will compete and the best one will be retained to assign a given protein to a specific subfamily.

In parallel we will develop a validation procedure based on probabilistic models to evaluate the accuracy of the proposed functional annotation.

We will also make a systematic attempt to gather structural information in the vicinity of functionally important motifs, as for example those described by the PROSITE patterns.

Tools resulting from the proposed project will be made available to the scientific community through the PROSITE database, software packages released with the GNU General Public License, and the web servers of the Swiss Institute of Bioinformatics.