InterPro is an integrated documentation resource for protein families, domains and functional sites amalgamating the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. The final product contains over 3000 entries, each one comprising a short essay on the corresponding protein domain or functional site, literature references, and links to searchable descriptions (patterns, profiles, fingerprints and hidden Markov models) of the member databases. InterPro has played an instrumental role in the functional annotation of the complete Drosophila genome published this year.
The contribution of our group was to make the necessary adjustments required by InterPro to the protein domain profiles of PROSITE and the corresponding documentation entries. In addition we have maintained the PFTOOLS softrware package and assisted the software integration team at the EBI in the incorporating of our profile search programs into the InterPro search engine. Furthermore, we have set up efficient procedures for generating complete hit lists between PROSITE patterns and profiles and protein sequences in SWISS-PROT and TrEmbl using the GeneMatcher (a specialized hardware for fast sequence database searches) available at the Swiss Institute of Bioinformatics.
The performance of different search methods provided by InterPro for the same protein domain was compared in a systematic manner by rigorous benchmarking protocols. One conclusion from these studies was that generalized profiles are particularly well suited for searching hypothetical protein sequence fragments obtained by gene prediction from working draft sequences. A database of such predicted sequences is maintained by our group and publicly distributed under the name TrGen.
We also revised several PROSITE profile entries to resolve inconsistencies between our hit lists and those obtained with search methods developed by other groups of the consortium. The discussions held with our research partners on the questionable matches to protein patterns or profiles served as an important quality control mechanism for the SWISS-PROT database.
InterPro is accessible for text- and sequence-based searches at
http://www.ebi.ac.uk/interpro. The databases developed by our group can be downloaded from our ftp site:
ftp://ftp.isrec.isb-sib.ch/sib-isrec/.