ServicenavigationHauptnavigationTrailKarteikarten


Forschungsstelle
EU FRP
Projektnummer
97.0495
Projekttitel
EUROSEARCH: Multilingual European federated search service
Projekttitel Englisch
EUROSEARCH: Multilingual European federated search service

Texte zu diesem Projekt

 DeutschFranzösischItalienischEnglisch
Schlüsselwörter
-
-
-
Anzeigen
Alternative Projektnummern
-
-
-
Anzeigen
Forschungsprogramme
-
-
-
Anzeigen
Kurzbeschreibung
-
-
-
Anzeigen
Weitere Hinweise und Angaben
-
-
-
Anzeigen
Partner und Internationale Organisationen
-
-
-
Anzeigen
Abstract
-
-
-
Anzeigen
Datenbankreferenzen
-
-
-
Anzeigen

Erfasste Texte


KategorieText
Schlüsselwörter
(Englisch)
Cross-language information retrieval; federated search engine; automatic categorization; multilingual web search
Alternative Projektnummern
(Englisch)
EU project number: -LE-8303
Forschungsprogramme
(Englisch)
EU-programme: 4. Frame Research Programme - 1.1 Information technologies
Kurzbeschreibung
(Englisch)
See abstract
Weitere Hinweise und Angaben
(Englisch)
Full name of research-institution/enterprise:
Eurospider Information Technology AG

Partner und Internationale Organisationen
(Englisch)
Italia Online (I) (Koordinator), CNR (I), CINET (S), Universität Dortmund (D), Gruner + Jahr EMS (D) (Aufnahme beantragt)
Abstract
(Englisch)
The Eurosearch project was focused on two main areas: the linguistic area for 'cross language (multilingual). web search' among federated search engines and the automalic categorization area for the Italian and German web document domains. The consortium has produced a number of working prototypes and has also created new online web services for the automatic categorization on Arianna (JOL) and Fireball (EMS) search engines as a direct exploitation of the project results.
In the linguistic area, three sub-prototypes for Italian-English, Italian-Spanish (Lexicon-based) and Italian-German (Similarity Thesaurus based) query translations have been implemented and integrated in the 'Final Prototype of integrated Multilingual Services'.
The prototypes have allowed the realisation of the concept of federated search engines. They have been implemented and experimented on Arianna and Eurospider search engines for the Italian and German web document domains and allow bi-directional query translations. An open cross-language architecture has been developed and successfully implemented for Eurosearch. This architecture is flexible enough to interact with multiple types/structures of search engines (Arianna, Eurospider, Altavista and Trovator) and to operate on different domains and with different indexing methods and query syntaxes. Different technologies have been integrated/developed and tested using the cross-language web search prototypes, such as Lexicon-based, Corpus-based and Similarity Translation base'.
The Similarity Thesaurus-based technology uses a data structure containing lists of terms based on their similarity. For use in Eurosearch, where the query has to be translated, multilingual similarity thesauri are employed. The multilingual variant connects words in the source language to similar terms in the target language. It has been extensively tested using TREC-style methods with encouraging results.
The implementation of the Lexicon-based technology has implied the set-up of a multilingual lexicon consisting of a set of bilingual Italian/English and Spanish/English dictionaries, with procedures that map between the English datasets in the different dictionaries, since English is used as an intermediate language to go from Italian to Spanish.
Corpus-based enhancements to the lexicon-based technology have been introduced by developing an experimental prototype that uses data extracted from document archives consisting of comparable corpora to expand queries with a vocabulary of related terms~
In the area of automatic categorization of web documents (Catalogue Generator) two prototypes capable of identifying relevant web sites and extracting a summary of their content have been implemented for the Italian and German web document domains. The automatic categorization of web documents is based on a probabilistic description-oriented representation of web documents. The automatic categorization prototype has been integrated into Arianna as an online service and later on ported to the German web domain through Fireball by EMS.
Datenbankreferenzen
(Englisch)
Swiss Database: Euro-DB of the
State Secretariat for Education and Research
Hallwylstrasse 4
CH-3003 Berne, Switzerland
Tel. +41 31 322 74 82
Swiss Project-Number: 97.0495