ServicenavigationHauptnavigationTrailKarteikarten


Research unit
EU RFP
Project number
97.0495
Project title
EUROSEARCH: Multilingual European federated search service

Texts for this project

 GermanFrenchItalianEnglish
Key words
-
-
-
Anzeigen
Alternative project number
-
-
-
Anzeigen
Research programs
-
-
-
Anzeigen
Short description
-
-
-
Anzeigen
Further information
-
-
-
Anzeigen
Partners and International Organizations
-
-
-
Anzeigen
Abstract
-
-
-
Anzeigen
References in databases
-
-
-
Anzeigen

Inserted texts


CategoryText
Key words
(English)
Cross-language information retrieval; federated search engine; automatic categorization; multilingual web search
Alternative project number
(English)
EU project number: -LE-8303
Research programs
(English)
EU-programme: 4. Frame Research Programme - 1.1 Information technologies
Short description
(English)
See abstract
Further information
(English)
Full name of research-institution/enterprise:
Eurospider Information Technology AG

Partners and International Organizations
(English)
Italia Online (I) (Koordinator), CNR (I), CINET (S), Universität Dortmund (D), Gruner + Jahr EMS (D) (Aufnahme beantragt)
Abstract
(English)
The Eurosearch project was focused on two main areas: the linguistic area for 'cross language (multilingual). web search' among federated search engines and the automalic categorization area for the Italian and German web document domains. The consortium has produced a number of working prototypes and has also created new online web services for the automatic categorization on Arianna (JOL) and Fireball (EMS) search engines as a direct exploitation of the project results.
In the linguistic area, three sub-prototypes for Italian-English, Italian-Spanish (Lexicon-based) and Italian-German (Similarity Thesaurus based) query translations have been implemented and integrated in the 'Final Prototype of integrated Multilingual Services'.
The prototypes have allowed the realisation of the concept of federated search engines. They have been implemented and experimented on Arianna and Eurospider search engines for the Italian and German web document domains and allow bi-directional query translations. An open cross-language architecture has been developed and successfully implemented for Eurosearch. This architecture is flexible enough to interact with multiple types/structures of search engines (Arianna, Eurospider, Altavista and Trovator) and to operate on different domains and with different indexing methods and query syntaxes. Different technologies have been integrated/developed and tested using the cross-language web search prototypes, such as Lexicon-based, Corpus-based and Similarity Translation base'.
The Similarity Thesaurus-based technology uses a data structure containing lists of terms based on their similarity. For use in Eurosearch, where the query has to be translated, multilingual similarity thesauri are employed. The multilingual variant connects words in the source language to similar terms in the target language. It has been extensively tested using TREC-style methods with encouraging results.
The implementation of the Lexicon-based technology has implied the set-up of a multilingual lexicon consisting of a set of bilingual Italian/English and Spanish/English dictionaries, with procedures that map between the English datasets in the different dictionaries, since English is used as an intermediate language to go from Italian to Spanish.
Corpus-based enhancements to the lexicon-based technology have been introduced by developing an experimental prototype that uses data extracted from document archives consisting of comparable corpora to expand queries with a vocabulary of related terms~
In the area of automatic categorization of web documents (Catalogue Generator) two prototypes capable of identifying relevant web sites and extracting a summary of their content have been implemented for the Italian and German web document domains. The automatic categorization of web documents is based on a probabilistic description-oriented representation of web documents. The automatic categorization prototype has been integrated into Arianna as an online service and later on ported to the German web domain through Fireball by EMS.
References in databases
(English)
Swiss Database: Euro-DB of the
State Secretariat for Education and Research
Hallwylstrasse 4
CH-3003 Berne, Switzerland
Tel. +41 31 322 74 82
Swiss Project-Number: 97.0495