Partner und Internationale Organisationen
(Englisch)
|
AT, BE, CY, CZ, DK, FI, FR, DE, EL, HU, IT, LT, NL, NO, PT, SK, SI, ES, SE, CH, TR, UK
|
Abstract
(Englisch)
|
This project is about the development of new approaches towards multimodal communication systems involving speech and visual processing. In the framework of COST 278, IDIAP mainly investigated new approaches towards the processing and combination of non-stationary and non-synchronous streams of data, typically resulting of the joint use of audio and visual information. More specificall, fusion algorithms based on entropy minimization have been further developed (see, e.g., (1)). New forms of hidden Markov models able to deal with correlated asynchronous information sequences have also been developed and were successfully tested on audio-visual speech recognition problems (see, e.g., (2)). Building upon these developments, new forms of truly multimodal user tracking algorithms using both visual tracking (based on particle filtering) and audio localization (using microphone arrays and used to initialize the visual tracker) have been developed.; see, e.g., (3). Finally, some of the resulting algorithms were also used to model multi-modal human interaction (involving audio and video features) in meetings. In 2005, in addition to the above, we mainly focused on new approaches towards the extraction of information from multimedia meeting collections and hierarchical multi-channel processing, which is further expending the possibilities towards advanced multimodal communication (6).
|