En-tête de navigationNavigation principaleSuiviFiche


Unité de recherche
COST
Numéro de projet
C01.0063
Titre du projet
Improved Signal Generation Techniques for Speech Synthesis
Titre du projet anglais
Improved Signal Generation Techniques for Speech Synthesis

Textes relatifs à ce projet

 AllemandFrançaisItalienAnglais
Mots-clé
-
-
-
Anzeigen
Programme de recherche
-
-
-
Anzeigen
Description succincte
-
-
-
Anzeigen
Autres indications
-
-
-
Anzeigen
Partenaires et organisations internationales
-
-
-
Anzeigen
Résumé des résultats (Abstract)
-
-
-
Anzeigen
Références bases de données
-
-
-
Anzeigen

Textes saisis


CatégorieTexte
Mots-clé
(Anglais)
Signal processing; speech synthesis; harmonics & noise modelling
Programme de recherche
(Anglais)
COST-Action 277 - Non-linear speech processing
Description succincte
(Anglais)
See abstract
Autres indications
(Anglais)
Full name of research-institution/enterprise: Université de Lausanne Section d'informatique et méthodes mathématiques Quartier UNIL-Dorigny
Partenaires et organisations internationales
(Anglais)
AT, BE, CZ, FR, DE, EL, IE, IT, LT, PT, SK, SI, ES, SE, CH, UK
Résumé des résultats (Abstract)
(Anglais)
During 2005, research conducted in association with COST 277 was oriented towards two issues in speech synthesis that showed promise of contributing to the improvement of the naturalness of synthetic speech. In a project conducted by B. Zellner-Keller, distributional parameters in spectral and amplitude profiles were studied for 33 male speakers of the British MARSEC corpus (paper presented/published at NOLISP'05/COST 277 in Barcelona, and in London at PEVOC). Cluster analysis identified four distinct patterns in the use of fundamental frequency, but only an undifferentiated pattern for amplitude. This result showed that the widely-followed principle of 'wishing to generate a multiplicity of synthetic speakers from a single speaker data base' will run into some major limitations with respect to fundamental frequency. Several data bases will be needed. In a second project, E. Keller obtained new data on regularity and variation in temporal structures in speech. Evidence indicated that human speakers show very strong inter-speaker agreement with respect to the placement of strong vowel onsets, but that inter-speaker agreement is much less for weak vowel onsets. Strength or weakness was determined automatically from the acoustic signal, and was likely to correlate with both motor and perceptual saliency within the utterance. Simulations of speech synthesis with correction for this differentiation led to perceptible improvements in synthetic naturalness. This suggests that current statistical or neural network models for temporal structuring of speech may well be fundamentally flawed. Instead of a rigid, totally predictable structure, temporal prediction systems should probably provide a set of main 'temporal anchor points' within the utterance, and introduce 'motivated variation' for the remaining aspects of temporal structure. This research was presented at the COST 277 Training School in Heraklion, Crete, in September 2005 and at ESSP'05 in Prague, Czech Republic in September 2005. Altogether, the scientific advances obtained over the past four years in this research group can be said to have contributed substantially to the empirically closer and more precise characterization of natural models in speech synthesis, in order to represent the full diversity of human speech realizations. This work is expected to contribute to the establishment of a better empirical foundation for the generation of a large variety of speech synthesis models required in the next generation of speech synthesis devices.
Références bases de données
(Anglais)
Swiss Database: COST-DB of the State Secretariat for Education and Research Hallwylstrasse 4 CH-3003 Berne, Switzerland Tel. +41 31 322 74 82 Swiss Project-Number: C01.0063