A probabilistic approach to select units based on acoustic similarity for speech synthesis (MS)

Show simple item record

dc.contributor.advisor Dr. Anil Kumar Sao
dc.contributor.author Babu, Anjana
dc.date.accessioned 2020-12-12T05:18:48Z
dc.date.available 2020-12-12T05:18:48Z
dc.date.issued 2015-10-03
dc.identifier.uri http://hdl.handle.net/123456789/341
dc.description A dissertation submitted for the award of the degree of Master of Science under the guidance of Dr. Anil Kumar Sao (Faculty, SCEE) en_US
dc.description.abstract One of the major challenges in Text-to-Speech Synthesis Systems (TTS) is the incorporation of prosody in the synthesised speech. Various techniques based on linguistic characteristics have been proposed in the literature for improving prosody. However, prosody in TTS is still an open problem and this is addressed in this thesis. In this work, approaches are proposed for improving the naturalness, intelligibility and prosody. It is performed by selecting the sequence of sound units from a large corpus in such a way that acoustic features are made consistent at (a) segmental level, and (b) supra-segmental level. At segmental level, the differences in acoustic features of sound units are reduced over the entire utterance to improve naturalness and intelligibility. At supra-segmental level, units are selected by ensuring the consistency in the differences in acoustic features of adjacent syllables at phrase level. In this method, consistency of acoustic features is also maintained at utterance level. Unlike the existing USS based TTS, which rely mostly on linguistic information for improving prosody, the proposed approach makes use of acoustic information. Probabilistic approaches are proposed for selecting units based on an acoustic framework. Since the context is only specified by acoustic features, the proposed approaches can be applied to any language and perhaps even for multilingual synthesis. The experimental results of the proposed approaches are demonstrated using five Indian languages. It was observed from the subjective evaluation tests that f0 contributed to the naturalness of the systems whereas duration and energy helped in improving the intelligibility of the systems. Also, ensuring consistency of energy and f0 across syllables in phrase and duration across syllables in utterance further improved the prosody. en_US
dc.language.iso en_US en_US
dc.publisher IITMandi en_US
dc.subject Energy Modification en_US
dc.subject Probabilistic Approach en_US
dc.title A probabilistic approach to select units based on acoustic similarity for speech synthesis (MS) en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IIT Mandi Repository


Advanced Search

Browse

My Account