Text to Speech
Welcome to the Text to Speech Thesis Project home page!
I am currently a 4th year Computer System Engineer creating a text to speech database for my thesis. The database will consist of diphones and will utilise the MBrola speech engine for text to speech conversion.
The spoken language can be broken into its constituents, similar to a material where the atom is its building block. In speech, a phoneme can be regarded as a fundamental building block of speech production. However, phoneme based databases usually sound disjointed. This is due to phoneme based databases only 'voicing' distinct phonemes whereas in human speech production the transition of a phoneme to another is a gradual process. Diphones mimics this gradual transition as they are basically the end half of one phoneme joined with the beginning half of another phoneme. Thus diphones can produce a 'sound' that can be concatenated with others to create the appropriate, and often realistic words.
It is hoped that I will be able to create a database to generate about 500 words with good intelligibility. This
database will then be processed through the MBrola Speech engine to generate the required words. The next step is then
to compare the MBrola Speech engine with one produced by myself and place a front page for either the MBrola or my speech
engine.
Last updated 13/03/97 (will be improved soon)
![]()
