The Impact Of Non-Standard Words And Pronunciations On Text-To-Speech Quality
Wednesday, May 23rd, 2007In English news wire text, on average one word out of twenty is non-standard, i.e. not simply made up of letters from the English alphabet. Examples are abbreviations, numbers, dates, times, and other measures. On top of these non-standard words comes an endless list of (foreign) names of people, places, products or companies, whose pronunciation is unknown and potentially irregular. When confronted with unknown words, modern text-to-speech engines try to guess the correct pronunciation based on the written form of the word, sometimes with weird results.
In this post we show two - admittedly contrived - example sentences, each with two pronunciations. The first pronunciation is out-of-the-box, whereas the second one benefits from a manually edited pronunciation dictionary.
Here’s the first sentence:
Out of the box, it is pronounced like this: sentence1_orig
With the abbreviations ‘Maj.’ and ‘NYTimes’ expanded and phonetic transcriptions for ‘Netanyahu’, ‘Paypal’ and ‘GMail’, the sentence becomes much more comprehensible. Judge for yourself: sentence1_enhanced
Here’s a second sentence:
The out-of-the-box pronunciation goes like this: sentence2_orig
By expanding the abbreviation ‘Jlem’ (= ‘Jerusalem’) and pronouncing ‘odiogo’ as a word rather than as an abbreviation, the overall quality is greatly enhanced: sentence2_enhanced
Note that we decided not to expand the abbreviations ‘LLC.’ and ‘US’.
With this post we wanted to show you a few very simple examples of how Odiogo enhances the out-of-the-box quality of its speech synthesis engine. If you want to learn about more advanced ways of improving text-to-speech quality, fetch the White Paper “Turning news & blog articles into high-fidelity computer-generated audio” from our download page.