Languages

Get News Update :
Home » » New technology converts web text to sound Text To Speech

New technology converts web text to sound Text To Speech

Penulis : Unknown on Friday, 7 June 2013 | 07:59

That is, "from the text-to-speech," is part of man-machine dialogue, allowing the machine to speak.

It is also applied linguistics and psychology distinguished for, the support of the built-in chip, through the neural network design, the text intelligently transformed into natural speech stream. TTS technology for real-time conversion of text files, conversion time can be short seconds. In its unique role of intelligent voice control, voice temperament smooth text output, making the listener feel natural when listening to information, there is no machine jerky voice output with a sense of indifference. TTS speech synthesis technology is about to overwrite GB one, two Chinese characters, with English interface, the automatic recognition, English, support Chinese and English mixed reading. All sounds using live Mandarin as the standard pronunciation, to achieve a 120-150 characters / sec fast speech synthesis, reading speed up to 3-4 characters / second, allowing users to hear the crisp and clear sound quality and fluency tone. There are a small part of MP3 player with a TTS function.


TTS is a speech synthesis applications, it is stored in computer files, such as the help file or web page, into natural speech output. TTS can help visually impaired people to read information on your computer, or simply be used to increase the readability of text documents. Now TTL driven applications including voice mail, and voice-sensitive systems. TTS voice recognition program often used together. There are many TTS products, including Read Please 2000, Proverbe Speech Unit, and Next Up Technology of TextAloud. Lucent, Elan, and AT & T has its own speech synthesis products.


In addition to TTS software, many businesses also provide hardware products, including Israel WizCom Technologies Company Quick Link Pen, which is a pen-shaped can scan devices can also read the text; also Ostrich Software company's Road Runner, a handheld the device can read ASCII text; another American DEC's DecTalk TTS, it is possible to replace the sound card external hardware device that contains an internal software device that can own a personal computer sound card to work together.


TTS resolution

TTS text to speech is very versatile, including e-mail reading, IVR voice prompts, etc., currently IVR system has been widely used in various industries (such as telecommunications, transportation, etc.).
TTS is a key technology used in speech synthesis (SpeechSynthesis). Early TTS generally use a dedicated chip, such as the Texas Instruments TMS50C10/TMS50C57, Philips PH84H36, etc., but is mainly used in household appliances or children's toys.
The PC-based applications TTS general use pure software implementation, including the following sections:
● Text analysis - linguistic analysis of the input text, sentence by sentence for vocabulary, syntax and semantic analysis to determine the lower sentence structure and composition of phonemes of each word, including the text of punctuation, word segmentation, polyphonic word processing, digital processing, acronyms processing.
● Speech synthesis - to put the processed text corresponding word or phrase extracted from the speech synthesis library, the linguistic description into speech waveform.
● rhythm processing - synthetic sound (Qualityof Synthetic Speech) voice synthesis system is the quality of speech output, generally from the definition (or intelligibility), and continuity of natural areas such as subjective evaluation. Definition is correct perception meaningful words in percentage; naturalness of synthetic speech quality used to evaluate whether the approaching sounds of human speech, the tone is naturally synthesized words; coherent statement is used to evaluate the synthetic fluid.
To synthesize high-quality voice, the algorithm used is extremely complicated, so the machine requirements are very high. Complexity of the algorithm determines the current multi-channel microcomputer concurrent TTS system capacity.


TTS CTI applications in the basic framework

In general the CTI application systems, there will be IVR (Interactive Voice Response System). IVR systems are an important part of the call center through the IVR system, the user can use the audio input by Kin phone information obtained from the system the number of pre-recorded or synthetic voice information. With TTS function of IVR services can accelerate the speed, cost-saving services to enable IVR provides callers 7 * 24 hours service.
Current common IVR systems are mostly generic IPC platform insert voice board composition, and support Chinese TTS speech synthesis technology.
A typical TTS services include telephone service processes can be divided into:
Users dial telephone, IVR response system, access to user keys and other information.
IVR based on the user's keystrokes, applications related data to the database server.
Text data to the database server returns IVR.
IVR through its TCP communication interface, will require the synthesis of text messages sent to the TTS server.
TTS server user text synthesized voice data communication interface via TCP segments sent to IVR server.
IVR server sends voice data assembled segmented into independent voice files.
IVR plays the appropriate audio file to the phone users.
General public network access (IVR) mostly using IPC + voice board, while the synthetic voice data is transmitted via LAN IVR. This structure is only suitable for simple applications....

Edit this paragraph TTS TalkTTS is a Text To Speech, text to speech, text to speech, almost a meaning. In the voice system development often use.
TTS on the market today are many ways to achieve a wide range, some very expensive, such as IFLYTEK, said 863 had been funded, there is a high technology; some relatively inexpensive, such as SinoVoice, InfoTalk ; also free, such as Microsoft TTS products.
Relative to the ASR (Automatic Speech Recognition, Automatic Speech Recognition), the realization of a TTS product need technical difficulty is not large, in my opinion it is a strength to live.
If we do a Chinese sentence able to read out the TTS, how would we do it?
There is a simple TTS, is to read out every word, you may ask, would not want to record six more than one thousand Chinese characters voice? Fortunately, very few Chinese syllables, many homonyms. The most we just need to record: the number of consonant vowel count × × 4, (in fact, not every pronunciation has four sound), this means a maximum of only a few hundred to record voice on it.
When the need for one in the synthesis of the corresponding Chinese character phonetic comparison table Pinyin input method also depends on this table, you can find online, but usually no four voice tone, big deal to add their own, huh, or how to say it is the strength to live .
Do so out of TTS effects also can be, especially reading some of the no special meaning, such as name, home address, etc. ticker sentences, it sounds clear enough. This is attributed to our great mother tongue are usually monosyllabic, beginning from ancient times, each character will have a word to express a meaning. And Chinese than English, English there are many tonal, rhythmic tone has changed dramatically, kanji much simpler.
Of course, you still have to deal with some of the details, such as the multi-tone words, the "Bank" read into "yin xing" is not right; another example, Punctuation, numbers, letters deal with these issues for many programs you wrote Of course not difficult.
Some domestic voice board with the TTS, either make money or free, and generally are doing it, that is, such an effect.
If you want to make things a little TTS effects, some more strength to live, the basic word into a voice recording, such as the common two words, four-character idioms, etc., and then make a thesaurus and speech database table, each need to synthesize When the thesaurus to find inside. So the word as a unit, than in words, the effect is much better. Of course, there are still technical, that is the technical word, to make complicated sentences broken into logical sequence of words, but also a bit technical. This new culture of those pioneers who have strange, had advocated the vernacular, the introduction of Western horizontal format, punctuation, when there is no space in the introduction of Spanish word. But even then the segmentation algorithm is not efficient, less accurate, it does not matter, as previously mentioned, Chinese characters are monosyllabic words, the sound together, and generally do no wrong.
Of course, IFLYTEK effort become more alive and dry, is said to have evolved to the usual sentence for the unit to record, we can imagine that this would cost more effort, in exchange for better results.
As for adding some convergence at the "word stuff", get some modification of the pitch, I think it is irrelevant to the overall improvement of the effect is not too large.
TTS also generally support the market commercialization Cantonese, Cantonese announcer please a recording, put the above efforts to live over again it wants.
Another aside, many people feel that the recording is best to find radio and television announcer, in fact, to find a female colleague around you to record, as long as the clear articulation on it. In some cases, an unusual sound than enunciate more lovely news network.
Come talk about the identity of the text, for complex text, the program can not handle certain content, you need to identify it. For example, a simple number "128" should be read as "one hundred twenty-eight" or "1.28"? Solution is usually added to XML annotation, such as Microsoft TTS: "<context ID = "number_cardinal"> 128 </ context>" pronounced "one hundred twenty-eight", "<context ID = "number_digit"> 128 </ context> "will be read as" January 28. " TTS engine can go to explain these dimensions. Unfortunately, the Voice XML annotation did not form completely recognized standards everyone is basically a set of their own.
Let me say TTS application programming, Microsoft TTS programming interface called SAPI, is a COM interface, developers it is still a bit cumbersome, but fortunately the MSDN Web site information is very comprehensive. Microsoft TTS although free, but its role is now a male Chinese, sounds a bit muddy, I feel uncomfortable.
General domestic manufacturers provides an API call interface is relatively simple and can be easily embedded applications go.
There concurrent TTS commercialization license restrictions is to limit the number of concurrent threads simultaneously synthesized, I think this restriction usefulness. Either TTS, you can convert text files into audio files for voice card player. Most applications relatively short sentences, generally not more than 100 characters, synthesis time is very short, get hold of the thread responsible for synthesis of other applications to the thread requests it wants, if the sentence is long, break it down into multiple it wants a short sentence, the playback speed is always slower than synthetic.
Too many applications are offline synthesis, no real-time requirements, even having to buy multiple licenses it.
More often, we do not even need to buy TTS, such as the development of a common cost voice calls and dialed after the broadcast: "Dear customer, you are this month's fee: 212 yuan," the front part are the same for all customers , wants to record a voice file, while the number of synthesis is very simple, you just have a good 10 digital voice recording, plus ten, hundred, thousand, million, plus the money unit "yuan."


Share this article :

Post a Comment

 
Company Info | Contact Us | Privacy policy | Term of use
Copyright © 2011. Latest Updates . All Rights Reserved.
Design Template by panjz-online | Support by creating website | Powered by Blogger