With a latency of just 0.5 seconds, smartphones can now convert text to speech in 21 languages, thanks to the National Institute of Information and Communications Technology (NICT).
This solution synthesizes one second of speech in just 0.1 seconds using a single CPU core—about eight times faster than conventional methods. This implies that a typical mid-range smartphone will be able handle the required processing on its own, without needing an internet connection or external resources.
This technology is publicly available and installed on NICT’s VoiceTra, a multilingual speech translation app for smartphones. NICT also anticipates future applications in car navigation and other speech services through commercial licensing.
Additionally, NICT is working on multilingual simultaneous interpretation technology, where translated speech is generated continuously, without waiting for the speaker to finish. This will require even faster text-to-speech technology to achieve real-time machine interpretation.
Submitted by Jane Gifford