How AT&T Can Translate Your Speech in
Real Time
AT&T Translator, a
service on the company's teleconference system that translates speech between
languages in real time, is currently in pilot testing by some of the company's
biggest business customers. PopMech caught up with Mazin Gilbert, assistant
vice president for technical research at AT&T Labs–Research, to learn about
the challenges of teaching machines to understand human speech.
Q:
Machine-based language
translation has been a longtime dream of science-fiction authors. C-3PO, after
all, was fluent in more than 6 million forms of communication. What inspired
your researchers to develop AT&T Translator?
A:
Language is one of the
largest barriers to communication globally. In the 1980s, we produced a short
film of what communications would be like in the future. We had a vision that
at some point in our lifetime there would be some intelligence in the network
where you could pick up the phone and talk to anyone in the world regardless of
the language you spoke.
Q:
How did you turn that
vision into a reality?
A:
The technology is a
product of more than two decades of research at AT&T in speech recognition,
speech synthesis, and natural language processing. There's nothing like this in
the world of enabling multiparties to converse in real time across languages.
It requires tremendous expertise in linguistics, machine learning, speech, and
signal processing that we have at AT&T.
We demonstrated the first prototype of English-to-Spanish translation in the lab in 1988 (and continued to research and refine the technology). But given that we're a communications company, it fits into our business nicely and that's why we're focused on pushing it out to the market.
We demonstrated the first prototype of English-to-Spanish translation in the lab in 1988 (and continued to research and refine the technology). But given that we're a communications company, it fits into our business nicely and that's why we're focused on pushing it out to the market.
Q:
What is the user
experience like?
A:
You call into a
conferencing service. Your user and audience (can be any place in the world).
You set your preference for native language (or languages), [and] what you hear
or read is that speaker in your native language. You can speak in your language
and they will receive it in their native language too. It's really very
transparent.
Q:
Which languages does the
translating system currently understand?
A:
English, French,
Italian, German, Spanish . . . and Chinese, Japanese [and] Korean, all from
speech in and out, [and] 12 other languages from text which we will roll out to
speech over time.
Q:
What happens when the
person talking in Spanish suddenly switches to English to read out loud a
street address?
A:
We can deal with that.
It's not a simple problem to identify what language a person is speaking. But
one of the technologies we have is identifying the language as you speak. So
when you change your language (which is not uncommon), then we are able to
detect that.
Q:
How many steps does it
take to translate in real time?
A:
There are many
components to this problem. To do multiparty communication, you need people who
understand how to do really high-quality speech recognition. Then you need a
team to translate that to the target language. Since we don't know what the
conversation is going to be about, we have to worry about scale: unlimited
vocabulary, [and] the words may be in more than one language.
There are huge numbers of parts back and forth. And you need a team that can work on text-to-speech. Finally, they (the end user) have to hear it in a compelling voice that doesn't sound like a machine is talking.
There are huge numbers of parts back and forth. And you need a team that can work on text-to-speech. Finally, they (the end user) have to hear it in a compelling voice that doesn't sound like a machine is talking.
Q:
What's the hardest part?
A:
There are many, many
challenges. The hardest part is the real-time nature of this. You have to
recognize the language, transcribe the language, translate the language, and do
it while the person is talking. The processing power this takes is enormous. It's
(also) a very expensive endeavor.
Q:
How does this compare in
quality to other automated translators, like Google Translate, the robot that translates Web pages into
another language? Google's service doesn't always deliver smooth-reading
article translations.
A:
A smooth delivery of
translation is certainly a quirk to many translation services, given the
complexity of language, and no system is perfect today. What makes (AT&T Translator) a strong competitor in speech and
language services is that it is powered through the cloud, which provides lower
latency and faster results for the end user.
It also uses machine-learning technology, meaning that its accuracy in speech and language services, including translation, improves every time the system is used. We've invested decades in research and development of speech technologies and have more than 600 U.S. patents and additional patent applications—in part to develop more natural-sounding speech that provides a smoother user experience.
It also uses machine-learning technology, meaning that its accuracy in speech and language services, including translation, improves every time the system is used. We've invested decades in research and development of speech technologies and have more than 600 U.S. patents and additional patent applications—in part to develop more natural-sounding speech that provides a smoother user experience.
Q:
How else is this product
different from the robotic voices we're used to hearing on airport shuttles,
GPS mapping devices, and the like?
A:
One of the things we're
working on is to make the (translated) voice sound like the speaker speaking in
a different language. Intonation [voice quality] carries a lot of information .
. . and you want to take and convey that to the listener. That is our goal:
that it will sound like you.
Q:
Where is this technology
headed? Might we see a real C-3P0 one day?
A:
We envision a world where language is no longer
a barrier for communication, whether for entertainment, health care, language
learning, education, hospitality or just conferencing.
Communication will happen across any device. You will be seeing people interacting across languages, and speaking your language through the magic AT&T network where the intelligence resides. You will be able to watch any program on demand in your own native language, and do that anywhere in the world. The opportunities are endless.
Communication will happen across any device. You will be seeing people interacting across languages, and speaking your language through the magic AT&T network where the intelligence resides. You will be able to watch any program on demand in your own native language, and do that anywhere in the world. The opportunities are endless.
No comments:
Post a Comment