View Single Post
Old March 11 2014, 11:41 PM   #27
Re: How does the universal translator work?

It seems that you are suggesting that in addition to its basic function, the UT is also able to draw on some dataset that has archived a representation of how the articulation of the user's native language should visually appear when rendered, and is able to simultaneously track and then overlay this sensory input alongside the translated speech.
No. What I mean is that the human brain will automatically choose to believe in lipsynch. We need no machinery for that, no artificial sensory input. All we need is a distraction to keep us from consciously realizing that this Japanese movie character's lips don't actually form the English words "Prepare to die, honored fiendish fiend - Ha JAH!" - and the UT can provide such a distraction simply by "tickling" the brain the right way.

If so, would you draw a distinction in these two capabilities?
I would. The UT would take what the ear hears, analyze it like today's translation software analyzes signals picked up by a microphone, and alter it, and then feed it forward to the brain - that's the basic function. If necessary, it would also tickle the brain so that it doesn't notice that the lipsynch isn't perfect, and that the grammar doesn't quite work, and that there are many words missing - that's the relatively trivial second function.

However seamlessly and without appreciable delay it is made available to the user, this output has presumably not been altered from how it is actually presented, including of course being heard in the other person's voice rather than a synthesized recreation. This does not seem different in substance than a super realized version of an old school Mission Impossible style gewgaw that might have portrayed a running instantaneous translation capability piped in through a cleverly hidden earpiece.
I don't see any need to assume that the UT, situated either between the ear and the brain or then between the incoming noise and the brain (in this case sidestepping the ear altogether), would have to shy away from altering the incoming speech rather fundamentally - changing the voice from deep male to shrill female, say. I'd actually expect the UT to produce a fairly generic and unconvincing voice for the translation, as this would make its task easier: there would be no effort put to voice imitation, because the brain would eventually self-deceive itself into hearing something suitable anyway. A little bit of "tickling" would again help there.

Of course, the UT could be feeding its signals to the language center of the brain exclusively, leaving the timbre-oriented areas of the brain starved and allowing the brain to insert its own preferred timbre to the voice it does not really hear. All the brain is getting is the language, not the voice. A slightly more elaborate machine could tickle the brain into believing in a soothing female voice or a harsh Hungarian accent - not by actually simulating such things (remember, there's no voice coming in, only pure language), but by evoking the brain's own recollections of such things. One could then buy a suitable package of recollections; nothing as elaborate and fantastic as "Total Recall", merely a set of sound tapes one listens to, while the UT observes the reactions and writes some notes about the connections to specific brain activity for later use.

This requires an action being produced that allows either individual to see something that is not really present and presumably would not be captured by an independent view that is not also utilizing the same technology.
Yes, this would definitely be true of the lipsynch: if B hears A say English words, thanks to a UT inserted in the auditory nerve of B, he will fool himself into seeing English lip movements on A, especially if "tickled" - but if C, standing nearby, only hears A's original Japanese words, he will not see English lip movements. But that already happens naturally if you watch sufficiently many lipsynched movies in a row: your brain wraps itself around lipsynch without the need for special machinery (although e.g. booze helps).

Anyway, this is a confirmed feature of the UT: it allows B, C, D and E to hear the speech of A in their own respective languages simultaneously, without either B, C, D or E complaining about poor lipsynch. Explaining such things is best done if we ditch the idea of sound being manipulated (or else there'd be lots of overlapping and perhaps mutually destructive manipulation going on) and favor a model where the signal is being manipulated only after it has safely entered the user's noggin.

Timo Saloniemi
Timo is offline   Reply With Quote