Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> the German translation produced was very poor quality compared to the English one

They would have selected the best possible language pair for this demo, so we should expect it to have done very poorly if they picked, say, English<->Mandarin.

What stands out is that they didn't pick Spanish as the other language.

It's pretty universally agreed that Spanish[1] is the easiest language for English speakers to learn, and Portuguese is in the same ballpark[2], but German is significantly harder[3]. (And Russian, Chinese, and Arabic would be way to the right on an exponential graph.)

I'm guessing that machine translation of English<->German, for some reason, must be easier than English<->Spanish.

[1] There's a fairly authoritative study on this which I can't find it immediately.

[2] Just from personal experience I find that English<->Portuguese with Google Translate is astonishingly good (in either direction): http://brazilsense.com/index.php?title=Getting_by_with_just_...

[3] The difficulty of German vs Spanish is confirmed by an NSA (!) document that says that "Next to Vietnamese, German may be the most difficult for English-speaking students to learn for German has a difficult syntactical feature, the discontinuity of the predicate, which the others lack. Among French, Italian, and Spanish, there also seems to be only a slight difference in difficulty. It appears that these three are the easiest languages for English-speaking students to learn": http://www.nsa.gov/public_info/_files/cryptologic_spectrum/f...



A little bit more than last year they did an impressive* demo from English -> Mandarin. There were slight errors that sometimes flipped meaning (but only slightly hindering, which made it more believable). This seems to be a first step along productization of that research.

http://blogs.technet.com/b/next/archive/2012/11/08/microsoft...

*Well relatively, it would be super lame when compared to the kind of tech one would find in a child's toy aboard a starship.


What you are discussing is called linguistic topology and it is far from estalished fact that Spanish is the easiest. I studied the topic but from the perspective of Arabic language instruction for native speakers of English and other languages.

It might be up there, but I have also heard things like Indonesian, which has the simplest grammatical structure. Chinese, speaking of morpho-syntax, is some ways as easy or difficult as English (tense, gender, and number are not more difficult than English, in my opinion, having studied Arabic to fluency and Chinese at the beginner level).

To get back on topic, topologies like this are good for focusing on which specific constructs will cause difficulty, but which is easiest to learn.


Your NSA quote is a little out of context. The article you link to states that German is the fourth-hardest language from a list of five comparatively easy languages.

> Among Vietnamese, German, French, Italian and Spanish, Vietnamese may be the most difficult... Next to Vietnamese, German may be the most difficult

This doesn't amount to being significantly harder, particularly in light of the statement (that you quoted) that there is just one feature that makes it harder than the other three.

Speaking from personal experience (native English speaker, no prior difference in exposure to the two languages, simultaneous study of the two, similar teacher quality and curriculum), I found French harder than German. Although my single anecdote doesn't prove German to be easier, surely it suggests that the one I found easier couldn't be significantly harder.


While English <-> Spanish shares the romance language cognates (generally speaking, higher English comes from French, which shares a common ancestry), English <-> German shares grammar.

I imagine the vocabulary isn't as difficult for machines to process as grammar.


>which shares a common ancestry

English shares no ancestry with French. However, in 1066 England was invaded by the Normans, leading to the entire aristocracy and upper classes speaking Norman French, causing a lot of French vocabulary to enter the English Language. Most of these words are for stuff in higher registers, though. The basic vocabulary in English is entirely Germanic (it's nigh-on impossible to write a sentence with only French words in English), while much of the more advanced or formal stuff is French (or Latin or Greek).


The FSI[1] has ranked languages according to how easy it is for English speakers to learn. There are 10 languages in Category I (the easiest to learn)[2]:

Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish

[1] http://www.state.gov/m/fsi/

[2] http://web.archive.org/web/20071014005901/http://www.nvtc.go...


Maybe because they didn't want to make it so obvious that the "other language" algorithm is much poorer than for English. A lot more people would pick-up on that fact, if they used Spanish, since a lot more people know Spanish.


Completely guessing - but I think the strict structure of German makes it easier for machines to translate? English and Spanish on the other hand break more of their rules than they stick to, so it makes it more "human" / "natural" and easier to relate to?


There probably were too many people in the room who speak Spanish. The German part of the conversation is slightly scary, to some degree due to her facial expressions. I'd really like to know what somebody who doesn't know German thinks about the faces she makes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: