This paper is awful. They bizarrely argue the fact that transformers are not very sensitive to word order as a positive of transformers despite the fact that's not how languages work. There's also this absurd passage.
>However, a closer look at the statistical structure of language use reveals that word order contains surprisingly little information over and above lexical information. To see this intuitively, imagine we give you a set of words {dogs, bones, eat} without telling you the original order of the words. You can still reconstruct the meaning based entirely on (1) the meanings of the words in isolation and (2) your knowledge of how the world works—dogs usually eat bones; bones rarely eat dogs. Indeed, many languages show a high level of nondeterminism in word order (Futrell et al., 2015b; Koplenig et al., 2017), and word order cues are often redundant with meaning or case markers (Pijpops and Zehentner, 2022; Mahowald et al., 2023). The fact that word order is relatively uninformative in usage also partly explains why bag-of-words methods dominated NLP tasks until around 2020, consistently outperforming much more sophisticated approaches: it turns out that most of the information in sentences is in fact present in the bag of words.
While it is certainly possible to guess that your interlocutor meant "dogs eat bones", the sentence "bones eat dogs" is entirely possible (if unlikely)! For example, imagine a moving skeleton in a video game or something. The idea that word order isn't vital to meaning is deeply unserious. (Of course there are languages where word order matters less, but there are still important rules about constituent structure, clausal embedding etc, which constrain word order).
I don't see how it's absurd. The point isn't that languages don't have word order constraints, it's that they're organized in such a way that word order is usually redundant with other things when it comes to expressing meaning. This redundancy is a real and nontrivial property of language which becomes obvious when you analyze it in a statistical and usage-based way, but which isn't so clear if you're focused only on the categorical formal structure.
The redundancy is not a property of language it's a property of communication. This just speaks to the pseudoscientific nature of usage based linguistics; it's as if physicists claimed that heavier objects fell faster than lighter ones because a piece of paper will fall slower than a brick. You can't just look at the most frequent cases, you need to look at the actual edge cases to understand a phenomenon.
>The redundancy is not a property of language it's a property of communication.
Meaningless semantics. Language is communication. It only exists and evolved to allow for more complex communication. Thinking of them as 2 separate things is just nonsensical.