posted Aug 10, 2006

den Beste at Chizumatic (likely permalink) is annoyed by this article about linguistics and pronunciation.

The researchers took the sounds of more than 3,000 words in English and subdivided each by its phonetic features — in other words, what a person does with his mouth, lungs and vocal cords to produce the sounds of each word.... About 65 percent of all nouns have another noun as its nearest neighbor, and about the same percentage of all verbs have another verb next door, Christiansen said.

I don't find this too surprising. Unfortunately, the full study is behind a subscription barrier, so I can't examine it too closely.

However, if you asked me in advance whether the nouns and verbs would bunch, I would have expected them to. Without claiming a cause or effect relationship in either direction, human languages usually have a canonical sentence order, the largest aspect of which is the subject, verb, and object order. Languages also have a fairly regular cadence to them; I could never describe it in words, but you can identify Hindi, Spanish, Chinese, and a number of other languages just by their cadence if you've been exposed to them before, even without knowing a single word. It's probably not enough to distinguish two related languages, but usually if you hear someone speaking it's a good guess that it's one of the top 10 languages. Of the seven of them I know I've heard, I'm pretty sure I could identify them, despite using only one and having studied only one other (French).

Putting together the fact that there is a fairly regular cadence and a canonical sentence order, it is reasonable to expect that nouns and verbs will be acoustically clustered. If they weren't clustered, the language's cadence would be much more random, and the effect would be that it would be harder for speakers to break apart the parts of a sentence. This processing takes place well below concious thought, so we can't observe it in ourselves. This would be far from the only way that human languages are surprisingly optimized for recognition in ways that no single human could ever hope to design into such a system.

Unfortunately, the study was conducted in only one language; hopefully they will extend it to other languages. My theory would be that the mechanisms above cause acoustic clustering, but that there is no universal "verb" cluster or "noun" cluster, and any trends in the clustering will have more to do with what sounds humans can produce comfortably than the type of word. In other words, it isn't that the type of sound helps you identify the meaning, but that the meaning helps drive the type of sound. The clusters probably wander over time, and as languages evolve there would eventually be no particular relationship where various clusters are in differing languages. (I wouldn't even be surprised that they can trace evolution paths, similar to the way there seem to be defined patterns of language change.)

Since the actual study is locked behind a subscription barrier, and this is exactly the sort of thing I wouldn't expect a journalist to understand, I wouldn't be at all surprised if the paper just points out there's an interesting correlation and at most speculates about the cause. Without data from multiple languages, it's premature to claim anything in particular about what might be causing this.

As for den Beste's specific objections: I'm not sure where he sees Chomskyism. I'd expect the clusters to move in different dialects, but neither strengthen or weaken particularly. And given that we're just talking trends, and that verbs were only closest to another verb 65% of the time, I'd expect the nasty words that are both verbs and nouns to be some of the exceptions. (Assuming they even used them; they may have tossed them for that reason.)

Update: I've sent him an email which might as well be posted here, too:

Actually, by "subject, verb, object" order, I meant that within a language there should be a canonical ordering, whatever it may be. And I use "canonical" because of course there are always exceptions. I'm not referring to SVO, but to the ordering of the S, the V, and the O.

I've actually been studying Japanese now for a few months too, so I'm aware of SOV differences. Also, French puts the adjectives after the nouns which can change the order, and from what I gather this isn't unusual either, so there are variation beyond SVO/SOV and so on.

In fact, Japanese is what made me think this up in the first place, because with all the verbs ending in the same endings, you'd get *way* more clustering in Japanese than English.

And I wouldn't take it seriously either until they do it with more languages either; that's why I offer an alternate interpretation of the same results, which does after all highlight by construction that their interpretation is not unique.

As for Chomskyism in general, the only thing that impressed me about it at one point is some studies mentioned by Steven Pinker's "The Language Instinct" that suggested it was possible there was a biological basis to the idea. That's the sole reason I have to think it's even remotely plausible, and I couldn't tell you if the study has been followed up on or debunked since then. Otherwise, I find the argument that it makes no falsifiable claims pretty compelling. (It's not really news that you can take an SVO sentence and re-order it to SOV, especially when you seem to be able to freely delete or add any grammatical particles you need to "make it work", and it seems if you can't find the grammar pieces you're looking for, you can simply assert they are there, just hiding.)


