Bayesian Filters

2004-01-14

Slashdot recently had an article on gibberish in spam. I posted a comment about my work on Bayesian, which reminded me I need to update that. This post will have to do.

First, you'll note a cranky tone in the Slashdot postings. Would you believe that Bayesian filtering has fanboys? A lot of people seem congenitally incapable of reading something about Bayesian once they get the faint idea that I may be a little critical of them. (Completely over their head is the distinction that I'm not critical of Bayesian per se, but of the idea that it will solve the spam problem once and for all.) Instead of reading the words, they seem to suffer some sort of strange vision ailment that renders them incapable of seeing anything but the phrase "Bayesian Filters are bad."

I used to blame my writing for any misunderstanding the reader may have had, but at some point, you just have to hold the reader accountable, you know? When they can't seem to read the plain English in front of them because they're too busy jumping to conclusions?

Speaking of "can't read plain English", I am both regretful and pleased to announce that I was probably wrong. Assuming you are willing to call Bayesian filters "widely deployed" (which I'm willing to stipulate, though it's a sematic issue and I can't claim to have deployment statistics; it is clearly having an effect on spammer countermeasures so they must be feeling it), it is the case that they are still working, even after six months. So I was wrong on how quickly they would sink.

But I would note that I'm yet to see the attacks I outlined used; instead I see just random word attacks, which really won't work. Now, I know at least a few spammers have read that piece, and I get a hit for "bypassing Bayesian filters" from Google at least once a week; surely at least a few of those are spammers with ill intent. Fortunately, to date, none of them have been bright enough to figure out what I was saying, in what I thought was plain English, and managed to implement this attack.

I freely admit that I have seriously overestimated the intelligence of the spamming community. If there is an upper limit to spammers intelligence, the anti-spam war may have some hope after all.

I need to update the piece with some of this information but it may be a while; in the meantime I think I'll just link to this post from there.