Spam Filtering's Last Stand, Part Two

Two people have now expressed the opinion that I am underestimating the advantage of personalization, and the power of statistics. It's worth replying to, I suppose, since that's two out of three. You can see why I left this out of an already-long weblog post before.

First, I have built Bayesian filters before, so I'm not completely ignorant about how they work. I'm not an expert, but then, right now I don't know that anyone is, since there's a lot of application-specific tuning that must be done. Still, I rest my argument on general principles, not specifics.

I think this was most clearly said by Jerry Kindall:

But at the root of it is the simple fact that Bayesian filters are by nature individualized. Just because a spammer manages to get a message past his own Bayesian filter doesn't mean squat about whether he'll be able to get it past mine. My corpus contains the names of my friends and topics we frequently talk about.

The problem is, with no disrespect intended to Jerry, that's just plain wrong. Like I said in the previous message, these filters are not independent; if they are, you could just string them together and the problem would have long since been solved. If a spammer builds a nice, generalized filter based on a broad cross-section of good and bad emails, and he crafts a message that gets past his filters, it must get past your filters too with a very high probability, I'd guesstimate 80%-90% at the lowest.

Why? This will be a little involved, so let me start with a conceptual framework. If you take a hypothetical average of all reasonable spam filters written for people in the same language/dialect, there will be a common core that will eventually end up being considered neutral by the spam filter, as simply "using English" is not a good determiner of whether a message is spam or not.

In order for the filter to determine if something is spam, it must look for the distinguishing characteristics of the message. This is what sets it apart from "normal messages" in English. If you're talking about your family, it might be your family's names and an unusual concentration of relative designators (cousin, brother, aunt, etc.). (Of course you're filter won't explicitly understand the categories, but it can get an idea.) On the other hand, if I sent you the admittedly silly message "The stuff which an the that I think.", the filter would have virtually nothing to make a decision with, because those are common to everything. Unfortunately, Bayesian analysis isn't smart enough to determine the core perfectly, which would require understanding the text. Rather, the useless middle just sort of develops over time from the way the algorithm plays out.

So you can consider how far a message "juts out" from that normal core words of English as what the Bayesian filter has to work with. (It actually works with phrases, but the same basic concepts hold there, too, with "core phrases" and distinctive ones.) And since this is a dumb computer algorithm and doesn't understand the semantic content of the message, this is really all the algorithm has to work with, whereas you, the human, understand the message.

Your "personalization" is some things that stick out that often indicate good messages for you: Names, places, companies you work for, stuff like that. (Note this even assumes that you can tell your filter that a given message is not spam (or at least misidentified as spam); if you only tell it what is spam, you'll never actually be able to personalize your filter significantly.) But it is only a few things, here and there, a vanishly small part of the total knowlege the filter represents, unless you routinely use an odd dialect. (You could easily create a Bayesian filter that only allowed 'leetspeak, or Cockney, for instance.)

Key paragraph: With this formulation of the problem, you are now in the domain of statistical discrimination, with all the standard tradeoffs and benefits thereof. If spammers start sending messages that don't "stick out" much, as I was alluding to yesterday, yes, you can in theory catch them. But there is a standard price to pay for making a filter more sensitive: You must create more false positives. Not "may", not "might", "must". It is inevitable in the very nature of the way it works. Thus, if a message is marked as "good" by the average filter, your filter must also share many of the same decision characteristics with this average filter (or your filter is totally out of whack with the rest of the world), and so it is quite likely to get past yours. The alternative is that there is some characteristic you have weighted as excessively likely to be spam, in which case you must be kicking out other good messages that have those characteristics themselves.

It is easy to see that the 99.9%+ accuracy that some earlier adopters are crowing about will not last long; it does not take much to get into merely 90% accuracy, then it's a pretty quick tumble to 50% (i.e., random chance) accuracy.

Your personalization in fact really only comes into play inasmuch as you prevent some false positives by the filter, but it does nothing whatsoever to filter spam that doesn't "jut out" very much; in fact it provides an avenue for the spammer to either get lucky, or exploit the personalization if you've been tracked online well enough to guess at some of the personalization words. (For one easy example, it's a good guess that your name will be a personalized example; if an email has your name in it, is that spam or not? Or your email address? And we're talking about walking the thin line between rejection or acceptance, so each little influence can count, unlike current spam where it is abundently obvious it is spam.)

I would say that I do not underestimate statistical techniques; with all due respect, I have to return the charge and say that there's an awful lot of unjustified ethusiasm by people who don't understand that these statistical techniques are not magic. They're very easily characterizable mathematically, have known weaknesses, and are simply not powerful enough to work in a actively hostile environment. (No known AI technique is.) Note that I've explained this in colloquial terms because this isn't a mathematics conference, but "jutting out" (deviation of the message from the hypothetical average message vector) and "the average of all good filters" (an average of the vector that characterizes the Bayes-type filter across many real people's filters) are actually reflections of the real mathematics of how these things work. Unfortunately, you don't have to understand how these things works to snarl them up, only to build them in the first place.

There's just not enough data in words or short phrases for the filters to meaningfully seperate "spam" from "non-spam", as the spam looks more and more like non-spam. Again, I'm willing to stand by this in the future.

I hate to do this, but let me give a concrete attack on a Bayes filter that I think can't be blocked by the filters without effectively tossing everything out as spam: For the math bunch in the crowd, consider creating an "averaged" Bayesian filter (any subtype you care, though I assume phrases of 2 or 3 are used, not words) by running the algorithm over many thousands of common messages. Extract the probabilities the Bayesian filter creates for itself and place that in a Markov-type matrix, where the transition probabilities are the inverse of the Bayes' filter probabilities for the message being spam. Use that to assist the human spammer in creating a message by guiding the spammer in their next word or phrase choice to select a maximally "good" choice, or by giving them a high-quality skeleton to fill in. Because of the flexibility of the English language, it will still be possible to work out a pitch that will appeal people, even out of this somewhat limited domain. If done correctly, it is entirely possible that this spam message will rate as "less spammy" then many or most real e-mails!

At this point, the recipient must either pass such a crafted message through, in which case the filter has failed, or mark it as spam, which will raise the false positive rate significantly each time that is done, until the user is too annoyed to use the filter any more and shuts it off, in which case the user gets the spam anyhow. Either way, either the spam gets throgh or the filter is damaged. (And don't get any bright ideas about looking for exeptionally "non-spammy" messages either, because I can tune this to produce messages as spammy or as non-spammy as I want, on average, so they won't stick out for long.)

I could write this, right now, in maybe three solid weeks of work, and keep it flexible enough to stay up to date with anything the Bayes filter folks can throw at it. Faster then they can come up with it, actually, because I get to leverage their work, while they can't touch my work at all (because the Bayes' filters can't start second guessing themselves). It will always work, because it goes right at the Bayes' filters weakness that will always be there, Judo style. And your personalization will be useless, because I will hit the common denominator that everyone must pass through preferentially, or mark all real e-mails as spam.

In conclusion, yes, I understand the statistics at work here, and no, they cannot stand against spam. They are too weak. Everything is too weak against humans, let alone computer-assisted humans.

The other major criticism I've seen would go something like Jerry's For example, a simple and obvious enhancement would be to weight the beginning of a message more strongly than the end.; note that if you're in any sort of arms race, you've gained little to nothing. That's the current status quo! Bayesian filtering is supposed to be an improvement, not just treading water.

In the meantime, there's time before this happens. Another person who linked my piece yesterday followed it immediately with the comment that they just started using one of these filters. I say, go ahead and enjoy them while you can! Any port in a storm.