posted May 07, 2001

AOL's New Filter on the Block Censorship5/7/2001; 11:43:28 AM 'America Online has begun using new filtering technology to power its "parental control" options for kids, young teens and older teens. The automated technology -- provided by filtering company RuleSpace -- recognizes eight languages and can analyze the content of 47 million webpages per day.''Because patents are pending on the context recognition technology, details are fuzzy. But the basic idea is that, rather than searching for objectionable keywords, it analyzes text and assigns it to a category of similar kinds of text. In this way, the program can supposedly distinguish between a lurid tale and a clinical discussion of STDs.''"This happens every year," said Chris Hunter, a civil liberties expert at the University of Pennsylvania's Annenberg Public Policy center. "They say they've found new artificial intelligence that will be effective. Then it gets tested and the examples of over-blocking come out."'I've seen what constitutes "state of the art" in text recognition; I just took the final exam in that class . Believe me, it was not hard to do better then the previous generation, which was typified by the most naive approach to the problem possible. Even the most untrained of computer users know that filtering on the simple presence of key words won't work. Duh! (Yet the filter makers made millions.)Now someone's finally commercializing techniques that have been developed in the last decade for text recognition, and yes, it's a lot better at automated classification. The problem is, while many distinct techniques for such classification have been developed, they all tend to plateau at the same level of accuracy. Accuracy depends on domain, of course, but we're usually talking around 80-90% accurate, hitting 95% if you're lucky... that's as many as one in five documents incorrectly classified.Even if we assume a groundbreaking, earth-shaking, award-winning accuracy of 95%, that's an inaccuracy rate of one out of twenty, and I don't know of any system that's ever come close to doing better, except of course a human. Of course, the line between "indecent" and "decent" is one of the fuzzier ones, even by purely human standards, which makes it even harder. I'm sure the demo of the product went just swimmingly but automated classification is still not the answer.To put it simply, the status quo remains. A lot of things will slip through, and a lot of things will be incorrectly banned. It still doesn't address one of my main concerns, which is the amount of power we're handing a corporation.But worst of all, this system will be defeatable, if it becomes common enough to make it worthwhile for anybody. Put the phrases "safe sex" and "condom safety" a few times on your page, and you'll probably be able to pass anything through this filter. Humans who run these websites are smart, as humans tend to be.


