Referrer spam half my bandwidth??? [update - nope] [update - false positives addressed]

A few days ago Alwin posted about his referrer spam and a solution he is using to at least cut it down. (Once the referrer spammers move on to v14gr4.somescummyhost.com it'll stop working, but hopefully they don't get that desperate, because hopefully they'll realize the pressures that drove email spammers to do that don't apply to them. But expecting intelligence from spammers has proven a futile endeavor.)

I went ahead and implemented it. To my great suprise, it seems to have cut my website bandwidth usage approximentally in half. Which bothers me; I had thought the spammers would use HEAD requests and not actually download the pages, as that would slow their referrer spamming way down. See parenthetical in paragraph one. I'll have to wait a few more days to be sure, but if so, this takes my bandwidth pressure off.

However, this measure is not without its casualties. The legitimate and non-porn weblog Caseyporn once linked to me, and now I've broken that link, because I've blocked "porn". (I tried a negative lookbehind assertion (?[less than]!casey)porn, but it looks like Apache doesn't support those, at least not the version I'm using. "Less-than" replaced to avoid screwing up double-unescaping aggregators.) But "porn" is in a lot of the garbage. Sorry, Casey, I'm not sure what to do.

I'm not sure off hand what a better solution is, but if half my bandwidth is referrer spam, I've definately got a real problem here so I need to use something. On a larger weblog, the referrer spam may be a neglible percentage, but the average weblog is quite small in traffic.

Update: Nope, that wasn't it. Turns out huge chunks of my site were missing due to a botched upload for boring technical reasons. I have restored my site and my bandwidth usage is back where it is.

Update 2-4-2005: An Anonymous Coward on Slashdot points me at how to unblock certain referrers. Thanks. "caseyporn" is now allowed again. You can get my htaccess if you want; that's a symlink so it'll stay updated as I update my own server.