posted Aug 01, 2008
in Administrative

By popular demand ("this one guy asked for it once"), I've added a post index to iRi. (Also, Google has lost a lot of my pages, which sucks mostly because I keep trying to use it on my own site.)

I tried to implement something like Amazon's Statistically Improbable Phrases, which take characteristic phrases out of books and works pretty well, but my corpus is too small. There are many words I use only once, and the vast majority of two-word phrases are entirely unique. Consequently, I use only single words, and even that works poorly.

While I failed in my goal, I decided the result was pleasantly surrealistic, so I kept it in anyhow. Word-based algorithms are so much fun sometimes, even when they completely fail. It's quirky and I know it. It really loves to highlight misspellings, for instance.

(Also, the process runs once a day, so for instance this post won't be there right away. That's OK.)

I had the original calendar-based archiving system initially, but I just don't think that works very well for weblogs. Who wants to go to "March 20, 2002"? It certainly doesn't work well with less than a post per day.


Site Links


All Posts