Feb 14, 2003

Algorithm for Building the Tree

For each processed word in the paragraph, each sense is taken in order and
added to the tree. A configurable discount factor is applied to each weight in
sequence. Using "1" will of course stop this behavior.

Each sense has the HYPERNYM relationship followed to the top, and that's stored
in a list. The list is used to build the tree from the root down to the sense,
and weight is added to each node according to some configurable scheme.
(Constant, linear, and exponential are currently implemented.) This is done for
each sense of each word.

Once done, each sense node has some number associated with it, the "weight".
When it comes time to join the tree, the union of each tree is taken
(discarding nodes that only appear in one but not the other), and a
configurable function is applied to each matching node pair to determine the
weight of the node in the union tree.

I rewrote the code so this is much cleaner now, notably the union code which is
no longer recursive.

The Idea

The idea is that this will allow us to use the weight of the node to determine
how strongly that sense/concept was represented in the paragraph, and we can
match things up with that later.

I need a week to experiment with good settings to get useful results out of
this scheme, if it is indeed possible. I also want to try to hook this up to my
hobby project so we can easily *see* these trees and the associated weights,
rather then trying to visualize them.

Right now everything I've tried is propogating too much weight up the tree, so
the top nodes get top-heavy, which isn't good because those are the least
specific nodes. Getting the balancing act right will be tricky.

Hilarious.

The Opera corporation also takes the opportunity in its press release to point something very importent out:

"Hergee berger snooger bork," says Mary Lambert, product line manager desktop, Opera Software. "This is a joke. However, we are trying to make an important point. The MSN site is sending Opera users what appear to be intentionally distorted pages. The Bork edition illustrates how browsers could also distort content, as the Bork edition does. The real point here is that the success of the Web depends on software and Web site developers behaving well and rising above corporate rivalry."

In this case, it's just a matter of extreme impoliteness on the part of Microsoft, since it's their own content they are screwing up, which they have a theoretical right to do. And technically, Opera shouldn't be doing what they are doing, because it isn't their content to muck up. However, the poetic justice is undeniable, and the demonstation of the power (and corresponding responsibility) of browser makers (and browser plug-in makers) to make sure they don't corrupt messages is well-taken.

Columbia observation
Feb 10, 2003

Now that the public discussion of the Columbia disaster seems to have abated, I would like to point out that very, very few people used this as an opportunity to try to dismantle the space program. The vast majority used it as an opportunity to re-affirm our national commitment to space.

Even the starry-eyed "Let's [magically] solve all the problems on Earth before spending anything on space" contingent was quieter then usual. I wasn't very old during the Challenger disaster but I seem to remember a lot more fear and resistence then this prompted. (Perhaps Challenger prepared us for this a bit more?)

Now, the real question is will NASA and the Federal Government back off of space flight as they did after Challenger, despite overwhelming public support? I gotta say, given the amount of spine the average politician and NASA administrator possesses, my best guess is still "Yes." They'll simply assume that this will freak out America and back off, despite the fact that's not what we want.

I often wonder how we survive as a species when the net IQ drops as a group grows in size.

The Job Search
Feb 08, 2003

I'm about to embark on that most scary of college experiences, the Job Search. I'm waiting on one last permission from someone to use them as a reference, then off I go.

The scariest part is Where will I go? One thing is certain, and that's that I'm not staying here. Michigan's a nice enough place to live, if you don't mind going weeks at a time under cloud cover; temperatures don't get too extreme in either direction as a rule (though there are exceptions), the only natural disaster that we are even remotely affected by are tornados, and it's nice green land with trees & stuff... but a bustling hive of tech activity it is not.

The least scary part is definately the transition to work life; I've been working in a professional environment since my second semester of school, and I've experienced several summers that were effectively a "real job". Frankly, I am looking forward to that part; I'm still enjoying school but I hate the way that homework is always hanging over your head. One of my priorities is to get a job that I can mostly leave at work. (I can tolerate some exceptions, and you couldn't stop me from thinking about design issues while eating dinner if you tried, but I do not want to be an on-call server admin.)

In the meantime, uncertainty. Will anyone even contact me? Will the further away jobs be willing to foot the transportation bill for the interview, or at least help out, if they're interested? (I'm in a good economic position compared to most students in my place, but I could still only swing one, maybe two trips to an interview on my own before I am completely out of money. What's common practice? I don't know!) Of all the things I have to do, this is both the most exciting, and in a strange way the one I am most reluctant to do. As much as I'm ready to leave the student lifestyle behind, it is how I've spent the majority of my life to this point. (It is my experience that it takes me about three months for the feeling that Something Is Due Tommorow to go away. This is cruel because that's about how long a summer vacation is.)

Oh, and on the off chance you're just itching to hire me, speak now or hold your peace for a while. ;-) I have many "career goals" I'd like to pursue (I find the concept of a "career goal" placed on the resume stupid, because the career goal is generally a glorified version of "I want the job you're offering". But when in Rome...), but the top one is anything to do with helping people communicate with each other, which is probably the strongest theme weaving its way through my projects, both released (tcp.im and the requisite Jabber work, CustomBlogPost, my work with LON-CAPA since education is a form of communication) and otherwise (my outliner, Spam Writer (the name of the tool in the screenshot), assorted stuff not generally useful so not worth releasing). Here's the resume, though that will probably be updated in a day or two; the info is accurate but I've been polishing it now that it needs to be "production quality", instead of just "hobby code".

Feb 04, 2003

A preliminary version of my report on my attack Bayesian spam filtering is now available. It's up for review, esp. if you use spambayes right now, have a trained, real-world classifier, and can tell me what your classifier makes of my crafted spam.

