Algorithm for Building the Tree, Version 1

2003-02-14

Algorithm for Building the Tree

For each processed word in the paragraph, each sense is taken in order and added to the tree. A configurable discount factor is applied to each weight in sequence. Using "1" will of course stop this behavior.

Each sense has the HYPERNYM relationship followed to the top, and that's stored in a list. The list is used to build the tree from the root down to the sense, and weight is added to each node according to some configurable scheme. (Constant, linear, and exponential are currently implemented.) This is done for each sense of each word.

Once done, each sense node has some number associated with it, the "weight". When it comes time to join the tree, the union of each tree is taken (discarding nodes that only appear in one but not the other), and a configurable function is applied to each matching node pair to determine the weight of the node in the union tree. I rewrote the code so this is much cleaner now, notably the union code which is no longer recursive.

The Idea

The idea is that this will allow us to use the weight of the node to determine how strongly that sense/concept was represented in the paragraph, and we can match things up with that later. I need a week to experiment with good settings to get useful results out of this scheme, if it is indeed possible. I also want to try to hook this up to my hobby project so we can easily *see* these trees and the associated weights, rather then trying to visualize them.

Right now everything I've tried is propogating too much weight up the tree, so the top nodes get top-heavy, which isn't good because those are the least specific nodes. Getting the balancing act right will be tricky.