On Holding Together This Structure

2004-03-02

In my previous posts, I've answered the question "What Is An Outline?" from the point of view of Iron Lute. The resulting data structures are somewhat complicated. These data structures are at the heart of Iron Lute; if they fail, the entire program can come crashing down. Moreover, if nobody is capable of correctly using the data structures, they are still useless. It's worth discussing

Theory to Practice: How Do We Hold This Together?

especially in light of the fact I'd like to see this become a self-sustaining open source project or potentially a commercially viable product; if I'm the only person who can write or maintain Iron Lute code then that's a hopeless endeavor. The answer to that comes in many parts.

Easy Interfaces For Easy Tasks

For all the complexity in the data structure itself, the vast majority of operations can be specified very well as a small local operation involving one, two, or at most three nodes. (An example of the latter is a "swap nodes" functionality, which involves the two nodes to be swapped and the parent; an example of this in use is the "Move Node Up" command.) And most of those operations do not intrinsically care whether or not the graph is a pure tree-based outline, a graph, or anything else.

For instance, adding a node, a common operation, doesn't mean you have to create a node, create a link, and connect the link manually. Assuming A is a node, it suffices to say

A.addNewChild()

By default, that method adds a new BasicNode (the simplest type) to the end of the children of A; various parameters allow one to override pretty much every clause in that sentence as desired. The "addNewChild" method handles all that.

In most common cases of traditional outline algorithms, the best way to use the outline is to grab a handle and do all access to the outline via that handle. The handle itself passes through node operations to the node that is the destination of the handled link, which means that one can often simply write algorithms in terms of node handles and have work without futzing around with anything else in the data structure. For instance, a simple find-and-replace algorithm to replace "Hello" with "Goodbye" in all nodes reachable from the current node, assuming handleA is a handle for node A, can be implemented as:

for handle in handleA.depthFirst():
    data = handle.getData()
    data = data.replace("Hello", "Goodbye")
    handle.setData(data)

Note the programmer doesn't need to concern themselves with the details of the data structure, including even the issues involved with handling true graphs, rather then outlines. (Depth-first of course refers to the depth-first traversal as also discussed in Storing my Outlines.)

Despite the complexity of the underlying data representation, the vast majority of operations a programmer may wish to perform are very simple; only a very few cases, mostly extending Iron Lute's commands in new ways, require mucking with the internals directly. Even more of those cases can and should end up being refactored into core capabilities, so that ideally nothing except the internals of Iron Lute ever need to much with those things.

The only moderately complicated thing is that you can not be sure if a given node will exist when you reach for it, if it's a remote node, but generally, this only occurs explicitly and people can explicitly plan for it. The Iron Lute core needs to be a bit more robust to handle these cases in general, but generally, people extending the core will not need to deal with that issue unless they deliberately choose to create a design where they might need to.

Thus, programming in Iron Lute is about as easy as it can be, and I do not think this will be a major concern. Keeping it this simple in principle, even when multiple people or processes are working on an outline at one time will be challenging but I am still optimistic it can be done.

Getting It Right, And Keeping It That Way

If the outline data structure has a flaw, the whole of Iron Lute has a flaw. Yet the number of possible interactions in the data structure is fairly large. How can I be sure I've caught the majority of the importent bugs?

The answer is Unit Testing, or more generally, a truly systematic testing framework, built in from Day One, that is specifically meant to cover all aspects of how outline nodes should behave in the system. For this outline data structure, I've written almost as much testing code as I have data structure code (with comments and documentation removed from both), and as bugs are found, that ratio will tip in favor of testing.

It sounds simple but it is not generally practiced, often because the benefits of pervasive, fine-grained testing is not understood in general. For instance, there are by my count 16 things you can do to a Link, just regarding the settings of the source and destination: For a link with 4 possible settings for whether it has a source or destination (no/no, yes/no, no/yes, yes/yes), you can either set a (new) destination, unset the destination, set a (new) source, or unset the source. By writing a test for those scenarios (which only required 4 loops in the testing code; there are some natural categories those break down into), I found a bug that only manifests itself in the yes/yes situation when resetting the destination. It failed to fire an event it should have, which itself is a subtle issue that would have taken a while to resolve, but is exactly the sort of thing that contributes generally to instability in the final product, when a couple hundred of these kinds of tiny bugs add up.

Even more interesting, my count was wrong. Creating a link and setting it, then unsetting the destination and unsetting the source is a distinct act vs. unsetting the source then the destination, as evidenced by the fact that I had a bug that only manifested itself when deleting an existing link by unsetting the destination and then the source. I found this bug not in the outline unit tests, but in the unit test that examined the Delete Node command. Without the unit tests, it would have been much harder to track that bug down, or even detect it, since as it happens the node still correctly deletes, it just leaves the link in an inconsistent state.

But the mere testing of correctness is not the only, or even the most interesting, benefit. As I mentioned at the outset, I've actually had various versions of Iron Lute already running with each of these four major versions of "outlines" at some point. Taking even a small program like Iron Lute through that kind of intensive refactoring tends to tear the program apart. It can be demoralizing to see your program reduced to a seething mass of unidentifiable bugs with no road plan on how to identify or fix them, except by groveling over the code by hand. For a hobby project, that means either death, or the sub-optimal choice of dodging the issues in the first place by leaving the poor design decisions in place.

The ability to test has helped hold the program together, even as the interfaces between the parts evolve. The basic commands, such as "Indent", have changed in implementation several times now, but since I have a unit test written for it, I can easily check whether it's still working with a single command. (In fact, the Unit Test for the Indent GUI command hasn't changed since I refactored the "indent" operation into the LinkHandle class around version 2, even as the "indent" method has undergone at least three major changes.) In a GUI-based program like Iron Lute, it's important to take the time to implement such things because even if you had the discipline to create a check-list to run through, it would still quickly fall into disuse, since a complete checking of the GUI functionality of the product would quickly grow to take hours, or even days. (The latter is not merely hyperbole; a full testing would need to test every node with every command in several circumstances in every time

Unit tests are almost more useful in their capacity to hold developing programs together, and thereby facilitate such development, then they are as correctness testers.

It boggles my mind how many people build large systems without this sort of testing, even though I was one of them just a couple of years ago. Having used them, I loath being forced to give them up at work. (I intend to bring them in someday but right now we are in the middle of a transition from an old system to a much newer version of it, so there's little point in writing the tests against a very different interface.) They actually accelerate development because of the power they give you to tear into "sacred" parts of the program and clear them up, secure in the knowlege that you stand a tolerable chance of not breaking anything. Experienced but non-test-using programmers tend to develop an instictive fear of touching any part of a program that is critical and embedded. Unit-testing support largely removes that fear. While there is still a degree of risk when refactoring code, the resulting gains from cleaning up the crufty parts of the code, and the subsequent more robust code that can then be extended even more, more then offsets that risk and the minimal cost of writing the testing code.

(After all, you will write testing code, and you will need to debug your code. Do you keep your testing code around in a systemized fasion, and use them to direct your debugging, or do you toss quick, unprincipled little test cases at your code, lean on debugging to "fix" the problems, and assume at the end on quite incomplete evidence that the code is "correct" without the ability to run those tests again?)

If you are a developer, this was my attempt to get you Test Infected. Please do not resist. Join us. Do not be afraid. It is so much happier over here...

Strong Code

I can't guarentee the code is bug free, even now. I can't guarentee that it won't need some even more fundamental extension later (though I'm pretty confident now the basic structure is correct). But I am confident that it handles a lot, handles it correctly, and is about as easy to use as you could ask for.

It isn't necessarily perfect code, but it's very strong code, and I'm confident it's strong enough to build Iron Lute on top of. Where it is not up to the task, I'm confident it can be improved on.

The next posting will explore the extensions of this data structure that will make Iron Lute the richest extensible outlining program around.

In other news, I continue to work on the serialization of the outlines. Work has been stressful so I've been doing more relaxing at home then coding. It's looking better so I hope to get more coding in.