Iron Lute

# Expanding Brain (ExBr), an outlining blog

Blog of interest: Expanding Brain (ExBr), a blog about "Windows outliners, software development and starting a new business". Also of note is that the author has set up a wiki for outliner topics.

The author and I have been exchanging emails, comparing development notes.

posted at 12:42:40, 2005-05-07

# I finally have a job, a real job, though I don't start until May 9th. For a while after that I will have an overlap between my real job and my last contract job, but after that, I hope to finally have some time to spend with a clear conscience on my open source projects that I want to work on, especially Iron Lute which has been seriously neglected (though not quite ignored).

In Iron Lute news, in the intervening years since I last analysed the graphical toolkits for whether I can build an outliner with them, GTK has become possible to use without too much hassle. Since this is about the fifth time I've switched major ways of doing business on the GUI front, I've got the independent stuff pretty well factored out, and it's going well. I got TK to a point that was almost satisfactory, but the graphical performance was poor; the node indicators (the thing you see to the left of each node) were getting painted in a seperate pass from the text, so while scrolling or doing other things, it looked horrible. GTK doesn't have that problem, and at this point I don't anticipate any other major problems.

XBLinJS is about ready for a 0.5 release pending me getting off my duff and testing and fixing it up on IE 6, which should happen soonish, but no promises.

Here's hoping the job lasts a while.

posted at 13:27:28, 2005-04-23

# Frontier Open Sourcing

Frontier will apparently be open source in the near future. I've actually spent a significant time with the product, and I have an interesting perspective on this issue as I have a project started in no small part because of the closed nature of the Frontier code. (A project, I might add, that I really, really, really wish I could get back to, but a guy's gotta eat.)

So I ask myself, as someone who might be interested in developing the product, what does an Open Source Frontier bring to the table?

First, the licensing issue has not been worked out. One thing that can wipe out all value up front is a restrictive license, such as one that ends up effectively only allowing Frontier to be used to run Manila.root or Radio.root. You've got to be able to rip out a chunk of Frontier and use it to run your own, independent application; I'd be perfectly fine with the LGPL or the GPL which both require source code to be provided to users on request, fully Free. I'd suggest the LGPL; dual-licensing GPL/commercial is another possibility but may or may not inhibit contributions.

What does the Frontier kernel bring to the table, and what value does it have?

The "Object" Database: (Since it was named, the word "Object" has acquired a more technical definition, hence the scare quotes.) The Object database is a neat piece of code but is somewhat less useful then it used to be. Metakit in particular seems to have left the ODB in the dust. I like the interface the ODB offers, but I could create a very, very thin layer on top of Metakit to provide that functionality if I cared (especially in pure Python).

I say the value depends on the quality of the code; get ready to hear this a lot. If it's high quality, it might be extensible to handle other datatypes, correct some of the worrisome bugs that sometimes pop up, and possibly support more metadata, which might allow for some more interesting program styles. If it's low quality, I'd say, the best thing to do is probably replace it with Metakit or a similar product underneath. You can easily match the interface that the rest of Frontier expects, you'll tap into a robust and current product, and hopefully the performance wouldn't be too bad... might even increase, I don't know. (The down side is the threading issue; metakit doesn't thread well. However, if you serialize access it's OK, and I wouldn't be surprised that's exactly what Frontier does internally anyhow.)

My call: The ODB's most interesting feature today is it's interface: You just can't beat root.value = 3 for convenience. The implementation is probably defunct. However, the good news is replacing the implementation should be easy, the internal interface should be clean. If it isn't, big strike against Frontier.
The library code: Reasonably solid. Letting third parties in to extend it is probably a good idea. It has a lot of little bugs and quirks which Userland simply hasn't had the man-power to fix; for instance, Frontier's XML support continues to be dangerously quirky and needs a standards-compliance work over. Most of the bugs are little, but need a lot of testing to validate correctness, and that some fix didn't break some other aspect. (Y'all can start by looking in my Jabber library; there's some tests there that are clearly labelled as breaking in the current Frontier, and last time I tried them around last February they were still broken.) The value here is again determined by the quality of the code, and how easy it will be to fix the bugs without breaking other things.

On that topic, I don't know what testing methodologies are used in Frontier, but I strongly suspect the addition of a unit-test-based test suite is called for. This would largely alleviate the fear Dave justifiably has of breaking the library code. It also provides a path for Userland to validate that the open-sourced Frontier continues to meet their needs; if the unit tests continue to pass, the project most likely hasn't broken their code.

My call: Cleaned up and properly tested, the library code has value. What I can't tell you is whether it any longer has a niche. There's hardly a function in Frontier I can think of that doesn't have a match in stock Python. (Excepting the outline functions, which I now have a match for in Iron Lute, so for me personally at least it's still a wash ;-) ). Or stock Perl, or easily acquired Java libraries.
Development environment: But one advantage I'd say Frontier still clearly holds over Python is the development environment. The debugger is superior to any Free product I'm aware of for Python, and the windowing system is nice. I intend to treat the outliner seperately, but the environment makes good use of it.
Cross-platform-ness: It's nice that it works in Windows and Mac, but I wonder what it would take to get it into Linux. My guess, honestly, is that it would be a hell of a lot of work, even leveraging Wine. My call: Marginal, but working on Windows is surprisingly rare, so may be useful.
Outliner: Ah, something near and dear to my heart, since the closed-source nature of Frontier is a major component of what drove me to create Iron Lute in the first place.

As I've pointed out elsewhere, there are no usable outliner widgets in the open source world. As a result, a lot of my Iron Lute time has been spent hacking and kludging a workable outline widget out of Tk. One of the things that will be most interesting to me personally is how flexible and salvagable the outline widget in Frontier will be.

On the other hand... the other design objections I've had to Frontier's outlines will remain intact, and if there's one thing I've learned as I wrote Iron Lute, it's that I have zero interest in trying to replicate my current outline.py in C. So its value to me personally is not as great as it would have been a year or two ago.

How far other people will be interested in taking it is another story. Depending on the timing of the Frontier source release and subsequent community formation, and of when I get a steady enough job that I feel like I can once against dedicate my time to a hobby rather then work, the story may change. One interesting possibility is if the Frontier crowd makes it easier to reach in; perhaps Iron Lute could become an alternate interface to the Frontier Core, perhaps even eventually replacing the current one for most people. Or perhaps they'll make enough progress that I'll feel compelled to work with the Frontier project to improve the current outliner up to the specs I think a modern outliner should have. Who knows?

My call: Once again, it depends on the quality and extensibility of the current code, whether it's at its end-of-life or not, and how much interest in shown in the ideas I've put forth on my Iron Lute blog. We'll have to see.

I think that largely covers the Frontier Core as it is being released. My final conclusion: I am positive about its release, but only cautiously optimistic about its value in the current environment. If it had been open sourced several years ago, it would have much more value to attract developers. It's bringing value to the table, but also a lot of bugs and work that needs to be done. Quality of the code is paramount; if it's undocumented and opaque, I don't see it ever collecting enough developers to matter.

As for what it means to me personally w.r.t. Iron Lute: Iron Lute is in stasis anyhow, so I can afford to wait and see. However, I don't expect that it will be worthwhile for me to try to merge my code back into an existing project, code that would require a deeper integration with Python on the Frontier side before it mattered anyhow. I expect that at best, a look at the Frontier code might teach me something, but I don't expect to be terribly surprised by what I see in there. One of the reasons for starting Iron Lute may be defunct by now but I'm far enough in that it's worth just continuing on. Assuming I can ever get back to it, Iron Lute will continue to offer a lot that even a retro-fitted Frontier won't be able to match for a while.

Addendum to initial post: Looking around, there are two things I forgot to talk about in the original post.

First, Frontier does have good support for non-programmers, especially in the Macintosh world. This is a very legitimate offering and would be an excellent niche for the Frontier kernel to deliberately pursue. Scripting in the Microsoft Windows environment is also largely a wide-open field; I forgot about this one because I've been living in the Linux world for so long now, where scripting is largely a given; in fact one must choose from an embarassing profusion of languages.

Second, the technology in Frontier, as useful as it is, is at the end of its life, I believe. You can see a hint of that in my ODB description above, where the technology is obsoleted and I think the best course of action would be to replace it with something else. In fact, I think that one of the best things an Open Source Frontier could do is take the plunge, sidle up to Python, and work on becoming a Python application that also happens to run Usertalk.

Why? First, Python is a sophisticated and developing language that allows for a lot of things that UserTalk doesn't, but really should. There are a lot of features in Python that would allow for even simpler APIs for beginning scripters. Second, it opens up a whole new technology world for Frontier with minimal investment; database APIs, real GUIs, and yes, even the possibility of integrated Iron Lute ;-)... but seriously, Frontier would probably be best served by this style of integration because it would benefit from the work of a much larger community. Nobody in the Frontier world is going to tie into GTK or QT, nobody's going to integrate with Linux stuff otherwise.

Frontier's best hope for continuing relevance is to focus on what it does better then anybody else, being a powerful scripting environment for beginners, and migrating off of its internal proprietary technology as quickly as possible. Honestly, I deeply respect Frontier, because it was literally a decade ahead of its time when it first came out. However, it is now about five years behind the time, and rather then trying to "catch up" with a small community, it would be better served by trying to connect to a modern one.

posted at 04:25:36, 2004-05-18

# Iron Lute back burnered

Well, Iron Lute is getting firmly placed on the back burner, 'cause I've been laid off my job and I need to concentrate on things that will make money. I do intend to get back to it but I have no idea when that will be.

posted at 16:39:28, 2004-04-06

# XML Mapping to and from Objects

Today's post is about one of the little libraries I'm developing as part of Iron Lute. In the previous posts, I've laid out what I believe were successes; today I highlight what is so far a failure for a change of pace.

Please note that today's post is really more of an XML post then an outliner post; if you're interested in programming with XML, stay tuned. If you're only interested in outlining, you should probably move on. I've tried in the previous posts to be accessible to interested laymen, but this one may also be only useful to programmers. Consider yourselves fairly warned.

XML Marshalling

Assume that we actually know how to use Object Orientation to our advantage. Suppose we want to transfer data via XML, without restriction importing from or exporting data to XML documents.

As anyone who has worked with this knows, objects do not automatically map out to XML. There are libraries that help take XML into objects, like DOM, but you do not get to choose the objects, or if you do, you are still constrained to the XML format. In the other direction, you get things like this module for Python that will implement the "pickle" module in XML, but good luck using such language-specific marshalling in any other environment.

In OO terms, both styles of library end up causing excessive coupling, either couple the XML to the object model (to the detriment of others who would like to read the XML), or coupling the object model to the XML (which almost inevitably results in serious compromises to the power, flexibility, and usefulness of the object model).

If you want to retain flexibility, you generally need to write a seperate marshalling layer for your objects, which will manage both the translation in both directions. Then, your XML format and data structure may both be optimized for their respective uses with no compromises, but at the cost of maintaining this extra layer.

In Iron Lute, which will deal with both XML and non-XML formats, I've implemented this as a "filetype" abstraction, which is a Python module that keeps track of some metadata and some classes which are responsible for knowing how to converting the target format to and from the outline nodes in the memory format I've discussed earlier.

Quite a lot of those file types are of course XML or XML-like. I began to conciously recognize something that had been percolating in my subconcious already: In that middle layer, there is a lot of redundency. It is a peculiar kind of "paired" redundency, where the same pairs of operations are repeated, over and over again.

For one common example, it is quite typical to have something like the HTML element <ul<, which may contain <li>'s. From the XML into the Object Model, you have code that switches into some sort of "ul" mode, and reads the subsequent li's into a container object. Going the other way, you iterate over the container object and spit out each of the li's into the XML stream.

Because of this similarity, I began the process of factoring this similarity out. Of all the aspects of Iron Lute, this is by far the most experimental. Outliners have been done before; in the end the only truly novel bullet-point feature Iron Lute may end up with is that it is "quality open source Python". To my knowlege, the only other attempt at this sort of XML library is a Java library called JAXB, and truthfully I'm not certain that that library is aiming for 100% indepedence of the XML and custom objects like I am; rather it still seems to defer a lot of the structure of the data objects on to the XML format itself.

Theory: Why Is This A Good Idea?

XMLMapping (the current name of the library, which is horrid) is built on the idea that you may have a pre-existing XML format, and a pre-existing Object Model, and you may not want or even be able to bend one to suit the other.

In theoretical terms, this means that XMLMapping completely decouples the XML format from the data format. When you put it that way, I'm surprised that there hasn't been more work in this area.

The ideal I'm shooting for is to be able to blur the distinction between the XML representation and the object, with no compromises on either the XML representation or the object design. This flexibility opens a lot of doors, like being able to throw objects into a Jabber stream, or directly over a socket, with very little difference in code.

Theoretically, you get the benefits of a native OO representation and a native XML representation at the price of writing a description of a translation layer, which at least in theory should be fairly redundant between various mappings, and thus I should be able to provide significant chunks of the transform as library code that can be pieced together into what you need. (For instance, the aforementioned Container behavior is very popular, and easy to abstract into a component, even though that generalized component is moderately complex.)

Another way of looking at this is that it is a library for creating modular and extensible SAX-based XML parsers and a generalized XML output system, all at once.

Practice: The failure

In practice, there are two major problems:

Even the simplest transforms require a lot of information. It's "declarative" information, not programming per se, but it's still a lot of typing in Python.
Getting the library code correct is challenging.

The first can (and I believe should) be solved by created a "mini-language" that allows the use of non-Python syntax to concisely and efficiently express the transform. The second is only solvable by continuing to pound on the problem. ;-)

The second one has been the killer. Basically, you need to have some sort of concept of "piece of XML" that maps to "piece of an object". Perhaps surprisingly, it is the first part that has been giving me fits. A "piece of an object" can be relatively easily specified in by a function or pair of functions, so if I want to for example map the "header" member of some object to the XML, I can specify the 'header' member quite simply as:

lambda obj: obj.header

and for convenience it is trivial to define well-named helper functions that make this look ok:

def member(memberName):
    return lambda o: getattr(o, memberName)
member('header')

It is defining a "piece of XML" that has been hard. This "piece of XML" concept needs to work in two contexts:

When XML is being read in, the XMLMapping interpreter needs to process each "piece of XML" in sequence.
When XML is being generated, each mapping element outputs a "piece of XML".

The tension between these two uses is what is driving me nuts. For the second concern, the answer is obvious: Just output chunks of XML as needed, precisely as any conventional XML output system does. The downside is that that's not useful to the reader; as one small example, in isolation, </integer> is useless. To understand that chunk of XML, you need the whole thing: <integer>54</integer> still needs some context to be understood but it is certainly plausible that the Python expression 54 adequately captures the meaning of that chunk.

(I know there are some evident solutions to the problem as described above, but remember that the preceding paragraph is in some sense a transliteration of a problem in the code; an English solution to the English formulation of the problem is not helpful. All of the simple English solutions I've come up with don't translate into code, since they all require human discretion to work correctly.)

I initially started with deciding that a "tag" is the root unit of XML, but I have encountered several difficulties. First, consider the OPML format: The body has only one tag name, outline, but we almost always want to switch based on the nodetype, which is actually an attribute. So the tag name isn't sufficient, we need to match on the whole tag as well.

One can equally well imagine wanting to switch off of contents of a tag, too.

I've also encountered a wide variety of other potential issues as I try to use this library. For instance, I found myself wanting to have a "null wrapper"; an Iron Lute document is currently just a stream of nodes and links, so to bundle it into an XML document I just need to wrap them in a root element. The code as written couldn't handle a "do nothing" element; it wanted to parse and understand every single tag. Coding around this required several lines of icky, fiddly code.

One can imagine other circumstances where you might want to ignore entire chunks of a document. For instance, someday one might want to specify something by an XPath expression and parse just that chunk of XML.

I'm still optimistic I can make this work usably, and if I do I still think it will be quite impressive, and all the more satisfying for the challenge it was to create. I can see why previous attempts haven't been as ambitious though ;-)

Well, you know what they say, you learn a lot more from failure then success.

In development news, I'm getting close to at least a prototype of the Iron Lute file output format being usable. Some other touch-ups on the xml mapping code here need to be made, which fortunately will carry over directly into any future version of the library anyhow so that's not time lost. After that we get the GUI going, then just a couple more pieces of developer support before we get going that I don't want to release without and I'll have myself an alpha release.

Unfortunately, due to unavoidable life issues development may be slowing down on Iron Lute for a long period of time; my development time needs to go elsewhere. I'll work what I can in, but it's no longer my highest development priority. The new #1 priority is much less fun, but, well, life doesn't always go as we plan.

posted at 12:42:40, 2004-04-05

# "Distractions" post on Iron Lute

Dotan Dimet has some comments on Iron Lute, which I think contains some misunderstandings about Iron Lute that mostly stem from the developer-centric view of Iron Lute y'all have seen so far. I posted a comment containing some follow-up on his post, and I replicate it here for posterity and RSS readers. Note it contains links to some of the actual code which I posted to provide evidence of how hard it will be to provide bi-directional text support in Iron Lute, so if you want to see some of my actual code, now's your chance. Comment I posted follows:

"Iron Lute, on the other hand, looks like another case of someone falling in love with the technology...."
I strongly disagree. Iron Lute is being created because there is no good platform to create LEO on. LEO is, with all due respect, a hack with regard to outlines, in the sense I can't build anything interesting on top of that, other then LEO. I don't care about outlines intrinsically and I look forward to the day I can actually do things with them. If a good platform existed (in a reasonable, non-Java language) I would have used it in a heartbeat. Unfortunately, no such open platform exists.

I think you'll see that even in the initial alpha release, the user focus will be on "what you can do" with Iron Lute, and not just "the outliner" aspect of it. Remember what you're reading now is developer documentation, not user documentation. That's waiting until closer to the release.

"Pity he's making the same mistake (well, "short sighted decision") made in Leo by choosing a GUI toolkit that doesn't support BiDi."

Only Tk has had the flexibility to do what I need. Text widgets that support BiDi don't support other things I absolutely need (like "how big is this text, wrapped?") so unfortunately they are not on the table. Tk would have been nearly my last choice, but it's the only one that worked.

(At least Tk supports Unicode!)

Even the Tk widgets aren't ideal and need a lot of massaging to work right. How much massaging? Let me show you ;-) Here's the code for each node in the outliner, and here's code for the outline frame that ties them together into an outline. That's 70 KB of Python code (using a conservative *3 for C/C++ code, that's 210KB, bigger then many novels), and it's still incomplete.

The GUI is abstracted from the code so if GTK or QT ever get their act together relative to my needs it would be relatively little work to get the support in there. (I actually started some wxWidgets work on Windows but I couldn't make it work without intolerable flashing, even after I hacked on the C++ code of wxWindows itself. But it was far enough to show the separation works.)
The other problem is that I have to do all the geometry management of the nodes myself, so BiDi and other issues are going to require a lot more code then in most projects. A lot of very, very sensitive code. See that stuff in outlineframe.py, like the syncWithDocument method or moveCursorToWidget? All very touchy, all needing to be modified for BiDi. Unfortunately, and I really mean that, it's not just flipping a switch in the code, even if GTK gives me what I need :-(, it's a massive and bug-prone re-write of the GUI guts of the project.

The good news is that it should at least be possible someday.

I'm going to link to this on my 'blog, as long as I've decided to share some of the code might as well do it for everyone ;-)

Which I did.

Thanks for the commentary and feedback. I hope this doesn't sound hostile, because misunderstandings about the nature of Iron Lute are entirely my fault, until it is released and can be actually used, so I don't "blame" Dotan Dimet for anything and actually value the chance to explain myself more clearly.

A stronger focus on "what can we do with the outliner?" will become evident when I start talking about how I've abstracted saving and loading from various formats and various file-like things, and it will become more clear how Iron Lute's focus is on the user, not the technology. (It's just that you can't skip the technology step, unfortunately.) The power of loading and saving OPML(or any other supported format) to various sources, not just files, will be clear. (For example, I already support loading and saving OPML to and from Manila sites, just to make sure the support was working and useful.)

Minor update: It occurs to me that it is worth mentioning why I tried wxWidgets on Windows. It turns out that only Windows has a rich text widget that you can ask, "Given the width you currently have and the formatted text you currently have, how much vertical screen space do you need to completely display the text?" This is very useful in many circumstances. Unfortunately, the rich text widgets had too many other flaws for what I needed to be useful, including but not limited to excessive flashing that I couldn't stop (even after trying every permutation of "double buffering" flags the widget supports), fears of massive incompatibilities due to there being at least 4 distinct versions of the Rich Text widget with no way for me to ship the one I need, and inability to control or even detect certain keypresses that are built-in to the widget (for example, I couldn't detect a "CTRL-+", which increases the size of every font in the widget, which in the outliner would require re-sizing the text widget itself; I'd either want to catch that event or prevent it, but I couldn't do either). Any one of them was a stopper; together they led me to believe Iron Lute isn't ever going to be feasible in Windows without writing a complete text widget from scratch, which isn't worth it to me. (Note that Tk is a complete text widget from scratch, it doesn't use Windows code to draw except for font rendering, and I've had perfect luck so far programming in Linux and having it work perfectly in Windows. Windows even looks better since it gets anti-aliasing the Linux Tk does not.)

As far as I can see, neither the Windows rich text editor, nor the Macintosh rich text widget, nor QT, GTK, FLTK, wxWidgets, or the Fox toolkit can do what I need them to do. If Tk couldn't do it, there'd be no Iron Lute (or perhaps I would have dived into console-only, but that would not be fun). I've analysed each of them for what I need, even the very latest of QT and GTK, and they are all lacking in almost the exact same ways: I can't talk about formatted text in the widgets, I can't (usually) get the screen coordinates of the cursor, and there's no way to klude what I need.

(I need those things because I absolutely insist that when you press the "up" key on the top line of a given node, that you go to the bottom line of the next node up, and that you go to the roughly correct column in that widget... just like every other professional outliner I've ever seen. Anything less will not be usable. Without the ability to get the screen coordinates of the cursor, and set the cursor to other screen coordinates, this feature is impossible. And laying the nodes out on screen without being able to ask the toolkit "how big is this text" is really hard; I ended up re-implementing the Tk layout algorithm in Python, or at least something close to it, but if it changes or if I'm wrong, wierd and bad things happen on the screen. I'd rather not have to match it...)

posted at 11:44:48, 2004-03-24

# I have a case of the multi-disciplinary writer's block.

I'm having a hard time writing the outline saving code in Iron Lute. I have an essay I want to post here, but it is obstinately refusing to go into focus. (I may yet just post it in a nebulous state, as I'm not convinced it's ever going to focus, by its very nature, but I think it's important that I write it.) I also have a technical post on the inner workings of Iron Lute which I'm having a hard time writing; I've mostly written it, but it's quite dull.

Yes, that's right, even more dull then the existing postings! I figure someone ought to be interested in those because I would be, but I wouldn't want to read this posting as written.

Going backwards, I know why the Iron Lute essay is dull, though I am surprised. For a change of pace, I was going to show a failure I've experienced, to contrast the successes I've mentioned to date. I've been able to explain why it should have been a success, but explaining why it is a failure has proven difficult. (Perhaps this is because the only real failure is that it is too difficult to use, not that it "doesn't work", and it's hard to show in prose why that is. You really have to try to use it yourself to understand why it is hard to use.) I am intrigued by how this is almost exactly the opposite of conventional journalism, where disaster stories write themselves (assisted with a heavy helping of cliches and standard idioms) but good news is hard to write.

I haven't been able to work on the code out of lack of motivation. I'm sure it will come back, it always does, but that's one of the major downsides of Free Software; it gets released when I'm good and ready and if that means I want to take a month off to play video games, well, that's what I'm gonna do. ;-) (OK, it's only been the last week, but it has been quite relaxing and rejuvanating. I'm sure the code-monkey will get back on my back here soon.)

Finally, the essay has been just plain, ole' fashioned hard to write. It's about negativity in weblogs and journalism, and how it is affecting me, and I believe others, emotionally, but I've had a hard time keeping a strong thesis in it, mostly because the emotional effects are quite diverse. Hopefully it will come together, because I feel quite strongly we need to start critically exploring the new media world we are building to see how we fit in as humans, and indeed, whether we fit in as humans.

Here's hoping at least one of the blocks breaks soon.

posted at 10:19:28, 2004-03-23

# Iron Lute progress update

Just a quick note: I'm still working on the XML save format for Iron Lute outlines. I'm trying to use a library I've put together for XML serialization and it's not going so well right now. I think it still has potential but I may need to re-work it into a "version 2", because version 1 is sucking pretty badly.

posted at 16:32:48, 2004-03-15

# Outlines, Part 6

In my previous post, I discussed the practical matter of how to hold the outline structure we've built so far together. Having created a strong base, it is now fruitful to consider how to extend the data structure to handle the wide variety of outline structures I want Iron Lute to be able to manipulate.

Node Types

One of the most interesting possibilities inherent in this structure is to formally recognize that there are a lot of potential different types of nodes that can be built.

Every outliner I've seen has at least some informal idea that there are different types of nodes; in pre-Node Type Frontier, the menus were implemented as extensions of outlines, in many other outliners they use attributes to try to declare some limited idea of node type (like "isComment='1'" and such things). But formal recognition of the fact that we want lots of different types of nodes allows us to formally support them, define them, and allow paths to extend them.

As an existing example, implemented in Radio Userland and Iron Lute, is the OPML Transclusion node. In Radio Userland, it is expressed as a node type of "link", in Iron Lute, I have an "OPMLTranscluder" (to seperate the OPML-specific semantics from other potential transclusion types, and because OPML is just one particular format to Iron Lute, it's not built on it).

What does that mean? It means that the node can bundle certain actions along with it, and carry data with it. In other words, it almost exactly parallels the "class" idea in OO, so in Iron Lute we actually build it on classes.

An OPMLTranscluder node knows to load a remote OPML file if it is opened. Many other type of nodes can be imagined, each carrying their own verbs with them whereever they may go. A lot of things can be expressed as Node Types that you might not initially realize.

For instance, I envision a web site based on outlines, instead of files and database entries. (Kinda like Channel Z, but really deeply outline oriented; I've recently been impressed with how HTML::Mason works in Perl and I think it'll end up looking like that with "outlines" in place of "components".) "Comment Entry Field" can itself be a particular node. Expanding it might let you view the comments made to date. Each comment would itself be a Comment type node, and might know who made the comment. The comment would carry commands with it, so you might be able to right-click and select "Ban User/IP".

Removing a comment would be as easy as just deleting it. You could edit/censor the comment if you were the site owner, re-order the comments, or do any of the other outline manipulations and it would all work. If you decided you didn't want any comments at all, you could delete the entire Comment node from the posting, and all the comments, plus the ability to add new ones, would disappear. So you could easily allow comments on a post-by-post basis.

Moreover, it's just a node. You could allow multiple distinct comment fields per posting, or move the comment field to another post, move comments from post to post, have the exact same comment field show up in multiple places (different pages of one article), etc. I see a very flexible and interactive web site developing from this, with a different content philosophy then modern web sites.

The website itself will be viewed as HTML by running the engine part of Iron Lute on the web server and rendering things as HTML. Of course, templates and such can already be represented as outlines; that's proven technology.

This is just one small instance of how much fun it might be to have such a clear, consistent, and powerful interface to data available to us. Replicating any of this in current architectures is of course possible, but with outlines and an outliner, the capabilities are clean, cheap, and extend well.

There's also some internal uses; I've implemented the menus as outlines (though as of this writing they don't cascade, that's only because I haven't needed it yet), and later internal configuration will be available in an outline form, with nodes holding preferences.

Node types are, as I've mentioned earlier, the fundamental Good Idea that drove me to write Iron Lute. I'm still excited by the prospect of having this stuff to play with; I only wish I was already done with the "hard work" to get here.

I don't have access to all of the closed-source outliners so I'd like to explicitly disclaim the possibility that they may have a wonderfully clear abstraction, since many of them do seem to have a relatively rich set of things that look a lot like Node Types. Without a developer API that can tap into that power, though, who cares?

Document Types

Every document can have a document type, which can also define behaviors and commands for every node it contains. For instance, in the web system I allude to above, whenever you are editing a document, you'll have a command in the Document menu to insert a new comment area. (You can see the Document menu in screen shot, dimmed out because I haven't come up with any reasonable "document commands" for the default, neutral document that aren't really generally useful commands that belong in the main menus.)

Another useful aspect in the general case is "embedded document" or "composite document". Imagine editing a Perl program, and having an HTML "here" document in it. Iron Lute will be able to provide full HTML support inside of the "here" document while still providing full Perl support in the normal document, with no "hacks" necessary. This comes up surprisingly often, and the ability to do it cleanly will bring forth even more applications when one document is "embedded" in another.

One thing I'm planning on is to segregate the keyboard commands based on whether they are "node" or "document" commands; I'm planning on reserving "ALT" for document commands, whereas I want to leave CTRL (w/ or w/o SHIFT) for node commands. (The user can re-configure this away if they like but I wouldn't recommend it.)

More Structured Node Text

Most outliners have some capability of supporting at least formatted text. I want to take the next step, and support smarter formatting.

Radio Userland already supports at least one special case of this; IIRC correctly, you can click an HTML link and follow it. Most, if not all, outliners allow formatted text. I want to generalize this, so that you can declare "tags" and decorate text with them (just as you decorate text with "bold" and "italic"), and allow the text itself to carry any useful commands, formatting, intelligence, etc. that it needs to.

Of course you can build the standard "bold" and "italic" on top of this, but you would then easily be able to build other things on top of it too, like an "HTML link". An "HTML Link" is a tag that turns the tagged text blue, underlines it, and when right-clicked, presents (at least) two choices: Edit the link destination, or open a browser on that link. In some ways it's easier to define the generic capability then build the simple things on top of it.

The spell-checker will tap into this; a spellchecker watches the words in a node, and marks misspelled words with a tag that changes the formatting and when selected, gives the usual correction options.

As of this writing, I'm not entirely certain how this is going to be done, but I am fairly certain you will be able to define your own tag types and provide them with context-specific commands in a clean manner.

Synthesis

All added up, this provides a rich, robust system for representing a wide variety of outline type data structures, documents and otherwise.

It's not a panacea; perhaps in some other posting I'll talk about the indicators that an outliner is not an appropriate interface. For now, that's only written up in the developer's manual, and still a work very much in progress. It also depends on some of the precise aspects of the outline structure that I have not discussed, such as its interaction with the Commands the user runs on the outline, and the hitherto-unmentioned Undo/Redo system, so I can't meaningfully describe all of the indicators right now.

It's worth reviewing for a moment what I've done. Starting from a monolithic idea of what a node is, I broke the node up into:

A structural piece (original node class)
how the structural nodes are connected (links)
a 'projected' outline on top of a graph (handles)
Where the node gets its data from (data sources)
what kind of node/document it is (node types)
what document the node belongs to

that's six distinct "aspects" of a node that I can manipulate, or even replace, independently of the others. (This is why writing outline code in other outliners feel so constraining to me; things natural under this model are complex, error-prone, or even impossible under the "monolithic node" model.) Despite that, it's only marginally more difficult to manipulate, and often easier because it asks you to jump through fewer hoops.

For reference, the final Python code that implements this structure is significantly shorter then just this discussion. Approx. 50 kilobytes of code sustained about 90kilobytes of discussion; and this discussion was just the high-level discussion, and the 50 kilobyte count includes a lot of documentation. While not all code is written with this level of deliberation, a lot of it is.

For you non-developers who may have slogged through this, I hope this opens your eyes to the amount of work represented in every program you use, including huge quantities of code you aren't even aware of at the OS layer and other layers you can't see. And even in the case of Iron Lute, which remains a relatively small, unreleased program, this is only a fraction of the total code in that program; as of this writing there are 700 kilobytes of Python code, including testing, documentation, and at least one dead branch of code. (While this is a very carefully considered and designed part, there are at least four or five other things in the code worthy of equal care; some I may yet write about here, some I probably won't.)

This should be the last bit exclusively regarding outlines, though of course further discussion will all be predicated on the outline model for obvious reasons.

Last night, I 'finished' the part of Iron Lute that reconstructs outlines based on the pieces as described in this post; finished gets scare quotes because it needs more tests written for it, which it gets tonight. However, it already correctly handles the case of A->B, which pretty much is the fundamental thing it has to get right; everything else (except maybe A->A) is built on that. After that I need to write the XML converter for the parts, and start testing the dickens out of it, then I can swing back to the GUI.

After more thought, I'm strongly considering releasing an alpha version at that point, even though I'm not "done" yet, for testing and feedback. There are several large ideas still left to go in, but in the long run it might be better to have it out there, getting used and getting feedback, then to get all the ideas in there at first. My primary issue is I don't want developers to get too attached to the current way of doing things which may change later, but this sort of project generally takes time to come up to speed, if it indeed ever does, so perhaps that's a phantom concern.

posted at 11:23:28, 2004-03-08

# On Holding Together This Structure

In my previous posts, I've answered the question "What Is An Outline?" from the point of view of Iron Lute. The resulting data structures are somewhat complicated. These data structures are at the heart of Iron Lute; if they fail, the entire program can come crashing down. Moreover, if nobody is capable of correctly using the data structures, they are still useless. It's worth discussing

Theory to Practice: How Do We Hold This Together?

especially in light of the fact I'd like to see this become a self-sustaining open source project or potentially a commercially viable product; if I'm the only person who can write or maintain Iron Lute code then that's a hopeless endeavor. The answer to that comes in many parts.

Easy Interfaces For Easy Tasks

For all the complexity in the data structure itself, the vast majority of operations can be specified very well as a small local operation involving one, two, or at most three nodes. (An example of the latter is a "swap nodes" functionality, which involves the two nodes to be swapped and the parent; an example of this in use is the "Move Node Up" command.) And most of those operations do not intrinsically care whether or not the graph is a pure tree-based outline, a graph, or anything else.

For instance, adding a node, a common operation, doesn't mean you have to create a node, create a link, and connect the link manually. Assuming A is a node, it suffices to say

A.addNewChild()

By default, that method adds a new BasicNode (the simplest type) to the end of the children of A; various parameters allow one to override pretty much every clause in that sentence as desired. The "addNewChild" method handles all that.

In most common cases of traditional outline algorithms, the best way to use the outline is to grab a handle and do all access to the outline via that handle. The handle itself passes through node operations to the node that is the destination of the handled link, which means that one can often simply write algorithms in terms of node handles and have work without futzing around with anything else in the data structure. For instance, a simple find-and-replace algorithm to replace "Hello" with "Goodbye" in all nodes reachable from the current node, assuming handleA is a handle for node A, can be implemented as:

for handle in handleA.depthFirst():
    data = handle.getData()
    data = data.replace("Hello", "Goodbye")
    handle.setData(data)

Note the programmer doesn't need to concern themselves with the details of the data structure, including even the issues involved with handling true graphs, rather then outlines. (Depth-first of course refers to the depth-first traversal as also discussed in Storing my Outlines.)

Despite the complexity of the underlying data representation, the vast majority of operations a programmer may wish to perform are very simple; only a very few cases, mostly extending Iron Lute's commands in new ways, require mucking with the internals directly. Even more of those cases can and should end up being refactored into core capabilities, so that ideally nothing except the internals of Iron Lute ever need to much with those things.

The only moderately complicated thing is that you can not be sure if a given node will exist when you reach for it, if it's a remote node, but generally, this only occurs explicitly and people can explicitly plan for it. The Iron Lute core needs to be a bit more robust to handle these cases in general, but generally, people extending the core will not need to deal with that issue unless they deliberately choose to create a design where they might need to.

Thus, programming in Iron Lute is about as easy as it can be, and I do not think this will be a major concern. Keeping it this simple in principle, even when multiple people or processes are working on an outline at one time will be challenging but I am still optimistic it can be done.

Getting It Right, And Keeping It That Way

If the outline data structure has a flaw, the whole of Iron Lute has a flaw. Yet the number of possible interactions in the data structure is fairly large. How can I be sure I've caught the majority of the importent bugs?

The answer is Unit Testing, or more generally, a truly systematic testing framework, built in from Day One, that is specifically meant to cover all aspects of how outline nodes should behave in the system. For this outline data structure, I've written almost as much testing code as I have data structure code (with comments and documentation removed from both), and as bugs are found, that ratio will tip in favor of testing.

It sounds simple but it is not generally practiced, often because the benefits of pervasive, fine-grained testing is not understood in general. For instance, there are by my count 16 things you can do to a Link, just regarding the settings of the source and destination: For a link with 4 possible settings for whether it has a source or destination (no/no, yes/no, no/yes, yes/yes), you can either set a (new) destination, unset the destination, set a (new) source, or unset the source. By writing a test for those scenarios (which only required 4 loops in the testing code; there are some natural categories those break down into), I found a bug that only manifests itself in the yes/yes situation when resetting the destination. It failed to fire an event it should have, which itself is a subtle issue that would have taken a while to resolve, but is exactly the sort of thing that contributes generally to instability in the final product, when a couple hundred of these kinds of tiny bugs add up.

Even more interesting, my count was wrong. Creating a link and setting it, then unsetting the destination and unsetting the source is a distinct act vs. unsetting the source then the destination, as evidenced by the fact that I had a bug that only manifested itself when deleting an existing link by unsetting the destination and then the source. I found this bug not in the outline unit tests, but in the unit test that examined the Delete Node command. Without the unit tests, it would have been much harder to track that bug down, or even detect it, since as it happens the node still correctly deletes, it just leaves the link in an inconsistent state.

But the mere testing of correctness is not the only, or even the most interesting, benefit. As I mentioned at the outset, I've actually had various versions of Iron Lute already running with each of these four major versions of "outlines" at some point. Taking even a small program like Iron Lute through that kind of intensive refactoring tends to tear the program apart. It can be demoralizing to see your program reduced to a seething mass of unidentifiable bugs with no road plan on how to identify or fix them, except by groveling over the code by hand. For a hobby project, that means either death, or the sub-optimal choice of dodging the issues in the first place by leaving the poor design decisions in place.

The ability to test has helped hold the program together, even as the interfaces between the parts evolve. The basic commands, such as "Indent", have changed in implementation several times now, but since I have a unit test written for it, I can easily check whether it's still working with a single command. (In fact, the Unit Test for the Indent GUI command hasn't changed since I refactored the "indent" operation into the LinkHandle class around version 2, even as the "indent" method has undergone at least three major changes.) In a GUI-based program like Iron Lute, it's important to take the time to implement such things because even if you had the discipline to create a check-list to run through, it would still quickly fall into disuse, since a complete checking of the GUI functionality of the product would quickly grow to take hours, or even days. (The latter is not merely hyperbole; a full testing would need to test every node with every command in several circumstances in every time

Unit tests are almost more useful in their capacity to hold developing programs together, and thereby facilitate such development, then they are as correctness testers.

It boggles my mind how many people build large systems without this sort of testing, even though I was one of them just a couple of years ago. Having used them, I loath being forced to give them up at work. (I intend to bring them in someday but right now we are in the middle of a transition from an old system to a much newer version of it, so there's little point in writing the tests against a very different interface.) They actually accelerate development because of the power they give you to tear into "sacred" parts of the program and clear them up, secure in the knowlege that you stand a tolerable chance of not breaking anything. Experienced but non-test-using programmers tend to develop an instictive fear of touching any part of a program that is critical and embedded. Unit-testing support largely removes that fear. While there is still a degree of risk when refactoring code, the resulting gains from cleaning up the crufty parts of the code, and the subsequent more robust code that can then be extended even more, more then offsets that risk and the minimal cost of writing the testing code.

(After all, you will write testing code, and you will need to debug your code. Do you keep your testing code around in a systemized fasion, and use them to direct your debugging, or do you toss quick, unprincipled little test cases at your code, lean on debugging to "fix" the problems, and assume at the end on quite incomplete evidence that the code is "correct" without the ability to run those tests again?)

If you are a developer, this was my attempt to get you Test Infected. Please do not resist. Join us. Do not be afraid. It is so much happier over here...

Strong Code

I can't guarentee the code is bug free, even now. I can't guarentee that it won't need some even more fundamental extension later (though I'm pretty confident now the basic structure is correct). But I am confident that it handles a lot, handles it correctly, and is about as easy to use as you could ask for.

It isn't necessarily perfect code, but it's very strong code, and I'm confident it's strong enough to build Iron Lute on top of. Where it is not up to the task, I'm confident it can be improved on.

The next posting will explore the extensions of this data structure that will make Iron Lute the richest extensible outlining program around.

In other news, I continue to work on the serialization of the outlines. Work has been stressful so I've been doing more relaxing at home then coding. It's looking better so I hope to get more coding in.

posted at 10:28:00, 2004-03-02

# Storing My Outlines

One significant advantage a more conventional outline has over the outline structure I've built up here is that it is much easier to store the traditional outline in a file. Using XML, traditional outlines are almost trivial to store:

<node text="A">
  <node text="B" />
  <node text="C" attribute="D" />
  </node>

Even if you don't read XML, you can quickly learn to read this format. About the only difference between what I just wrote and the official OPML specification is that OPML adds a <head> section that includes some metadata about the outline, and uses "outline" instead of "node".

Right now I'm working on how I'm storing my outlines "natively" in Iron Lute. First of all, using that simple style is not an option, because it doesn't allow for anything but a strict tree. There are two basic other choices that I can see. Both require the addition of node ids, since both will require the ability to refer to previous nodes (and possibly nodes that come later in the file):

<node text="A" id="1">
  <node text="B" id="2" />
  <node text="C" attribute="D" id="CWQ!" />
  </node>

IDs

On the subject of IDs, for the IDs, I face two choices: Do I want them to be globally unique, or just unique within a given file?

Globally unique IDs have some advantages in that they make excellent link targets and having a GUID can preserve the identity of a node, even if you move it from one document to another. But I faced an issue with ensuring that the GUIDs would be truly globally unique, and I couldn't come up with a solution I found satisfactory. (I'm not overly concerned about speed, but all the attempts I tried really slowed Iron Lute down, even with the trivial outline sizes I'm currently dealing in.)

For the purposes I need, it suffices that IDs are unique within a file/stream, and that's much easier to guarentee. I only use these ids in the context of the file itself, so they don't even have to stay constant. Other 'anchors' will be created if we want to link into an outline later.

The Two Basic Choices

Now, let us say that I am representing an outline that looks like A->B->A. The two basic choices I see for representing that outline look like this:

<node text="A" id="1">
  <node text="B" id="2">
    <noderef targetid="1" />
    </node>
  </node>

<node text="A" id="1" root="true" />
<node text="B" id="2" />
<link from="1" to="2" />
<link from="2" to="1" />

The first one preserves the current natural representation of outlines, and uses some sort of placeholder element for nodes that already exist in the file and are re-appearing later. The second one gives up the natural hierarchy in XML and just lists the nodes, the links, and how they correspond; re-assembling them is still fairly easy, though slightly less intuitive.

The first example has the benefit of being more human readable... but since I anticipate supporting a lot of formats natively in Iron Lute, and already natively support OPML, for instance, I'm actually not so concerned about "human readability" for the "native" format of Iron Lute. Iron Lute's native format needs to be powerful enough to handle everything anybody can throw at it, and human readability just isn't in the cards for that.

The second format also has an intriguing possibility to it. Right now, I'm assuming that the order matters, so when the first <link> element comes along, the code that is re-assembling the outline(/graph) structure just "knows" that that is the first child of the node with id="1". If we were willing to specify the order explicitly, we get an interesting property:

<node text="A" id="1" root="true" />
<node text="B" id="2" />
<link from="1" to="2" child="0" />
<link from="2" to="1" child="0" />

What's interesting about this list is that the elements can be arbitrarily re-ordered, and the final assembled product (assuming a correctly written assembler) is the same. The above is exactly the same as

<link from="1" to="2" child="0" />
<node text="B" id="2" />
<link from="2" to="1" child="0" />
<node text="A" id="1" root="true" />

and is exactly the same as all the other 22 orderings of those four things.

(Actually, in the final Iron Lute file format, the nodes will carry a list of link ids in them, rather then the link specifying what child it is. The advantage is that once you have a node, you can know how many children it has, and by extension, whether it has children at all, but sticking it in the link has only the advantage of looking nicer. Showing the links in the node requires something like <links><link id="4"/><link id="5"/></links>. Since this is a discussion, not code, that's why I show the child indicators in the link right now.)

This intrinsically only has the "advantage" of making the re-assembly process a bit more complicated, since in addition to understanding backward references (references to already-existing nodes), you must now understand forward references (references to nodes or links that haven't arrived yet), but that is a relatively minor point in general, especially as this is code that will not be modified by many people.

What is really interesting is the secondary effects that you can get from playing with the order an outline is saved in. Let us imagine some relatively complicated outline with a few hundred nodes, but the last child of the root node has no children. Now, imagine what happens as the outliner reads in that outline, specified in format #1 above. The last child of the root node is the last node to be seen.

What the implies is that the outliner can never be sure that it has all the children of even the root node until the entire file is loaded. Therefore, the program really can't display the outline until the entire outline has been loaded. In theoretical terms, this is because the outline is stored by using a depth-first traversal. Re-ordering the file is not an option since the order in the file directly reflects the order in the outline.

But what if we're not tied to that order? What if I use the second style of file, and choose the order the nodes are dumped out to use breadth-first traversal? Then, we can know whether or not we've seen all the children of a given node. We could then go ahead and display the outline to the user, and allow it to keep loading in the background.

Breadth-first loading here has the advantage that unless the user immediately dives deeply into the outline, or immediately issues an "Expand All" command, we can most likely allow the user to immediately begin using the outline, even if only a fraction of it is loaded. (Iron Lute already weakens "Expand All" anyhow, because of the possibility of looping paths in the outline.)

This appeals to me because the most important thing about a program is not its raw speed, but its responsiveness to human demands. Even if I take ten times longer to read in an outline then some other program (which is not impossible, since they may be written in C and Python, whatever other virtues it has, is not the fastest language), if I display the outline and allow the user to use it ten times faster, my program looks and feels faster, and will make the user happier. This isn't an issue for most normal-sized outlines stored on disk, but if we want to pull outlines from a remote source over a slow link, this can become a huge advantage.

In the ultimate case, I visualize someday having full Co-Outlining implemented in Iron Lute, with Iron Lute connected to another instance of Iron Lute. Even at dial-up speeds, connecting to another Iron Lute outline would allow you to start manipulating the outline within seconds of the request to connect, even if the outline you've connected to is many megabytes in size (as long as it doesn't have too many top-level nodes, which if you want dial-up users to use your outline you will avoid). If you do open something that isn't loaded yet, your Iron Lute could send a request to the remote Iron Lute to start loading those nodes first, minimizing the time it takes to manipulate those nodes.

Another interesting but subtle property of this file format is that it turns the file format into a series of commands. In the above example, it is better to think of the <link> and <node> elements as commands to "add a link" or "add a node", rather then trying to puzzle them out as some kind of structure. Once you start thinking of it that way, you can later in the file add a new node with the same id, and interpret that as "replace the old node with this one", or you can add simple delete commands to remove old nodes or links.

Why is this interesting? For one thing, I intend to use this as Iron Lute's emergency backup technique; Iron Lute can always append to the file rather then re-write it from scratch, so you won't get that massive pause every time the program does an automatic backup, like I've seen in Microsoft Word. The backup time is proportional to the changes you made since the last save, not the size of the document itself. (Every once in a while, Iron Lute might save it from scratch to avoid filling the disk, but a human would not be able to generate too many events so in the common case it would be a long time before the user used a significant proportion of the disk.)

Another thing I'm not sure about yet, but... I'm thinking that the co-outlining will be implementable as exactly the same stream as the file storage system uses; as long as Iron Lute understands "stream still in progress" and has some system for generating stream updates, it should work. A central participant in the co-outlining will be responsible for collating the incoming events, and for resolving resulting conflicts (two people modifying the same node at roughly the same time, one person modifying a node the other has deleted, etc.), and I think that's nearly all that will be necessary for a first-cut, but functional, outline sharing. This is one design goal I haven't talked about but it is affecting my decisions; I don't want to lock myself out of this possibility inadvertantly.

My current plans for Iron Lute at the moment are that I want to get this file saving working correctly, then I want to swing back to the GUI and make it usable as at least a simple outliner. I would then have a project that could theoretically be released at any time, and the decision about when to release it would be driven by whether or not it did anything useful yet, and how much I would mind having others build on top of it yet. It is my guestimate that completing the file saving/loading and GUI work I want to do will be at least three months, so we'll see where to go from there.

posted at 20:33:52, 2004-02-25

# I just had pretty much the worst week ever at work, which accounts for my not posting anything this week. How bad? I just finally noticed today that I didn't post my Monday Iron Lute post.

I can only hope this week is better. Unfortunately, it doesn't show any particular signs of improving, so posts may be slowed up this week too.

This week instead of continuing on with my outline series of posts, I think I may jump ahead and discuss the file format for my outline model, which is what I'm working on right now. Designing a file format to hold the data hasn't been too hard, but it's been a bit of a brain bender trying to figure out the best way to implement it. There are a few interesting things you can do with an outline structure you can't do with a conventional flat file, and I want to make sure that those things are possible. What day this will get posted, I don't know, since I still have to write it.

posted at 13:16:32, 2004-02-22

# What is an Outline, Part 4

In my last post, I dissociated the concept of "outlineness" from a graph, and showed at least the skeleton of a data structure that allows the power of graphs while preserving the nature of an outline.

In this post, we will fix a flaw in the model built up to that point, which is that there is no way to obtain a list of parents, given a node, only a list of children. For various reasons, this is necessary to building an outliner, so this flaw must be fixed.

Terminology note: A "link" connects one node to another. In a graph, it's a "edge", but in my mind, it behaves more like a link, so that's the name it gets. Right now, we can only follow links from parents to children; we want to be able to follow them back, too.

Again the consequences are subtle, so I'll spread the Pros and Cons of various decisions out with the discussion. First:

Issue: How we will enable the ability to follow parent links of nodes while sacrificing as little as possible?

Once again, we will start out with...

The Obvious Solution: Record Incoming Links In Nodes

The obvious solution is to simply add a list of nodes that have incoming links to a given node, and update it as it changes.

One obvious Pro is that it is simple. And for once, the obvious simple solution has the virtue of not being completely wrong. However, it does have two Cons that are worth thinking about to see if we can do something about them:

Con: Suppose you have a node that has the same node twice as a child. An A node with two children, both the same B. What does in the incoming list look like? Well, if you have A's child list look like this: [B, B], and B's incoming link list look like this: [A, A], you have a problem: You can't disambiguate which link is which, since both are A->B. This causes some uncomfortable problems in code which tends to assume that they are distinct, such as link removal code. (For instance, if I have a child list of [B, C, B], and I am instructed to remove the link to B, which one is it? It matters.)
One viable solution is simply to ban this, since it's hard to think of a solution where this could be a good thing. It is often the case that these sort of edge cases are best handled by banning them. But let's consider the whole problem before committing to a solution...
Con: Done naively, this now means that whenever somebody's child list is manipulated, they must also manipulate the parent list for the appropriate node. Since this is going to be the same sort of thing every time we do it, it'll be best to centralize it.

Technical Python note: There is a third problem, which is that strong bi-directional links weaken the garbage collector; it can handle it but it's not as good as ref-count garbage collection. We'd prefer to do away with the strong child->parent references, but forcing everyone to maintain the weak links themselves is even more complexity (and potential errors by programmers who really shouldn't need to understand weak links to correctly use this data structure, this should be pushed under the hood).

We need to abstract this handling into some kind of easy functionality, preferably transparently integrated as appropriate (so ideally, nobody has to deal with this other then me). We could just create some functions, but we actually do benefit from creating a Link class, because it turns out to be really nice to refer to instances of the Link class ("links").

"Links" As A First-Order Concept

A "Link" can be though of as a staple, where the nodes are little pieces of paper. Each link can be attached to two nodes, a "source" and a "destination", where the source is the parent and the destination is the child. (It is not technically correct to refer to a Link's "parent" or "child", because a link does not have either of those things. It has a source and a destination, so that's the terms I use. A link defines a parent-child relationship, it does not have one itself.)

We can actually instantiate a link without knowing either the link or the destination right away. You can envision that as a staple that hasn't been used yet, so both ends are unbent. As we connect it to nodes, we wrap the ends around the nodes. Of course, since this is a software staple, we can also "unbend" the links and reconnect them to different nodes without penalty. In my implementation, you can accomplish this with code like the following. If A and B are nodes, that we're going to connect like A->B, you can do the following:

l = Link()
l.source = A
l.dest = B

(As a shortcut, you could also do l = Link(A, B), but I wanted to emphasize how you could manipulate the ends of the link indepedently.)

As a practical matter, the link only actually connects when you give it both a valid source and destination, so technically, in the code example, nothing happens to either A or B until the last line executes. It will also disconnect from both nodes immediately if you invalidate one of the ends. (In Python terms, you rebind either source or dest to None, although source and dest are actually properties so that's not quite right terminology either... you can del them too if you prefer.)

We bundle into the Link class the intelligence necessary to update the child and parent lists of the node appropriately. Instead of links to the nodes, we actually store the relevant link objects. Of course, we still present easy access to the programmer by following the link as appropriate, so if they write code with A[0], they still get B without having to manually write A.data.outgoing[0].dest. (data is the data source for A; remember that? outgoing is the list of outgoing links. [0] gets the actual link, and finally, dest returns the actual node the link points to.)

This is a minor OO violation, in that there are data structures in the Node class (the incoming list of links) and the DataSource class (the children list) that the Link class is responsible for handling. Factoring this out isn't worth the pain right now, although it will be later; in the current-as-of-this-writing version of this datastructure, there actually is a special IncomingLink data structure that mediates between the link and the node for the incoming links.

While I have no use for it at the moment, we can actually now load other data onto the link as appropriate. Generally, in documents, this is not useful, but there are some data structures that this may be useful with. We're not paying anything for this, or letting it affect the design, so it's not a YAGNI violation.

I think there may be later uses for this link as well, but as I've not thought through this part of the design yet I can't say much about it. Luckily, it's a cleanly-seperated part of the data model, so I can replace how links work later and it will have minimal (ideally, no) impact on the rest of Iron Lute.

(For instance, I visualize a web template system that tracks links made to other internal documents and thus tracks which parts of which documents are dependent on others, re-rendering things only when necessary. If we replaced all Links in such documents with Links that also reported to the central repository when they link to external documents, then we could easily implement this repository with very little code, and the Iron Lute GUI would need no changes at all.)

Interconnecting Outlines

Up to this point, I have gone quite far in resolving the "What Is An Outline?" question in the conventional case. It's an outline tree structure overlaid on top of a graph structure. In a conventional outliner, the underlying data structure is never allowed to be anything but a tree, which makes it even easier. If we only opened one document at a time and never allowed them to interlink directly, we'd effectively be done now.

But earlier, I mentioned that one of fundamental design goals is to have multiple documents truly interacting in the outliner. When I'm in one document and I choose "Save", I need to save only that document, not all the nodes I have in memory.

Terminology and Assumptions

For our representational convenience, let us define a node T that is a "transcluder"; it has a data source that loads an outline from the web. The document it loads will use numbers for the nodes, as in: 1->2->3, which allows us to easily distinguish those other nodes. The number document may also contain a #T, which transcludes back to the original document.

So imagine that we have two documents, A->B->T and 1->2->#T. When we load the first, we only display A. When we expand B and then T, we then load the other document. Finally, when we expand #T, we get the original document transcluded, so the display now shows A->B->T->1->2->#T->A. This has a cycle if we have both a T and a #T node.

For the moment, let's suppose we have write access to both the documents and that Iron Lute knows how to save to even the remote document, so we have total freedom to manipulate both documents. (How this is accomplished cleanly internally is the subject of a later posting.)

What is an Outline Document?

It's time to ask ourselves what it means to have an "outline document", with the emphasis placed on the word "document". Why do we care about what constitutes a "document"? Well, abstractly, we're just dealing with a seething mass of nodes and links, existing with no particular relation to each other beyond their links. This is the traditional conception of a 'graph'. However, this program exists in the real world, where users like to do things like hit the "Save" button and close the program, expecting that their data has been saved.

In a strict tree outline, it's easy to declare the root node of the tree as corresponding to the "document". Saving a representation of that particular node, along with all of its children, will result in the entire document being saved; starting from any other node is guarenteed to miss something.

For our purposes, the first attribute is worth saving, while the second attribute is to be discarded. There must be a path to the root node, but it may not be unique and it may require careful choice of which parent links are followed. We can specify that from a "root node" of a document, all children that we consider to "belong to" that document should be reachable by following the children. In a full graph, therefore, there may be multiple nodes that are a candidate for being the "document".

We're going to use what document a node is in all over the place in Iron Lute. For instance, if the node is part of an HTML document, we will present the user with HTML-based commands for manipulating the text, whereas if we're in a LaTeX document, we'll present LaTeX-based commands. Given a node, we need to be able to determine a unique document that that node belongs to for these purposes. Also, for cases like OPML transclusion, we need to be able to distinguish between a "link node" and its "transcluded document"; since the link node will belong to the original document, but the children will belong to the newly transcluded document, we know not to insert the transcluded nodes into a saved representation of the original document.

At the moment, the data we have to work with is:

There exists some nodes that are defined to be the Document nodes. For instance, if we load in an OPML file, the "root node" of the OPML is defined as the Document for that OPML file.
Each node has a list of incoming links.
Each node has a list of outgoing links.

And let me give a hypothetical that seems to largely capture the spirit of the problems we need to address:

Suppose you have three documents, A->B->C, D->E->F, and G->H->I. The first is completely out of your control and is dynamically changing. You load them all into your outliner, and you copy C and paste it as a link under both F and I. C now has three incoming links, one from each document. Now, suppose the remote user deletes C from the original document. What should happen?

You have three basic choices:

Delete the node from all documents. (Or "invalidate the link", which has much the same effect for any reasonable definition of "invalidate" I can think of.)
Leave the C as one node for both remaining documents (i.e., a change to one continues to be a change to the other), and pick a new "home document" for the node.
Allow the original node to be be destroyed, and put two need copies of C into the two remaining documents.

It is not obvious what to do from here; there are pros and cons to each. However, one thing is clear: In the general case, only the first one is practical. To consider an analogous case, consider linking to a remote web page on the web. If a link is broken, it is broken, and you can do nothing to fix the brokeness on the target end, since you only control the source end.

If you control all the documents, you may be able to do something more clever. This says to me that there may be room to change which behavior is used depending on circumstances, but that we must be prepared to implement and handle the first case if necessary. Sometimes we may not even be aware the link is broken until transclusion time. Therefore, that is what we will start with, adding other behaviors later as they are called for in the UI.

This is sad, in that we'd like to maintain integrity of the node structure, but if we learn anything from the World Wide Web and Xanadu, the gyrations necessary to try to do so are not worth it in the general case.

With this discussion in mind, let's get more detailed about what it means to determine what document a node belongs to.

What Document Is A Node In?

Suppose I give you a node, which as described above has some incoming links, some outgoing links, and some data. How can you tell me what document this belongs to?

I'd like to try to do this without extra information, to save memory and to prevent situations where the extra information gets out of sync with the rest of the structure of the outline.

Can you do it?

Almost. The reason why not is so subtle it took me a couple of weeks to notice it.

"Primary" Links?

An equivalent definition of a directed tree (given that the graph is fully connected) is that each node has precisely one parent, except one root node that has 0. Any (fully-connected) graph that meets this constraint is a tree; the proof is trivial enough to be assigned as a very simple homework problem in a graph theory course. (It is sometimes even used as the definition; there are several different definitions of "tree" that look different on paper but are equivalent mathematically.)

We have an ordered list of incoming links; we can give the first one special prominence and call it the "primary link". It is also easy to show that if you enforce a couple of simple restrictions, that you can always reach a Document node by following the primary parent. Thus, we can define a node's document as "The document I find if I follow the primary links until I reach the first document."

Conceptually, we are projecting a tree on top of our graph. We end up with a tree, as defined by the first incoming link for a given node, that is extendable into a graph by adding more primary links.

This answers the questions posed about about what happens if a node is deleted from a document when there's another document that has a link to it; the "primary link" is removed, and the next link in line automatically becomes the next "primary link" with no fuss.

This is very simple, and you can actually get quite far with it. It is conceptually well-defined and you don't seem to hit any problems with ambiguities; the moving of a node from one document into another can be handled with a bit of work, mostly just notifying the new parent that it has been changed.

Unfortunately, as appealingly simple (in execution, if not in concept if you're not used to thinking in graphs) as this is, it does not work.

Conceptual Impurity

There are two equally valid ways of expressing why this doesn't work, one couched in theoretical terms and one in purely practical terms.

In theoretical terms, this violates the Conceptual Integrity of the design. We now have two distinct "outline" overlays on top of our graph structure, the "Link Handles" and the "Primary Link". Each of these overlays behaves very differently, and have different purposes in life. This is a bad thing.

Practically, this results in a lot of problems that stem from the fact that we are highly dependent on both the order and the existance of the incoming links. This also reveals a theoretical flaw, our dependence on knowing all of the incoming links, even though we can't necessarily know them.

When we remove a primary link, we need to know what the next link is in order to know what will be promoted. In order to maintain behavior between saves, we need to save all of the incoming links, in order; if you miss even one link, that might have been the one that should have promoted. Since we can't even assume we know what all of them are, this is impractical.

A lot of practical problems arise as a result. We can't save documents. We can't load documents. We need the entirity of the Outline Web accessible to determine what will happen if we delete a primary link. We need access to the foreign documents to move copies of the node into them if they are the next "primary link".

Basically, the "primary link" abstraction just works as long as the outline nodes are all in memory, and they can all be saved and loaded like a memory snapshot, not like files. This is an unacceptable limitation.

The solution is for nodes to actually keep track of what document they belong to, and for the nodes to keep track of what incoming links are coming in on a document-by-document basis. (Actually, it suffices to track "from my document" vs. "not from my document", but it's convenient and cheap to partition them based on document; that way, if the node changes documents (cut & paste), we don't need to recompute anything.)

Interestingly, just as much as we need to keep track of the order of the outgoing links, we do not want to give order to the incoming links.

The End - For Now

We've now built an outline data structure, that with one small penalty (remote links may be broken, which is something we can't do much about anyhow), allows the flexibility of graphs with the ease-of-programming of outlines. On top of this structure, we can build an outliner that will be robust and flexible.

Next, I will explore some misc. practical issues brought up by this discussion.

posted at 13:08:00, 2004-02-10

# Leo and Iron Lute

Rand Anderson blogs about Iron Lute and his preferred outliner Leo:

Seems he's aware of Leo (my outliner of choice) but wants an outliner that allows the use of the outline structure for all content (as opposed to having a body for free text associated with each tree node, as Leo does).
I'm pretty committed to Leo and it's author's vision for Leo going forward, but Iron Lute is worth keeping an eye on.

First, thanks for the nice comment ("worth keeping an eye on").

Since someone's interested, I thought I'd talk more about why I didn't start by extending Leo.

(Protocol note: When starting an Open Source project, the first responsibility of the starter is to determine why existing projects aren't worth building off of, so I believe it is necessary for me to justify why I don't start with existing projects.)

Basically, my "problem" with Leo (as in, "the reason I didn't just build on that instead of writing my own stuff", not an actual "problem") is Leo is really an application of an outliner, not an outliner itself. Leo is using an outliner to track the little fiddly bits that constitute the pieces of whatever it is building (program, website, etc), but you can't actually use the outliner to write text in the "outliner" itself.

By "outliner" here I specifically mean just the part of Leo in the upper left window, not the program as a whole; you might think of it as the outline widget only. I think the reason for this is the aforementioned difficulty of writing an outline widget with open source widget sets, and Leo's author is more interested in Leo qua Leo then the outliner itself (which I respect completely), so Leo's outliner is highly focused, gets the job done with relatively minimal attention to outlining (as compared to the gyrations I'm undertaking, not saying that it was trivial to write per se), and not usable in the general case. For instance, try to write an essay in the upper-left window alone; it's not convenient.

In programmer parlance, Iron Lute and Leo are almost completely orthogonal. Despite the surface similarity they are actually quite different projects that share little in common.

Forgive my splitting hairs like this ("outliner" vs. "using an outliner") but it's an important ability for a programmer.

As I further develop the outline model you'll see why it's a very different beast then Leo. In the far future (year+), I hope someday that Leo can use Iron Lute as a display library for their outlines, as I mentioned on that page. Then Leo can get for free/cheap all the other stuff I'm talking about, and other exciting things I will talk about later that I'm sure Leo would just love to have in place.

It should be very easy to put them together; both are Python/Tk. Leo's use of Tk didn't really enter into my decision to use Tk, but I consider it a happy side-effect. Ideally, you need not lose your "commitment" to Leo to continue to be interested in Iron Lute. It is an explicit design goal to make anything doable with outlines possible with Iron Lute, so I will have no objection to Leo specializing Iron Lute in whatever manner is necessary to maintain Leo's conceptual integrity; I'm fairly confident that the structure I'm building will be flexible enough to accomodate what I've seen of Leo with no compromises on Leo's side.

I haven't contacted Leo's author about any of this because let's face it, right now Iron Lute is vaporware to everyone but me. When I have an actual release it might be worth discussing. In the meantime it's not worth bothering him with unless he finds it himself.

Disclaimer: This may sound presumptuous, as if I'm "planning" something for somebody elses project. First, I'd like to disclaim that, this is just something I see that could be beneficial for both parties. Secondly, integration with Leo would constitute a "significant contribution" to Iron Lute that would allow it to get to Open Source status under the license rules I posted earlier, and as both would then be Open Source projects under compatible licenses, anybody could do it without "permission".

posted at 17:15:28, 2004-02-04

# Iron Lute License

I want to be able to discuss Iron Lute's planned license in a posting in response to somebody's blog here, but first I need to discuss it. So here are the licensing plans I have for Iron Lute. I'm very interested in feedback on this.

I'd would like Iron Lute to be open source, likely GPL with explicit exclusion of plug-ins (i.e., you are explicitly allowed to create new Node Types, etc., and release them under any license you like). (This is not quite the LGPL because you still can not take Iron Lute and build another application around it; limiting as I will means you can "connect" Iron Lute to your application but that you can not just pick up Iron Lute and make it be your application.) But that depends on the usage and support pattern it receives.

If nobody uses it and nobody helps me with it, then eventually, I'll stop supporting it. (No surprise there.) I'll keep it online and quite possibly continue using it in my own personal projects, but unless somebody else shows an interest, no public sign of development will continue.

(Truthfully, call me arrogant but I do not consider this likely. I think there's a definate niche in the Open Source world for what I am creating here, partially because of the lack of an outlining widget that I mentioned in my original post.)

If I get significant contributions from other developers and other people (including documentation, help, community helping each other out a lot, etc.), I'll make it the aforementioned limited GPL. The threshold for this is "if I had to remove all contributions other then mine to date, how painful would that be?" If it can be done swiftly and painlessly, then I may go to the third option.

The third option is "a lot of people use it and like it, but nobody is helping with the development". In that case I'll do an analysis, and either stop development, or try to take it commercial, depending on feedback.

I'm thinking of a cut-off date of a year after the initial "stable" release, which would be some months after the first public release, of course, so we're talking about at least a year and a half window, practically.

I'd like to say up front that I want this to be open source. I want to experience new features of Iron Lute that I didn't have to write myself. However, I do not want to tie my hands by committing to a license "come hell or high water"; the condition above is not intended as a "loophole I can jump through" so much as a way of pointing out that two pages of docs and a node type do not constitute "significant contributions" in the sense I am mentioning. Believe me, I will err on the side of "open source" if anybody gives me the chance.

So, what license this is released under is at least partially up to you.

I will make this commitment though: I will not pull the rug out from underneath you. Should I end up going commercial, the last "free" version will be clearly marked, never crippled, and will never expire. You just won't get updates and improvements.

posted at 17:09:04, 2004-02-04

October 2006
Mo	Tu	We	Th	Fr	Sa	Su
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31
May 2005					Nov 2006