Postel's Law

2004-01-12

There’s just no nice way to say this: Anyone who can’t make a syndication feed that’s well-formed XML is an incompetent fool. - Timoth Bray on ongoing

I'd like to chip in on this with a couple of hard numbers. Depending on your library support, it is possible to write a basic OPML parser in around half-an-hour using an existing XML parser. I've done it twice now, once directly using a SAX-like XML parser and once with a home-rolled XML<->object library; it's easy. It's even a good way to learn how to parse XML because of OPML's essential simplicity.

On the other hand, there are OPML files in the wild that are illegal, such as this one; it contains an illegal ü in the file, which needs to be declared as an entity or escaped. (It may be a fluke, perhaps generated by hand-editing, because the other entities are handled correctly in that file.) I've spent around three hours trying to handle this in some reasonable fashion, and I've had to give up; yeah I found some ways to kludge around this, but there's no kludge that's safe enough for use in the real world without introducing other bugs into the parsing. I've become convinced that no such kludge exists, either, just from the nature of XML parsing.

The fundamental problem is that the kludges multiply, and for every "forgiving" kludge allowed into the system, the difficulty of implementation increases geometrically. It does not take long at all before a complete implementation is out of the reach of one person (a critical cutoff point for technology implementation, even in corporate contexts), and it's not much futher before the implementations are inherently buggy because no conceivable group of people can cope with all the issues at once.

One simple error in an OPML file has already cost me three hours, and I can't say I've "fixed" it; I just punted. OPML is so simple it can't harbor very many edge cases for this purpose; imagine a real spec with tens or hundreds of potential edge cases, any one of which might cost hours to find, "repair", and verify that the "repair" did not introduce more new errors, which bug "fixes" that are intended to "correct" validity problems almost invariably do by their very nature.

HTML is an example of something that has been on the verge of crossing this boundary; were it not for the push-back about writing "standards-compliant" HTML, it probably would have crossed over to the point where it would have become impossible to write HTML with complicated Javascript behaviors for anything but Internet Explorer. It is still not easy but at least it is possible.

Also, I'd note that in the above quote, by "Anyone" I expect that Timothy Bray mostly (though not entirely) means "programmers". I would expect that the number of people writing Atom feeds by hand will be roughly the same number of people writing RSS feeds by hand, which would be nearly zero. Unlike HTML, there's so little benefit to a "static RSS file" that nobody bothers with it, so almost all RSS is generated. So I don't see any justification for making room in the spec for people writing Atom feeds by hand; it ignores the nature of the technology.

Assuming that Timothy Bray meant what I think meant, I agree; outputting proper XML is pretty easy; if you can't meet that basic level of functionality in three or four tries you probably shouldn't be programming things that you expect other people to be able to use.

Finally, I find it ironic that someone who started a spec because RSS was alledged "underspecified" is now pushing for formalized underspecification. To put it bluntly, WTF?