Reverse Compatibility as a Pernicious Software Trap

I'm a little late to this party; only today and to a lesser degree yesterday is my typing speed high enough that I feel I can "afford" a weblog post... and even this otherwise pointless paragraph :-) (My previous post on switching to Dvorak was justified as a typing exercise and easily took over two hours to type. This one did too in rough draft but you see how much better I'm doing :-) )

Joel on Software recently posted How Microsoft Lost the API War. If you haven't already read Joel's piece, odds are you won't care about the rest of this post either which will be of a technical nature. I assume you have read Joel's piece.

I posted a comment on Slashdot in reply to the Slashdot story, which centers on the observation that at its root, the problem is that backwards compatibility is not free. The costs grow exponentially. Eventually, "exponentially" defeats any real-world entity.

This garnered several replies which mostly focused on trying to shift the costs onto somebody else without addressing the problem of the total effort continuing to grow exponentially. Several people claimed emulators solve the problem, without taking into the account the cost of the emulators, or of certifying the functionality of the software in what is, technically, an entirely new environment, or of the new problem of maintaining the emulators! Sure, the problem of exponentially growing costs is easy to solve if you start with "Assume infinite supplies of free labor...", but such a solution lacks a certain practicality.

How am I sure this is exponential? Imagine a software release every year starting with 1980. In 1980, you have a blank slate and your reverse-compatibility costs are zero. In 1981, you must support 1980 as well, but with only one other year, the costs are minimal and they are certainly justified. 1982 is a little hairier, but still doable. 1983 and it's really starting to annoy the project's lead developer but nobody is too worried.

An inexperienced developer, which I strongly suspect is exactly whom I heard from on Slashdot, has only this much experience and thinks this can go on forever. Part of the insidious nature of this trap is the exponential nature of the problem hasn't even surfaced yet! So far, the effort is only geometric: In 1980, you supported and developed only the 1980 version. In 1981, two versions, 1982, three. Obviously eventually a single person would jam on just supporting the old versions, but a corporate superhuman can grow their capabilities to meet this challenge.

In 1984, let's see, you buy into the neural networkcrap hype and recast the core of your product with neural networks for some marketing reason. Now supporting the past is becoming a real problem. Under the new architecture some of the old algorithms are too slow, so you add in some speed hacks. These speed hacks are of course one of the hardest parts of the system to build and debug, and will be a real albatross in coming years.

In 1985, you push the system beyond any sane idea of what neural nets can do. Your customers rebel because your product is, despite the hype, so artificially stupid and slow that your customers users actually start to prefer working out their problems on paper... or more realistically, with old, already-paid-for versions of your product. You fire the lead architect and put a new guy in.

In 1986, your new lead architect gets the object orientation religion, and you switch languages from, oh, PASCAL to C++. Oops. You learn two things: One, that may payoff eventually but your company may not survive to see those payoffs. (And note, Mozilla may be strong, but indeed Netscape never saw the payoff.) And two, your second and third biggest customers have business-critical systems that depend on the neural nets, so you still need to support them or lose 40% of your business.

Your 1987 release is mostly unremarkable... except that in the process of re-architecting the system, the previously-mentioned performance hack for the 1983 code, which of course you have no tests for (TDD being a ways off it the future of course, since it really hasn't arrived even now) had a critical, data-destroying pointer error in it, which your largest and oldest customer discovered first hand during a demo to their largest customer. They are now your largest and newest ex-customer, and you thank your lucky stars that you didn't get sued.

And now we begin to see the problem. All that old code interacts; the reverse compatibility fixes start blocking each other, and the fixes start interacting with the fixes. Not only that, but the need to ensure reverse compatibility critically and negatively impacts your architecture decisions in the first place, which is also a weakness. That's exponential problem growth; any "fix" can interact with any other fix, in arbitrary combinations of any size.

You can't even imagine the 2004 monstrosity that gets released, supporting PASCAL, neural nets, a bad C++ design, a better one, a later complete re-architecting of the C++ for network functionality, another one to support a custom scripting language, and two iterations of a Java design. Except you don't need to imagine it, because odds are the very machine you are using to read this runs Windows, and its hundreds of times worse than this little, trivial example!

Why did I use the word "insidious" in my title? Because while the problem is firmly exponential in nature, the exponent is small. And of course there are benefits to reverse compatibility, which in the early days easily overwhelm the compatibility costs. So you choose the compatibility route.

And now you are trapped. Every year you produce a reverse compatible product, you are training your customers that they will always be compatible. Every year, the stresses build... and eventually, the code will win. Joel's article, in a nutshell, is that Microsoft has until now built their strategy on compatibility, and they are finally abandoning it. Joel doesn't say whether it has been forced on them or not, and I don't know either. All I do know is that they had to, sooner or later, probably sooner.

Now, the really interesting thing to my eye is that we have an alternative, out there in the real world, which has been running for as long or longer. The Microsoft/IBM world has banked on 100% reverse binary compatibility, where "binary" is the hardest kind of all. The UNIX world, by contrast, is too diverse to say it ever had one strategy, but a significant subset of it gave up reverse compatibility and ported things periodically, which is to say, it keeps the source around and updated it as needed. Unused capabilities would fall by the wayside, which of course sometimes causes pain. Programs frequently leapt from one processor architecture to a totally unrelated one.

Each transition hurts more than the equivalent Microsoft transition, which is never quite zero but its still relatively low. But the pain is limited. Your vendor fixes the code and deploys it, it may take a couple of iterations to hammer out whether they are going to support some obscure old feature or if you're going to code around it or live without it, but you take the pain over time.

Interestingly, not only do you never suffer the One Big Jolt that rocks your entire world, you suffer less pain total. The Microsoft approach hits you with exponential pain in one huge blow. The UNIX approach hits you only with linear pain total. In the long term, this is preferable.

The optimal solution to this problem, viewed globally as the sum of the costs to the vendor and the customer (and note not all costs can be passed to the customer; opportunity costs, for instance), is something like the following: Provide deprecation warnings, ensure the existence of a clean upgrade path from the previous version, discover through feedback what capabilities are still in use and target them in the next revision. Eventually, you must cut loose people who won't upgrade (and they must be happy with the old software anyhow).

This is expressed today in its purest form in the Open Source world, of course, and it ends up working fairly well because it has a tight feedback loop. It will naturally deprecate old code unless a user cares enough to maintain it, which is a good place to put that responsibility. It is almost inconceivable that Open Source could work any other way.

I don't think the association between mega-commercial closed source in the IBM world and backwards compatibility is a coincidence. I think that they naturally fall into this insidious trap because there is no commercial alternative: If Editor A1 and Editor B1 are splitting the market 50-50, and A2 is reverse compatible with A1 while B2 isn't compatible with B1, A2 is going to win. Open Source avoids this trap and ironically can achieve superior value commercially in the long term as a result.

I think the "long term" is now here. This is a golden opportunity for open source to exploit its natural long-term viability.

Is there anything Microsoft can do? Yes. They need to abandon some reverse compatibility here at the API level. But they don't need to abandon all of it, and certainly not twice in less than four years! They need to provide bridges for the recent stuff, and convey to people they still need to meet them halfway.

(Even so, is that enough? Probably not, because while that at least partially addresses Joel's complaint about cost, it still doesn't address Joel's complaint that the conversion would gain him nothing over the status quo. That is an even more fundamental Microsoft problem that is technically independent of the point I am making here; reducing costs is still necessary but not sufficient.)

What lesson can we carry away from this? Unless we hold a program stagnant, it is absolutely impossible to retain complete reverse compatibility forever and ever, amen. It is better to take the pain in small, regular doses, and by acknowledging this need, make it possible to plan for and minimize the pain. There is no escape from pain, and by pretending there is you only make the Day of reckoning that much worse.