Programming Wisdom

A BlogBook is a collection of blog posts intended to be weaved into a book.

This page is the BlogBook I called “Programming Wisdom”. It is incomplete and will probably remain so for a long time. I’m still grappling with what I want to write here even years later.

Nevertheless, here is what I wrote:

What is Programming Wisdom?

This is the first post of my new BlogBook, Programming Wisdom.

This initial post starts off by discussing exactly what I mean by "wisdom" in the context of programming. Non-programmers may still be interested as it is more about wisdom than programming.

Wisdom is now one of those Old English words that you're probably only likely to encounter in a church nowadays, like Atonement or Reconciliation. In the era of exponentially-exploding knowledge, wisdom has been just plain crowded out.

For a long time, my favorite definition of wisdom was: "Knowledge is the knowing of a fact; wisdom is understanding how to apply the knowledge." I think there's still some truth to that, but I've come to prefer this formulation:

Knowledge is when you know some fact about something.

Wisdom is when you have a sense of the whole system, and can take actions based on the whole of the system, not just a part.

I think here are two basic kinds of wisdom:

Received wisdom: Hard and fast rules generalizations, often passed down the generations and now phrased in absolutes, because after every person passing along the wisdom is done simplifying and distorting it, like in the classic Telephone game, what's left is incomprehensible and worthless. This is best demonstrated by the well-known joke/parable about the mother who cut of the end of her pot roast for reasons unknown; what started as a perfectly reasonable practice was brittle and useless "received wisdom" within two generations.
Learned wisdom: The wisdom you have put together yourself over the years, from your own experiences.

Every generation of recorded history has had to re-examine its received wisdom, and replace some of it with learned wisdom. This is both a personal process and a societal process, especially important in a democratic government where "prevailing wisdom" can have an impact on the actions of the society.

Learned wisdom can not be taught, because it is too complicated; the full totality of a system of understanding can not be expressed in words. The wisest solution for any given problem changes radically depending upon every aspect of the specific situation. Everything from the smallest technical consideration to the largest political consideration matters.

Received wisdom may be inferior to true wisdom, but it is also simpler, and there are times when that is enough. One of the better ways to conceptualize parenting is providing the child with a good enough "received wisdom" scaffolding so that they can survive long enough to develop their own true wisdom.

There are many things like this in life, including life as a whole. You can read all about football strategy all you want, but until you're out there on the field you can't understand the full depth of the situation, and once you are out there, you realize that the theory only scratches the surface. You can't become a master chess player by reading about chess.

It is easy to conclude from this that reading about chess or football is a waste of time, but that's untrue. What can be done for wisdom is to sensitize the student to the structure of the problems they will encounter, to offer them initial tools to approach the problem with, and (ideally) to encourage the student to eventually work out their own understanding of the topic, and fly free. You may obtain mastery without ever learning the theory, but done properly, you'll attain mastery faster if you bootstrap off of the theory. Most likely, you'll learn it better in the end too, and the theory can provide you with new words and concepts you can use to communicate with other people who share your understanding of the theory.

The purpose of this book is to so sensitize you, to speed wisdom and to a lesser degree, to offer you some initial scaffolding you can build around. I consider the latter a lesser goal because you can find many places online and in the book store that could help with the scaffolding; it is the other ideas I chose to write about here because I do not know of a single place to find them. Some received wisdom is inevitable, because without discussing real problems this already-abstract work would become too abstract to be useful. I also hope to help you avoid the common error of prematurely mistaking some particular bit of received wisdom or dogma as Absolute Truth; you must learn to make your own judgments.

Much of what I say here is said elsewhere of course, but I believe the consistency with which I apply my unifying principle (about which more later) may be unique. Some of the perspectives I have here are also at the very least unusual and possibly unique among the set of people who write about programming. I am directly targeting programmers first starting out on their way, perhaps a sophomore or junior in college, or someone who has been programming for a year or two. Those whom have been actively working on building their own wisdom will probably find somewhat less value here.

In fact, they will certainly find things they disagree with; I'd even go so far as to call it a necessary (but not sufficient) condition for being wise. One of the characteristics of the rich problem space that so many things in life present is that due to our different experiences and perspectives, no two of us ever come to the exact same understanding. But even though you will eventually end up disagreeing with me, you will hopefully develop faster for having been exposed to the right kind of wrong idea.

## Why Do We Care What's Special About Programming?

The first step to attaining wisdom is to understand why special "programming wisdom" is needed in the first place.

It's easy to get the idea that software is easy to create, because it is partially true. Computers get more powerful every year, and we trade in on that power to make programming easier. Every year results in more and better libraries. Changing software is very easy, and it's relatively easy to test compared to an equivalently-complex real-world object. J. Random User can write an Excel macro with a reasonable amount of effort that saves him a lot of time, and early programmers can become excited about what amazing things they can do just by assembling existing libraries and frameworks together which makes everything seem so easy.

This rosy picture is brought to you by confirmation bias, paying attention only to the uniquely easy characteristics and ignoring the things that make it uniquely challenging. Poke past the surface and you find a strange, complicated, chaotic beast. Learning to tame the power requires a lot of experience and wisdom.

### Programming is Uniquely Difficult

Engineers of other disciplines often take offense at the claim that software is uniquely difficult. They do have a point. As pointed out by Fred Brooks in hyper-classic The Mythical Man Month, one reason software is hard because software is so uniquely easy.

We fundamentally built on top of components that have reliability literally in the 99.999999999% range and beyond; a slow 2GHz CPU that "merely" failed once every trillion operations would still fail on average in eight hours at full load, which would be considered highly unreliable in a server room. Physical engineers would kill for this sort of reliability in their products. Or an equivalent to our ability to re-use libraries. Or how easily we can test most of our functionality with the ability to replicate the tests 100% accurately. Or any number of other very nice things we get in the software domain. Our job is far easier in some ways than any discipline concerned with the physical world, where nothing ever works 100%.

Every library, every new computer, every new programming paradigm, and every other such new thing is designed to make programming easier. Some significant fraction of these things actually do make programming easier, though it can be surprisingly difficult to figure out exactly which. And with every task made easier, we face a choice: We can do the task in less time and then be done, or we can do the task in less time, then take on some other task.

Almost without fail, we choose the latter. This is not unique to software by any means; humans have been making this choice for centuries across a wide variety of fields. What is unique is that this interacts with the unique reliability of software; we can, and therefore do, create huge stacks of software for various purposes. The amount of software involved in running even the simplest of static web sites is well beyond what one human could fully understand. (Full understanding here means the ability to describe the reason for absolutely every design decision, and the ability to then make informed decisions about changes in such a way that minimal adverse affects occur. By this standard, it's likely nobody even fully understands things like the Linux Kernel; even the more central people in kernel development have sometimes made incorrect decisions about core components.)

It's because it's easy to build and build on top of software that the "simple" task of web development requires understanding at least three languages (one a full-fledged programming language), and then at least two more languages (another full-fledged programming language and SQL for the database) on the server, and the server code itself may be even more complicated than that. It's because it's easy to build and build on top of software that this language count is going up, not down, in the future. It's because of this that Windows development tends to get more complicated over time, as more and more abstractions and layers are created.

If these layers could perfectly seal off the layers below them, this wouldn't be so bad, because what really matters is the set of knowledge you have to have in order to do useful work. If the abstractions were perfect, you'd only need to understand the top layer, and that is much simpler than having to understand the whole stack. Unfortunately, since all abstractions leak, the result is increasing complexity over time.

We make these trades for good reasons. I would not trade my software stack for a Commodore 64. I look forward to the next iteration as much as the next programmer. But modern software development is complicated beyond belief, beyond comprehension. Where once a Renaissance Man might know "all there is to know" about the whole of science, today you are an above-average developer if you can stay fully competent in more than one language of the many tens of languages that are viable for doing large projects... and that's just the mainstream general-purpose language count. Go beyond the mainstream or into specialized languages and the count goes into the high hundreds or low thousands.

Ironically, it is exactly the unique ease of software development that ends up making it uniquely complicated.

#### The Programming Construction Metaphor

I've gone on before about how distrustful of metaphors I am, and it seems like every year I'm getting more distrustful of them. Either deal with the thing as it is, or just give up understanding it. Metaphors lead to the beginning of understanding, but no farther.

Programmers aren't immune to the metaphor sickness, and if there's one metaphor you can expect to see trotted out at the earliest available opportunity, it's the "programming as construction" metaphor. This metaphor has been skillfully deconstructed many many times before, but I'm going to deconstruct it from the opposite angle... what if construction was like software engineering?

If we built buildings the way we wrote software, we wouldn't even call a contractor; we'd go down to our local hardware store and pick up a copy of Microsoft House. We'd poke the parameters into Microsoft House, push a button, and our house would be templated within seconds. Rather than mucking about with blueprints and plans, we'd walk through an actual physical house, and customize it in real time, because there's nothing they're going to ask for that Microsoft hasn't already heard and incorporated into Microsoft House. Building a house has been done.

Microsoft House costs $59.95 and is certified to comply with the building and housing code in every jurisdiction in North America. You could also use GnuHouse, which is Free and has a few more features but a bit less style, since nobody could afford to hire the best designers.

Anyone who has even built a shed in the back yard knows this is not how construction works. Construction is nothing like software engineering. The difficulty of doing what has been done before is nearly zero, and as a result we don't spend much time on that. If you have to draw an analogy with something, "engineering" in software is more like "research" in any other field; you can't know exactly how long something will take, even if you have a good idea about where you're going and how to get there, because at any moment something new and surprising may jump out at you and change everything.... and I'm not even considering the possibility of "changing requirements" when I say that.

### Software is Uniquely Complicated

In 2007, with a well-loaded Linux desktop installation, my /usr/bin is 257 megabytes, with debugging off and dynamically-linked libraries not contributing to that count. My particular copy of the Linux kernel 2.6.19 with certain Gentoo patches has 202,381,268 bytes of C code alone. If I'm computing this correctly, at a constant 100 words per minute (5 chars/word), that's 281 24-hour days just to re-type the C code in the kernel.

One of the projects I was able to work on during my schooling years was a relatively obscure Learning Content Management System with over a decade of history behind it. At the moment, that project contains roughly 3000 files in its CVS repository, nearly 300,000 lines of Perl code in just under 9 megabytes, and still going. One rule of thumb says multiplying by five converts a line count from Perl to something like Java, which would be 1.5 million lines of code. And this is just the project-specific code; it is layered on top of also-complex tools like the Perl implementation, the Apache webserver, the Linux kernel, and numerous other libraries and frameworks of all shapes and sizes. Some of these things, like the library used to support internationalization, are tiny. Others like the Linux kernel or the Apache webserver dwarf this single project.

No matter how you slice it, software has a lot of moving parts, but there's no obvious way to compare source code complexity to mechanical complexity. Trying to do a straight part-count comparison is probably therefore disingenuous, so we can't make a straight quantitative comparison. I'd assert that even relatively simple pieces of software have more parts in them than even relatively complex machines like modern automobiles (minus their software), but I have no way to prove this.

There is a qualitative distinction we can draw between the physical world and the world of software, though: the interaction of the parts of a program qualitatively differ from a real-world device. A real world device's connectivity between parts is limited by physical three-dimensional space; with rare exceptions, parts that are interacting must be physically next to each other. In software, any part can interact with any other part at any time. It's as if they are all right next to each other, a physically untenable situation, the equivalent of zero-dimensional space. (There are some exceptions to physical proximity, like process boundaries, but these are often crossed as well.) Software can also include as a critical component arbitrary pieces from any place in the world, thanks to network communications; the Internet as a machine is the size of the planet. The software stack to run a Google search on the client side is already complex (web browser, OS, graphics driver, text services, graphics renderers, and more), but add in the need for Google's server system to be functioning correctly with it's own mind-boggling complexity, and you start to see why it's a miracle software ever works at all.

It's tempting to dismiss this as hyperbole, but the effects routinely manifest in real life. I've experienced many errors at the highest layer ultimately being traceable back to a bug in something several layers deeper, sometimes all the way down to the kernel. Even the simplest act, like typing a character into a text box in a web browser, will in a fraction of a second call tens or hundreds of software "pieces" into action, starting at the operating system handling the keypress, up through the userspace, into the windowing system, through the layers the window system may have in place to modify the keyboard (such as code for handling international layouts), passing through the code to decide which window gets the keypress, into the program's instance of the widget library it uses, which routes the keypress into an event loop with correct source annotations, into the application code, which then itself creates a Javascript-layer event which may then be hooked into by arbitrary Javascript on the web page which can proceed to do any number of other things that may trigger another cascade that is equally complicated, or perhaps even more complicated (like loading a new image from the network). And that's the simplified version of the simple process of handling a keypress. Everywhere you look in software, you get these sorts of interactions routinely, and a single subtle flaw at any layer can have odd effects at any time.

Fortunately, we can also harness some characteristics of software to reduce the complication that any one person needs to worry about at any one point in time to a reasonable level, or it really would be impossible to write a program that can be counted on to work. Again, it is the unique ease of software that makes all this complexity possible; part of the reason other fields don't deal with the kind of complexity that software can deal with is because they lack the reliability of the basic components, testability, and other such aspects of software. They are forced to keep it simple by the nature of the physical world. The only thing stopping medical doctors from dealing with equal or greater complexity is that they can't see into biological processes as well as we can see into software processes, so they are forced to deal with a simplified model of the human body. As we continue to master the physical world, physical engineers and biologists will begin to experience this complexity too. Programmers may be blazing a trail, and it may be unique today, but someday everybody will get to deal with the complexity of software.

(Mu-hu-ha-ha-ha!)

### Software is Uniquely Chaotic

In my continuing series on why software is special, motivating writing a book about it, this post discusses how software is chaotic.

Here I refer to the mathematical definition of chaos, which I will define as: "A chaotic system is one in which small changes in the initial conditions can cause large and unpredictable changes in the system's iterated state." This is based on the mathematical definition(s), but simplified for our purposes. It's not just a word, it's a quasi-formal concept.

Every clause of the definition is important. In particular, people often leave out the "unpredictable" part of the definition of chaos, but you do not have chaos if everything is predictable. If you are at the top of a very round, smooth hill with a heavy ball, the final destination of the ball is predictably determined by the initial direction you give it when you drop it. This is what physicists would call "unstable", but it is not chaotic.

Every computer science curriculum worth anything will talk about the fundamental limits of computing, such as the halting problem in all of its guises. One of the most important things to carry away from that seemingly-academic discussion, even if you have no interest in pursuing further academics, is that unpredictability is fundamental to the building blocks of software. Once you start using Turing Machines, you have entered the realm of chaos. A single bit changed in the data or the program can have arbitrarily large effects, and in the general case, you can not predict what those effects are, not even in theory.

Software is almost the canonical embodiment of mathematical chaos. You can control and limit the chaos to some degree, but there is a boundary beyond which you fundamentally may not pass, and the reality of this boundary is so thoroughly embedded in the way we program that it is almost invisible. (The people who can best help you see this boundary are those who are studying ways to prove correctness in programs. They push the frontier a little further back with great effort and cleverness, and for this I salute them, but they will never be able to completely remove the chaos.) Per my earlier discussion about the lack of spatial separation, the full state space of a system is inevitably incomprehensibly large, leading to a lot of "room" for chaos to manifest, more than we could ever hope to examine. ("Room" grows exponentially in the size of the computer's memory.) This makes the system more unpredictable in practice, even if in theory the full behavior of the program could be understood. And being discrete, even the smallest change of a single bit can have arbitrarily large changes in the evolution of a program's state space.

Many other engineering disciplines certainly encounter chaos, though most try to minimize it, because unpredictable systems are generally less preferable than predictable ones. Even those that embrace it try to contain and minimize it; studying chaos can help you build a better muffin batter mixer but you wouldn't build the entire bread factory's mechanisms to function mathematically chaotically. (If you did wish to invoke chaos for some reason, you'd do it with software managing the system. The machines would still be designed to function non-chaotically.)

It can be very valuable for a computer programmer to take some time to study some of the characteristics of chaotic systems; I don't think a truly mathematical approach is necessary, so an informal overview and some time with a fractal-generation program should suffice. Things like "attractors" have obvious correlations to how software functions. You'll get enough practical experience once you start coding on your own, once you know what you're looking for.

### [Cheap] Good Practice is Unusually Hard to Create

The most common complaint about software is that it is "too buggy". The question is, "What does too buggy mean?" People making this complaint are often holding software to absurdly high standards, even when making comparisions to other engineering disciplines. In fact, bridges do fall down. Architects fail; often the designs can be seen to fail and corrected or maintained before catastrophic collapse, but it happens. Software is no more likely to be absolutely perfect than any other human endeavor.

Software is an engineering concern, and one of the things that means is that you can't have anything for free. If faced with the choice between a $100 piece of buggy or incomplete software, and a $50,000 piece of production-quality bullet-proof highly-tested quality software, it's unfair to complain that the $100 piece of software is buggy and incomplete.

Because software exists as an amorphous collection of numbers, and is mostly concerned with the manipulation of other amorphous numbers, when it fails, it is on average not as big of a problem as when other engineering artifacts fail. Software generally can't kill someone. (To the extent that it can, more care needs to be taken.) Thus, given a choice between a program that occasionally sort of eats your data but mostly works for $50, or a solid program that never ever eats your data but costs X*$50, people will generally take the former. Even if it's a bad idea. Even if the program will end up eating more than (X-1)*$50 worth of data. I'm not saying it's rational, I'm just saying that's how people are. The more expensive, higher quality program often won't even get made because nobody will buy it.

How many of you out there in the audience have complained about Microsoft's OS products? How many of you have even seriously considered spending many thousands of dollars more on robust UNIX-based systems? A few hands, yes, but not many. (Note that the quoted price includes some estimated training costs and such.) How many of you would actually shell out $2000 for a hypothetical version of Windows that never crashed, but didn't actually have any more features than your current Windows OS? Not many, I see. What about during the Windows 3.1 days, back when Windows itself crashed more often? Ah, that's a few more, but most of you are still picking the cheap-but-crashy software. Don't lie, I can see it in your spending patterns.

Here lies the core problem with finding good practice for software engineering. We can adapt the same basic processes used in other engineering disciplines. We have the examples from NASA and select other applications to show that software can be created with extremely high reliability. However, in the "real world" people simply aren't willing to spend the money necessary to create software with these heavyweight good practices, because thanks to the previously mentioned unique aspects of software (the number of interacting parts, mathematical chaos), this sort of software is extremely expensive. People want cheaper software. This is perfectly rational; often the thing that costs $X and does 90% of what you need is honestly the better choice than the thing that costs $100*X and does everything you need perfectly; it all comes down to a complicated and situation-dependent set of calculations for each choice.

The other problem is that it's not necessarily clear what the best practice actually is after all. Non-software developers will often be seen accusing software developers as a whole of not caring about process, but the truth is almost the exact opposite: Software engineering as a whole is nearly obsessed with process. From the Agile Methodology proponents, to those pushing UML, to any number of management methodologies ranging from the heavy to the light and everything in between, everything has been tried at one point or another. Metrics? Tried 'em, from the simple ("lines of code") to the obscure and mathematical ("cyclomatic complexity"). None of them are worthwhile. Testing methodologies all fail in the face of exponential state space. Design methodologies have experienced some ups and downs, but still there's nothing like a "one true answer". It's not that software engineers haven't tried to produce good process, it's that it's really hard to create a good process that meets all the constraints placed on us by customers.

Research into better methodologies is an ongoing process. Progress is slow due to the near impossibility of doing true scientific research on the topic, but some progress is being made. It's actually an amazing accomplishment for a 2007 program to have the same number of apparent bugs as a 1987 program; the same number of apparent bugs is spread out over a much larger code base, which implies that code bases are in fact improving in quality. This quality improvement happens as we improve our libraries, as we improve our methodologies slowly but surely, and as we tune our tools and libraries for these improved methodologies.

"Cheap, good, soon - pick two." In engineering terms, we are in fact learning how to make things cheaply and well, just as critics want, but it's at the cost of "soon". It's an extremely hard problem, so it's taking a long time. There's a long way yet to go. The way people want software to be all of "cheap, good, and soon" isn't really unique, but the degree which software is affected by these pressures is unusal... and as far as I can tell, the sanctimonious pronouncements about how we should do our job "better" from non-programmers do seem to be unique.

(One note: Throughout this section, when I talk about the costs of software, I am mostly talking about production costs, not actually the cost to the user. Thus, "free" software is not an issue here, because there is no such thing as software that is free to produce. "Free" or "open source" software simply pays for production costs in ways other than directly charging users; the mechanisms of such production are way out of scope of this book.)

### Programming is not Uniquely Unique

I want to be clear about my purpose here. My point is not to claim that the uniqueness of programming is itself unique. Every interesting field is unique in its own special way. For each field, it is helpful to understand why it is unique if you wish to truly excel, or you may bring inappropriate concepts from other domains in, or export inappropriate programming concepts to other domains. I say that programming has several unique aspects and that these aspects are worth thinking about, but this does not mean that programming is privileged somehow.

In fact, that would be a very bad attitude to have since the very purpose of a professional programmer is to serve somebody else, and service workers don't succeed with a holier-than-thou attitude.

This chapter is intended both to combat the perception I have seen that programming is somehow equivalent to some other task, prompting bad suggestions and in the worst cases bad decisions, and to explicitly call out the things that are special about programming to encourage people to think clearly about them. None of this takes away from the specialness or uniqueness of anything else.

There is a delicate balance to be had here. There are powerful underlying similarities shared by many disciplines, but everything is also unique. Ideal skill development can only be had both truths are correctly balanced, when you learn how to correctly leverage your past experiences while at the same time seeing how the new task is different.

(This work is of course about programming because a programmer is what I am. I am not qualified to write Baseball Wisdom or Accounting Wisdom, presuming I'm even qualified to write Programming Wisdom. In a way, nobody ever really is, but it's better that somebody try.)

The unifying principle of this book is:

Everything costs something. Everything worth talking about has benefits. Nothing is free; nothing has infinite value.

This sounds very simple and unobjectionable, but experience shows people have a hard time putting it into practice and realizing how pervasive the principle is.

#### Compression

Below the fold, a discussion about compression, using this as a clear example of a principle I intend to relate to other programming principles, and indeed engineering principles in general.

The Pigeonhole principle is named for the fact that if you have five holes and six pigeons that need holes, no matter how you arrange the pigeons in the holes you have at least one hole with two pigeons in it.

A simple and air-tight proof that no algorithm can possible compress all sets of data is based on this principle. A real-world compression algorithm defines a reversible mapping of input bytes to output bytes. (It has to be "reversible", or you can't write a perfect decompresser.) Imagine starting with a compression algorithm that maps all input back to itself (the identity mapping). Now, if you map the five-byte input ABCDE to the four-byte output WXYZ (20% compression), the input string WXYZ now must go somewhere else. It can only increase in size in the putatively compressed output, because all outputs four bytes and smaller are now taken. In order for a compression algorithm to shrink some inputs, some others must grow.

This is not a rigorous proof, but the rigorous proof takes this form.

If this is true, than how can compression algorithms do useful work? Like many mathematical concepts, there are many equivalent ways of looking at the answer. The best formulation for my purpose is that algorithms can take advantage of the fact that not all data sets are equally probable.

The best way we can measure how much of the total possible space we will probably use is with a mathematical concept from information theory called entropy. This gives us an algorithm that can look at a data set and give us the number of bits it "really uses". Truly random data will tend to have an entropy measurement around 8 bits of entropy per byte, meaning that it is "really using" all 8 bits and is therefore incompressible; an endlessly repeating character will have an entropy very close to 0 bits per byte, meaning that it isn't "really" using those bits and is therefore compressible.

The entropy of English text is around 0.6 to 1.3 bits per character. Let's use 1 bit for our computing convenience. A modern ASCII-based encoding of this text uses 8 bits per character. Using these numbers, we can compare the total number of possible data strings for a given number of bytes, and the total amount of this possibility space that is "really consumed" by possible English phrases.

For a message of a mere 15 characters, the number of possible byte-based messages is 2^8*15, which is approximately 1.3*10^³⁶. The number of possible English messages using this measure is 2^1*15, or 32,768.

Don't lose sight of the point here by taking that number too literally; it's a probabilistic thing. Arguing about piddling differences here or there is unimportant when the gulf is already 31 orders of magnitude, and growing exponentially with every byte!

I've taken some liberties, especially around the scare-quoted bits, but I believe everything I said is true enough. For more real information, follow the links or consult your choice of sources.

Note: To anyone who wishes to argue about the compression points made here, I will accept absolutely nothing less than fully functional source code sent to me that implements your "universal compressor/decompressor".

If you take a moment to digest those numbers (10³⁶ is not an easy thing to wrap your mind around!), you might intuitively glimpse at how compression algorithms manages to work. The data that we are interested in is a vanishingly small island in a sea of useless random data. Any compression algorithm must expand inputs to shrink others, but we can let that useless random garbage "take" the expansion, while we concentrate our compression mojo on that small little island.

Nothing is free. If you want compression, you have to pay. But sometimes it's a really, really good deal.

#### Language Design Tradeoffs

Wherein the point I was setting up in Compression is made.

What was the point of that little discussion about compression? Talking about compression is a pure way to introduce a general principle: To make one thing easier, something else must be made harder.

Consider a C compiler, and the C language it defines. It makes "subroutines" much easier to deal with, but at the cost of not allowing you arbitrary sequences of machine code. You can embed assembler, but even then the assembler code can't be arbitrary. It must work in harmony with the rest of the C code.

Examples can be seen everywhere you look, once you know what you're looking for. Take two top languages from two different paradigms and pit the strengths of one against the weaknesses of the other. Watch a simple Prolog exercise devolve into a multi-thousand-line C program, because Prolog makes easy some things C makes hard. On the other hand, imagine trying to write the Linux kernel in Prolog; C makes many things easy that Prolog makes hard, or even impossible. The compilers that implement these languages had to make different tradeoffs; a correct choice for C may be horribly wrong in some other context.

Note that this section assumes a certain definition of power, in terms of what execution results you can obtain from the language. There's nothing wrong with this definition, but it is not the only one, nor is it the one I would normally use to discuss language power. It does have the virtue of being arguably the most concrete and fundamental definition.

Many people believe that the purpose of higher-level languages is to add power, but the best way to think about it is that they are taking away power, so they can trade that power in for some other benefit. No language, however clever, can actually add power to machine code; machine code is the very definition of the capability of the machine itself. A language can only manifest as limitations on the machine language the machine may actually execute. But in return, all the features that we normally think of as ``added'' features are actually built on these chosen limitations.

When learning a new language, the first thing to do is to seek out what the language forbids, and why it forbids them. The second is to find out what the language builds on top of those restrictions. For reasons that should be obvious, few languages loudly advertise the things they forbid, so this may take some research but you can usually find it. Beginner exercises: Find out what Java forbids, and why; find out why functional languages are so keen on immutability, and what restriction that names.

Nothing is free. If you want to make some compiled output or result easier to generate, you must make others harder or even impossible. If somebody is telling you something is free, that's just a sign to look even harder for the tradeoff; it must be a doozy if they're trying to hide it that hard. (Or maybe they're just ignorant. But one thing they aren't is right.)

#### Higher Level Tradeoffs

Expanding the points made in Compression and Compression in Languages up to the Methodology level, and beyond.

Above languages we have design methodologies and team management methodologies, and the trade offs continue. A methodology that works great with a team of four may crash and burn with a team of 400. A methodology that works well with that team of 400 may have horrible overhead for the team of four.

One common methodology trade off, not just in software, is to improve consistency at the cost of creativity and spontaneity. McDonalds uses this to their advantage, because they are all about consistency and not at all about creativity. Managing your programmers like a McDonalds is a recipe for disaster under most circumstances. Managing a McDonalds with an Agile methodology is little more than an amusing mental image:

Hello, welcome to McDonalds. We are currently not serving hot food while our chefs refactor the grill and deep-fat fryer to use the same heat source, and our drinks won't be available until Bob completes running the unit tests to verify that we haven't accidentally caused our fountain drink system to dispense boiling hot oil... again.

When I pointed out that software is unusually hard to find a single good methodology for, here is where the problem manifests; it isn't that you can't produce a good methodology, it's that there doesn't seem to be one methodology that works across the entire domain of project scale, resources available, and other business pressures. Usually when someone is rhapsodizing about what's wrong with software and how to fix it, they will propose extremely heavyweight methodologies involving vastly more design and testing than is typically used in a real software project, which is all find and dandy, but such things are very, very costly. Can a methodology that virtually guarantees that you will run out of money and go out of business before completing your perfect gem of a program actually be considered "the answer"? (Hint: No. Once "practicality" is discarded as a judgment criterion, who really cares about the rest of judgment?)

Every methodology's performance varies along many axes depending on the circumstances it is applied in, and the people who apply it. Bug rates, code output, code quality, cost, none of these are independent variables. If you want effectively bug-free code, you'll need a very expensive methodology. If you want a cheap methodology, prepare for either low quality or quantity of output. And so it goes for any number of different combinations of criteria.

Nothing is free. If a methodology makes it easier to obtain one type of result, it makes it harder to obtain another type of result with the same resources.

#### Judging Tradeoffs

The core point of the entire Programming Wisdom blogbook.

With every choice, no matter how large or small, we bring some things closer and push some things farther. Just as with compression, the idea is to make the trade offs that bring the things you want closer, and primarily make the things you don't care about harder, so in the end you come out ahead.

And also like compression, there is a large set of things that nobody really needs or wants, and the corresponding trade offs typically are as close as you can get to no-brainers in programming. If your program will ever be maintained at all, your variables shouldn't all be named with single characters, or in languages foreign to the primary maintainers, or with otherwise obviously-bad names. One can argue about how much care to put into naming, but clearly mario.save(thePrincess) is right out for a bit of code that downloads and verifies an XML file.

Long lists of such bad tradeoffs have been collected for your amusement and edification. Let those who have not read the list and pondered it beware.

To see some of these useless things in action, check out the surprisingly numerous novelty languages, like Whitespace, Brainfuck, or INTERCAL. Each is, in its own special way, a solid implementation of a certain programming philosophy. An insane philosophy, but a philosophy.

We can take these large sets of things nobody wants as our baseline. That means (more-or-less by definition) the low-hanging fruit in any methodology or language has already been picked, and in practice, whenever we allow something useful, we must be disallowing something else useful, since we've already kicked out the majority of useless things. This is why the highlighted paragraph I opened this chapter with includes the phrase everything worth talking about has benefits; sure, Whitespace theoretically makes some things easier, but who really cares? And sure, you can construct languages with obvious flaws that can then be simply corrected, but in practice, you are unlikely to run into such a beast on any real project, so who cares about arguing about such hypotheticals? (If you do encounter such a language in the real world, unless you are in the middle of creating a new environment for some sufficiently-good reason, it's probably proof you should run screaming.)

Here is the key sentence for this entire work: Programming wisdom is the ability to correctly determine the costs and benefits of a set of solutions and judge which solution is best.

Of course it sounds obvious when I put it that way, but clearly many people do not think this way. Many people will read this and say Well, duh!, talking the talk about already knowing this, but I can see by actual actions taken that few exhibit a deep, internalized understanding of this principle. There are many subtleties involved with putting this into practice and shortcuts to avoid; people frequently fail to account for entire value categories, let alone judging them correctly.

## What Is Programming?

What is programming?

When you first start programming, the answer is painfully obvious: Programming is making the computer do what you want.

Duh, right?

However, if you have any aptitude for it at all, you will rapidly get to the point where making the computer do what you want really isn't that hard. Oh, you may be betrayed by your environment, your libraries, even your hardware sometimes, and you never get to the point where you are immune to the multi-day debugging sessions, but in general, getting the computer to do what you want ceases to be a challenge.

The true challenge of programming is learning to want the right things, and then how to obtain those things, beyond the mere first-order consideration of "does it run right now?"

When you work on the same product for three years, you will learn to want maintainable code. Writing code that works is easy; learning how to write maintainable code is a worthy challenge.

When you start work on a project that has fifty man-years already put into it, you will learn to want code that is properly documented. Learning exactly what "properly documented" truly entails is a worthy challenge.

When a project exceeds the size that one person can comfortably hold in their head, you will learn to want code that is conceptually clean and easy to come back to; a worthy challenge.

And so on, for a number of worthy challenges.

If only it were so easy as "making the computer do what we want"!

Programmers that can make computers do things are a dime a dozen. Programmers that have learned to want the right things are unusual.

### On Values

When we make a judgment, we are saying that one thing has a larger value than another. We have a value function in our brains that takes two arguments and returns whether the first is less than, equal to, or greater than the other. As cruel or as crazy as it may sound, that function can take any two things and compare them; we have to make decisions like Value(CoolJob, CloseToFamily) all the time.

For my purposes, I merely want a concrete term to talk about "value functions" with, but it's interesting to consider the characteristics of the function. Is it a full ordering, or a partial ordering? A partial ordering seems more likely, but then implies there may be cycles, which would have to be resolved by a higher-order value function if you ever faced the corresponding choice. Yet a total ordering doesn't seem likely either. So it's probably simply inconsistent, which if you know anything about humans isn't too surprising. Making sure to think about your choices explicitly may help you avoid or minimize the inconsistencies, which it is probably safe to say are bad in practice.

Many people have an instinctive revulsion to the idea that such a value function exists, but it is important to understand that it does, no matter how much you'd like to avoid it. If you are in an improbable situation where you are forced to chose which of your children lives, you will have to make a choice. Refusing to choose is itself a choice, and is a piece of your value function.

In programming, as in life, people often end up using a value function provided by somebody else, rather than actually deciding what they value and what gets them the most with their resources.

Many people are peddling value functions. When someone advocates a methodology, they are also selling you on the value function which their methodology theoretically maximizes. When someone advocates a language or platform, they are selling you the value function where their language or platform has more value than any other.

I glibly use the phrase ``your true value function'' here without defining it. What exactly is your true value function? Well, if you'll pardon yet more glibness, I think that's a bit beyond the scope of this work. Some things can only be discovered for yourself, no?

The correspondence of these value functions to your true value function varies widely. If you use a value function that is greatly at odds with your true value function, you will invariably end up with less true value than you could have gotten. Simple proof: If you choose something you accidentally overvalued, then you pay the opportunity cost on the additional value of what you should have chosen. If you undervalue something, you are likely to end up choosing something else incorrectly and getting less than you should. If you choose poorly enough, you may end up with negative value, even discounting opportunity cost.

It's one thing to choose Java because it's the best choice for you. Maybe you value the cross-platform support. Maybe there's a Java-only library you can buy that gets you closer to the goal in one step than you can get in any other language; that can be valuable. Maybe all of your developers already know Java and that outweighs the costs it has.

It's quite another to choose it because "everybody uses Java", without analysis. Java's got some serious disadvantages, too; are you so willing to accept them without thought?

As you might get from the somewhat slanted tone of the last two paragraphs, I think the "use it because everybody else is using it" heuristic is one of the worst value functions you can adopt. Life's just not that easy.

I can also now re-express the goal of this book more precisely. My goal is not to try to provide you a value function; my goal is to help you build your own. Step one is realizing you need one. Some elements of my personal value function will shine through in this work, but that is because it is unavoidable (and therefore I don't try to hard), not because I truly want you to adopt mine. One of the important ways I can do that is help you consciously think about your value functions.

#### Emotional Value

When I was a child, I wanted to be like Spock. For those few who do not know whom I mean, Spock was the science officer on the star ship Enterprise in the famous 1960's sci-fi television show Star Trek. His claim to fame was being half-human and half-Vulcan. Vulcans were an alien race who are so naturally violent that they felt themselves forced to renounce their emotions and turn to a life of pure logic, lest they extinguish themselves in endless war. A common misconception is that Vulcans have no emotions; they do, but they rigidly suppress them.

Spock's major character arc involved a conflict between his "human side" and his "Vulcan side", between "emotions" and "logic". During the television series, he had chosen to attempt being pure Vulcan/logical, but he met with less success than he would have liked. Something never made clear was whether this was purely a personal issue or if perhaps being only half-Vulcan made it somehow biologically more difficult to live with the Vulcan philosophies and disciplines. (Most likely even the writers themselves were conflicted over their interpretation of this.)

Spock's initial choice reflects a common view of emotions, that they are intrinsically opposed to logic, unpredictable and uncontrollable, that you are forced to choose either the cold, cruel world of logic, or the squishy, utterly irrational world of emotion and feeling, but that ne'er the twain shall meet. This is view can be seen in our most ancient literature, where the fiery passions of somebody's loins are routinely contrasted with their cold, austere logical mind.

What absolute garbage!

It is not true that logic is a cold, cruel discipline. Logic is nothing at all; it is merely a way of manipulating a set of statements with some truth value to obtain new statements with some truth value. What is cold and cruel is not logic itself, but the axioms fed to the logic system to get it started. Certainly if you start with axioms like the axiom scheme for universal instantiation, then you are going to end up with a logic that is not capable of dealing with emotion. But then, it's not capable of dealing with much of anything outside of pure math without fully specifying the universe, something generally considered impractical. (And possibly not even if you did provide such a specification; it is impossible to prove that the Universe works solely according to any given logical axiom system.)

On the other hand, feed logic what you know about emotions from life, like

If you prick us, do we not bleed? if you tickle us, do we not laugh? if you poison us, do we not die? and if you wrong us, shall we not revenge?

and use a form of probabilistic, experience-fed logic we call "common sense", and by golly, those emotions become highly tractable and reasonably predictable.

It is true that mere knowledge about emotions does not intrinsically allow you to manipulate them; the mere knowledge that you will be angry does not prevent you from becoming angry. However, it does allow you to indirectly manipulate your emotions, or the emotions of others, by giving you the insight to prevent the anger-inducing situation from arising in the first place, or if nothing else at least allowing you to prepare yourself for the anger and perhaps mentally rehearse your actions before your emotions determine them for you.

There are a lot of people who try to deal with their emotions as Spock did, by rigidly suppression or denial, and as a result, they assign no value to their emotional well-being, assuming it to be something they can simply ignore or change with raw willpower. But there is a value to your emotional well-being, a value greater than zero, and this should never be ignored or downplayed because of stupid ideas about emotions. You should not consider the value of loved one's emotional well-being to be zero either; this can be harder since you don't directly experience their emotions.

This concept will generally not come up directly throughout this book, but it is a constant background presence. Your mental state is always a consideration in programming decisions. On the trivial level, the answer to the tricky problem of weighing the costs and benefits of five competing solutions to your problem may be to give up for the night, call a couple of your friends, and hit the bar for the rest of the night. This is not a joke.

On a more serious level, if you are constantly being forced into sub-optimal solutions by an authority figure and the prospect of having to spend another month patching servers and restoring backups that could have been prevented if you had just been given the opportunity to take two days and write the input parsing routines correctly, when it comes time to decide whether or not to keep the job or look elsewhere, do not make the mistake of valuing your emotional state at zero. Only you can decide the exact value, but you should make that decision with the full awareness of the value of your emotions.

If you are making programming decisions that affect a team, you mustn't ignore these factors either. Working your team 80 hours a week isn't a neutral decision, because not only are they not really doing 80 hours of work anyhow (one of the few facts about productivity that has abundant scientific support), but you are wrecking their lives, burning them out, and, if you still can't find it in yourself to care about that, greatly increasing the chances that you will lose employees, which also represents a loss of knowledge and skills that can only be painfully replaced, if that.

Programming seems to attract a lot of people, both managers and managees, that somehow have come to the conclusion that being human is a failing that can be overcome merely with sufficient will power. Pervasive belief does not make it true.

You are not a Vulcan. You should not plan to live like one.

#### The Money Value Function

I've loosely defined the value function (link) to only compare two "things", without further specifying what "things" it can take, because some things we put in there (like CloseToFamily) are fundamentally non-numeric properties. But some people have their own specializations of this value function. One that almost nobody will admit to using, but a lot of people live by, is the Money value function. This function takes just one argument and returns a single concrete number with the unit "Dollars" (or relevant local currency).

I do not recommend using the Money value function personally, but probably not for the reasons you think. It's not that it is an intrinsically bad idea, it is that humans have a proven track record of being unable to use this metric rationally; the problem lies with us, not the metric. Part of what may help you to rationalize this is to recognize that while I may be willing to sell eight hours of my day to my employer for $XXX, it does not mean that I will sell the next eight for the same price. If you balance the need to be with your family against the more fundamental need to provide for them and recognize how values shift over time, you can "get away with" using this value function. Still, it's tricky; I don't do it. I think conventional wisdom on this topic is essentially correct, though as usual, wrong about why.

I do not recommend this function for personal use, but it is a value function you should become familiar with, because it is the one in use by businesses.

This sounds horrible, but the horror doesn't intrinsically come from using a dollar-based function. It comes from the valuation the business places upon the non-concrete factors like ``employee happiness'' and other such things. Contrary to popular belief, no company can afford to put a negatively-infinite valuation on employee happiness, nor can they afford a true zero (if nothing else local labor laws, unions, or if necessary vigilante justice will impose some lower limit), but certainly very low values can be used.

It is the nature of that function that differentiates companies, not whether or not they have one.