The unifying principle of this book is:
Everything costs something. Everything worth talking about has benefits. Nothing is free; nothing has infinite value.
This sounds very simple and unobjectionable, but experience shows people have a hard time putting it into practice and realizing how pervasive the principle is.
When we make a judgment, we are saying that one thing has a larger value than another. We have a value function in our brains that takes two arguments and returns whether the first is less than, equal to, or greater than the other. As cruel or as crazy as it may sound, that function can take any two things and compare them; we have to make decisions like Value(CoolJob, CloseToFamily) all the time.
Many people have an instinctive revulsion to the idea that such a value function exists, but it is important to understand that it does, no matter how much you'd like to avoid it. If you are in an improbable situation where you are forced to chose which of your children lives, you will have to make a choice. Refusing to choose is itself a choice, and is a piece of your value function.
In programming, as in life, people often end up using a value function provided by somebody else, rather than actually deciding what they value and what gets them the most with their resources.
Many people are peddling value functions. When someone advocates a methodology, they are also selling you on the value function which their methodology theoretically maximizes. When someone advocates a language or platform, they are selling you the value function where their language or platform has more value than any other.
The correspondence of these value functions to your true value function varies widely. If you use a value function that is greatly at odds with your true value function, you will invariably end up with less true value than you could have gotten. Simple proof: If you choose something you accidentally overvalued, then you pay the opportunity cost on the additional value of what you should have chosen. If you undervalue something, you are likely to end up choosing something else incorrectly and getting less than you should. If you choose poorly enough, you may end up with negative value, even discounting opportunity cost.
It's one thing to choose Java because it's the best choice for you. Maybe you value the cross-platform support. Maybe there's a Java-only library you can buy that gets you closer to the goal in one step than you can get in any other language; that can be valuable. Maybe all of your developers already know Java and that outweighs the costs it has.
It's quite another to choose it because "everybody uses Java", without analysis. Java's got some serious disadvantages, too; are you so willing to accept them without thought?
As you might get from the somewhat slanted tone of the last two paragraphs, I think the "use it because everybody else is using it" heuristic is one of the worst value functions you can adopt. Life's just not that easy.
I can also now re-express the goal of this book more precisely. My goal is not to try to provide you a value function; my goal is to help you build your own. Step one is realizing you need one. Some elements of my personal value function will shine through in this work, but that is because it is unavoidable (and therefore I don't try to hard), not because I truly want you to adopt mine. One of the important ways I can do that is help you consciously think about your value functions.
It is not true that logic is a cold, cruel discipline. Logic is nothing at all; it is merely a way of manipulating a set of statements with some truth value to obtain new statements with some truth value. What is cold and cruel is not logic itself, but the axioms fed to the logic system to get it started. Certainly if you start with axioms like the axiom scheme for universal instantiation, then you are going to end up with a logic that is not capable of dealing with emotion. But then, it's not capable of dealing with much of anything outside of pure math without fully specifying the universe, something generally considered impractical. (And possibly not even if you did provide such a specification; it is impossible to prove that the Universe works solely according to any given logical axiom system.)
On the other hand, feed logic what you know about emotions from life, like
If you prick us, do we not bleed? if you tickle us, do we not laugh? if you poison us, do we not die? and if you wrong us, shall we not revenge?
and use a form of probabilistic, experience-fed logic we call "common sense", and by golly, those emotions become highly tractable and reasonably predictable.
It is true that mere knowledge about emotions does not intrinsically allow you to manipulate them; the mere knowledge that you will be angry does not prevent you from becoming angry. However, it does allow you to indirectly manipulate your emotions, or the emotions of others, by giving you the insight to prevent the anger-inducing situation from arising in the first place, or if nothing else at least allowing you to prepare yourself for the anger and perhaps mentally rehearse your actions before your emotions determine them for you.
There are a lot of people who try to deal with their emotions as Spock did, by rigidly suppression or denial, and as a result, they assign no value to their emotional well-being, assuming it to be something they can simply ignore or change with raw willpower. But there is a value to your emotional well-being, a value greater than zero, and this should never be ignored or downplayed because of stupid ideas about emotions. You should not consider the value of loved one's emotional well-being to be zero either; this can be harder since you don't directly experience their emotions.
This concept will generally not come up directly throughout this book, but it is a constant background presence. Your mental state is always a consideration in programming decisions. On the trivial level, the answer to the tricky problem of weighing the costs and benefits of five competing solutions to your problem may be to give up for the night, call a couple of your friends, and hit the bar for the rest of the night. This is not a joke.
On a more serious level, if you are constantly being forced into sub-optimal solutions by an authority figure and the prospect of having to spend another month patching servers and restoring backups that could have been prevented if you had just been given the opportunity to take two days and write the input parsing routines correctly, when it comes time to decide whether or not to keep the job or look elsewhere, do not make the mistake of valuing your emotional state at zero. Only you can decide the exact value, but you should make that decision with the full awareness of the value of your emotions.
If you are making programming decisions that affect a team, you mustn't ignore these factors either. Working your team 80 hours a week isn't a neutral decision, because not only are they not really doing 80 hours of work anyhow (one of the few facts about productivity that has abundant scientific support), but you are wrecking their lives, burning them out, and, if you still can't find it in yourself to care about that, greatly increasing the chances that you will lose employees, which also represents a loss of knowledge and skills that can only be painfully replaced, if that.
Programming seems to attract a lot of people, both managers and managees, that somehow have come to the conclusion that being human is a failing that can be overcome merely with sufficient will power. Pervasive belief does not make it true.
You are not a Vulcan. You should not plan to live like one.
The Money Value Function
I do not recommend this function for personal use, but it is a value function you should become familiar with, because it is the one in use by businesses.
This sounds horrible, but the horror doesn't intrinsically come from using a dollar-based function. It comes from the valuation the business places upon the non-concrete factors like ``employee happiness'' and other such things. Contrary to popular belief, no company can afford to put a negatively-infinite valuation on employee happiness, nor can they afford a true zero (if nothing else local labor laws, unions, or if necessary vigilante justice will impose some lower limit), but certainly very low values can be used.
It is the nature of that function that differentiates companies, not whether or not they have one.
Tradeoffs in Language Design
What was the point of that little discussion about compression? Talking about compression is a pure way to introduce a general principle: To make one thing easier, something else must be made harder.
Consider a C compiler, and the C language it defines. It makes "subroutines" much easier to deal with, but at the cost of not allowing you arbitrary sequences of machine code. You can embed assembler, but even then the assembler code can't be arbitrary. It must work in harmony with the rest of the C code.
Examples can be seen everywhere you look, once you know what you're looking for. Take two top languages from two different paradigms and pit the strengths of one against the weaknesses of the other. Watch a simple Prolog exercise devolve into a multi-thousand-line C program, because Prolog makes easy some things C makes hard. On the other hand, imagine trying to write the Linux kernel in Prolog; C makes many things easy that Prolog makes hard, or even impossible. The compilers that implement these languages had to make different tradeoffs; a correct choice for C may be horribly wrong in some other context.
Many people believe that the purpose of higher-level languages is to add power, but the best way to think about it is that they are taking away power, so they can trade that power in for some other benefit. No language, however clever, can actually add power to machine code; machine code is the very definition of the capability of the machine itself. A language can only manifest as limitations on the machine language the machine may actually execute. But in return, all the features that we normally think of as ``added'' features are actually built on these chosen limitations.
When learning a new language, the first thing to do is to seek out what the language forbids, and why it forbids them. The second is to find out what the language builds on top of those restrictions. For reasons that should be obvious, few languages loudly advertise the things they forbid, so this may take some research but you can usually find it. Beginner exercises: Find out what Java forbids, and why; find out why functional languages are so keen on immutability, and what restriction that names.
Nothing is free. If you want to make some compiled output or result easier to generate, you must make others harder or even impossible. If somebody is telling you something is free, that's just a sign to look even harder for the tradeoff; it must be a doozy if they're trying to hide it that hard. (Or maybe they're just ignorant. But one thing they aren't is right.)
Tradeoffs in Higher Levels
Above languages we have design methodologies and team management methodologies, and the trade offs continue. A methodology that works great with a team of four may crash and burn with a team of 400. A methodology that works well with that team of 400 may have horrible overhead for the team of four.
One common methodology trade off, not just in software, is to improve consistency at the cost of creativity and spontaneity. McDonalds uses this to their advantage, because they are all about consistency and not at all about creativity. Managing your programmers like a McDonalds is a recipe for disaster under most circumstances. Managing a McDonalds with an Agile methodology is little more than an amusing mental image:
Hello, welcome to McDonalds. We are currently not serving hot food while our chefs refactor the grill and deep-fat fryer to use the same heat source, and our drinks won't be available until Bob completes running the unit tests to verify that we haven't accidentally caused our fountain drink system to dispense boiling hot oil... again.
When I pointed out that software is unusually hard to find a single good methodology for, here is where the problem manifests; it isn't that you can't produce a good methodology, it's that there doesn't seem to be one methodology that works across the entire domain of project scale, resources available, and other business pressures. Usually when someone is rhapsodizing about what's wrong with software and how to fix it, they will propose extremely heavyweight methodologies involving vastly more design and testing than is typically used in a real software project, which is all find and dandy, but such things are very, very costly. Can a methodology that virtually guarantees that you will run out of money and go out of business before completing your perfect gem of a program actually be considered "the answer"? (Hint: No. Once "practicality" is discarded as a judgment criterion, who really cares about the rest of judgment?)
Every methodology's performance varies along many axes depending on the circumstances it is applied in, and the people who apply it. Bug rates, code output, code quality, cost, none of these are independent variables. If you want effectively bug-free code, you'll need a very expensive methodology. If you want a cheap methodology, prepare for either low quality or quantity of output. And so it goes for any number of different combinations of criteria.
Nothing is free. If a methodology makes it easier to obtain one type of result, it makes it harder to obtain another type of result with the same resources.
With every choice, no matter how large or small, we bring some things closer and push some things farther. Just as with compression, the idea is to make the trade offs that bring the things you want closer, and primarily make the things you don't care about harder, so in the end you come out ahead.
And also like compression, there is a large set of things that nobody really needs or wants, and the corresponding trade offs typically are as close as you can get to no-brainers in programming. If your program will ever be maintained at all, your variables shouldn't all be named with single characters, or in languages foreign to the primary maintainers, or with otherwise obviously-bad names. One can argue about how much care to put into naming, but clearly mario.save(thePrincess) is right out for a bit of code that downloads and verifies an XML file.
Long lists of such bad tradeoffs have been collected for your amusement and edification. Let those who have not read the list and pondered it beware.
We can take these large sets of things nobody wants as our baseline. That means (more-or-less by definition) the low-hanging fruit in any methodology or language has already been picked, and in practice, whenever we allow something useful, we must be disallowing something else useful, since we've already kicked out the majority of useless things. This is why the highlighted paragraph I opened this chapter with includes the phrase everything worth talking about has benefits; sure, Whitespace theoretically makes some things easier, but who really cares? And sure, you can construct languages with obvious flaws that can then be simply corrected, but in practice, you are unlikely to run into such a beast on any real project, so who cares about arguing about such hypotheticals? (If you do encounter such a language in the real world, unless you are in the middle of creating a new environment for some sufficiently-good reason, it's probably proof you should run screaming.)
Here is the key sentence for this entire work: Programming wisdom is the ability to correctly determine the costs and benefits of a set of solutions and judge which solution is best.
Of course it sounds obvious when I put it that way, but clearly many people do not think this way. Many people will read this and say Well, duh!, talking the talk about already knowing this, but I can see by actual actions taken that few exhibit a deep, internalized understanding of this principle. There are many subtleties involved with putting this into practice and shortcuts to avoid; people frequently fail to account for entire value categories, let alone judging them correctly.
Tradeoffs in Compression
The Pigeonhole principle is named for the fact that if you have five holes and six pigeons that need holes, no matter how you arrange the pigeons in the holes you have at least one hole with two pigeons in it.
A simple and air-tight proof that no algorithm can possible compress all sets of data is based on this principle. A real-world compression algorithm defines a reversible mapping of input bytes to output bytes. (It has to be "reversible", or you can't write a perfect decompresser.) Imagine starting with a compression algorithm that maps all input back to itself (the identity mapping). Now, if you map the five-byte input ABCDE to the four-byte output WXYZ (20% compression), the input string WXYZ now must go somewhere else. It can only increase in size in the putatively compressed output, because all outputs four bytes and smaller are now taken. In order for a compression algorithm to shrink some inputs, some others must grow.
This is not a rigorous proof, but the rigorous proof takes this form.
If this is true, than how can compression algorithms do useful work? Like many mathematical concepts, there are many equivalent ways of looking at the answer. The best formulation for my purpose is that algorithms can take advantage of the fact that not all data sets are equally probable.
The best way we can measure how much of the total possible space we will probably use is with a mathematical concept from information theory called entropy. This gives us an algorithm that can look at a data set and give us the number of bits it "really uses". Truly random data will tend to have an entropy measurement around 8 bits of entropy per byte, meaning that it is "really using" all 8 bits and is therefore incompressible; an endlessly repeating character will have an entropy very close to 0 bits per byte, meaning that it isn't "really" using those bits and is therefore compressible.
The entropy of English text is around 0.6 to 1.3 bits per character. Let's use 1 bit for our computing convenience. A modern ASCII-based encoding of this text uses 8 bits per character. Using these numbers, we can compare the total number of possible data strings for a given number of bytes, and the total amount of this possibility space that is "really consumed" by possible English phrases.
For a message of a mere 15 characters, the number of possible byte-based messages is 28*15, which is approximately 1.3*10^36. The number of possible English messages using this measure is 21*15, or 32,768.
Don't lose sight of the point here by taking that number too literally; it's a probabilistic thing. Arguing about piddling differences here or there is unimportant when the gulf is already 31 orders of magnitude, and growing exponentially with every byte!
If you take a moment to digest those numbers (1036 is not an easy thing to wrap your mind around!), you might intuitively glimpse at how compression algorithms manages to work. The data that we are interested in is a vanishingly small island in a sea of useless random data. Any compression algorithm must expand inputs to shrink others, but we can let that useless random garbage "take" the expansion, while we concentrate our compression mojo on that small little island.
Nothing is free. If you want compression, you have to pay. But sometimes it's a really, really good deal.