Lazy Programmers Guide

This essay has been lightly updated in July 2006, but my final goal is to roll the disparate pieces described here into a unified book on programming. However, this will take years, so in the meantime I should at least not remove this from the web.

This essay shares some good tips and tricks I've picked up. The putative guiding principle was what I called "lazy programming", which I characterized as wanting to "minimize the time spent on all tasks... and is willing to go to great lengths to accomplish this goal. At all times, effort spent must be for the maximal gain."

At the time that I first wrote this, I believed the first bit of that was the important bit, hence the "Lazy Programmer" in the title. I have since come to believe that the last bit, a toss-off point at the time, is actually far more fundamental, and will serve as the unifying point of my book.

In the meantime, here are the various bits and pieces that I had here:

Friction Matters

In the real world, "friction" controls how easy it is to get from point A to point B. Here I use "friction" to include all forces that prevent you from sliding around, not just what we technically call "friction" in physics. Let's say you want to move a huge rock from your yard. If the friction is too high, perhaps because you have no tools to move it with, you simply won't bother. If you have the tools, like a backhoe, lowering the effective friction (because as I said I'm not talking strictly about the kind the Physics class kind), you might move it somewhere where you can forget it. Mission accomplished.

Many programmers' imagination stops there. They did the job they wanted to do. But the story goes on. Suppose the effort involved in overcoming the friction dropped even farther. Now perhaps you'll move the rock somewhere decorative, knowing you can move it again later if it becomes a problem. If it's a huge effort to move, you won't do that. (Look around... how many decorative rocks do you see on people's lawns in the range of tens of pounds? How many do you see in the range of thousands?)

If it drops even further, until it becomes effectively zero, more possibilities emerge that you might not have even considered. Maybe you'll celebrate Halloween by recreating Stonehenge. After all, its mere minutes work to take it apart and re-arrange it.

As usual, trying to apply a physical metaphor to a programming topic ultimately confuses more than enlightens. In programming, you can reduce the friction of a task far, far beyond any reasonable physical metaphor allows. The metaphor is intended to highlight how the amount of effort something takes critically affects not only what you can do with a tool, but what you'll even consider... that's two separate effects, mind you, not just one.

The relationship is an inverse-type relationship. As friction halves, the possibilities double (or more). This is not amazing... what's amazing is that it's true, all the way down to the character level.

Even single characters can slow you down if they don't belong in the flow. I for one have a real problems with Perl's extensive use of characters like \$@#. They slow you down, and are really there because the language is ill-specified, not because they increase your convenience.

Of course most of the time you're not writing a language. The really practical thing to carry away from this is when approaching a problem, use your imagination and think big for a moment. Step back from a problem, don't approach the problem with a brute-force (backhoe) solution, and think about the other possibilities. One of the most amazing moments in computer science is when you discover that a powerful solution that "empower" you to toss around boulders like so many marbles is actually easier to build and use then the brute force solution! A moment spent thinking can literally be worth months spent coding. (I do not use the world "literally" to mean "figuratively", like many people. I mean "literally" here. The payoff is rarely that large, but it does happen!)

Speaking from experience, once you have that more general solution, you'll think of other things you can do with it that you wouldn't have even considered before. You can end up with a superior solution and still be ahead in time! Thinking can result in both less and better coding. Not only that, but thinking is portable and can be done anywhere; coding tends to require a computer.

The importance of this is magnified when you are working in a group situation; every person who does something that you have reduced the friction for benefits. The total effect adds up fast.

Small Components Mean Better Fluidity

You should always look to make small components, then hook them together to make larger components, and so on. Exactly what a "component" is will depend on your environment and your problems. In OO languages, a class is a component. In functional languages, functions and transforms may be components. In procedural (and in OO languages that allow it), functions are components.

Furthermore, it should be an explicit goal at all times that the components be as standalone as possible, even if you never intend to use them again. To make them standalone in theory, you must reduce the dependencies to the bare minimum. Frequently, programmer laziness (the bad kind) makes every component intimately aware of, and therefore dependent on, every other component.

This means that when (not if) the requirements change, any change to one component makes it that much more likely that all of them will need to change. Minimizing the dependencies makes it easier to rapidly re-combine the constituent pieces in any necessary configuration.

This is of course just the OO idea of encapsulation. The difference between my treatment and the standard treatment of the problem lies in motivation. Encapsulation is frequently cast as a method of defense against other programmers; this is rarely desirable in practice, unless you are building a class library that will become very popular and must be easy to use without mucking up the internal data structures. Many programming communities in fact reject the idea of enforcing encapsulation in the language, preferring to rely on inter-programmer communication to obtain all the supposed benefits of enforced encapsulation. It is much more often a better idea to practice encapsulation as defense against the future, with the idea that changes will probably not need to cross encapsulation boundaries; perhaps looking at it that way you will actually be motivated to practice it. Believe me, the benefits are worth it.

(I have expanded on this idea significantly in another essay.)

It is more difficult to make a truly stand-alone component then something that cheats a little, and it's not always worth it to have perfect purity. But, if it does become worth it, it will be that much easier to do if you designed it with that goal in mind in the first place. A generally accepted figure in the software engineering community is that it is three times hard to make a stand-alone, generally useful component out of something then to make a specialized component with extensive system knowledge. However, by keeping these principles in mind, sometimes you can get a stand-alone component "for free", as your component moves through several iterations, and perhaps gets embedded into other programs.

Always Check Validity, Never Check Lack of Invalidity

You will be writing data validation code until the day you stop writing code. In the vast majority of situations, there is a wrong way to check data validity/integrity and a right way.

Wrong Way

The rookie mistake is to check for the lack of invalidity. Watch the double negative! Having graded many freshman projects I find that beginning programmers seem to gravitate towards this. Resist.

The fundamental problem with this approach is that the universe is essentially out to get you: There are many more ways input can be bad then good. Good input is usually defined as "good" according to some small set of rules. Bad input can have illegal chars, be too long, be syntactically correct but logically wrong, contain data that will cause your processing routines to crash, cause arbitrary effects in other (too-trusting) parts of your program, and that's just the vague effects. Ask me about a specific situation and I can spin even more detailed (and plausible) tales of bad-data woe.

The canonical example of the failure of this approach has to be times where you want to constrain text input to certain chars, like alphanumeric chars for passwords or file names. The standard bad thing to do it to run through the text input, and see if it has a dash, or an underscore, or an apostrophe, etc. (This is easier nowadays with regex libraries and defined character classes, but this used to be really hard.) This was a ineffective even in the old ASCII world, frequently missing hundreds of "bad chars", many with exciting effects such as "Line Feed", but in this world of Unicode this approach is just a sick joke. Yet I still see it.

Right Way

The right way is to check for validity. If you wish to constrain the chars, check that each char in the input is permitted. If you're checking that a buffer won't overflow, check each char as it comes in to make sure it won't overflow the buffer. (When I was grading freshman projects, I saw a lot of code that attempted to check after the fact whether the input had overflowed the buffer. Close, no cigar.)

A lesser-well known corollary is that if the input fails, toss it. Do not try to salvage it unless you really know what you are doing, because the act of salvaging it is often a security hole itself.

If you absolutely must clean up your input, run your cleaning routine over the text until it converges (i.e., running it doesn't change the text). Watch for infinite loops.

Try Other, Non-Mainstream Languages!

It's certainly marketable to learn Java, Visual Basic, C++, a smattering of SQL, some HTML, and maybe NT/2000 admin and call yourself a computing professional. But you'll really be missing out.

Learn some other languages. First, learning other languages in computer science has a very similar effect to learning multiple human languages. It can expand your thinking and show you new perspectives on things, in ways you can't even imagine until after you've done it. In fact, for this reason, I strongly suggest that you get at least one functional language under your belt, and make an effort to use it for at least one interesting program. . . especially if you never learned one in school. The functional way of thinking is invaluable, even if you never really get into the computer theoretical aspect of provability, closures, side-effect-free code, etc.

Second, learning multiple languages in computer science shares another aspect with learning multiple human languages: The first few are the toughest, after that it's not so bad. The more diverse your early languages, the more true this is. Not to downplay the accomplishments of human language polyglots, but after the first three or four languages, picking up the next five or six is relatively (relatively!) easy, especially if you grew up in an environment where many languages were spoken in your day-to-day life. Learn LISP/Smalltalk, C++/Java, C (as distinct from C++), and Perl/Python, and you're a long way towards understanding anything at all. It is quite likely in your career you will need to learn another language. It would be good for your career if it doesn't take you a year to become fluent in it, because you will be up against people who will pick up 99% of the language in two or three intensive self-training weeks.

(Note I said "language", not "API" or "library set". However, I also note that there are often significant commonalities between APIs, too, even across languages; for instance, there are two basic approaches to XML parsing, and once you learn them in any language, it's extremely likely to carry to another, even if they do it slightly differently.)

Selective Verbosity Can Greatly Enhance Coding Clarity

Both of the following code segments are in Python, which you may or may not know. Which of the following is more understandable?

def exampleOne ( l ):
return reduce(operator.__add__, filter(lambda  a: a % 2 == 0, l) ) -
reduce(operator.__mul__, filter(lambda a: a % 2  == 1, l) )

Or this:

def exampleTwo ( numList ):
even = lambda a: a % 2 == 0
odd = lambda a: a % 2 == 1

evens = filter ( even, numList )
odds = filter ( odd, numList )

evenSum = reduce ( operator.__add__, evens )
oddMult = reduce ( operator.__mul__, odds )

return evenSum - oddMult

Of course they're the same code fragment, and to any programmer, it's obvious what I did to create that. But, which is more understandable?

The first is a code block that simply numbs the mind when you read it. It has no handle, no way to get a grasp on what it means.

The second has several variable names which provide valuable clues about what's going on. I've written it in such a way that if you simply read it out loud, you've come very close to understanding what it does. (Try it.)

What it does is take a list of numbers (a key fact in understanding the function that the first writeup does no more then hint at), separate the odd and even numbers, add up all the evens, multiply all the odds, then subtract the even sum from the odd multiplicand.

The price? For the most part, it's only extra typing. In a dynamic language like Python, you will pay a vanishingly small price for the variable look ups, almost certainly swamped by the filtering and reducing functions. For a static language like C or C++, you might actually gain execution time from this; the optimizer doesn't have to search so hard for subexpressions to optimize, one of the most important optimizations, because you've already laid it out. (This is more an issue with more complicated expressions, where you may be tempted to write things in such a way that the computer can't know whether or not the subexpression is a constant or not.)

Another benefit you might not expect until you start doing this habitually (I didn't) is that it really helps code reusability on the micro level (i.e., copy/paste level). Both functions filter out even numbers, but if you ever want to copy & paste that code somewhere else, the second form is much easier to deal with. Of course, you'll do that with slightly more complicated code, where it would actually be work to separate it out of the messy expression.

Once you start doing this sort of thing, the question of "what code to comment" goes away. My comment heuristic is simply that I comment when I can't be as clear as I am above. Sometimes ya just gotta work yer deep wizardry; then you comment. The rest of the time, your code ought to be virtually self-commenting.

Finally, perhaps the most important benefit of all, the second code is easier to manipulate in your head. The human brain can only process so much at a time. The first code block is more-or-less perceived as a single block, which makes piece-wise debugging difficult mentally, as well as understanding what it does. The second form fits our brains much more easily, since it only does one or two things at a time, and shows us how to assemble them in small steps. This is another benefit that you must see to believe. Because of this, you will actually write code faster, because understanding it is one of the hardest things.

For a little extra typing, you'll write more maintainable code, better code, more quickly. Sounds likes a deal to me.

Selective Brevity Can Greatly Enhance Coding Clarity

The other side of the coin from the preceding, but this side is much better known. If you have a twenty line segment of code in the middle of a forty-line function, and it does something distinct, wrap it into a function and give that function a descriptive name.

On the topic of brevity, I personally abhor Hungarian Notation. (Look for it on the web if you don't know what it is.) Hungarian notation dates from before C++, and the corresponding strengthening of the type system over C. It is almost always unnecessary, and turns your code into a Perl nightmare of meaningless gibberish. A case can be made that for common types, it can be helpful, but as soon as you can't immediately remember what the prefix means, you have done negative work. Sometimes I will append the type of a variable, if it will make things clear; this is especially helpful when writing code that maintains or reflects state across multiple representations, where there may be a UI and a data model value with the same name.

On a related note, I have not figured out why so much C and C++ code I see "in the wild" is so incredibly redundant. If you're going to use a particular window handle 10 times in a function, do you really need to fully write out window->parent.ChildRegistery("Something", &cPntr, s_State)->element(*cb_Checkbox1) every single time?

There has been a traditional split between the "industry" and the "academic" worlds, with each side looking down its nose at the other. The fact of the matter is that both sides have excellent reason to think what they do about the other side.

The industry is in the trenches, and prides itself on getting things done. People in the industry often look down on the "ivory tower" freaks who spin complicated theories that are useless in the real world, because they never come down and live in the real world to see what it's like.

(Though it's worth remembering some of those "complicated theories" and other trash academia produces is a side-effect of thesis that masters and PhD students write to get their degrees; these people are quickly flushed from academia and go down to the "real world", where they usually promptly forget about their weird theories.)

These comments are not entirely unjustified, but it's worth remembering that the academic world is populated with ferociously smart people who have a lot more time to think about things, with a lot less economic pressure, then those in industry. Some of those fancy theories may in fact be highly useful or practical if you'd give them a chance.

In your personal life, the split is only as real as you allow it to be. Learn from the academics. In particular for programmers of all varieties, the academic discipline of "software engineering" is a highly practical field of study that can have significant benefits for you in your life. However, you will want to temper the academic ideas of software engineering with the experieces of real software engineers, which thanks to the internet are now readily available online. The best way to avoid falling into a mere fad is to read widely.

You ignore wisdom at your own peril. . . true for programming as it is in all of life. Look at the design community. Learn what "design patterns" are. Examine your processes to see what can be improved. See what tools there may be to help you that you didn't even knew existed. Don't let your own opinions prevent you from learning something very useful and applicable to the above rules.

Jerf.org : Lazy Programmer's Guide