Don't Repeat Yourself and the Strong Law of Small Numbers

2024-02-19

There aren’t enough small numbers to meet the many demands made of them.

What this means is that there are so many more mathematical patterns in the world that involve small numbers than there are small numbers that there will inevitably be coincidences where two completely distinct will share some terms together, but that sharing is essentially false and meaningless.

3Blue1Brown goes over one of the Wikipedia page’s examples of a sequence of terms that reads 1, 2, 4, 8, and 16. The obvious guess for the next term is 32, but for this sequence, it is actually 31.

An OEIS lookup yields some other interesting sequences that contain that. Some interesting ones:

A180414: Number of different resistances that can be obtained by combining n one-ohm resistors.; next term is 36.
A067945: Numbers k that divide 3^k - 1; next term is 20.
A004000: RATS: Reverse Add Then Sort the digits applied to previous term, starting with 1; next term is 77.
A033496: Numbers m that are the largest number in their Collatz (3x+1) trajectory: next term is 20.
A018487: Divisors of 496; next term is 31.
A326081: Number of subsets of {1..n} containing the product of any set of distinct elements whose product is <= n; next term is actually 32, the one after that is 56.

And I just skimmed through the first few pages and confined myself to patterns that started with 1, 2, 4 in the OEIS listing (so no arguments from the peanut gallery about things starting as 0, 0, 0, 0, 1, 2 or something). There’s so many patterns that you can’t help but have 1, 2, 4, 8, 16 appear in many things other than 2^n.

Don’t Repeat Yourself is the principle in programming that a given concept should appear in only one location. There should be one canonical location for a given bit of configuration, one canonical location for how to convert a configuration file into that configuration, one canonical location that understands how to take a given product’s usage and convert it into billing information, etc. You should not, say, have twelve functions that sort arrays of integers in slightly different ways rather than having one sort function that can be parameterized for those various sorts.

I still generally like DRY as a guiding principle, if for no other reason than the opposite is clearly silly. Most programmers have worked on code bases built on the principle that all concepts should be smeared all over the code base in every place that uses the concept and it doesn’t take a programming genius to work out that is a bad idea.

But my practice of DRY has shifted. I generally let things grow for longer and diverge farther before I try to DRY them out, because I think there’s a similar effect to the Strong Law of Small Number on code and data structures. The first hour of any new subsystem in a code base is very likely to look very similar to something else because there’s only so many code and data organizations you can reach in that period of time. You’ve got a handful of structures and a few methods and maybe a new database table or something, and it superficially probably looks like any number of other things you’ve done before…

… at least in the short term. If you could holistically view that bit of code across its entire lifetime, and see what is would eventually develop into even at the beginning of its lifecycle, you’d see a lot more diversity. What may have looked like a simple CRUD update task may take on unique caching considerations, or acquire integration with a message bus, or be turned into a microservice, or any number of other things that weren’t obvious in the beginning.

All plants look similar when they are just little shoots stabbing up from the soil for the first time. But if you see a spruce tree and an oak tree sprouting at the same time and decide There Should Be Only One Concept Of Tree In My Code and tie them together, your trees are going to have a bad time. Especially if you keep coming back to them over time and giving them the Procrustes treatment, constantly jamming them into your initial understanding of their similarities, which depending on how badly you misinterpreted the situation back when they were just sprouts, may not even be a correct understanding in the first place. And you’re just going to kill everything if you also see a tomato plant growing up next to your trees and decide in the first day that they’re all the same thing and force all of them to live and die together. Die they will, and in sufficiently bad cases they’ll take your codebase with them.

The practical upshot of all of this is that I am in a lot less hurry to DRY my code out than I used to. There is always error where you prematurely DRY something you shouldn’t and where you fail to DRY something you should. It’s not a question of driving the error rate down to zero, it’s a question of minimizing the global costs of DRY errors, and I’ve found that in the past few years I’ve toned my zeal down a bit. I’m waiting longer for common bits of code to prove to me that they really are the same concept before I jam them into the same bed and start hacking limbs off. While this has indeed resulted in me sometimes putting it off for longer than I should have, on the net I’ve won on how many bits of code I’ve discovered weren’t as much much a Repeat as I thought.