Power-user Configuration Files Should Be Programs

2006-12-22

Every program of any complexity needs to be configured. The traditional answer is to put this in some sort of text file, or something else that is fundamentally static data that must be read in and parsed. (That is, for the purposes of this discussion the Windows Registry is a static text file.)

If you are programming in a language that can take a program in at run-time, like Python, Ruby, or Perl, and you are targeting an advanced audience, the configuration file should actually be a program. In the degenerate case, the programming language can easily look like a configuration file, but then, if you need the full power of programming, it's sitting right there.

Update Dec. 24th: Similar thoughts w.r.t. build tools. I think declaration-based systems tend towards "demoware"; any arbitrary task can be made to look good in the demo of a declarative system, but it seems inevitable that before you can even blink, you need a scripting language. And ultimately, as with ant vs. rake, it's simpler just to pull in the full scripting language that to fall prey to Greenspun's Tenth Rule in your "simpler" declarative language.

For instance, Django uses a normal Python module to specify its settings. Mostly it looks like X = "VALUE", so it shouldn't confuse newbies. But because it's actually a Python module, it turns out to be easy to override or change things programmatically. On my webserver I have a separate "production_settings.py" file, which contains all the values that are different on the production deployment vs. my development environment. Then, at the end of my settings file, I can just add:

try:
    from jerf.production_settings import *
except ImportError:
    pass

And all the settings defined in production_settings.py overrides the settings in my "base" settings.py. The Django developers didn't have to anticipate this solution; the fact that they use a Python module for configuration instead of a file left me the flexibility to craft my own solution.

If there was ever an argument in favor of allowing users to mutate the syntax of the language, by modifying the meaning of operators or even defining new ones, I think this is it. Writing the actual code should probably stick to a well-defined common subset of code, because it's very, very important that you be able to understand what the code is doing, but people should have much more flexibility when it comes to defining program-based declarations.

I think this is ultimately where something like Python's decorators are most useful, and you might even use it as a rule of thumb that all decorators should actually be able to be thought of as declaring something, not just as what decorators literally mean in the code sense. One of the standard examples, @synchronized, is a great example; it's implemented in pure Python code (as opposed to being compiler magic or something), but if you just think of it as declaring that the method is synchronized, you can pretty much ignore the code.

Which then leads me to the rule of thumb that register is about a good a name for a decorator as thing is for a class. I plead guilty to doing this myself. (I still need to learn to listen more to that nagging voice that said there will be other decorators named 'register', this is a bad name.) A decorator shouldn't be named after the verb it is performing (which is pretty much always either @register or @wrap), it should be named after the declaration it is making. My @register should be @object_maker or something. (I don't think that's a good name, and that leads me to the belief that I need to more carefully consider the nomenclature I introduce in that library. I ought to have a name that makes sense after you've read the docs. Not to mention I ought to have a name that makes sense to the module author!)

Anyhow, as long as you have something that is cleaner than the resulting method-call disaster that would result in Java if you tried this (especially pre-Java-decorators), and the resulting configuration file isn't a complete syntax atrocity, it's probably still a win.

An example of a situation where I didn't do this, and I now somewhat regret it: At work, we make it easy to rapidly create a development database for a given developer. The database connection requires an XML file to specify what database to connect to. I wish I'd done it in a programming language, because then I could ship one file that pulled the developer's user name from the environment and inserted that into the database name at run time. Instead, every developer currently has to customize their own copy of this XML file to insert their name.

See also: .emacs, which just goes to show there's nothing new under the sun.