"Give me a place to stand and a lever long enough and I will move the world."

I was grumbling under my breath at a configuration management system today, and reminded of this wonderful statement by Archimedes.

Configuration management is the discipline of building systems which manage other systems -- cfengine is a well-known open source example. I needed to reboot a few hosts on a regular schedule -- easily handled in 5 minutes with "vi /etc/crontab" on each, or an ssh loop to append to the crontab on each affected system. I was struck by how many levels of indirection I needed to traverse to get this done with configuration management. This in turn prompted some thought about why jumping through the various hoops was worthwhile.

There are many excellent reasons to use configuration management:

  • Time savings -- over repeating the same actions over and over; this increases with the number of hosts involved.
  • Consistency -- configuration management ensures that (portions of) systems which should be identical really are.
  • Reproducibility -- because CMS is naturally tied into version control, it is easy to either examine or recreate the state of affairs at an arbitrary time in the past.
  • Modeling -- a CMS ends encompasses a representation of all the systems it manages. This efficient representation of those systems is quite useful for examining and comparing them. It's especially useful with a large or dynamic population of administrators, as it provides a single place to learn about the whole constellation of systems, and enforces some consistency among the various ways admins can manage systems.

In the simplest case, to make a machine reboot once, I could pull the plug and put it back (assuming I was near, or could get to, the machine). In a non-CMS scenario, I would do it with ssh and the shutdown -r. In this case, it was considerably more involved:

  • Launch PuTTY.
  • Log into a system with a checkout of the CMS configuration files.
  • Find the appropriate file (non-trivial if the managed constellation is complicated).
  • Fetch the latest version of the file (with multiple users, it's unlikely my checkout is current).
  • Edit the file corresponding to /etc/crontab or /var/spool/cron/root (I used kate, as I don't enjoy either vi or emacs, and BBEdit wasn't available); kate popped back an X11 session tunneled through ssh.
  • Create a pair of local machine sets in the file (cfengine calls these 'aliases'), each including half the covered systems (the systems reboot at staggered times, so they're not all down at once).
  • Create the pair of crontab lines, one for each machine set, embedding the pair of different reboot times and the shutdown -r command.
  • Check the modified crontab file back into the version control system; enter a message for the change log.
  • In a distributed CMS, staging hosts pick up the changes from version control, either on a schedule or when manually kicked for emergency/rush changes.
  • The affected hosts pick up the change from the CMS, and implement the specified change.

The reason Archimedes' quote is apropos is that configuration management provides excellent leverage -- I can edit one file in one place, and easily affect several systems (potentially hundreds or thousands). Each hoop I have to jump through provides an additional fulcrum. I can sit at my desk and use PuTTY to log into dozens of systems, across the world -- without even knowing where they are. Each change I make to the version control system is automatically picked up by every host participating in the system, and available to every admin with a checkout. I don't have to log into 8 machines (even uninteractively) to make them reboot -- I can orchestrate it all from my local workstation.

Unfortunately, mistakes are leveraged too; there is often no good way to test changes to production systems during business hours. If the changes are restricted to non-production hours, when the admin might not be around to monitor them (and shouldn't have to -- it's an automated system, after all!), the window could be closed by the time the admin sees whether the change was successful. Missing a change window can easily defer a change 24 hours.