Extract - some devops yang

If you're a modern sysadmin you've probably been sipping at the devops koolaid and trying out one or more of the current system configuration management tools like puppet or chef.

These tools are awesome - particularly for homogenous large-scale deployments of identical nodes.

In practice in the enterprise, though, things get more messy. You can have legacy nodes that can't be puppetised due to their sensitivity and importance; or nodes that are sufficiently unusual that the payoff of putting them under configuration management doesn't justify the work; or just systems which you don't have full control over.

We've been using a simple tool called extract in these kinds of environments, which pulls a given set of files from remote hosts and stores them under version control in a set of local per-host trees.

You can think of it as the yang to puppet or chef's yin - instead of pushing configs onto remote nodes, it's about pulling configs off nodes, and storing them for tracking and change control.

We've been primarily using it in a RedHat/CentOS environment, so we use it in conjunction with rpm-find-changes, which identifies all the config files under /etc that have been changed from their deployment versions, or are custom files not belonging to a package.

Extract doesn't care where its list of files to extract comes from, so it should be easily customised for other environments.

It uses a simple extract.conf shell-variable-style config file, like this:

# Where extracted files are to be stored (in per-host trees)
EXTRACT_ROOT=/data/extract

# Hosts from which to extract (space separated)
EXTRACT_HOSTS=host1 host2 host3

# File containing list of files to extract (on the remote host, not locally)
EXTRACT_FILES_REMOTE=/var/cache/rpm-find-changes/etc.txt

Extract also allows arbitrary scripts to be called at the beginning (setup) and end (teardown) of a run, and before and/or after each host. Extract ships with some example shell scripts for loading ssh keys, and checking extracted changes into git or bzr. These hooks are also configured in the extract.conf config e.g.:

# Pre-process scripts
# PRE_EXTRACT_SETUP - run once only, before any extracts are done
PRE_EXTRACT_SETUP=pre_extract_load_ssh_keys
# PRE_EXTRACT_HOST - run before each host extraction
#PRE_EXTRACT_HOST=pre_extract_noop

# Post process scripts
# POST_EXTRACT_HOST - run after each host extraction
POST_EXTRACT_HOST=post_extract_git
# POST_EXTRACT_TEARDOWN - run once only, after all extracts are completed
#POST_EXTRACT_TEARDOWN=post_extract_touch

Extract is available on github, and packages for RHEL/CentOS 5 and 6 are available from my repository.

Feedback/pull requests always welcome.