Missing Delicious Feeds

I've been playing with using delicious as a lightweight URL database lately, mostly for use by greasemonkey scripts of various kinds (e.g. squatter_redirect).

For this kind of use I really just need a lightweight anonymous http interface to the bookmarks, and delicious provides a number of nice lightweight RSS and JSON feeds suitable for this purpose.

But it turns out the feed I really need isn't currently available. I mostly want to be able to ask, "Give me the set of bookmarks stored for URL X by user Y", or even better, "Give me the set of bookmarks stored for URL X by users Y, Z, and A".

Delicious have a feed for recent bookmarks by URL:

http://feeds.delicious.com/v2/{format}/url/{url md5}

and a feed for all a user's bookmarks:

http://feeds.delicious.com/v2/{format}/{username}

and feeds for a user's bookmarks limited by tag(s):

http://feeds.delicious.com/v2/{format}/{username}/{tag[+tag+...+tag]}

but not one for a user limited by URL, or for URL limited by user.

Neither alternative approach is both feasible and reliable: searching by url will only return the most recent set of N bookmarks; and searching by user and walking the entire (potentially large) set of their bookmarks is just too slow.

So for now I'm having to workaround the problem by adding a special hostname tag to my bookmarks (e.g. squatter_redirect=www.openfusion.net), and then using the username+tag feed as a proxy for my username+domain search.

Any cluesticks out there? Any nice delicious folk want to whip up a shiny new feed for the adoring throngs? :-)

Squatter Domains, Tracked with Delicious

A few weeks ago I hit a couple of domain squatter sites in quick succession and got a bit annoyed. I asked on twitter/identi.ca whether anyone knew of any kind of domain squatter database on the web, perhaps along the lines of the email RBL lists, but got no replies.

I thought at the time that delicious might be useful for this, in much the same way that Jon Udell has been exploring using delicious for collaborative event curation.

So here's the results of some hacking time this weekend: Squatter Redirect, a greasemonkey script (i.e. firefox only, sorry) that checks whether the sites you visit have been tagged on delicious as squatter domains that should be directed elsewhere, and if so, does the redirect in your browser.

Here's a few squatter domains to try it out on:

The script checks two delicious accounts - your personal account, so you can add your own domains without having to wait for them to be pulled into the 'official' squatter_redirect stream; and the official squatter_redirect delicious account, into which other people's tags are periodically pulled after checking.

Marking a new domain as a squatter simply involves creating a delicious bookmark for the squatter page with a few special tags:

  • squatter_redirect - flagging the bookmark for the attention of the squatter_redirect delicious user
  • squatter_redirect=www.realdomain.com - setting the real domain that you want to be redirected to
  • (optional) squatter_domain=www.baddomain.com - marker for the squatter domain itself (only required if you want to use from your own delicious account)

So www.quagga.org above would be tagged:

squatter_redirect squatter_redirect=www.quagga.net
# or, optionally:
squatter_redirect squatter_redirect=www.quagga.net squatter_domain=www.quagga.org

Feedback/comments welcome.

Quick Linux Box Hardware Overview

Note to self: here's how to get a quick overview of the hardware on a
linux box:
perl -F"\s*:\s*" -ane "chomp \$F[1];
  print qq/\$F[1] / if \$F[0] =~ m/^(model name|cpu MHz)/;
  print qq/\n/ if \$F[0] eq qq/\n/" /proc/cpuinfo
grep MemTotal /proc/meminfo
grep SwapTotal /proc/meminfo
fdisk -l /dev/[sh]d? 2>/dev/null | grep Disk

Particularly useful if you're auditing a bunch of machines (via an ssh loop or clusterssh or something) and want a quick 5000-foot view of what's there.

ASX20 Announcements Review

Question: you're a small investor with a handful of small share
investments in Australian companies listed on the ASX. How do you
keep track of the announcements those companies make to the ASX?

There are a couple of manual methods you can use. You can bookmark the announcements page on the various company websites you're interested in and go and check them periodically, but that's obviously pretty slow and labour intensive.

Or you can go to a centralised point, the ASX Announcements Page, and search for all announcements from there. Unfortunately, the ASX only lets you search for one company at a time, so that's also pretty slow, and still requires you do all the checking manually - there's no automation available without subscribing to the ASX's expensive data feed services.

There are also other third-party subscription services you can pay for that will do this for you, but it's annoying to have to pay for what is rightly public information.

The better answer is for the company themselves to provide their announcements through some sort of push mechanism. The traditional method is via email, where you subscribe to company announcements, and they show up in your inbox shortly after they're released.

But the best and most modern solution is for companies to provide a syndication feed on their website in a format like RSS or Atom, which can be monitored and read using feed readers like Google Reader, Mozilla Thunderbird, or Omea Reader. Feeds are superior to email announcments in that they're centralised and lightweight (big companies don't have to send out thousands of emails, for instance), and they're a standardised format, and so can be remixed and repurposed in lots of interesting ways.

So out of interest I did a quick survey of the current ASX20 (the top 20 list of companies on the ASX according to Standards and Poors) to see how many of them support syndicating their announcements either by email or by RSS/Atom. Here are the results:

Table: Company Announcement Availability, ASX20
Company via web via email via RSS/Atom
AMP tick tick RSS Feed
ANZ tick
BHP tick tick
Brambles (BXB) tick tick
Commonwealth Bank (CBA) tick tick
CSL tick tick
Fosters (FGL) tick
Macquarie Group (MQG) tick tick
NAB tick
Newcrest Mining (NCM) tick
Origin Energy (ORG) tick tick RSS Feed
QBE Insurance (QBE) tick
Rio Tinto (RIO) tick tick RSS Feed
Suncorp Metway (SUN) tick tick RSS Feed
Telstra (TLS) tick tick RSS Feed
Wesfarmers (WES) tick tick RSS Feed
Westfield (WDC) tick tick
Westpac (WBC) tick
Woodside Petroleum (WPL) tick tick
Woolworths (WOW) tick tick RSS Feed

Some summary ratings:

  • Grade: A - announcements available via web, email, and RSS: AMP, ORG, RIO, SUN, TLS, WES, WOW (7)
  • Grade: B - announcements available via web and email: BHP, BXB, CBA, CSL, MQG, WDC, WPL (7)
  • Grade: C - announcements available via web: ANZ, FGL, NAB, NCM, QBE, WBC (6)

Overall, I'm relatively impressed that 7 of the ASX20 do support RSS. On the down side, the fact that another 6 don't even provide an email announcements service is pretty ordinary, especially considering the number of retail shareholders who hold some of these stocks (e.g. three of the big four banks, bonus points to CBA, the standout).

Special bonus points go to:

  • Suncorp Metway and Wesfarmers, who also offer RSS feeds for upcoming calendar events;

  • Rio Tinto, who have their own announcements twitter account.

Corrections and updates are welcome in the comments.

The Joy of Scripting

Was going home on the train with Hannah (8) this afternoon, and she says, "Dad, what's the longest word you can make without using any letters with tails or stalks?". "Do you really want to know?", I asked, and whipping out the trusty laptop, we had an answer within a couple of train stops:

egrep -v '[A-Zbdfghjklpqty]' /usr/share/dict/words | \
perl -nle 'chomp; push @words, $_;
  END { @words = sort { length($b) cmp length($a) } @words;
        print join "\n", @words[0 .. 9] }'

noncarnivorousness
nonceremoniousness
overcensoriousness
carnivorousnesses
noncensoriousness
nonsuccessiveness
overconsciousness
semiconsciousness
unacrimoniousness
uncarnivorousness

Now I just need to teach her how to do that.

Cityrail Timetables Greasemonkey Script

I got sufficiently annoyed over last week's Cityrail Timetable fiasco that I thought I'd contribute something to the making-Cityrail-bearable software ecosystem.

So this post is to announce a new Greasemonkey script called Cityrail Timetables Reloaded [CTR], available at the standard Greasemonkey repository on userscripts.org, that cleans up and extensively refactors Cityrail's standard timetable pages.

Here's a screenshot of Cityrail's initial timetable page for the Northern line:

Cityrail standard timetable

and here's the same page with CTR loaded:

Cityrail timetable via CTR

CTR loads a configurable number of pages rather than forcing you to click through them one by one, and in fact will load the whole set if you tell it to.

It also has support for you specifying the 'from' and 'to' stations you're travelling between, and will highlight them for you, as well as omit stations well before or well after yours, and any trains that don't actually stop at your stations. This can compress the output a lot, allowing you to fit more pages on your screen:

Cityrail timetable via CTR

I can't see Cityrail having a problem with this script since it's just taking their pages and cleaning them up, but we shall see.

If you're a firefox/greasemonkey user please try it out and post your comments/feedback here or on the userscripts site.

Enjoy!

Soul Communications FAIL!

What's a blog if not a vehicle for an occasional rant?

I used to have a mobile with Soul Communications, and recently changed to another provider because Soul cancelled the plan I'd been on with them for 3 or 4 years. I ported my number, and gathered that that would close the Soul account, and all would be good. Soul has a credit card on that account that they've billed for the last 3 years or so without problems. I've had nothing from them to indicate there are any issues.

And so today I get a Notice of Demand and Disconnection from Soul advising me that my account is overdue, charging me additional debt recovery fees, and advising that if I don't pay all outstanding amounts immediately it'll be referred to debt collectors.

Nice work Soul.

So let's recap. I've had no notices that my account is overdue, no contact from anyone from Soul, no indication that there are any issues, and then a Notice of Demand?

I go and check my email, in case I've missed something. Two emails from Soul since the beginning of the year, the most recent from a week ago. They're HTML-only, of course, and I use a text email client, but hey, I'll go the extra mile and fire up an HTML email client to workaround the fact that multipart/alternative is a bit too hard.

The emails just say, "Your Soul Bill is Now Available", and point to the "MySoul Customer Portal". (Yes, it would be nice if it was a link to the actual bill, of course, rather than expecting me to navigate through their crappy navigation system, but that's clearly a bit too sophisticated as well; but I digress.) There's no indication in any of the emails that anything is amiss, like a "Your account is overdue" message or something. So no particular reason I would have bothered to go and actually login to their portal, find my bill, and review it, right? They've got the credit card, right?

So let's go and check the bill. Go to "MySoul Salvation Portal", or whatever it's called, dig out obscure customer number and sekrit password, and login. Except I can't. "This account is inactive."

Aaaargh!

So let's recap:

  • account has been cancelled due to move to another carrier (yippee!)

  • can't login to super-customer-portal to get bills

  • emails from Soul do not indicate there are any problems with the account

  • no other emails from the Soul saying "we have a problem"

  • maybe they could, like, phone my mobile, since they do have the number - no, too hard!

Epic mega stupendous FAIL! What a bunch of lusers.

So now I've phoned Soul, had a rant, and been promised that they'll email me the outstanding accounts. That was half an hour ago, and nothing in the inbox yet. I get the feeling they don't really want to be paid.

And I feel so much better now. :-)

mod_auth_tkt 2.0.1

I'm happy to announce the release of mod_auth_tkt 2.0.1, the first full release of mod_auth_tkt in a couple of years. The 2.0.x release includes numerous enhancements and bugfixes, including guest login support and full support for apache 2.2.

mod_auth_tkt is a lightweight single-sign-on authentication module for apache, supporting versions 1.3.x, 2.0.x, and 2.2.x. It uses secure cookie-based tickets to implement a single-signon framework that works across multiple apache instances and servers. It's also completely repository agnostic, relying on a user-supplied script to perform the actual authentication.

The release is available as a tarball and various RPMs from the mod_auth_tkt homepage.

Testing Disqus

I'm trying out disqus, since I like the idea of being able to track/collate my comments across multiple endpoints, rather than have them locked in to various blogging systems. So this is a test post to try out commenting. Please feel free to comment ad nauseum below (and sign up for a disqus account, if you don't already have one).

Open Fusion RPM Repository

Updated 2014-09-26 for CentOS 7.

Over the last few years I've built up quite a collection of packages for CentOS, and distribute them via a yum repository. They're typically packages that aren't included in DAG/RPMForge when I need them, so I just build them myself. In case they're useful to other people, this post documents the repository locations, and how you can get setup to make use of it yourself.

Obligatory Warning: this is a personal repository, so it's primarily for packages I want to use myself on a particular platform i.e. coverage is uneven, and packages won't be as well tested as a large repository like RPMForge. Also, I routinely build packages that replace core packages, so you'll want the repo disabled by default if that concerns you. Use at your own risk, packages may nuke your system and cause global warming, etc. etc.

Locations:

To add the Open Fusion repository to your yum configuration, just install the following 'openfusion-release' package:

# CentOS 5:
sudo rpm -Uvh http://repo.openfusion.net/centos5-x86_64/openfusion-release-0.7-1.of.el5.noarch.rpm
# CentOS 6:
sudo rpm -Uvh http://repo.openfusion.net/centos6-x86_64/openfusion-release-0.7-1.of.el6.noarch.rpm
# CentOS 7:
sudo rpm -Uvh http://repo.openfusion.net/centos7-x86_64/openfusion-release-0.7-1.of.el7.noarch.rpm

And here are the openfusion-release packages as links:

Feedback and suggestions are welcome. Packaging requests are also welcome, particularly when they involve my wishlist. ;-)

Enjoy.

Questions That Cannot Be Answered

Was thinking this morning about my interactions with the web over the last couple of weeks, and how I've been frustrated with not being able to (simply) get answers to relatively straightforward questions from the automated web. This is late 2008, and Google and Google Maps and Wikipedia and Freebase etc. etc. have clearly pushed back the knowledge boundaries here hugely, but at the same time lots of relatively simple questions are as yet largely unanswerable.

By way of qualification, I mean are not answerable in an automated fashion, not that they cannot be answered by asking the humans on the web (invoking the 'lazyweb'). I also don't mean that these questions are impossible to answer given the time and energy to collate the results available - I mean that they are not simply and reasonably trivially answerable, more or less without work on my part. (e.g. "How do I get to address X" was kind of answerable before Google Maps, but they were arguably the ones who made it more-or-less trivial, and thereby really solved the problem.)

So in the interests of helping delineate some remaining frontiers, and challenging ourselves, here's my catalogue of questions from the last couple of weeks:

  • what indoor climbing gyms are there in Sydney?

  • where are the indoor climbing gyms in Sydney (on a map)?

  • what are the closest gyms to my house?

  • how much are the casual rates for adults and children for the gyms near my house?

  • what are the opening hours for the gyms near my house?

  • what shops near my house sell the Nintendo Wii?

  • what shops near my house have the Wii in stock?

  • what shops near my house are selling Wii bundles?

  • what is the pricing for the Wii and Wii bundles from shops near my house?

  • of the shops near my house that sell the Wii, who's open late on Thursdays?

  • of the shops near my house that sell the Wii, what has been the best pricing on bundles over the last 6 months?

  • trading off distance to travel against price, where should I buy a Wii?

  • what are the "specials" at the supermarkets near my house this week?

  • given our grocery shopping habits and the current specials, which supermarket should I shop at this week?

  • I need cereal X - do any of the supermarkets have have it on special?

That's a useful starting set from the last two weeks. Anyone else? What are your recent questions-that-cannot-be-answered? (And if you blog, tag with #qtcba pretty please).

CSS and Javascript Minification

I've been playing with the very nice YSlow firefox plugin recently, while doing some front-end optimisation on a Catalyst web project.

Most of YSlow's tuning tips were reasonably straightforward, but I wasn't sure how to approach the concatenation and minification of CSS and javascript files that they recommend.

Turns out - as is often the case - there's a very nice packaged solution on CPAN.

The File::Assets module provides concatentation and minification for CSS and Javascript 'assets' for a web page, using the CSS::Minifier (::XS) and JavaScript::Minifier (::XS) modules for minification. To use, you add a series of .css and .js files in building your page, and then 'export' them at the end, which generates a concatenated and minified version of each type in an export directory, and an appropriate link to the exported version. You can do separate exports for CSS and Javascript if you want to follow the Yahoo/YSlow recommendation of putting your stylesheets at the top and your scripts at the bottom.

There's also a Catalyst::Plugin::Assets module to facilitate using File::Assets from Catalyst.

I use Mason for my Catalyst views (I prefer using perl in my views rather than having another mini-language to learn) and so use this as follows.

First, you have to configure Catalyst::Plugin::Assets in your project config file (e.g. $PROJECT_HOME/project.yml):

Plugin::Assets:
    path: /static
    output_path: build/
    minify: 1

Next, I set the per-page javascript and and css files I want to include as mason page attributes in my views (using an arrayref if there's more than one item of the given type) e.g.

%# in my person view
<%attr>
js => [ 'jquery.color.js', 'person.js' ]
css => 'person.css'
</%attr>

Then in my top-level autohandler, I include both global and per-page assets like this:

<%init>
# Asset collation, javascript (globals, then per-page)
$c->assets->include('js/jquery.min.js');
$c->assets->include('js/global.js');
if (my $js = $m->request_comp->attr_if_exists('js')) {
  if (ref $js && ref $js eq 'ARRAY') {
    $c->assets->include("js/$_") foreach @$js;
  } else {
    $c->assets->include("js/$js");
  }
}
# The CSS version is left as an exercise for the reader ...
# ...
</%init>

Then, elsewhere in the autohandler, you add an exported link at the appropriate point in the page:

<% $c->assets->export('text/javascript') %>

This generates a link something like the following (wrapped here):

<script src="http://www.example.com/static/build/assets-ec556d1e.js"
  type="text/javascript"></script>

Beautiful, easy, maintainable.

Basic KVM on CentOS 5

I've been using kvm for my virtualisation needs lately, instead of xen, and finding it great. Disadvantages are that it requires hardware virtualisation support, and so only works on newer Intel/AMD CPUs. Advantages are that it's baked into recent linux kernels, and so more or less Just Works out of the box, no magic kernels required.

There are some pretty useful resources covering this stuff out on the web - the following sites are particularly useful:

There's not too much specific to CentOS though, so here's the recipe I've been using for CentOS 5:

# Confirm your CPU has virtualisation support
egrep 'vmx|svm' /proc/cpuinfo

# Install the kvm and qemu packages you need
# From the CentOS Extras repository (older):
yum install --enablerepo=extras kvm kmod-kvm qemu
# OR from my repository (for most recent kernels only):
ARCH=$(uname -i)
OF_MREPO=http://www.openfusion.com.au/mrepo/centos5-$ARCH/RPMS.of/
rpm -Uvh $OF_MREPO/openfusion-release-0.3-1.of.noarch.rpm
yum install kvm kmod-kvm qemu

# Install the appropriate kernel module - either:
modprobe kvm-intel
# OR:
modprobe kvm-amd
lsmod | grep kvm

# Check the kvm device exists
ls -l /dev/kvm

# I like to run my virtual machines as a 'kvm' user, not as root
chgrp kvm /dev/kvm
chmod 660 /dev/kvm
ls -l /dev/kvm
useradd -r -g kvm kvm

# Create a disk image to use
cd /data/images
IMAGE=centos5x.img
# Note that the specified size is a maximum - the image only uses what it needs
qemu-img create -f qcow2 $IMAGE 10G
chown kvm $IMAGE

# Boot an install ISO on your image and do the install
MEM=1024
ISO=/path/to/CentOS-5.2-x86_64-bin-DVD.iso
# ISO=/path/to/WinXP.iso
qemu-kvm -hda $IMAGE -m ${MEM:-512} -cdrom $ISO -boot d
# I usually just do a minimal install with std defaults and dhcp, and configure later

# After your install has completed restart without the -boot parameter
# This should have outgoing networking working, but pings don't work (!)
qemu-kvm -hda $IMAGE -m ${MEM:-512} &

That should be sufficient to get you up and running with basic outgoing networking (for instance as a test desktop instance). In qemu terms this is using 'user mode' networking which is easy but slow, so if you want better performance, or if you want to allow incoming connections (e.g. as a server) you need some extra magic, which I'll cover in a "subsequent post":kvm_bridging.

Simple KVM Bridging

Following on from my post yesterday on "Basic KVM on CentOS 5", here's how to setup simple bridging to allow incoming network connections to your VM (and to get other standard network functionality like pings working). This is a simplified/tweaked version of Hadyn Solomon's bridging instructions.

Note this this is all done on your HOST machine, not your guest.

For CentOS:

# Install bridge-utils
yum install bridge-utils

# Add a bridge interface config file
vi /etc/sysconfig/network-scripts/ifcfg-br0
# DHCP version
ONBOOT=yes
TYPE=Bridge
DEVICE=br0
BOOTPROTO=dhcp
# OR, static version
ONBOOT=yes
TYPE=Bridge
DEVICE=br0
BOOTPROTO=static
IPADDR=xx.xx.xx.xx
NETMASK=255.255.255.0

# Make your primary interface part of this bridge e.g.
vi /etc/sysconfig/network-scripts/ifcfg-eth0
# Add:
BRIDGE=br0
# Optional: comment out BOOTPROTO/IPADDR lines, since they're
# no longer being used (the br0 takes precedence)

# Add a script to connect your guest instance to the bridge on guest boot
vi /etc/qemu-ifup
#!/bin/bash
BRIDGE=$(/sbin/ip route list | awk '/^default / { print $NF }')
/sbin/ifconfig $1 0.0.0.0 up
/usr/sbin/brctl addif $BRIDGE $1
# END OF SCRIPT
# Silence a qemu warning by creating a noop qemu-ifdown script
vi /etc/qemu-ifdown
#!/bin/bash
# END OF SCRIPT
chmod +x /etc/qemu-if*

# Test - bridged networking uses a 'tap' networking device
NAME=c5-1
qemu-kvm -hda $NAME.img -name $NAME -m ${MEM:-512} -net nic -net tap &

Done. This should give you VMs that are full network members, able to be pinged and accessed just like a regular host. Bear in mind that this means you'll want to setup firewalls etc. if you're not in a controlled environment.

Notes:

  • If you want to run more than one VM on your LAN, you need to set the guest MAC address explicitly, since otherwise qemu uses a static default that will conflict with any other similar VM on the LAN. e.g. do something like:
# HOST_ID, identifying your host machine (2-digit hex)
HOST_ID=91
# INSTANCE, identifying the guest on this host (2-digit hex)
INSTANCE=01
# Startup, but with explicit macaddr
NAME=c5-1
qemu-kvm -hda $NAME.img -name $NAME -m ${MEM:-512} \
  -net nic,macaddr=00:16:3e:${HOST_ID}:${INSTANCE}:00 -net tap &
  • This doesn't use the paravirtual ('virtio') drivers that Hadyn mentions, as these aren't available until kernel 2.6.25, so they're not available to CentOS linux guests without a kernel upgrade.

Catalyst + Screen

I'm an old-school developer, doing all my hacking using terms, the command line, and vim, not a heavyweight IDE. Hacking perl Catalyst projects (and I imagine other MVC-type frameworks) can be slightly more challenging in this kind of environment because of the widely-branching directory structure. A single conceptual change can easily touch controller classes, model classes, view templates, and static javascript or css files, for instance.

I've found GNU screen to work really well in this environment. I use per-project screen sessions set up specifically for Catalyst - for my 'usercss' project, for instance, I have a ~/.screenrc-usercss config that looks like this:

source $HOME/.screenrc
setenv PROJDIR ~/work/usercss
setenv PROJ UserCSS
screen -t home
stuff "cd ~^Mclear^M"
screen -t top
stuff "cd $PROJDIR^Mclear^M"
screen -t lib
stuff "cd $PROJDIR/lib/$PROJ^Mclear^M"
screen -t controller
stuff "cd $PROJDIR/lib/Controller^Mclear^M"
screen -t schema
stuff "cd $PROJDIR/lib/$PROJ/Schema/Result^Mclear^M"
screen -t htdocs
stuff "cd $PROJDIR/root/htdocs^Mclear^M"
screen -t static
stuff "cd $PROJDIR/root/static^Mclear^M"
screen -t sql
stuff "cd $PROJDIR^Mclear^M"
select 0

(the ^M sequences there are actual Ctrl-M newline characters).

So a:

screen -c ~/.screenrc-usercss

will give me a set of eight labelled screen windows: home, top, lib, controller, schema, htdocs, static, and sql. I usually run a couple of these in separate terms, like this:

dual-screen screenshot

To make this completely brainless, I also have the following bash function defined in my ~/.bashrc file:

sc ()
{
  SC_SESSION=$(screen -ls | egrep -e "\.$1.*Detached" | \
    awk '{ print $1 }' | head -1);
  if [ -n "$SC_SESSION" ]; then
    xtitle $1;
    screen -R $SC_SESSION;
  elif [ -f ~/.screenrc-$1 ]; then
    xtitle $1;
    screen -S $1 -c ~/.screenrc-$1
  else
    echo "Unknown session type '$1'!"
  fi
}

which lets me just do sc usercss, which reattaches to the first detached 'usercss' screen session, if one is available, or starts up a new one.

Fast, flexible, lightweight. Choose any 3.

Brackup Tips and Tricks

Further to my earlier post, I've spent a good chunk of time implementing brackup over the last few weeks, both at home for my personal backups, and at $work on some really large trees. There are a few gotchas along the way, so thought I'd document some of them here.

Active Filesystems

First, as soon as you start trying to brackup trees on any size you find that brackup aborts if it finds a file has changed between the time it initially walks the tree and when it comes to back it up. On an active filesystem this can happen pretty quickly.

This is arguably reasonable behaviour on brackup's part, but it gets annoying pretty fast. The cleanest solution is to use some kind of filesystem snapshot to ensure you're backing up a consistent view of your data and a quiescent filesystem.

I'm using linux and LVM, so I'm using LVM snapshots for this, using something like:

SIZE=250G
VG=VolGroup00
PART=${1:-export}

mkdir -p /${PART}_snap
lvcreate -L$SIZE --snapshot --permission r -n ${PART}_snap /dev/$VG/$PART && \
  mount -o ro /dev/$VG/${PART}_snap /${PART}_snap

which snapshots /dev/VolGroup00/export to /dev/VolGroup00/export_snap, and mounts the snapshot read-only on /export_snap.

The reverse, post-backup, is similar:

VG=VolGroup00
PART=${1:-export}

umount /${PART}_snap && \
  lvremove -f /dev/$VG/${PART}_snap

which unmounts the snapshot and then deletes it.

You can then do your backup using the /${PART}_snap tree instead of your original ${PART} one.

Brackup Digests

So snapshots works nicely. Next wrinkle is that by default brackup writes its digest cache file to the root of your source tree, which in this case is readonly. So you want to tell brackup to put that in the original tree, not the snapshot, which you do in the your ~/.brackup.conf file e.g.

[SOURCE:home]
path = /export_snap/home
digestdb_file = /exportb/home/.brackup-digest.db
ignore = \.brackup-digest.db$

I've also added an explicit ignore rule for these digest files here. You don't really need to back these up as they're just caches, and they can get pretty large. Brackup automatically skips the digestdb_file for you, but it doesn't skip any others you might have, if for instance you're backing up the same tree to multiple targets.

Synching Backups Between Targets

Another nice hack you can do with brackup is sync backups on filesystem-based targets (that is, Target::Filesystem, Target::Ftp, and Target::Sftp) between systems. For instance, I did my initial home directory backup of about 10GB onto my laptop, and then carried my laptop into where my server is located, and then rsync-ed the backup from my laptop to the server. Much faster than copying 10GB of data over an ADSL line!

Similarly, at $work I'm doing brackups onto a local backup server on the LAN, and then rsyncing the brackup tree to an offsite server for disaster recovery purposes.

There are a few gotchas when doing this, though. One is that Target::Filesystem backups default to using colons in their chunk file names on Unix-like filesystems (for backwards-compatibility reasons), while Target::Ftp and Target::Sftp ones don't. The safest thing to do is just to turn off colons altogether on Filesystem targets:

[TARGET:server_fs_home]
type = Filesystem
path = /export/brackup/nox/home
no_filename_colons = 1

Second, brackup uses a local inventory database to avoid some remote filesystem checks to improve performance, so that if you replicate a backup onto another target you also need to make a copy of the inventory database so that brackup knows which chunks are already on your new target.

The inventory database defaults to $HOME/.brackup-target-TARGETNAME.invdb (see perldoc Brackup::InventoryDatabase), so something like the following is usually sufficient:

cp $HOME/.brackup-target-OLDTARGET.invdb $HOME/.brackup-target-NEWTARGET.invdb

Third, if you want to do a restore using a brackup file (the SOURCE-DATE.brackup output file brackup produces) from a different target, you typically need to make a copy and then update the header portion for the target type and host/path details of your new target. Assuming you do that and your new target has all the same chunks, though, restores work just fine.

Fun with brackup

I've been playing around with Brad Fitzpatrick's brackup for the last couple of weeks. It's a backup tool that "slices, dices, encrypts, and sprays across the net" - notably to Amazon S3, but also to filesystems (local or networked), FTP servers, or SSH/SFTP servers.

I'm using it to backup my home directories and all my image and music files both to a linux server I have available in a data centre (via SFTP) and to Amazon S3.

brackup's a bit rough around the edges and could do with some better documentation and some optimisation, but it's pretty useful as it stands. Here are a few notes and tips from my playing so far, to save others a bit of time.

Version: as I write the latest version on CPAN is 1.06, but that's pretty old - you really want to use the current subversion trunk instead. Installation is the standard perl module incantation e.g.

# Checkout from svn or whatever
cd brackup
perl Makefile.PL
make
make test
sudo make install

Basic usage is as follows:

# First-time through (on linux, in my case):
cd
mkdir brackup
cd brackup
brackup
Error: Your config file needs tweaking. I put a commented-out template at:
  /home/gavin/.brackup.conf

# Edit the vanilla .brackup.conf that was created for you.
# You want to setup at least one SOURCE and one TARGET section initially,
# and probably try something smallish i.e. not your 50GB music collection!
# The Filesystem target is probably the best one to try out first.
# See '`perldoc Brackup::Root`' and '`perldoc Brackup::Target`' for examples
$EDITOR ~/.brackup.conf

# Now run your first backup changing SOURCE and TARGET below to the names
# you used in your .brackup.conf file
brackup -v --from=SOURCE --to=TARGET

# You can also do a dry run to see what brackup's going to do (undocumented)
brackup -v --from=SOURCE --to=TARGET --dry-run

If all goes well you should get some fairly verbose output about all the files in your SOURCE tree that are being backed up for you, and finally a brackup output file (typically named SOURCE-DATE.brackup) should be written to your current directory. You'll need this brackup file to do your restores, but it's also stored on the target along with your backup, so you can also retrieve it from there (using brackup-target, below) if your local copy gets lost, or if you need to restore to somewhere else.

Restores reference that SOURCE-DATE.brackup file you just created:

# Create somewhere to restore to
mkdir -p /tmp/brackup-restore/full

# Restore the full tree you just backed up
brackup-restore -v --from=SOURCE-DATE.brackup --to=/tmp/brackup-restore/full --full

# Or restore just a subset of the tree
brackup-restore -v --from=SOURCE-DATE.brackup --to=/tmp/brackup-restore --just=DIR
brackup-restore -v --from=SOURCE-DATE.brackup --to=/tmp/brackup-restore --just=FILE

You can also use the brackup-target utility to query a target for the backups it has available, and do various kinds of cleanup:

# List the backups available on the given target
brackup-target TARGET list_backups

# Get the brackup output file for a specific backup (to restore)
brackup-target TARGET get_backup BACKUPFILE

# Delete a brackup file on the target
brackup-target TARGET delete_backup BACKUPFILE

# Prune the target to the most recent N backup files
brackup-target --keep-backups 15 TARGET prune

# Remove backup chunks no longer referenced by any backup file
brackup-target TARGET gc

That should be enough to get you up and running with brackup - I'll cover some additional tips and tricks in a subsequent post.

Banking for Geeks

Heard via @chieftech on twitter that the Banking Technology 2008 conference is on today. It's great to see the financial world engaging with developments online and thinking about new technologies and the Web 2.0 space, but the agenda strikes me as somewhat weird, perhaps driven mainly by the vendors they could get willing to spruik their wares?

How, for instance, can you have a "Banking Technology" conference and not have at least one session on 'online banking'? Isn't this the place where your technology interfaces with your customers? Weird.

My impression of the state of online banking in Australia is pretty underwhelming. As a geek who'd love to see some real technology innovation impact our online banking experiences, here are some wishlist items dedicated to the participants of Banking Technology 2008. I'd love to see the following:

  • Multiple logins to an account e.g. a readonly account for downloading things, a bill-paying account that can make payments to existing vendors, but not configure new ones, etc. This kind of differentiation would allow automation (scripts/services) using 'safe' accounts, without having to put your master online banking details at risk.

  • API access to certain functions e.g. balance checking, transaction downloads, bill payment to existing vendors, internal transfers, etc. Presumably dependent upon having multiple logins (previous), to help mitigate security issues.

  • Tagging functionality - the ability to interactively tag transactions (e.g. 'utilities', 'groceries', 'leisure', etc.), and to get those tags included in transaction reporting and/or downloading. Further, allow autotagging of transactions via descriptions/type/other party details etc.

  • Alert conditions - the ability to setup various kinds of alerts on various conditions, like low or negative balances, large withdrawals, payroll deposit, etc. I'm not so much thinking of plugging into particular alert channels here (email, SMS, IM, etc), just the ability to set 'flags' on conditions.

  • RSS support - the ability to configure various kinds of RSS feeds of 'interesting' data. Authenticated, of course. Examples: per-account transaction feeds, an alert condition feed (low balance, transaction bouncing/reversal, etc.), bill payment feed, etc. Supplying RSS feeds also means that such things can be plugged into other channels like email, IM, twitter, SMS, etc.

  • Web-friendly interfaces - as Eric Schmidt of Google says, "Don't fight the internet". In the online banking context, this means DON'T use technologies that work against the goodness of the web (e.g. frames, graphic-heavy design, Flash, RIA silos, etc.), and DO focus on simplicity, functionality, mobile clients, and web standards (HTML, CSS, REST, etc.).

  • Web 2.0 goodness - on the nice-to-have front (and with the proviso that it degrades nicely for non-javascript clients) it would be nice to see some ajax goodness allowing more friendly and usable interfaces and faster response times.

Other things I've missed? Are there banks out there already offering any of these?

Simple dual upstream gateways in CentOS

Had to setup some simple policy-based routing on CentOS again recently, and had forgotten the exact steps. So here's the simplest recipe for CentOS that seems to work. This assumes you have two upstream gateways (gw1 and gw2), and that your default route is gw1, so all you're trying to do is have packets that come in on gw2 go back out gw2.

1) Define an extra routing table e.g.

$ cat /etc/iproute2/rt_tables
#
# reserved values
#
255     local
254     main
253     default
0       unspec
#
# local tables
#
102     gw2
#

2) Add a default route via gw2 (here 172.16.2.254) to table gw2 on the appropriate interface (here eth1) e.g.

$ cat /etc/sysconfig/network-scripts/route-eth1
default table gw2 via 172.16.2.254

3) Add an ifup-local script to add a rule to use table gw2 for eth1 packets e.g.

$ cat /etc/sysconfig/network-scripts/ifup-local
#!/bin/bash
#
# Script to add/delete routing rules for gw2 devices
#

GW2_DEVICE=eth1
GW2_LOCAL_ADDR=172.16.2.1

if [ $(basename $0) = ifdown-local ]; then
  OP=del
else
  OP=add
fi

if [ "$1" = "$GW2_DEVICE" ]; then
  ip rule $OP from $GW2_LOCAL_ADDR table gw2
fi

4) Use the ifup-local script also as ifdown-local, to remove that rule

$ cd /etc/sysconfig/network-scripts
$ ln -s ifup-local ifdown-local

5) Restart networking, and you're done!

# service network restart

For more, see:

Super Simple Public Location Broadcasting

I've been thinking about Yahoo's new fire eagle location-broking service over the last few days. I think it is a really exciting service - potentially a game changer - and has the potential to move publishing and using location data from a niche product to something really mainstream. Really good stuff.

But as I posted here, I also think fire eagle (at least as it's currently formulated) is probably only usable by a relatively small section of the web - roughly the relatively sophisticated "web 2.0" sites who are comfortable with web services and api keys and protocols like OAuth.

For the rest of the web - the long web 1.0 tail - the technical bar is simply too high for fire eagle as it stands to be useful and usable.

In addition, fire eagle as it currently stands is unicast, acting as a mediator between you some particular app acting as a producer or a consumer of your location data. But, at least on the consumer side, I want some kind of broadcast service, not just a per-app unicast one. I want to be able to say "here's my current location for consumption by anyone", and allow that to be effectively broadcast to anyone I'm interacting with.

Clearly my granularity/privacy settings might be different for my public location, and I might want to be able to blacklist certain sites or parties if they prove to be abusers of my data, but for lots of uses a broadcast public location is exactly what I want.

How might this work in the web context? Say I'm interacting with an e-commerce site, and if they some broad idea of my location (say, postcode, state, country) they could default shipping addresses for me, and show me shipping costs earlier in the transaction (subject to change, of course, if I want to ship somewhere else). How can I communicate my public location data to this site?

So here's a crazy super-simple proposal: use Microformat HTTP Request Headers.

HTTP Request Headers are the only way the browser can pass information to a website (unless you consider cookies a separate mechanism, and they aren't really useful here because they're domain specific). The HTTP spec even carries over the "From" header from email, to allow browsers to communicate who the user is to the website, so there's some kind of precedent for using HTTP headers for user info.

Microformats are useful here because they're really simple, and they provide useful standardised vocabularies around addresses (adr) and geocoding (geo).

So how about (for example) we define a couple of custom HTTP request headers for public location data, and use some kind of microformat-inspired serialisation (like e.g. key-value pairs) for the location data? For instance:

X-Adr-Current: locality=Sydney; region=NSW; postal-code=2000; country-name=Australia
X-Geo-Current: latitude=33.717718; longitude=151.117158

For websites, the usage is then about as trivial as possible: check for the existence of the HTTP header, do some very simple parsing, and use the data to personalise the user experience in whatever ways are appropriate for the site.

On the browser side we'd need some kind of simple fire eagle client that would pull location updates from fire eagle and then publish them via these HTTP headers. A firefox plugin would probably be a good proof of concept.

I think this is simple, interesting and useful, though it obviously requires websites to make use of it before it's of much value in the real world.

So is this crazy, or interesting?