Comparing Directories

Saw this post fly past in the twitter stream today:
"http://linuxshellaccount.blogspot.com/2008/03/perl-directory-permissions-difference.html".
It's a script by Mike Golvach to do something like a `diff -r`, but also
showing differences in permissions and ownership, rather than just content.

I've written a CPAN module to do stuff like this - File::DirCompare - so thought I'd check how straightforward this would be using File::DirCompare:

#!/usr/bin/perl

use strict;
use File::Basename;
use File::DirCompare;
use File::Compare qw(compare);
use File::stat;

die "Usage: " . basename($0) . " dir1 dir2\n" unless @ARGV == 2;

my ($dir1, $dir2) = @ARGV;

File::DirCompare->compare($dir1, $dir2, sub {
  my ($a, $b) = @_;
  if (! $b) {
    printf "Only in %s: %s\n", dirname($a), basename($a);
  } elsif (! $a) {
    printf "Only in %s: %s\n", dirname($b), basename($b);
  } else {
    my $stata = stat $a;
    my $statb = stat $b;

    # Return unless different
    return unless compare($a, $b) != 0 ||
      $stata->mode != $statb->mode ||
      $stata->uid  != $statb->uid  ||
      $stata->gid  != $statb->gid;

    # Report
    printf "%04o %s %s %s\t\t%04o %s %s %s\n",
      $stata->mode & 07777, basename($a),
        (getpwuid($stata->uid))[0], (getgrgid($stata->gid))[0],
      $statb->mode & 07777, basename($b),
        (getpwuid($statb->uid))[0], (getgrgid($statb->gid))[0];
  }
}, { ignore_cmp => 1 });

So this reports all entries that are different in content or permissions or ownership e.g. given a tree like this (slightly modified from Mike's example):

$ ls -lR scripts1 scripts2
scripts1:
total 28
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script1
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script1.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script3
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script3.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:49 script4
scripts2:
total 28
-rw-r--r-- 1 gavin users 0 Mar 17 16:41 script1
-rw-r--r-- 1 gavin users 0 Mar 17 16:41 script1.bak
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:41 script2.bak
-rwxr-xr-x 1 gavin gavin 0 Mar 17 16:41 script3*
-rwxr-xr-x 1 gavin gavin 0 Mar 17 16:41 script3.bak*
-rw-r--r-- 1 gavin gavin 0 Mar 17 16:49 script5

it will give output like the following:

$ ./pdiff2 scripts1 scripts2
0644 script1 gavin gavin                0644 script1 gavin users
0644 script1.bak gavin gavin            0644 script1.bak gavin users
0644 script3 gavin gavin                0755 script3 gavin gavin
0644 script3.bak gavin gavin            0755 script3.bak gavin gavin
Only in scripts1: script4
Only in scripts2: script5

This obviously has dependencies that Mike's version doesn't have, but it comes out much shorter and clearer, I think. It also doesn't fork and parse an external ls, so it should be more portable and less fragile. I should probably be caching the getpwuid lookups too, but that would have made it 5 lines longer. ;-)

blog comments powered by Disqus