Subtracting Text Files

This has bitten me a couple of times now, and each time I've had to re-google the utility and figure out the appropriate incantation. So note to self: to subtract text files use comm(1).

Input files have to be sorted, but comm accepts a - argument for stdin, so you can sort on the fly if you like.

I also find the -1 -2 -3 options pretty counter-intuitive, as they indicate what you want to suppress, when I seem to want to indicate what I want to select. But whatever.

Here's the cheatsheet:

FILE1=one.txt
FILE2=two.txt

# FILE1 - FILE2 (lines unique to FILE1)
comm -23 $FILE1 $FILE2

# FILE2 - FILE1 (lines unique to FILE2)
comm -13 $FILE1 $FILE2

# intersection (common lines)
comm -12 $FILE1 $FILE2

# xor (non-common lines, either FILE)
comm -3 $FILE1 $FILE2
# or without the column delimiters:
comm -3 --output-delimiter=' ' $FILE1 $FILE2 | sed 's/^ *//'

# union (all lines)
comm $FILE1 $FILE2
# or without the column delimiters:
comm --output-delimiter=' ' $FILE1 $FILE2 | sed 's/^ *//'
blog comments powered by Disqus