Sorting with “-k” on Unix and Linux

The “sort” command on Solaris has a “-k” switch for sorting by a particular field. For example, “sort -k 2” will sort by the second field on each line of input. Parts of fields can be further specified with “-k n.m“, says the man page.

For example, “sort -k 2.3” should sort by the second field, starting with the third character in that field. But the man page isn’t the clearest, and getting the “-k x.y” notation to work is tricky. Tricky until you realize it never works you also supply the “-b” argument. Same on Linux.

Take the following example:

jim@pluto # cat 1
0   2 Que  12/08/12  16:51
988 1 Act  10/29/12  03:51
49  6 Wri  07/30/12  20:01
49  3 Wri  07/25/12  20:01
988 8 Wri  08/09/12  06:45
988 5 Wri  09/23/11  06:45
988 7 Wri  02/09/11  06:45

We want to sort by the date in the 4th column (American format – mm/dd/yy). The following command should do it. Those three “-k” arguments should sort by the year, then the month, then the day, respectively, putting the lines in overall date order. But no, the output is jumbled:

jim@pluto # cat 1 | sort -k 4.7,4.8 -k 4.1,4.2 -k 4.4,4.5
49  6 Wri  07/30/12  20:01
988 5 Wri  09/23/11  06:45
49  3 Wri  07/25/12  20:01
0   2 Que  12/08/12  16:51
988 7 Wri  02/09/11  06:45
988 8 Wri  08/09/12  06:45
988 1 Act  10/29/12  03:51

Add the “-b” flag, and it all works as expected:

jim@pluto # cat 1 | sort -b -k 4.7,4.8 -k 4.1,4.2 -k 4.4,4.5
988 7 Wri  02/09/11  06:45
988 5 Wri  09/23/11  06:45
49  3 Wri  07/25/12  20:01
49  6 Wri  07/30/12  20:01
988 8 Wri  08/09/12  06:45
988 1 Act  10/29/12  03:51
0   2 Que  12/08/12  16:51

All in date order: from the 9th of Feb 2011, down to 8th December 2012. Those -k switches mean: sort by characters 7 to 8 of the 4th field (the year), then by characters 1 to 2 (the month) and finally by characters 4 to 5 (the day).

The Solaris man page says that -b “ignores leading blank characters“. On Linux it adds “If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace“. None of which really explains what is happening on the command line. Whatever. Just remember to always use “-b” with “-k“, and it works. Useful for sorting by dates, times, serial numbers, custom fields, whatever.

2 thoughts on “Sorting with “-k” on Unix and Linux

  1. Thank you Mo, article corrected.

    Even with the incorrect “-k 4.3,4.4″, it still appeared to sort dates in the right order, at least with the file above. Picking some new, simpler data shows the difference though:

    bash-4.2$ cat 2
    49  3 Wri  07/31/12  20:01
    49  6 Wri  07/30/12  20:01
    bash-4.2$ cat 2 | sort -b -k 4.7,4.8 -k 4.1,4.2 -k 4.4,4.5   # Correct sort, by Mo
    49  6 Wri  07/30/12  20:01
    49  3 Wri  07/31/12  20:01
    bash-4.2$ cat 2 | sort -b -k 4.7,4.8 -k 4.1,4.2 -k 4.3,4.4  # Incorrect sort
    49  3 Wri  07/31/12  20:01
    49  6 Wri  07/30/12  20:01
    

    As expected, the incorrect sort key gets it wrong. But, if the “3″ in column 2 is changed to a “6″, making it the same for both lines, things change…

    bash-4.2$ cat 2
    49  6 Wri  07/31/12  20:01
    49  6 Wri  07/30/12  20:01
    bash-4.2$ cat 2 | sort -b -k 4.7,4.8 -k 4.1,4.2 -k 4.4,4.5  # Correct sort again
    49  6 Wri  07/30/12  20:01
    49  6 Wri  07/31/12  20:01
    bash-4.2$ cat 2 | sort -b -k 4.7,4.8 -k 4.1,4.2 -k 4.3,4.4  # Incorrect sort
    49  6 Wri  07/30/12  20:01
    49  6 Wri  07/31/12  20:01
    

    This time, even the duff sort key got it right. In fact, things are still right even if you remove all of the keys except for “year”:

    bash-4.2$ cat 2 | sort -b -k 4.7,4.8
    49  6 Wri  07/30/12  20:01
    49  6 Wri  07/31/12  20:01
    

    …which is not expected. Who knows what sort is doing here. It should leave the original order untouched. Instead it seems to be sorting by year, finding nothing to do, then it does some kind of fall back and runs a sort on the whole line ? Beats me.

    Jim

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>