The “sort” command on Solaris has a “-k” switch for sorting by a particular field. For example, “sort -k 2” will sort by the second field on each line of input. Parts of fields can be further specified with “-k n.m“, says the man page.
For example, “sort -k 2.3” should sort by the second field, starting with the third character in that field. But the man page isn’t the clearest, and getting the “-k x.y” notation to work is tricky. Tricky until you realize it never works you also supply the “-b” argument. Same on Linux.
Take the following example:
jim@pluto # cat 1 0 2 Que 12/08/12 16:51 988 1 Act 10/29/12 03:51 49 6 Wri 07/30/12 20:01 49 3 Wri 07/25/12 20:01 988 8 Wri 08/09/12 06:45 988 5 Wri 09/23/11 06:45 988 7 Wri 02/09/11 06:45
We want to sort by the date in the 4th column (American format – mm/dd/yy). The following command should do it. Those three “-k” arguments should sort by the year, then the month, then the day, respectively, putting the lines in overall date order. But no, the output is jumbled:
jim@pluto # cat 1 | sort -k 4.7,4.8 -k 4.1,4.2 -k 4.4,4.5 49 6 Wri 07/30/12 20:01 988 5 Wri 09/23/11 06:45 49 3 Wri 07/25/12 20:01 0 2 Que 12/08/12 16:51 988 7 Wri 02/09/11 06:45 988 8 Wri 08/09/12 06:45 988 1 Act 10/29/12 03:51
Add the “-b” flag, and it all works as expected:
jim@pluto # cat 1 | sort -b -k 4.7,4.8 -k 4.1,4.2 -k 4.4,4.5 988 7 Wri 02/09/11 06:45 988 5 Wri 09/23/11 06:45 49 3 Wri 07/25/12 20:01 49 6 Wri 07/30/12 20:01 988 8 Wri 08/09/12 06:45 988 1 Act 10/29/12 03:51 0 2 Que 12/08/12 16:51
All in date order: from the 9th of Feb 2011, down to 8th December 2012. Those -k switches mean: sort by characters 7 to 8 of the 4th field (the year), then by characters 1 to 2 (the month) and finally by characters 4 to 5 (the day).
The Solaris man page says that -b “ignores leading blank characters“. On Linux it adds “If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace“. None of which really explains what is happening on the command line. Whatever. Just remember to always use “-b” with “-k“, and it works. Useful for sorting by dates, times, serial numbers, custom fields, whatever.
I think better to change
cat 1 | sort -b -k 4.7,4.8 -k 4.1,4.2 -k 4.3,4.4
to
cat 1 | sort -b -k 4.7,4.8 -k 4.1,4.2 -k 4.4,4.5
Thank you Mo, article corrected.
Even with the incorrect “-k 4.3,4.4″, it still appeared to sort dates in the right order, at least with the file above. Picking some new, simpler data shows the difference though:
As expected, the incorrect sort key gets it wrong. But, if the “3″ in column 2 is changed to a “6″, making it the same for both lines, things change…
This time, even the duff sort key got it right. In fact, things are still right even if you remove all of the keys except for “year”:
…which is not expected. Who knows what sort is doing here. It should leave the original order untouched. Instead it seems to be sorting by year, finding nothing to do, then it does some kind of fall back and runs a sort on the whole line ? Beats me.
Jim