2016-03-06 06:27:31 -05:00
|
|
|
On the criteria for ordering
|
|
|
|
==============================
|
2016-03-31 14:36:31 -04:00
|
|
|
|
2016-03-31 14:42:54 -04:00
|
|
|
I was confused by the documentation for sort's "-d" flag. This confusion
|
|
|
|
relates to GNU coreutil's locale-specific sort. [^]
|
2016-03-31 14:36:31 -04:00
|
|
|
|
|
|
|
Below I discuss sort order differences between different implementations
|
2016-03-31 14:42:54 -04:00
|
|
|
of sort and of sh "*" for my particular environments.
|
2016-03-31 14:36:31 -04:00
|
|
|
|
|
|
|
Sorting with sort
|
|
|
|
------------
|
|
|
|
Consider the following two sort commands.
|
|
|
|
|
|
|
|
printf '@ b\n- d\n? a\n~ c\n! e\n' | sort
|
|
|
|
printf '@ b\n- d\n? a\n~ c\n! e\n' | sort -d
|
|
|
|
|
|
|
|
With BusyBox v1.23.2 on NixOS 15.09, the first of these commands returns
|
|
|
|
ASCIIbetical order,
|
|
|
|
|
|
|
|
! e
|
|
|
|
- d
|
|
|
|
? a
|
|
|
|
@ b
|
|
|
|
~ c
|
|
|
|
|
|
|
|
and the second returns dictionary order.
|
|
|
|
|
|
|
|
? a
|
|
|
|
@ b
|
|
|
|
~ c
|
|
|
|
- d
|
|
|
|
! e
|
|
|
|
|
|
|
|
With GNU coreutils version 8.24 on NixOS, both commands return
|
|
|
|
dictionary order. The same is true for GNU coreutils version 8.23 on
|
|
|
|
Debian Wheezy.
|
|
|
|
|
|
|
|
? a
|
|
|
|
@ b
|
|
|
|
~ c
|
|
|
|
- d
|
|
|
|
! e
|
|
|
|
|
2016-03-31 14:42:54 -04:00
|
|
|
IEEE Std 1003.1, 2013 Edition [^^] specifies that the "-d" flag should
|
2016-03-31 14:36:31 -04:00
|
|
|
enable dictionary order. All of these versions of sort have clear
|
|
|
|
documentation about the order that should be returned when the "-d" flag
|
|
|
|
is set, (See --help, man, or info.) and the implementations match the
|
|
|
|
documentation as far as I can tell.
|
|
|
|
|
|
|
|
I have found no explicit documentation from any relevant source as to
|
|
|
|
what the default sort order should be. On the other hand, they all
|
|
|
|
suggest that "-d" produces an order different from the default order.
|
|
|
|
|
|
|
|
In GNU coreutils 8.24, for example, "-d" is a direction to "consider
|
|
|
|
only blanks and alphanumeric characters". It lacks any mention that the
|
|
|
|
"-d" flag has no effect or that it is the default. Furthermore, on my
|
|
|
|
first reading, I took it to mean that the default is to consider all
|
|
|
|
characters and that "-d" limits the considered characters to blanks and
|
|
|
|
alphanumeric characters.
|
|
|
|
|
|
|
|
|
|
|
|
Sorting in *
|
|
|
|
-------------
|
|
|
|
I think this is related to the order returned by "*" in sh.
|
2016-03-06 06:27:31 -05:00
|
|
|
The following sh code creates several files in a directory and then
|
|
|
|
calls "*", listing them in order.
|
|
|
|
|
|
|
|
printf '@ b\n- d\n? a\n~ c\n! e\n' | while read line; do
|
|
|
|
touch -- "${line}"
|
|
|
|
done
|
|
|
|
for file in *; do echo "$file"; done
|
|
|
|
|
2016-03-31 14:20:54 -04:00
|
|
|
On one computer, running FreeBSD, the order is apparently
|
2016-03-31 14:36:31 -04:00
|
|
|
ASCIIbetical.
|
2016-03-06 06:27:31 -05:00
|
|
|
|
|
|
|
! e
|
|
|
|
- d
|
|
|
|
? a
|
|
|
|
@ b
|
|
|
|
~ c
|
|
|
|
|
2016-03-31 14:03:24 -04:00
|
|
|
On two GNU systems, running NixOS and Debian, respectively, output is
|
2016-03-06 06:27:31 -05:00
|
|
|
in dictionary order. I'm not exactly sure what dictionary order is, but
|
|
|
|
it is something like sorting on the alphabetical characters before
|
|
|
|
sorting on the rest of the line.
|
|
|
|
|
|
|
|
? a
|
|
|
|
@ b
|
|
|
|
~ c
|
|
|
|
- d
|
|
|
|
! e
|
|
|
|
|
2016-03-31 14:36:31 -04:00
|
|
|
(I don't really know what dictionary order is, I was able to determine
|
2016-03-06 06:27:31 -05:00
|
|
|
that the above results are in dictionary order because of my investigation of
|
2016-03-31 14:36:31 -04:00
|
|
|
incompatible implementations of sort.)
|
2016-03-31 14:20:54 -04:00
|
|
|
|
2016-03-31 14:42:54 -04:00
|
|
|
[^] https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
|
|
|
|
[^^] http://pubs.opengroup.org/onlinepubs/9699919799/
|