urchin/docs/SORTING

99 lines
2.8 KiB
Plaintext
Raw Permalink Normal View History

2016-03-06 06:27:31 -05:00
On the criteria for ordering
==============================
2016-03-31 14:36:31 -04:00
2016-03-31 14:42:54 -04:00
I was confused by the documentation for sort's "-d" flag. This confusion
relates to GNU coreutil's locale-specific sort. [^]
2016-03-31 14:36:31 -04:00
Below I discuss sort order differences between different implementations
2016-03-31 14:42:54 -04:00
of sort and of sh "*" for my particular environments.
2016-03-31 14:36:31 -04:00
Sorting with sort
------------
Consider the following two sort commands.
printf '@ b\n- d\n? a\n~ c\n! e\n' | sort
printf '@ b\n- d\n? a\n~ c\n! e\n' | sort -d
With BusyBox v1.23.2 on NixOS 15.09, the first of these commands returns
ASCIIbetical order,
! e
- d
? a
@ b
~ c
and the second returns dictionary order.
? a
@ b
~ c
- d
! e
With GNU coreutils version 8.24 on NixOS, both commands return
dictionary order. The same is true for GNU coreutils version 8.23 on
Debian Wheezy.
? a
@ b
~ c
- d
! e
2016-03-31 14:42:54 -04:00
IEEE Std 1003.1, 2013 Edition [^^] specifies that the "-d" flag should
2016-03-31 14:36:31 -04:00
enable dictionary order. All of these versions of sort have clear
documentation about the order that should be returned when the "-d" flag
is set, (See --help, man, or info.) and the implementations match the
documentation as far as I can tell.
I have found no explicit documentation from any relevant source as to
what the default sort order should be. On the other hand, they all
suggest that "-d" produces an order different from the default order.
In GNU coreutils 8.24, for example, "-d" is a direction to "consider
only blanks and alphanumeric characters". It lacks any mention that the
"-d" flag has no effect or that it is the default. Furthermore, on my
first reading, I took it to mean that the default is to consider all
characters and that "-d" limits the considered characters to blanks and
alphanumeric characters.
Sorting in *
-------------
I think this is related to the order returned by "*" in sh.
2016-03-06 06:27:31 -05:00
The following sh code creates several files in a directory and then
calls "*", listing them in order.
printf '@ b\n- d\n? a\n~ c\n! e\n' | while read line; do
touch -- "${line}"
done
for file in *; do echo "$file"; done
2016-03-31 14:20:54 -04:00
On one computer, running FreeBSD, the order is apparently
2016-03-31 14:36:31 -04:00
ASCIIbetical.
2016-03-06 06:27:31 -05:00
! e
- d
? a
@ b
~ c
2016-03-31 14:03:24 -04:00
On two GNU systems, running NixOS and Debian, respectively, output is
2016-03-06 06:27:31 -05:00
in dictionary order. I'm not exactly sure what dictionary order is, but
it is something like sorting on the alphabetical characters before
sorting on the rest of the line.
? a
@ b
~ c
- d
! e
2016-03-31 14:36:31 -04:00
(I don't really know what dictionary order is, I was able to determine
2016-03-06 06:27:31 -05:00
that the above results are in dictionary order because of my investigation of
2016-03-31 14:36:31 -04:00
incompatible implementations of sort.)
2016-03-31 14:20:54 -04:00
2016-03-31 14:42:54 -04:00
[^] https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
[^^] http://pubs.opengroup.org/onlinepubs/9699919799/