Thing is, we don't yet have an updated utflen() ready for use. Although
prints the lines completely, it does not add a proper spacing between
each entry. This will be fixed later.
For sort(1) we need memmem(), which I imported from OpenBSD.
Inside sort(1), the changes involved working with the explicit lengths
given by getlines() earlier and rewriting some of the functions.
Now we can handle NUL-characters in the input just fine.
The assumption of NUL-terminated strings is actually quite a good one in
most cases. You don't have to worry about paths, because they may not
contain NUL.
Same applies to arguments passed to you. Unless you have to unescape,
there is no way for you to receive a NUL.
There are two important exceptions though, and it's important that we
address them, or else we get unexpected behaviour:
1) All tools using unescape() have to be strict about delimlen.
Else they end up for instance unescaping
'\\0abc'
to
'\0abc',
which in C's string-vision is an empty string.
2) All tools doing line wrenching and putting them out
again as lines again.
puts() will cut each line containing NULs off at the first
occurence.
-s strip binary
-d create directory
-D create missing directories
-t DIR target directory
-m MODE permission bits
-o USER set owner
-g GROUP set group
Installed files are copied, and default mode is 755.
Signed-off-by: Mattias Andrée <maandree@kth.se>
Looking at the old code, it became clear that the desired
functionality with the t-flag could not be added unless the
underlying data-structures were reworked.
Thus the only way to be successful was to rewrite the whole thing.
od(1) allows giving arbitrarily many type-specs per call, both via
-t x1o2... and -t x1 -t o2 and intermixed.
This fortunately is easy to parse.
Now, to be flexible, it should not only support types of integral
length. Erroring out like this is inacceptable:
$ echo -n "shrek"| od -t u3
od: invalid type string ‘u3’;
this system doesn't provide a 3-byte integral type
Thus, this new od(1) just collects the bytes until shortly before
printing, when the numbers are written into a long long with the
proper offset.
The bytes per line are just the lcm of all given type-lengths and >= 16.
They are equal to 16 for all types that are possible to print using
the old od(1)'s.
Endianness is of course also supported, needs some testing though,
especially on Big Endian systems.
The logic is simple, it's just a pain in the ass to fill the
data-structures.
Some lines had to be commented out, as glibc/musl apparently
have not fully implemented the mandatory variables for the
2013 corrigendum of POSIX 2008.
Also added a manpage and the necessary entries in README.
I also removed it from the TODO.
If this flag is not given, od(1) automatically replaces duplicate
adjacent lines with an '*' for each reoccurence.
If this flag is set, thus, no such filtering occurs.
In this case this would mean having to somehow keep the last printed
line in some backbuffer, building the next line and then doing the
necessary comparisons. This basically means that we duplicate the
functionality provided with uniq(1).
So instead of
$ od -t a > dump
you'd rather do
$ od -t a | uniq -f 1 -c > dump
Skipping the first field is necessary, as the addresses obviously differ.
Now, I was thinking hard why this flag even exists. If POSIX mandated
to add the address before the asterisk, so we know the offset of duplicate
occurrences, this would make sense. However, this is not the case.
Using uniq(1) also gives nicer output:
~ $ echo "111111111111111111111111111111111111111111111111" | od -t a -v | uniq -f 1 -c
3 0000000 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0000060 nl
1 0000061
in comparison to
$ echo "111111111111111111111111111111111111111111111111" | od -t a
0000000 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
*
0000060 nl
0000061
Before working on od(1), I didn't even know it would filter out
duplicate adjacent lines like that. This is also a matter of
predictability.
Concluding, the v-flag is implicitly set and users urged to just
use the existing tools provided by the system.
I don't think we would break scripts either. Firstly, it's rather
unlikely to have duplicate lines exactly matching the line-length of
od(1). Secondly, even if a script did that specifically, in the worst
case there would be a counting error or something.
Given od(1) is mostly used interactively, we can safely assume this
feature is for the benefit of the users.
Ditch this legacy POSIX crap!
Please enter the commit message for your changes. Lines starting
This client does not support the netascii mode. The default mode
is octet/binary and should be sufficient.
One thing left to do is to check the source port of the server
to make sure it doesn't change. If it does, we should ignore the
packet and send an error back without disturbing an existing
transfer.
1) Remove the function prototypes. No need for them, as the
functions are ordered.
2) Add fieldseplen, so the length of the field-separator is not
calculated nearly each time skipcolumn() is called.
3) rename next_col to skip_to_next_col so the purpose is clear,
also reorder the conditional accordingly.
4) Put parentheses around certain ternary expressions.
5) BUGFIX: Don't just exit() in check(), but make it return something,
so we can cleanly fshut() everything.
6) OFF-POSIX: Posix for no apparent reason does not allow more than
one file when the -c or -C flags are given.
This can be problematic when you want to check multiple files.
With the change 5), rewriting check() to return a value, I went
off-posix after discussing this with Dimitris to just allow
arbitrary numbers of files. Obviously, this does not break scripts
and is convenient for everybody who wants to quickly check a big
amount of files.
As soon as 1 file is "unsorted", the return value is 1, as expected.
For convenience reasons, check()'s warning now includes the filename.
7) BUGFIX: Set ret to 2 instead of 1 when the fshut(fp, *argv) fails.
8) BUGFIX: Don't forget to fshut stderr at the end. This would improperly
return 1 in the following case:
$ sort -c unsorted_file 2> /dev/full
9) Other style changes, line length, empty line before return.
Make it clear that <blank> characters just are spaces or tabs and
not a special group which needs special treatment for wide characters.
Also, and that was the only problem here, correctly calculate the
offset given by the key definitions for the start- and end-characters
using libutf-utility-functions.
Mark the progress in the README and put parentheses around the missing
flags which are insane to implement for no real gain.
Where should I start? It's a rather irrelevant tool and broken as is.
We'll re-add it as soon as the code has been fixed by the original
author.
Until then, better keep it out or some kids get hurt.
Use the *at functions instead of building paths manually. We do
still have path-building in recurse() and other areas, but the
long-term goal is to rid most interfaces of that for practical
and security reasons.
In this case, it's more or less trivial.
Also, refactor the manpage to be more consistent with the others.
BUGFIX: Return exit status 3 on error.
Mostly manpage-shuffling according to the changes in the corrigendum,
wording changes and more idiomatic expressions.
All this is finished up by marking the POSIX 2013 conformant tools
with
.St -p1003.1-2013
which is not available in older mandoc builds or nroff, but which
reflects what we actually did, so who cares?
This is a huge step and it's not far until we can release sbase 0.1.
I can't believe we've come this far! The idea is to look at the
2013 POSIX corrigendum for each tool and deep-test features before
making the first 0.1 release.
To keep the noise low, I'll do this in batches, not on a per-tool-
basis (as many of these are trivial to test).
In the meantime, I'll also think of a fitting STANDARDS section
for the non-POSIX tools. Now that the audits are pretty much done,
I can also have a more relaxed view on standards compliance instead
of having to dig through some uncleaned mess.
To mark this "new beginning", the README has gotten a liftover.
The POSIX 2008-column was more or less useless and as I expect the
checks to go along pretty quickly, I "reset" the compliance state
of all but the non-POSIX tools and will then go along and check every
single one of them in the next few days.
Apart from the few missing flags and audits, sbase should then be
ready to hit the world with the first release after 4 years of work.
Sort comes pretty much automatically, as no script relies on the
undefined behaviour of the input _not_ being sorted, we might as well
sort the sorted input already.
The only downside is memory usage, which can be an issue for large
files.
The o-flag was trivial to implement.
The flexible design already allowed to add these flags trivially.
Drop the -I and -L-flags, which are XSI-extensions.
The audit generally consisted of style-changes, dropping kitchen-
sink functions, updating the usage and using estrtonum instead of
strtol.
1) Refactor the manpage to use the num-syntax and concise wording.
2) Build format instead of having a list of static strings.
3) BUGFIX: if (!buf[0] || buf[0] == '\n') Process last-read-line
properly.
4) BUGFIX: In case we hit a formatting line, print a newline instead
of just dropping it.
5) Use a switch instead of having spaghetti-cases.
6) Don't use printf-magic but explicitly do a putchar(' ')-loop.
7) Update usage(), indent properly.
8) BUGFIX: strchr is not NULL when type[0] is \0. Check for \0
separately beforehand.
9) Reorder arg.h-cases for better readability.
No bugs found, but I changed intmax_t to long long to make it more
predictable and removed some of the kitchen-sinking.
Don't return structs themselves, as this is not very elegant.
Do it like functions like stat(), which take a pointer to a
struct to fill.
I've been wanting to do this for a while now, as tar(1) used to
be one of messiest and cruftiest tools.
First off, before walking through the audit, I'll talk about
what the DIRFIRST-flag for recurse() does.
It basically calls fn() on the first-level-dir before calling
it's subentries. It's necessary here, because else the order
of the tar-files would've been wrong (it would try to create
dir/file before creating dir/).
Now, to the audit:
1) Update manpage, fix mistake that compression is also available
for compressing. It's only available for extracting.
2) Define the major, minor and makedev macros from glibc by ourselves.
No need to rely on them, as they are common sense.
decomp()
3) Simple refactorization.
putoctal()
4) Add a truncation check for snprintf().
archive()
5) BUGFIX: Add checks to any checkable function, don't blindly call
them, this is harmful and there are 100 ways to exploit that.
6) Use estrlcpy() instead of snprintf() wherever possible, fix
alignment.
7) BUGFIX: Terminate the result-buffer of readlink(), check if
it even succeeded.
8) Fix sizeof()-formatting.
unarchive()
9) BUGFIX: Add checks to any checkable function, don't blindly call
them, this is harmful and there are 100 ways to exploit that.
10) BUGFIX: strtoul can happily return negative numbers. Add checks
for that and also if the full string has been processed.
11) Remove calls to perror(). We have eprintf, use it.
12) BUGFIX: "minor = strtoul(h->mode, 0, 8);". We need h->minor of
course.
13) Fix typo "usupported", remove fprintf-call.
print()
14) Check fread().
xt()
15) Get rid of snprintf-magic. Use estrlcat().
16) BUGFIX: check for ferror() on the tarfile.
usage()
17) Update it. The old usage() was like 1000 years old.
main()
18) Add DIRFIRST-flag to the recursor.
19) Don't print usage() when a mode is re-set. We allow this in
general.
20) Add function checks and fix error messages.
21) Add tarfilename-global for proper error-messages.
1) Properly document e, f and m-flags in the manpage.
2) Clear up the code for the m-flag-handling. Add idiomatic
'/'-path-traversal as already seen in mkdir(1).
3) Unwrap the SWAP_BUF()-macro.
4) BUGFIX: Actually handle the f-flag properly. Only resolve
the dirname and append the basename later.
5) Use fputs() instead of printf("%s", ...).
Add "none" to ls, as all pending flags are optional.
sed is feature-complete, so I marked it like that. It needs an audit
though.
seq is implicitly UTF-8-ready, will be audited later.