Commit Graph

49 Commits

Author SHA1 Message Date
FRIGN
b6b977f63d Audit tar(1), add DIRFIRST-flag to recurse()
I've been wanting to do this for a while now, as tar(1) used to
be one of messiest and cruftiest tools.
First off, before walking through the audit, I'll talk about
what the DIRFIRST-flag for recurse() does.
It basically calls fn() on the first-level-dir before calling
it's subentries. It's necessary here, because else the order
of the tar-files would've been wrong (it would try to create
dir/file before creating dir/).

Now, to the audit:
1)  Update manpage, fix mistake that compression is also available
    for compressing. It's only available for extracting.
2)  Define the major, minor and makedev macros from glibc by ourselves.
    No need to rely on them, as they are common sense.

decomp()
3)  Simple refactorization.

putoctal()
4)  Add a truncation check for snprintf().

archive()
5)  BUGFIX: Add checks to any checkable function, don't blindly call
    them, this is harmful and there are 100 ways to exploit that.
6)  Use estrlcpy() instead of snprintf() wherever possible, fix
    alignment.
7)  BUGFIX: Terminate the result-buffer of readlink(), check if
    it even succeeded.
8)  Fix sizeof()-formatting.

unarchive()
9)  BUGFIX: Add checks to any checkable function, don't blindly call
    them, this is harmful and there are 100 ways to exploit that.
10) BUGFIX: strtoul can happily return negative numbers. Add checks
    for that and also if the full string has been processed.
11) Remove calls to perror(). We have eprintf, use it.
12) BUGFIX: "minor = strtoul(h->mode, 0, 8);". We need h->minor of
    course.
13) Fix typo "usupported", remove fprintf-call.

print()
14) Check fread().

xt()
15) Get rid of snprintf-magic. Use estrlcat().
16) BUGFIX: check for ferror() on the tarfile.

usage()
17) Update it. The old usage() was like 1000 years old.

main()
18) Add DIRFIRST-flag to the recursor.
19) Don't print usage() when a mode is re-set. We allow this in
    general.
20) Add function checks and fix error messages.
21) Add tarfilename-global for proper error-messages.
2015-03-21 01:30:47 +01:00
FRIGN
58098575e7 Audit cp() in libutil
1) Rename cp_HLPflag -> cp_follow for consistency.
2) Use function-pointers for stat to clear up the code.
3) BUGFIX: TERMINATE THE RESULT BUFFER OF READLINK !!!
   It's something I noticed earlier and it actually lead to some
   pretty insane behaviour on our side using glibc (musl somehow
   magically solves this).
   Basically, symlinks used to contain the data of the file they
   pointed to. I wondered for weeks where this came from and now
   this has finally been solved.
4) BUGFIX: Do not unconditionally unlink target-files. Even GNU
   coreutils do it wrong.
   The basic idea is this:
   If fflag == 0 --> don't touch target files if they exist.
   If fflag == 1 --> unlink all and don't error out when we try
                     to unlink a file which doesn't exist.
5) Use estrlcpy and estrlcat instead of snprintf for path building.
6) Make it clearer what happens in preserve.
2015-03-19 17:57:12 +01:00
FRIGN
3111908b03 Refactor recurse() again
Okay, why yet another recurse()-refactor?
The last one added the recursor-struct, which simplified things
on the user-end, but there was still one thing that bugged me a lot:
Previously, all fn()'s were forced to (l)stat the paths themselves.
This does not work well when you try to keep up with H-, L- and P-
flags at the same time, as each utility-function would have to set
the right function-pointer for (l)stat every single time.

This is not desirable. Furthermore, recurse should be easy to use
and not involve trouble finding the right (l)stat-function to do it
right.
So, what we needed was a stat-argument for each fn(), so it is
directly accessible. This was impossible to do though when the
fn()'s are still directly called by the programs to "start" the
recurse.
Thus, the fundamental change is to make recurse() the function to
go, while designing the fn()'s in a way they can "live" with st
being NULL (we don't want a null-pointer-deref).

What you can see in this commit is the result of this work. Why
all this trouble instead of using nftw?
The special thing about recurse() is that you tell the function
when to recurse() in your fn(). You don't need special flags to
tell nftw() to skip the subtree, just to give an example.

The only single downside to this is that now, you are not allowed
to unconditionally call recurse() from your fn(). It has to be
a directory.
However, that is a cost I think is easily weighed up by the
advantages.

Another thing is the history: I added a procedure at the end of
the outmost recurse to free the history. This way we don't leak
memory.

A simple optimization on the side:

-		if (h->dev == st.st_dev && h->ino == st.st_ino)
+		if (h->ino == st.st_ino && h->dev == st.st_dev)

First compare the likely difference in inode-numbers instead of
checking the unlikely condition that the device-numbers are
different.
2015-03-19 01:08:19 +01:00
FRIGN
b3e8b17235 Audit concat() in libutil
Be more pedantic about the error-checking, fread can also return
values > 0 even though there has been a read-error.
We want to write the last incoming data and then bail.
2015-03-18 22:58:42 +01:00
FRIGN
a68c2a9e6e Remove apathmax() and implicitly agetcwd()
pathconf() is just an insane interface to use. All sane operating-
systems set sane values for PATH_MAX. Due to the by-runtime-nature of
pathconf(), it actually weakens the programs depending on its values.

Given over 3 years it has still not been possible to implement a sane
and easy to use apathmax()-utility-function, and after discussing this
on IRC, we'll dump this garbage.

We are careful enough not to overflow PATH_MAX and even if, any user
is able to set another limit in config.mk if he so desires.
2015-03-18 15:20:35 +01:00
FRIGN
93fd817536 Add estrlcat() and estrlcpy()
It has become a common idiom in sbase to check strlcat() and strlcpy()
using

if (strl{cat, cpy}(dst, src, siz) >= siz)
        eprintf("path too long\n");

However, this was not carried out consistently and to this very day,
some tools employed unchecked calls to these functions, effectively
allowing silent truncations to happen, which in turn may lead to
security issues.
To finally put an end to this, the e*-functions detect truncation
automatically and the caller can lean back and enjoy coding without
trouble. :)
2015-03-17 11:24:49 +01:00
FRIGN
9fd4a745f8 Add history and config-struct to recurse
For loop detection, a history is mandatory. In the process of also
adding a flexible struct to recurse, the recurse-definition was moved
to fs.h.
The motivation behind the struct is to allow easy extensions to the
recurse-function without having to change the prototypes of all
functions in the process.
Adding flags is really simple as well now.

Using the recursor-struct, it's also easier to see which defaults
apply to a program (for instance, which type of follow, ...).

Another change was to add proper stat-lstat-usage in recurse. It
was wrong before.
2015-03-13 00:29:48 +01:00
FRIGN
af61ba738c Refactor recurse()
Instead of allocating a buffer on each run, build a buf on the stack.
2015-03-12 13:22:37 +01:00
FRIGN
01de5df8e6 Audit du(1) and refactor recurse()
While auditing du(1) I realized that there's no way the over 100 lines
of procedures in du() would pass the audit.
Instead, I decided to rewrite this section using recurse() from libutil.
However, the issue was that you'd need some kind of payload to count
the number of bytes in the subdirectories and use them in the higher
hierarchies.
The solution is to add a "void *data" data pointer to each recurse-
function-prototype, which we might also be able to use in other
recurse-applications.
recurse() itself had to be augmented with a recurse_samedev-flag, which
basically prevents recurse from leaving the current device.

Now, let's take a closer look at the audit:
1) Removing the now unnecessary util-functions push, pop, xrealpath,
   rename print() to printpath(), localize some global variables.
2) Only pass the block count to nblks instead of the entire stat-
   pointer.
3) Fix estrtonum to use the minimum of LLONG_MAX and SIZE_MAX.
4) Use idiomatic argv+argc-loop
5) Report proper exit-status.
2015-03-11 23:21:52 +01:00
FRIGN
833c2aebb4 Remove mallocarray(...) and use reallocarray(NULL, ...)
After a short correspondence with Otto Moerbeek it turned out
mallocarray() is only in the OpenBSD-Kernel, because the kernel-
malloc doesn't have realloc.
Userspace applications should rather use reallocarray with an
explicit NULL-pointer.

Assuming reallocarray() will become available in c-stdlibs in the
next few years, we nip mallocarray() in the bud to allow an easy
transition to a system-provided version when the day comes.
2015-03-11 10:50:18 +01:00
FRIGN
3c33abc520 Implement mallocarray()
A function used only in the OpenBSD-Kernel as of now, but it surely
provides a helpful interface when you just don't want to make sure
the incoming pointer to erealloc() is really NULL so it behaves
like malloc, making it a bit more safer.

Talking about *allocarray(): It's definitely a major step in code-
hardening. Especially as a system administrator, you should be
able to trust your core tools without having to worry about segfaults
like this, which can easily lead to privilege escalation.

How do the GNU coreutils handle this?
$ strings -n 4611686018427387903
strings: invalid minimum string length -1
$ strings -n 4611686018427387904
strings: invalid minimum string length 0

They silently overflow...

In comparison, sbase:

$ strings -n 4611686018427387903
mallocarray: out of memory
$ strings -n 4611686018427387904
mallocarray: out of memory

The first out of memory is actually a true OOM returned by malloc,
whereas the second one is a detected overflow, which is not marked
in a special way.
Now tell me which diagnostic error-messages are easier to understand.
2015-03-10 22:19:19 +01:00
FRIGN
3b825735d8 Implement reallocarray()
Stateless and I stumbled upon this issue while discussing the
semantics of read, accepting a size_t but only being able to return
ssize_t, effectively lacking the ability to report successful
reads > SSIZE_MAX.
The discussion went along and we came to the topic of input-based
memory allocations. Basically, it was possible for the argument
to a memory-allocation-function to overflow, leading to a segfault
later.
The OpenBSD-guys came up with the ingenious reallocarray-function,
and I implemented it as ereallocarray, which automatically returns
on error.
Read more about it here[0].

A simple testcase is this (courtesy to stateless):
$ sbase-strings -n (2^(32|64) / 4)

This will segfault before this patch and properly return an OOM-
situation afterwards (thanks to the overflow-check in reallocarray).

[0]: http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/calloc.3
2015-03-10 21:23:36 +01:00
sin
7d36a35649 Fix off-by-one in apathmax() as the path is relative to "/"
1) Use size_t * instead of long *
2) Fallback to PATH_MAX instead of BUFSIZ
3) Header cleanup
2015-03-06 23:50:39 +00:00
FRIGN
0b9c02cd22 Use path[len] instead of *(path + len)
Maybe it's time to go to bed...
2015-03-03 00:31:27 +01:00
FRIGN
903d43bbb8 Use dynamic array in recurse() instead of PATH_MAX-array
Thanks Evan!
2015-03-03 00:11:41 +01:00
FRIGN
8dc92fbd6c Refactor enmasse() and recurse() to reflect depth
The HLP-changes to sbase have been a great addition of functionality,
but they kind of "polluted" the enmasse() and recurse() prototypes.
As this will come in handy in the future, knowing at which "depth"
you are inside a recursing function is an important functionality.

Instead of having a special HLP-flag passed to enmasse, each sub-
function needs to provide it on its own and can calculate results
based on the current depth (for instance, 'H' implies 'P' at
depth > 0).
A special case is recurse(), because it actually depends on the
follow-type. A new flag "recurse_follow" brings consistency into
what used to be spread across different naming conventions (fflag,
HLP_flag, ...).

This also fixes numerous bugs with the behaviour of HLP in the
tools using it.
2015-03-02 22:50:38 +01:00
FRIGN
933ed8c00b Rename unused flag in rm()
Before somebody gets the wrong idea again like I did.
2015-03-02 14:36:26 +01:00
FRIGN
286df29e7d Make already audited tools argv-centric instead of argc-centric
This has already been suggested by Evan Gates <evan.gates@gmail.com>
and he's totally right about it.
So, what's the problem?
I wrote a testing program asshole.c with

int
main(void)
{
        execl("/path/to/sbase/echo", "echo", "test");
        return 0;
}

and checked the results with glibc and musl. Note that the
sentinel NULL is missing from the end of the argument list.
glibc calculates an argc of 5, musl 4 (instead of 2) and thus
mess up things anyway.
The powerful arg.h also focuses on argv instead of argc as well,
but ignoring argc completely is also the wrong way to go.
Instead, a more idiomatic approach is to check *argv only and
decrement argc on the go.
While at it, I rewrote yes(1) in an argv-centric way as well.

All audited tools have been "fixed" and each following audited
tool will receive the same treatment.
2015-03-02 14:19:26 +01:00
FRIGN
5d6e609455 Do not mask previous return-values in libutil/rm.c
Thanks Michael Forney <mforney@mforney.org> for this observation!
2015-03-02 10:53:55 +01:00
FRIGN
48696d8c95 Fix exit status with -f for nonexistent paths
Thanks Michael Forney <mforney@mforney.org> for reporting this!
2015-03-01 23:48:50 +01:00
FRIGN
9b06720f62 Refactor cryptcheck() to allow multiple list-files and stdin
Previously, it was not possible to use

sha1sum test.c | sha1sum -c

because the program would not differenciate between an empty
argument and a non-specified argument.
Moreover, why not allow this?

sha1sum -c hashlist1 hashlist2

Digging deeper I found that using function pointers and a
modification in the crypt-backend might simplify the program
a lot by passing the argument-list to both cryptmain and
cryptcheck.
Allowing more than one list-file to be specified is also
consistent with what the other implementations support,
so we not only have simpler code, we also do not silently
break if there's a script around passing multiple files to
check.
2015-03-01 22:51:52 +01:00
sin
8f068589fb Fix recurse() prototype and convert char to int flags 2015-02-16 16:23:12 +00:00
Tai Chi Minh Ralph Eastwood
0cf6a18f6f recurse: change char follow to int follow 2015-02-16 15:53:58 +00:00
Tai Chi Minh Ralph Eastwood
82bc92da51 recurse: add symlink derefencing flags -H and -L 2015-02-16 15:53:55 +00:00
FRIGN
d7a438b2f8 Add \e, \", \' and hex-escapes (\xH[H]) to unescape()
So the users control the program, and the program doesn't
control the users.
2015-02-14 22:55:37 +01:00
sin
113caaf677 Make getlines() less verbose
Thanks Roberto for the suggestion.
2015-02-12 14:34:07 +00:00
Jakob Kramer
c0a3c66a84 add estrndup 2015-02-11 01:17:21 +00:00
Jakob Kramer
08e93dd4f5 add en*alloc functions 2015-02-11 01:17:21 +00:00
sin
51680535ce getlines: Style fix 2015-02-11 00:27:30 +00:00
Jakob Kramer
66a5ea722d getlines: last line of file should always have a newline
This is a useful behavior if you want to reorder the lines,
because otherwise you might end up with originally two lines
on one, e.g.

	$ echo -ne "foo\nbar" | sort
	barfoo
2015-02-11 00:25:48 +00:00
Tai Chi Minh Ralph Eastwood
af8be7f92c cp: add symlink deref flags -H and -L for cp and mv 2015-02-09 22:54:52 +00:00
FRIGN
360a63769c Use strtonum and libutf in test(1), refactor code and manpage
and mark it as finished in README.
2015-02-09 22:21:23 +01:00
sin
c0d36e0064 Switch concat() to use fread() and fwrite()
We should never mix FILE I/O with raw I/O.  Going from raw I/O
to FILE I/O is fine but doing the opposite is extremely tricky and
only works under certain conditions (unbuffered stream + no call
to ungetc()).
2015-02-09 15:24:03 +00:00
FRIGN
fd562481f3 Convert estrto{l, ul} to estrtonum
Enough with this insanity!
2015-01-30 16:52:44 +01:00
sin
ab149deebe Use errstr as filled by strtonum() because it is more informative 2015-01-30 13:59:43 +00:00
sin
e5c1f0f372 Add estrtonum() as well 2015-01-30 13:56:45 +00:00
sin
add25a464f Add strtonum() in preparation to nuking estrtol() and friends 2015-01-30 13:48:33 +00:00
sin
b90ca482a0 Add estrtoul() 2015-01-30 13:24:41 +00:00
FRIGN
e60885699c Fix return values in rm(1) and mv(1)
by setting rm_status to 1 if removing 1 file in the list fails.
Extend this to mv_status in mv(1).
2015-01-30 12:45:54 +01:00
FRIGN
38adcf0c08 Fix tabs in libutil/unescape.c 2015-01-29 21:59:27 +01:00
FRIGN
b8b9d983c8 Add unescape() to libutil
formerly known as resolveescapes(), it is of central use to numerous
programs.
This drops a lot of LOC.
2015-01-29 21:52:44 +01:00
sin
bc9c752df5 Import strsep() from musl libc 2015-01-25 17:48:11 +00:00
Michael Forney
e14e0becce cp: Rename -d option to -P
The -d option is a GNU extension and is equivalent to its "-P
--preserve=links" options.

Since we don't implement the --preserve=links functionality anyway (it
means preserve hard links between files), just call it -P, which is
specified by POSIX.

Additionally, there is no need to check for cp_Pflag again before
copying the symlink itself because the only way the mode in the stat
will indicate a symlink is if we used lstat (which we only do if -P is
specified).
2014-12-08 10:02:56 +00:00
sin
875f433666 Argh - include strings.h 2014-11-21 00:03:30 +00:00
sin
ce86a05f36 Import strcasestr() from musl and remove -D_GNU_SOURCE 2014-11-20 23:46:06 +00:00
FRIGN
1436518f9d Use < 0 instead of == -1 2014-11-19 20:09:29 +00:00
sin
9b38355ae8 Break out if stat fails on the source file in cp(1)
Save one level of indentation.
2014-11-19 15:08:57 +00:00
Evan Gates
84b08427a1 remove agetline 2014-11-18 21:05:28 +00:00
sin
027052f5e5 Rename util/ to libutil/ 2014-11-17 16:48:34 +00:00