sbase

Commit Graph

Author	SHA1	Message	Date
Pieter Kockx	6eec2eb3b4	tr: Fix infinite loop When `makeset` got a string containing square brackets followed by at least one extra character, e.g. "[abc]d", it entered an infinite loop because it was assumed `j` could not exceed `len` without having been equal to `len`. It can, however, when `m == len` and subsequently `j = m + 1`.	2017-10-21 12:44:09 -07:00
Laslo Hunhold	fb11173926	tr: Fix multiple ranges with different lengths (Michael Forney) See his description below. Thanks Michael! --- A bug was introduced in `bc4c293fe5` causing the range length for the next set to be used instead of the first one. This causes issues when choosing the replacement rune when the ranges are of different lengths. Current behavior: $ echo 1234 \| tr 'a-f1-4' '1-6a-d' 56ab Correct behavior: $ echo 1234 \| tr 'a-f1-4' '1-6a-d' abcd This also fixes range expressions in the form [a-z], which get encoded as four ranges '[', 'a'..'z', ']', causing all a-z characters to get mapped to ']'. This form is occasionally used in shell scripts, including the syscalltbl.sh script used to build linux. ---	2016-11-18 12:45:59 +01:00
Laslo Hunhold	b5ebd49dd3	tr: Provide a fallthrough case for single-arg -s Previously, this would not work properly and not be let through the sanity check. This is a dirty hack until the next iteration where I'll clean up the data structures and make this saner.	2016-10-06 02:00:25 +02:00
Laslo Hunhold	c154ef7a03	tr(1): Properly handle the -dc case for character classes I actually did that properly in the set-case but forgot to add the same logic to the character classes. Now it should work fine.	2016-10-06 00:16:30 +02:00
Laslo Hunhold	096c504d82	tr(1): Properly jump to output when inside set complement	2016-10-05 21:54:51 +02:00
FRIGN	bc4c293fe5	Revamp tr(1) set parsing and handling If you look at GNU coreutils, they do not support the mappings $ echo "1234abc" \| tr "[:alnum:]" "[:upper:]" $ echo "ABCabc" \| tr -c "[:upper:]" "[l*]" to only give a few examples. This commit broadens the scope of tr(1) as far as humanly possible to map between classes and non-classes, making tr a usable tool and actually fulfilling user expectations. Posix really is of no help here as it still kind of assumes the fixed ASCII table instead of complex Unicode code points or even Grapheme clusters.	2016-10-05 21:18:24 +02:00
FRIGN	9de401a495	Fix tr(1) squeezing Okay, it took me a while and another look at the Posix spec to see that I have been dealing with squeezing in a way too complicated way. What just needed to be done is before doing the final write to deploy the squeeze-check. We actually do not need this atomically complicated squeeze check in every single edge-case. Now it should work properly.	2016-10-05 19:31:50 +02:00
FRIGN	97ce9ea586	Fix -s in tr(1) Forgot that in case there is a second argument given with -s you probably want to have your characters substituted. I changed it so that shortly before "deploying" we check if the "to be written"-Rune is equal to the last Rune, and proceed as needed.	2016-03-02 09:31:11 +00:00
sin	2366164de7	No need for semicolon after ARGEND This is also the style used in Plan 9.	2015-11-01 10:18:55 +00:00
FRIGN	d23cc72490	Simplify return & fshut() logic Get rid of the !!()-constructs and use ret where available (or introduce it). In some cases, there would be an "abort" on the first fshut-error, but we want to close all files and report all warnings and then quit, not just the warning for the first file.	2015-05-26 16:41:43 +01:00
Michael Forney	035e14c516	tr: Fix -c option when translating	2015-04-27 17:16:37 +01:00
FRIGN	7be2449aa9	tr(1): Show an error when classes and non-classes are mixed	2015-04-20 20:41:11 +01:00
Hiltjo Posthuma	bfcf46ac5e	tr: fix "isdigit" check	2015-04-20 20:41:11 +01:00
FRIGN	11e2d472bf	Add *fshut() functions to properly flush file streams This has been a known issue for a long time. Example: printf "word" > /dev/full wouldn't report there's not enough space on the device. This is due to the fact that every libc has internal buffers for stdout which store fragments of written data until they reach a certain size or on some callback to flush them all at once to the kernel. You can force the libc to flush them with fflush(). In case flushing fails, you can check the return value of fflush() and report an error. However, previously, sbase didn't have such checks and without fflush(), the libc silently flushes the buffers on exit without checking the errors. No offense, but there's no way for the libc to report errors in the exit- condition. GNU coreutils solve this by having onexit-callbacks to handle the flushing and report issues, but they have obvious deficiencies. After long discussions on IRC, we came to the conclusion that checking the return value of every io-function would be a bit too much, and having a general-purpose fclose-wrapper would be the best way to go. It turned out that fclose() alone is not enough to detect errors. The right way to do it is to fflush() + check ferror on the fp and then to a fclose(). This is what fshut does and that's how it's done before each return. The return value is obviously affected, reporting an error in case a flush or close failed, but also when reading failed for some reason, the error- state is caught. the !!( ... + ...) construction is used to call all functions inside the brackets and not "terminating" on the first. We want errors to be reported, but there's no reason to stop flushing buffers when one other file buffer has issues. Obviously, functionales come before the flush and ret-logic comes after to prevent early exits as well without reporting warnings if there are any. One more advantage of fshut() is that it is even able to report errors on obscure NFS-setups which the other coreutils are unable to detect, because they only check the return-value of fflush() and fclose(), not ferror() as well.	2015-04-05 09:13:56 +01:00
Hiltjo Posthuma	d6aff89bbb	tail: allow tail -n 0 or tail -0 fix a crash, but allow this option.	2015-03-30 21:24:46 +02:00
FRIGN	f6dc69eca3	Audit tr(1) A tool of my own devising, except from a small style-fix this code has already been triple-checked.	2015-03-17 23:41:22 +01:00
FRIGN	833c2aebb4	Remove mallocarray(...) and use reallocarray(NULL, ...) After a short correspondence with Otto Moerbeek it turned out mallocarray() is only in the OpenBSD-Kernel, because the kernel- malloc doesn't have realloc. Userspace applications should rather use reallocarray with an explicit NULL-pointer. Assuming reallocarray() will become available in c-stdlibs in the next few years, we nip mallocarray() in the bud to allow an easy transition to a system-provided version when the day comes.	2015-03-11 10:50:18 +01:00
FRIGN	3c33abc520	Implement mallocarray() A function used only in the OpenBSD-Kernel as of now, but it surely provides a helpful interface when you just don't want to make sure the incoming pointer to erealloc() is really NULL so it behaves like malloc, making it a bit more safer. Talking about *allocarray(): It's definitely a major step in code- hardening. Especially as a system administrator, you should be able to trust your core tools without having to worry about segfaults like this, which can easily lead to privilege escalation. How do the GNU coreutils handle this? $ strings -n 4611686018427387903 strings: invalid minimum string length -1 $ strings -n 4611686018427387904 strings: invalid minimum string length 0 They silently overflow... In comparison, sbase: $ strings -n 4611686018427387903 mallocarray: out of memory $ strings -n 4611686018427387904 mallocarray: out of memory The first out of memory is actually a true OOM returned by malloc, whereas the second one is a detected overflow, which is not marked in a special way. Now tell me which diagnostic error-messages are easier to understand.	2015-03-10 22:19:19 +01:00
FRIGN	3b825735d8	Implement reallocarray() Stateless and I stumbled upon this issue while discussing the semantics of read, accepting a size_t but only being able to return ssize_t, effectively lacking the ability to report successful reads > SSIZE_MAX. The discussion went along and we came to the topic of input-based memory allocations. Basically, it was possible for the argument to a memory-allocation-function to overflow, leading to a segfault later. The OpenBSD-guys came up with the ingenious reallocarray-function, and I implemented it as ereallocarray, which automatically returns on error. Read more about it here[0]. A simple testcase is this (courtesy to stateless): $ sbase-strings -n (2^(32\|64) / 4) This will segfault before this patch and properly return an OOM- situation afterwards (thanks to the overflow-check in reallocarray). [0]: http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/calloc.3	2015-03-10 21:23:36 +01:00
FRIGN	31572c8b0e	Clean up #includes	2015-02-14 21:12:23 +01:00
FRIGN	73577f10a0	Scrap chartorunearr(), introducing utftorunestr() Interface and function as proposed by cls. The reasoning behind this function is that cls expressed his interest to keep memory allocation out of libutf, which is a very good motive. This simplifies the function a lot and should also increase the speed a bit, but the most important factor here is that there's no malloc anywhere in libutf, making it a lot smaller and more robust with a smaller attack-surface. Look at the paste(1) and tr(1) changes for an idiomatic way to allocate the right amount of space for the Rune-array.	2015-02-11 21:32:09 +01:00
FRIGN	7c578bf5b0	Scrap writerune(), introducing fputrune() Interface and function as proposed by cls. Code is also shorter, everything else analogous to fgetrune().	2015-02-11 20:58:00 +01:00
FRIGN	a5ae899a48	Scrap readrune(), introducing fgetrune() Interface as proposed by cls, but internally rewritten after a few considerations. The code is much shorter and to the point, aligning itself with other standard functions. It should also be much faster, which is not bad.	2015-02-11 20:16:49 +01:00
FRIGN	5836ef72e3	Use runetypebody.h-functions in tr(1) That's one small step for a man, one giant leap for mankind.	2015-02-11 13:12:27 +01:00
FRIGN	d5d686e9f6	tr : Revert `97c5986146` This was no typo.	2015-02-07 18:09:04 +01:00
Hiltjo Posthuma	97c5986146	tr: small typo	2015-02-06 15:43:07 +01:00
FRIGN	ff0347e391	Fix tr(1) behaviour in special cases and be stricter about stuff	2015-02-02 19:59:41 +01:00
FRIGN	f9d9672326	Fix segmentation fault in tr(1) with -dc and one set	2015-02-02 17:58:16 +01:00
FRIGN	b8b9d983c8	Add unescape() to libutil formerly known as resolveescapes(), it is of central use to numerous programs. This drops a lot of LOC.	2015-01-29 21:52:44 +01:00
FRIGN	ee6f7d3fc0	Add trivial equivalence class support in tr(1) and update manpage Equivalence classes are a hard matter and there's still no "standard" way to solve the issue. Previously, tr would just skip those classes, but it's much better when it resolves a [=c=] to a normal c instead of treating it as a literal. Also, reflect recent changes in the manpage (octal escapes) and fix the markup in some areas.	2015-01-28 19:44:05 +01:00
FRIGN	ee843a2e09	Fix segmentation fault in tr(1) and make the parser stricter.	2015-01-24 23:00:34 +01:00
FRIGN	eb57becb38	Add octal sequence support to tr(1)	2015-01-24 22:43:46 +01:00
sin	98d759a274	Add license remark to tr.c	2015-01-20 15:26:08 +00:00
FRIGN	7d3e9c6e88	Resolve escape characters in tr(1) This is one aspect which I think has blown up the complexity of many tr-implementations around today. Instead of complicating the set-theory-based parser itself (he should still be relying on one rune per char, not multirunes), I added a preprocessor, which basically scans the code for upcoming '\'s, reads what he finds, substitutes the real character onto '\'s index and shifts the entire following array so there are no "holes". What is left to reflect on is what to do with octal sequences. I have a local implementation here, which works fine, but imho, given tr is already so focused on UTF-8, we might as well ignore POSIX at this point and rather implement the unicode UTF-8 code points, which are way more contemporary and future-proof. Reading in \uC3A4 as a an array of 0xC3 and 0xA4 is not the issue, but I'm still struggling to find a way to turn it into a well-formed byte sequence. Hit me with a mail if you have a simple solution for that.	2015-01-15 11:01:52 +00:00
FRIGN	7a644aea7d	Fix mapping a class to a simple set and improve error-reporting It's standard behaviour to map a whole class of matched objects to the last element of a given simple set2 instead of just passing it through. Also, error out more strictly when the user gives us bogus sets.	2015-01-12 11:19:43 +00:00
FRIGN	0f90528df7	Add proper casts and fix a small error	2015-01-11 22:35:15 +00:00
FRIGN	09704afc24	Add Unicode character class support Thinking about it long enough, the solution seems almost trivial.	2015-01-11 22:35:15 +00:00
FRIGN	369bb01eb1	Prevail order	2015-01-10 19:56:34 +00:00
Hiltjo Posthuma	14c5ab48d5	tr: set2 must be set in some cases echo abc \| tr 'a' '' would crash because of: m--; r = set2[m].start + (off1 - off2) / set2[m].quant; if set2ranges > 0 it's fine.	2015-01-10 18:16:43 +00:00
Hiltjo Posthuma	cf714e6edb	tr: fix signed/unsigned warnings	2015-01-10 17:00:01 +00:00
sin	1f3345b9e6	Staticise some symbols in tr(1)	2015-01-10 14:26:32 +00:00
FRIGN	a582cb8a2f	Rewrite tr(1) in a sane way tr(1) always used to be a saddening part of sbase, which was inherently broken and crufted. But to be fair, the POSIX-standard doesn't make it very simple. Given the current version was unfixable and broken by design, I sat down and rewrote tr(1) very close to the concept of set theory and the POSIX-standard with a few exceptions: - UTF-8: not allowed in POSIX, but in my opinion a must. This finally allows you to work with UTF-8 streams without problems or unexpected behaviour. - Equivalence classes: Left out, even GNU coreutils ignore them and depending on LC_COLLATE, which sucks. - Character classes: No experiments or environment-variable-trickery. Just plain definitions derived from the POSIX- standard, working as expected. I tested this thoroughly, but expect problems to show up in some way given the wide range of input this program has to handle. The only thing left on the TODO is to add support for literal expressions ('\n', '\t', '\001', ...) and probably rethinking the way [_*n] is unnecessarily restricted to string2.	2015-01-10 14:26:30 +00:00
Evan Gates	84b08427a1	remove agetline	2014-11-18 21:05:28 +00:00
FRIGN	ec8246bbc6	Un-boolify sbase It actually makes the binaries smaller, the code easier to read (gems like "val == true", "val == false" are gone) and actually predictable in the sense of that we actually know what we're working with (one bitwise operator was quite adventurous and should now be fixed). This is also more consistent with the other suckless projects around which don't use boolean types.	2014-11-14 10:54:20 +00:00
FRIGN	7d2683ddf2	Sort includes and more cleanup and fixes in util/	2014-11-14 10:54:10 +00:00
FRIGN	eee98ed3a4	Fix coding style It was about damn time. Consistency is very important in such a big codebase.	2014-11-13 18:08:43 +00:00
sin	0c5b7b9155	Stop using EXIT_{SUCCESS,FAILURE}	2014-10-02 23:46:59 +01:00
sin	ac402965d5	Fix comment style and nuke stray whitespace	2014-07-16 20:43:29 +01:00
Adria Garriga	b3a63a60e4	Improved tr - Added support for character ranges ( a-z ) - Added support for complementary charset ( -c ), only in delete mode - Added support for octal escape sequences - Unicode now only works when there are no octal escape sequences, otherwise behavior is not predictable at first sight. - tr now supports null characters in the input - Does not yet have support for character classes ( [:upper:] )	2014-07-16 20:40:54 +01:00
Hiltjo Posthuma	fab4b384e7	use agetline instead of agets also use agetline where fgets with a static buffer was used previously. Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>	2014-06-01 18:03:10 +01:00

1 2

58 Commits