Commit Graph

39 Commits

Author SHA1 Message Date
FRIGN 31572c8b0e Clean up #includes 2015-02-14 21:12:23 +01:00
FRIGN 73577f10a0 Scrap chartorunearr(), introducing utftorunestr()
Interface and function as proposed by cls.

The reasoning behind this function is that cls expressed his
interest to keep memory allocation out of libutf, which is a
very good motive.
This simplifies the function a lot and should also increase the
speed a bit, but the most important factor here is that there's
no malloc anywhere in libutf, making it a lot smaller and more
robust with a smaller attack-surface.

Look at the paste(1) and tr(1) changes for an idiomatic way to
allocate the right amount of space for the Rune-array.
2015-02-11 21:32:09 +01:00
FRIGN 7c578bf5b0 Scrap writerune(), introducing fputrune()
Interface and function as proposed by cls.
Code is also shorter, everything else analogous to fgetrune().
2015-02-11 20:58:00 +01:00
FRIGN a5ae899a48 Scrap readrune(), introducing fgetrune()
Interface as proposed by cls, but internally rewritten after a few
considerations.
The code is much shorter and to the point, aligning itself with other
standard functions. It should also be much faster, which is not bad.
2015-02-11 20:16:49 +01:00
FRIGN 5836ef72e3 Use runetypebody.h-functions in tr(1)
That's one small step for a man, one giant leap for mankind.
2015-02-11 13:12:27 +01:00
FRIGN d5d686e9f6 tr : Revert 97c5986146
This was no typo.
2015-02-07 18:09:04 +01:00
Hiltjo Posthuma 97c5986146 tr: small typo 2015-02-06 15:43:07 +01:00
FRIGN ff0347e391 Fix tr(1) behaviour in special cases and be stricter about stuff 2015-02-02 19:59:41 +01:00
FRIGN f9d9672326 Fix segmentation fault in tr(1) with -dc and one set 2015-02-02 17:58:16 +01:00
FRIGN b8b9d983c8 Add unescape() to libutil
formerly known as resolveescapes(), it is of central use to numerous
programs.
This drops a lot of LOC.
2015-01-29 21:52:44 +01:00
FRIGN ee6f7d3fc0 Add trivial equivalence class support in tr(1) and update manpage
Equivalence classes are a hard matter and there's still no "standard"
way to solve the issue.
Previously, tr would just skip those classes, but it's much
better when it resolves a [=c=] to a normal c instead of treating
it as a literal.

Also, reflect recent changes in the manpage (octal escapes) and fix
the markup in some areas.
2015-01-28 19:44:05 +01:00
FRIGN ee843a2e09 Fix segmentation fault in tr(1)
and make the parser stricter.
2015-01-24 23:00:34 +01:00
FRIGN eb57becb38 Add octal sequence support to tr(1) 2015-01-24 22:43:46 +01:00
sin 98d759a274 Add license remark to tr.c 2015-01-20 15:26:08 +00:00
FRIGN 7d3e9c6e88 Resolve escape characters in tr(1)
This is one aspect which I think has blown up the complexity of many
tr-implementations around today.
Instead of complicating the set-theory-based parser itself (he should
still be relying on one rune per char, not multirunes), I added a
preprocessor, which basically scans the code for upcoming '\'s, reads
what he finds, substitutes the real character onto '\'s index and shifts
the entire following array so there are no "holes".

What is left to reflect on is what to do with octal sequences.
I have a local implementation here, which works fine, but imho,
given tr is already so focused on UTF-8, we might as well ignore
POSIX at this point and rather implement the unicode UTF-8 code points,
which are way more contemporary and future-proof.

Reading in \uC3A4 as a an array of 0xC3 and 0xA4 is not the issue,
but I'm still struggling to find a way to turn it into a well-formed
byte sequence. Hit me with a mail if you have a simple solution for
that.
2015-01-15 11:01:52 +00:00
FRIGN 7a644aea7d Fix mapping a class to a simple set and improve error-reporting
It's standard behaviour to map a whole class of matched objects
to the last element of a given simple set2 instead of just passing
it through.
Also, error out more strictly when the user gives us bogus sets.
2015-01-12 11:19:43 +00:00
FRIGN 0f90528df7 Add proper casts and fix a small error 2015-01-11 22:35:15 +00:00
FRIGN 09704afc24 Add Unicode character class support
Thinking about it long enough, the solution seems almost trivial.
2015-01-11 22:35:15 +00:00
FRIGN 369bb01eb1 Prevail order 2015-01-10 19:56:34 +00:00
Hiltjo Posthuma 14c5ab48d5 tr: set2 must be set in some cases
echo abc | tr 'a' '' would crash because of:

	m--;
	r = set2[m].start + (off1 - off2) / set2[m].quant;

if set2ranges > 0 it's fine.
2015-01-10 18:16:43 +00:00
Hiltjo Posthuma cf714e6edb tr: fix signed/unsigned warnings 2015-01-10 17:00:01 +00:00
sin 1f3345b9e6 Staticise some symbols in tr(1) 2015-01-10 14:26:32 +00:00
FRIGN a582cb8a2f Rewrite tr(1) in a sane way
tr(1) always used to be a saddening part of sbase, which was
inherently broken and crufted.
But to be fair, the POSIX-standard doesn't make it very simple.
Given the current version was unfixable and broken by design, I
sat down and rewrote tr(1) very close to the concept of set theory
and the POSIX-standard with a few exceptions:

 - UTF-8: not allowed in POSIX, but in my opinion a must. This
          finally allows you to work with UTF-8 streams without
          problems or unexpected behaviour.
 - Equivalence classes: Left out, even GNU coreutils ignore them
                        and depending on LC_COLLATE, which sucks.
 - Character classes: No experiments or environment-variable-trickery.
                      Just plain definitions derived from the POSIX-
                      standard, working as expected.

I tested this thoroughly, but expect problems to show up in some
way given the wide range of input this program has to handle.
The only thing left on the TODO is to add support for literal
expressions ('\n', '\t', '\001', ...) and probably rethinking
the way [_*n] is unnecessarily restricted to string2.
2015-01-10 14:26:30 +00:00
Evan Gates 84b08427a1 remove agetline 2014-11-18 21:05:28 +00:00
FRIGN ec8246bbc6 Un-boolify sbase
It actually makes the binaries smaller, the code easier to read
(gems like "val == true", "val == false" are gone) and actually
predictable in the sense of that we actually know what we're
working with (one bitwise operator was quite adventurous and
should now be fixed).

This is also more consistent with the other suckless projects
around which don't use boolean types.
2014-11-14 10:54:20 +00:00
FRIGN 7d2683ddf2 Sort includes and more cleanup and fixes in util/ 2014-11-14 10:54:10 +00:00
FRIGN eee98ed3a4 Fix coding style
It was about damn time. Consistency is very important in such a
big codebase.
2014-11-13 18:08:43 +00:00
sin 0c5b7b9155 Stop using EXIT_{SUCCESS,FAILURE} 2014-10-02 23:46:59 +01:00
sin ac402965d5 Fix comment style and nuke stray whitespace 2014-07-16 20:43:29 +01:00
Adria Garriga b3a63a60e4 Improved tr
- Added support for character ranges ( a-z )
- Added support for complementary charset ( -c ), only in delete mode
- Added support for octal escape sequences
- Unicode now only works when there are no octal escape sequences,
  otherwise behavior is not predictable at first sight.
- tr now supports null characters in the input
- Does not yet have support for character classes ( [:upper:] )
2014-07-16 20:40:54 +01:00
Hiltjo Posthuma fab4b384e7 use agetline instead of agets
also use agetline where fgets with a static buffer was used previously.

Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-06-01 18:03:10 +01:00
Silvan Jegen 4e13ff39c3 Wrap mbtowc to check for errors 2014-04-12 21:29:16 +01:00
sin bc13aa5960 No need to cast return value of mmap() in tr 2014-04-12 20:33:59 +01:00
Hiltjo Posthuma a8f45b4568 tr: change delete behaviour
when one argument is specified use delete behaviour again

Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-12 20:33:10 +01:00
Hiltjo Posthuma ff474a8cbc tr: add dflag, error with usage() on invalid flag combination
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-09 15:40:21 +01:00
Hiltjo Posthuma 3e49e946b7 tr: fix escape code handling in set2
Signed-off-by: Hiltjo Posthuma <hiltjo@codemadness.org>
2014-04-09 15:40:04 +01:00
sin e9a4af87bd Staticise functions in tr(1) 2014-01-25 22:07:40 +00:00
sin fe6144793f Check mmap() return value and unmap at the end 2014-01-20 11:28:21 +00:00
Silvan Jegen 38f429a3d2 Add the tr program including man page 2014-01-20 11:22:28 +00:00