Commit Graph

12 Commits (master)

Author SHA1 Message Date
Richard Ipsum 2f0b15201d paste: Support -d '\0'
POSIX specifies that -d '\0' sets the delimiter to an empty string.
2020-04-15 16:11:12 -07:00
Michael Forney 28063c02f4 libutf: Change return type of utftorunestr to size_t
It returns the size of the rune array, so size_t is the right type
to use here.
2020-04-07 01:25:22 -07:00
Michael Forney 5d49332d4f libutf: Adjust runelen prototype to match definition
The `const` isn't useful here.
2019-04-16 17:27:38 -07:00
FRIGN 73577f10a0 Scrap chartorunearr(), introducing utftorunestr()
Interface and function as proposed by cls.

The reasoning behind this function is that cls expressed his
interest to keep memory allocation out of libutf, which is a
very good motive.
This simplifies the function a lot and should also increase the
speed a bit, but the most important factor here is that there's
no malloc anywhere in libutf, making it a lot smaller and more
robust with a smaller attack-surface.

Look at the paste(1) and tr(1) changes for an idiomatic way to
allocate the right amount of space for the Rune-array.
2015-02-11 21:32:09 +01:00
FRIGN 7c578bf5b0 Scrap writerune(), introducing fputrune()
Interface and function as proposed by cls.
Code is also shorter, everything else analogous to fgetrune().
2015-02-11 20:58:00 +01:00
FRIGN a5ae899a48 Scrap readrune(), introducing fgetrune()
Interface as proposed by cls, but internally rewritten after a few
The code is much shorter and to the point, aligning itself with other
standard functions. It should also be much faster, which is not bad.
2015-02-11 20:16:49 +01:00
FRIGN f9846a9a6b Split up is*rune() and to*rune() functions into individual source files
This optimizes the binary size for each tool that uses these functions.
Previously, if a program just used one single function, maybe even a
one-liner, it would statically compile in all lookup-tables, bloating
the binary by up to 20K.
All these changes are derived from a local libutf where I do the
primary changes. So I hope that I can merge these things into libutf
sooner or later, as discussed on the ml.
2015-02-11 15:48:18 +01:00
FRIGN 02ec321419 Add missing is*rune() functions and tolowerrune() and toupperrune()
This basically means that we now have an autogenerating typecheck
and case-conversion tool.
Don't freak out when you see the added LOC. Given we now have
an additional mapping to the uppercase-characters, some ranges got
"lost" and have to be written literally by the generating awk-script.

The runetypebody.h was generated by myself using my modified version
of mkrunetype.awk and I'll push the changed version as soon as this
has been discussed on the ml.
If you worry about speed, consider, that bsearch is just the right
tool for this job and can even handle a long array like this.
2015-02-11 13:12:27 +01:00
FRIGN a582cb8a2f Rewrite tr(1) in a sane way
tr(1) always used to be a saddening part of sbase, which was
inherently broken and crufted.
But to be fair, the POSIX-standard doesn't make it very simple.
Given the current version was unfixable and broken by design, I
sat down and rewrote tr(1) very close to the concept of set theory
and the POSIX-standard with a few exceptions:

 - UTF-8: not allowed in POSIX, but in my opinion a must. This
          finally allows you to work with UTF-8 streams without
          problems or unexpected behaviour.
 - Equivalence classes: Left out, even GNU coreutils ignore them
                        and depending on LC_COLLATE, which sucks.
 - Character classes: No experiments or environment-variable-trickery.
                      Just plain definitions derived from the POSIX-
                      standard, working as expected.

I tested this thoroughly, but expect problems to show up in some
way given the wide range of input this program has to handle.
The only thing left on the TODO is to add support for literal
expressions ('\n', '\t', '\001', ...) and probably rethinking
the way [_*n] is unnecessarily restricted to string2.
2015-01-10 14:26:30 +00:00
sin 18850f5dfa writerune() should operate on a FILE * 2014-11-21 16:34:57 +00:00
sin 5b5bb82ec0 Factor out readrune and writerune 2014-11-21 16:31:16 +00:00
sin 56709a2414 Import libutf from 2014-11-17 15:46:01 +00:00