sbase

Author	SHA1	Message	Date
sin	d4830dba30	Fix fgetrune on systems where char is unsigned by default (ARM) Store the result in an int and do the comparison. This is always safe without using strange constructs like "signed char". wc(1) would go into an infinite loop when executed on an ARM system.	2015-02-13 15:42:54 +00:00
FRIGN	ab9b240dc6	Fix warnings and update isalpharune()	2015-02-12 17:08:02 +01:00
FRIGN	4f6d696894	Also add "B"-type characters to isspacerune()	2015-02-12 16:48:22 +01:00
FRIGN	24d6cb90e7	Amend isspacerune() properly with WS and S Unicode characters	2015-02-12 16:41:57 +01:00
FRIGN	ce11e1f195	Add section for laces in lowerrune and upperrune and more ranges This is a special third kind of structure found in Unicode, besides singletons and ranges. This dramatically reduces the number of explicit singletons in the lookup tables. Also, I changed the awk-script so that it can sort trivial translations as well, breaking down the LOC even more. The binary size of tr dropped from 67K to 51K.	2015-02-12 16:18:02 +01:00
FRIGN	9565eef895	Refactor uppercase-inclusion in libutf Previously, the to*rune function would have to jiggle with two arrays, and it somehow evaded me that it is actually way simpler to just add another entry to the arrays if needed. Binary size goes slightly down, e.g. tr statically linked against musl: 68072 -> 67688 Behind the scenes though the conversion should be a bit faster and, more importantly, the scary case-conversion function is simplified and easier to understand. It also drops nearly half the LOC in upperrune.c and lowerrune.c.	2015-02-12 12:28:45 +01:00
FRIGN	73577f10a0	Scrap chartorunearr(), introducing utftorunestr() Interface and function as proposed by cls. The reasoning behind this function is that cls expressed his interest to keep memory allocation out of libutf, which is a very good motive. This simplifies the function a lot and should also increase the speed a bit, but the most important factor here is that there's no malloc anywhere in libutf, making it a lot smaller and more robust with a smaller attack-surface. Look at the paste(1) and tr(1) changes for an idiomatic way to allocate the right amount of space for the Rune-array.	2015-02-11 21:32:09 +01:00
FRIGN	1c462012e4	Rename variable Rune * p -> r in fgetrune()	2015-02-11 21:14:28 +01:00
FRIGN	7c578bf5b0	Scrap writerune(), introducing fputrune() Interface and function as proposed by cls. Code is also shorter, everything else analogous to fgetrune().	2015-02-11 20:58:00 +01:00
FRIGN	a5ae899a48	Scrap readrune(), introducing fgetrune() Interface as proposed by cls, but internally rewritten after a few considerations. The code is much shorter and to the point, aligning itself with other standard functions. It should also be much faster, which is not bad.	2015-02-11 20:16:49 +01:00
FRIGN	f9846a9a6b	Split up isrune() and torune() functions into individual source files This optimizes the binary size for each tool that uses these functions. Previously, if a program just used one single function, maybe even a one-liner, it would statically compile in all lookup-tables, bloating the binary by up to 20K. All these changes are derived from a local libutf where I do the primary changes. So I hope that I can merge these things into libutf sooner or later, as discussed on the ml.	2015-02-11 15:48:18 +01:00
FRIGN	22f40b5a03	Add explicit boundary to loop in readrune() You never know what could happen. Better have a "blind" read than a segmentation fault.	2015-02-01 04:20:02 +01:00
FRIGN	696bb992c3	Return number of bytes read even on a partial read and set the rune to Runeerror for later checking (if desired).	2015-02-01 03:54:56 +01:00
FRIGN	f2e6af7350	Return number of bytes read in readrune() Could be useful in some cases...	2015-02-01 03:43:54 +01:00
Hiltjo Posthuma	4469b0b641	chartorunearr: initialize ret	2015-01-10 17:00:01 +00:00
FRIGN	a582cb8a2f	Rewrite tr(1) in a sane way tr(1) always used to be a saddening part of sbase, which was inherently broken and crufted. But to be fair, the POSIX-standard doesn't make it very simple. Given the current version was unfixable and broken by design, I sat down and rewrote tr(1) very close to the concept of set theory and the POSIX-standard with a few exceptions: - UTF-8: not allowed in POSIX, but in my opinion a must. This finally allows you to work with UTF-8 streams without problems or unexpected behaviour. - Equivalence classes: Left out, even GNU coreutils ignore them and depending on LC_COLLATE, which sucks. - Character classes: No experiments or environment-variable-trickery. Just plain definitions derived from the POSIX- standard, working as expected. I tested this thoroughly, but expect problems to show up in some way given the wide range of input this program has to handle. The only thing left on the TODO is to add support for literal expressions ('\n', '\t', '\001', ...) and probably rethinking the way [_*n] is unnecessarily restricted to string2.	2015-01-10 14:26:30 +00:00
sin	18850f5dfa	writerune() should operate on a FILE *	2014-11-21 16:34:57 +00:00
sin	5b5bb82ec0	Factor out readrune and writerune	2014-11-21 16:31:16 +00:00
sin	56709a2414	Import libutf from http://git.suckless.org/libutf	2014-11-17 15:46:01 +00:00

19 Commits