llength() is currently a 'short' which can overflow and result in signed
numbers if line lengths are larger than 32k. We'll fix the overflow
separately, but before we do that, just use a signed int to hold the
value so that we don't overrun memory allocations when we converted that
negative number to a large positive unsigned integer.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This re-introduces vtputc() as the way to show characters, which
reinstates the control character handing, and simplifies show_line() in
the process.
vtputc now takes an "int" that is either a unicode character or a signed
char (so negative values in the range [-1, -128] are considered to be
the same as [128, 255]). This allows us to use it regardless of what
the source of data is.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The TAB handling got broken by commit cee00b0efb ("Show UTF-8 input as
UTF-8 output") when it stopped doing things one byte at a time.
I'm sure the other special character cases are broken too.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ok, so it may do odd things if it's not truly utf-8, and when moving up
and down lines that have utf-8 the cursor moves oddly (because the byte
offset within the line stays constant, rather than the character
offset), but with this you can actually open the UTF8 example file and
move around it, and at least some of the movement makes sense.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
.. by doing the stupid "convert to unicode value and back" model.
This actually populates the 'struct video' array with the unicode
values, so UTF8 input actually shows correctly. In particular, the nice
test-file (UTF-8-demo.txt) shows up not as garbage, but as the UTF-8 it
is.
HOWEVER!
Since the *editing* doesn't know about UTF-8, and considers it just a
stream of bytes, the end result is not actually a usable utf-8 editor.
So don't get too excited yet: this is just a partial step to "actually
edit utf8 data"
NOTE NOTE NOTE! If the character buffer contains Latin1, we will
transform that Latin1 to unicode, and then output it as UTF8. And we
will edit it correctly as the character-by-character data. Also, we
still do the "UTF8 to Latin1" translation on *input*, so with this
commit we can actually continue to *edit* Latin1 text.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>