llength() is currently a 'short' which can overflow and result in signed
numbers if line lengths are larger than 32k. We'll fix the overflow
separately, but before we do that, just use a signed int to hold the
value so that we don't overrun memory allocations when we converted that
negative number to a large positive unsigned integer.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This re-introduces vtputc() as the way to show characters, which
reinstates the control character handing, and simplifies show_line() in
the process.
vtputc now takes an "int" that is either a unicode character or a signed
char (so negative values in the range [-1, -128] are considered to be
the same as [128, 255]). This allows us to use it regardless of what
the source of data is.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The TAB handling got broken by commit cee00b0efb ("Show UTF-8 input as
UTF-8 output") when it stopped doing things one byte at a time.
I'm sure the other special character cases are broken too.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ok, so it may do odd things if it's not truly utf-8, and when moving up
and down lines that have utf-8 the cursor moves oddly (because the byte
offset within the line stays constant, rather than the character
offset), but with this you can actually open the UTF8 example file and
move around it, and at least some of the movement makes sense.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
.. by doing the stupid "convert to unicode value and back" model.
This actually populates the 'struct video' array with the unicode
values, so UTF8 input actually shows correctly. In particular, the nice
test-file (UTF-8-demo.txt) shows up not as garbage, but as the UTF-8 it
is.
HOWEVER!
Since the *editing* doesn't know about UTF-8, and considers it just a
stream of bytes, the end result is not actually a usable utf-8 editor.
So don't get too excited yet: this is just a partial step to "actually
edit utf8 data"
NOTE NOTE NOTE! If the character buffer contains Latin1, we will
transform that Latin1 to unicode, and then output it as UTF8. And we
will edit it correctly as the character-by-character data. Also, we
still do the "UTF8 to Latin1" translation on *input*, so with this
commit we can actually continue to *edit* Latin1 text.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is disgusting. And quite frankly, it's debatable whether this will
ever work. The "line" structure is still just an array of characters,
so that has to work with utf-8.
But the 'struct video' thing is what represents the actual screen
rectangle, and is fixed-size by the size of the screen. So making it
contain actual 32-bit unicode characters *may* make sense.
Right now we translate things the same way we always used to, though, so
utf-8 in 'struct line' will not be translated to the proper unicode
array, but to the bytes of the utf-8 representation. So this really
doesn't improve anything per se yet, just expands the memory use of the
video array.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I want to see the difference between space and nbsp, and I consider nbsp
to be a control character, so show it as such. Even if it is
technically "printable".
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch kills #ifdef'd code from display.c and file.c.
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Return statement is not a function so remove superfluous use of parenthesis.
Cc: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a basic usage() function to support the --help option.
Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>