For some reason I had limited things to 0xffff, it really should be 0x10ffff.
We don't actually support a full 32-bit unicode model anyway, since we
use the high bits for the control/meta/^X/special bits, but there was no
reason to limit things to 16 bits when we had 28 bits available. And
the real limit for real Unicode characters is 0x10ffff.
Add a silly example character past the 16-bit range to the UTF8 demo
file:
'SMILING FACE WITH HALO' (U+1F607)
from the 'emoticons' block.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes actual basic editing work. Including things like
justify-paragraph etc, so lines get justified by number of UTF8
characters rather than bytes.
There are probably tons of broken stuff left, but this actually seems to
get the basics working right.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes it possible to cut-and-paste the UTF8 testfile into a new
buffer, and the end result looks correct.
NOTE! We still do various things wrong while editing. For example,
while the cursor movements were fixed, simple things like deleting a
character still work on single bytes, rather than utf8 characters.
So while this is getting much closer to actually editing UTF-8 data,
it's not there yet.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While I'm here, improve the word of the above two options.
Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
uemacs is not a subprogram and doesn't seem it will be, so there is no reason
to leave this macro. And this macro is defined to 0, so we never reach the path
where we test for this macro.
Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>