The previous scheme incorrectly accepted 0xC1 0x80 as U+0040.
That could have been fixed by tweaking the loop, but the constant
array is surely easier to verify.
In the previous version, invalid UTF-8 from a terminal caused
UCS_NO_CHAR (0xFFFFFFFD) to be stored in a term_event_key_T, resulting
in -3 which was then incidentally treated as an unassigned special key.
Now, invalid UTF-8 is instead mapped to UCS_REPLACEMENT_CHARACTER
and treated as a character. The fact that handle_interlink_event
calls term_send_ucs when it receives invalid UTF-8 makes it pretty
clear that this is how it was intended.
src/viewer/text/link.c (not changed in this commit) already referred
to UCS_REPLACEMENT_CHARACTER in a comment even though it was not
previously defined.
Previously, ELinks used to silently discard the Alt modifier from
Alt-ö keystrokes when UTF-8 I/O was enabled. Now, separate actions
can be bound to ö and Alt-ö.
However, if CONFIG_UTF_8 is defined, then actions cannot be bound to
non-ASCII characters, regardless of modifiers. This is because the
code that handles names of keystrokes assumes a character can only be
a single byte. This commit does not change that.
Form fields and BFU text-input widgets then convert from UCS-4 to UTF-8.
If not all UTF-8 bytes fit, they don't insert anything. Thus it is no
longer possible to get invalid UTF-8 by hitting the length limit.
It is unclear to me which charset is supposed to be used for strings
in internal buffers. I made BFU insert UTF-8 whenever CONFIG_UTF_8,
but form fields use the charset of the terminal; that may have to be
changed.
As a side effect, this change should solve bug 782, because
term_send_ucs no longer encodes in UTF-8 if CONFIG_UTF_8 is defined.
I think the UTF-8 and codepage encoding calls I added are safe, too.
A similar bug may still surface somewhere else, but 782 could be
closed for now.
This change also lays the foundation for binding actions to non-ASCII
keys, but the keystroke name parser doesn't yet support that.
The CONFIG_UTF_8 mode does not currently support non-ASCII characters
in hot keys, either.
For instance, if Ctrl-F1 were pressed and src/terminal/kbd.c supported it,
then toupper(KBD_F1) would be called, resulting in undefined behaviour.
src/terminal/kbd.c does not support such combinations yet, but it is
safest to fix the bug already.