The numbering of document.dump.color_mode and terminal._template_.colors
is now the same regardless of compile-time options, unlike in previous
versions. Therefore this version of ELinks may interpret a configuration
file differently from previous versions even if compiled with the same
options. This is unfortunate but the alternatives (keeping the numbering
dependent on configuration options; defining separate options that use
the new numbering; starting the numbers from 10 or so and recognizing the
previous ones only for compatibility) seem even worse.
This does not conflict with querying the palette from xterm (bug 890)
because although those palettes would have to be modifiable, they
would be terminal-specific rather than global.
UCS_ORPHAN_CELL is currently defined as U+0020 SPACE, which was
already used before this macro, so the behaviour does not change,
but the code seems clearer now.
I searched for ' ' and 32 and 0x20 and \x20, and replaced with
UCS_ORPHAN_CELL wherever UCS_NO_CHAR was involved. However,
some BFU widgets first draw spaces and then overwrite with text;
those will require a more complex fix if UCS_ORPHAN_CELL is ever
changed to some other character.
The previous version assumed the first non-digit after the CSI was the
Final Byte, for example the first semicolon in the "\E[?1;2c" report.
It then treated all subsequent bytes as typed characters.
According to Standard ECMA-48 (Fifth Edition - June 1991), there may
be any number of Parameter Bytes in the range 0x30 to 0x3F, and then
any number of Intermediate Bytes in the range 0x20 to 0x2F, between
the CSI and the Final Byte.
This version still does not support control sequences longer than
ITRM_IN_QUEUE_SIZE bytes.
To reproduce:
- Start ELinks.
- Enable the ui.tabs.wraparound option.
- Press t to open a second tab.
- Go to http://elinks.cz/ in the second tab.
- Press 3< to step three tabs to the left.
In the statement "tab = tabs + tab % tabs;", tab == -2 and tabs == 2.
So tab % tabs == 0 and tab becomes 2, which is out of range.
The new version calls get_opt_bool even if the tab parameter is already in
range, but the cost should be negligible compared to the redraw_terminal()
call that follows.
The configure script no longer recognizes "CONFIG_UTF_8=yes" lines
in custom features.conf files. They will have to be changed to
"CONFIG_UTF8=yes". This incompatibility was deemed acceptable
because no released version of ELinks supports CONFIG_UTF_8.
The --enable-utf-8 option was not renamed.
Suggested by Miciah on #elinks.
What was renamed:
add_utf_8 => add_utf8
cp2utf_8 => cp2utf8
encode_utf_8 => encode_utf8
get_translation_table_to_utf_8 => get_translation_table_to_utf8
goto invalid_utf_8_start_byte => goto invalid_utf8_start_byte
goto utf_8 => goto utf8
goto utf_8_select => goto utf8_select
terminal_interlink.utf_8 => terminal_interlink.utf8
utf_8_to_unicode => utf8_to_unicode
What was not renamed:
terminal._template_.utf_8_io option, TERM_OPT_UTF_8_IO
Compatibility with existing elinks.conf files would require an alias.
--enable-utf-8
Because the name of the charset is UTF-8, --enable-utf-8 looks better
than --enable-utf8.
CONFIG_UTF_8
Will be renamed in a later commit.
Unicode/utf_8.cp, table_utf_8, aliases_utf_8
Will be renamed in a later commit.
Reported by Jonas Fonseca.
Also add an empty line above the label in init_tab; but there are
still several labels elsewhere that don't have empty lines above them.
The previous scheme incorrectly accepted 0xC1 0x80 as U+0040.
That could have been fixed by tweaking the loop, but the constant
array is surely easier to verify.
In the previous version, invalid UTF-8 from a terminal caused
UCS_NO_CHAR (0xFFFFFFFD) to be stored in a term_event_key_T, resulting
in -3 which was then incidentally treated as an unassigned special key.
Now, invalid UTF-8 is instead mapped to UCS_REPLACEMENT_CHARACTER
and treated as a character. The fact that handle_interlink_event
calls term_send_ucs when it receives invalid UTF-8 makes it pretty
clear that this is how it was intended.
src/viewer/text/link.c (not changed in this commit) already referred
to UCS_REPLACEMENT_CHARACTER in a comment even though it was not
previously defined.
Decrement term->current_tab before calling delete_window() instead of after
deleting all backgrounded tabs, so get_tab_by_number() will see a
consistent value.
Actions can now be bound to e.g. Ctrl-Alt-A. The keybinding code also
supports other combinations of modifiers, like Shift-Ctrl-Up, but the
escape sequence decoder doesn't yet.
Don't let Ctrl-Alt-letter combinations open menus.
This requires compiling cp2u() in even without CONFIG_UTF_8.
I also added an is_kbd_character macro to make try_document_key
more resilient to changes in the definition of term_event_key_T.
Previously, ELinks used to silently discard the Alt modifier from
Alt- keystrokes when UTF-8 I/O was enabled. Now, separate actions
can be bound to and Alt-.
However, if CONFIG_UTF_8 is defined, then actions cannot be bound to
non-ASCII characters, regardless of modifiers. This is because the
code that handles names of keystrokes assumes a character can only be
a single byte. This commit does not change that.
Form fields and BFU text-input widgets then convert from UCS-4 to UTF-8.
If not all UTF-8 bytes fit, they don't insert anything. Thus it is no
longer possible to get invalid UTF-8 by hitting the length limit.
It is unclear to me which charset is supposed to be used for strings
in internal buffers. I made BFU insert UTF-8 whenever CONFIG_UTF_8,
but form fields use the charset of the terminal; that may have to be
changed.
As a side effect, this change should solve bug 782, because
term_send_ucs no longer encodes in UTF-8 if CONFIG_UTF_8 is defined.
I think the UTF-8 and codepage encoding calls I added are safe, too.
A similar bug may still surface somewhere else, but 782 could be
closed for now.
This change also lays the foundation for binding actions to non-ASCII
keys, but the keystroke name parser doesn't yet support that.
The CONFIG_UTF_8 mode does not currently support non-ASCII characters
in hot keys, either.
There is no need to check whether ev->ev == EVENT_KBD;
if decode_terminal_escape_sequence called
decode_terminal_mouse_escape_sequence, then the former neither modified
kbd.key nor passed &kbd to the latter, so kbd.key remains KBD_UNDEF.
If ev->ev was not checked, then it should not be trusted either.
So reinitialize the whole *ev if a keyboard event was indeed found.
For instance, if Ctrl-F1 were pressed and src/terminal/kbd.c supported it,
then toupper(KBD_F1) would be called, resulting in undefined behaviour.
src/terminal/kbd.c does not support such combinations yet, but it is
safest to fix the bug already.
Also list the capnames with which the escape sequences could be
read from Terminfo, and the ECMA-48 interpretations of the bytes
(parenthesized if they seem unrelated to the keys). This is in
preparation for fixing bug 96.
decode_terminal_escape_sequence() used to handle both, but
there is now a separate decode_terminal_application_key()
for ESC O. I have not yet edited decode_terminal_escape_sequence();
there may be dead code in it.
If there is e.g. ESC [ in the input buffer, combine that to Alt-[.
Check the first character too; don't blindly assume it is ESC, as
it can be NUL as well. Note this means you can no longer activate
the main menu by pressing Ctrl-@ (or Ctrl-Space on some terminals).
Otherwise, the timeout could cause ELinks to resume reading from
the terminal device while another process is still using it.
This actually happened in a test.
On entry to some functions that could resume reading from the device,
assert that the terminal has not been blocked.