If CONFIG_UTF8 is not defined, then text_end is not used, and GCC
could warn about that. Because configure can add -Werror to CFLAGS,
the warning could then cause the whole build to fail.
If utf8_char2cells isn't told where the string that contains
the given UTF-8 character ends, it computes that itself. Two users
of utf8_char2cells, format_textutf8 and split_line, were calling
utf8_char2cells in a loop without providing the end of the string,
resulting in numerous calls by utf8_char2cells to strlen.
With this patch, format_textutf8 and split_line each find the end
of the string once and provide it to utf8_char2cells.
This particularly improves performance with textareas, since
format_textutf8 is called multiple times each time the user interacts
with the textarea and when it must be redrawn.
Closes: Bug 823 - Big textarea is too slow with CONFIG_UTF8
UCS_ORPHAN_CELL is currently defined as U+0020 SPACE, which was
already used before this macro, so the behaviour does not change,
but the code seems clearer now.
I searched for ' ' and 32 and 0x20 and \x20, and replaced with
UCS_ORPHAN_CELL wherever UCS_NO_CHAR was involved. However,
some BFU widgets first draw spaces and then overwrite with text;
those will require a more complex fix if UCS_ORPHAN_CELL is ever
changed to some other character.
The current rules are:
term.utf8
CONFIG_UTF8 UTF-8 I/O widget_data.cdata
----------- --------- ------------------
undefined disabled charset of the terminal
undefined enabled charset of the terminal
defined disabled charset of the terminal (*)
defined enabled always UTF-8
(*) kbd_field was incorrectly assuming UTF-8 in this case.
Explicitly compare the value that is returned by the widget handler
against EVENT_NOT_PROCESSED rather than relying on the fact that
EVENT_NOT_PROCESSED is equal to 1.
ffeedbdc5045a6a5db2bc75ecaab56bfe46c80ea
UCS_NO_CHAR here means the input was too short. Because the strings
generally come from the source code or from PO files, they should not
end in the middle of a character. However, the whole character may be
missing if the string is empty. So select_button_by_key() now checks
for that case separately.
UCS_NO_CHAR must not be passed to unicode_fold_label_case() because
the result is undefined.
The configure script no longer recognizes "CONFIG_UTF_8=yes" lines
in custom features.conf files. They will have to be changed to
"CONFIG_UTF8=yes". This incompatibility was deemed acceptable
because no released version of ELinks supports CONFIG_UTF_8.
The --enable-utf-8 option was not renamed.
Suggested by Miciah on #elinks.
What was renamed:
add_utf_8 => add_utf8
cp2utf_8 => cp2utf8
encode_utf_8 => encode_utf8
get_translation_table_to_utf_8 => get_translation_table_to_utf8
goto invalid_utf_8_start_byte => goto invalid_utf8_start_byte
goto utf_8 => goto utf8
goto utf_8_select => goto utf8_select
terminal_interlink.utf_8 => terminal_interlink.utf8
utf_8_to_unicode => utf8_to_unicode
What was not renamed:
terminal._template_.utf_8_io option, TERM_OPT_UTF_8_IO
Compatibility with existing elinks.conf files would require an alias.
--enable-utf-8
Because the name of the charset is UTF-8, --enable-utf-8 looks better
than --enable-utf8.
CONFIG_UTF_8
Will be renamed in a later commit.
Unicode/utf_8.cp, table_utf_8, aliases_utf_8
Will be renamed in a later commit.
This causes the documented-slow cp2u() to be called in a loop, which
fortunately doesn't have very many iterations. If this is too slow,
then cp2u() can be rewritten, or the hotkeys can be cached in struct
widget or struct widget_data.
Note that check_kbd_label_key() does not yet allow non-ASCII
characters when CONFIG_UTF_8 is defined. Before they are allowed,
menu.c should also be updated.
To reproduce before this patch:
- Run ELinks with an 80x25 terminal.
- Set document.browse.forms.confirm_submit = 1.
- Go to <http://bugzilla.elinks.cz/query.cgi>.
- Click the [ Search ] submit button.
- ELinks asks "Do you want to post form data to URL".
Each line of the URL begins at the horizontal center of the dialog,
and bleeds outside the right border of the dialog. Also, the
[ Yes ] and [ No ] buttons appear to float below the dialog.
Form fields and BFU text-input widgets then convert from UCS-4 to UTF-8.
If not all UTF-8 bytes fit, they don't insert anything. Thus it is no
longer possible to get invalid UTF-8 by hitting the length limit.
It is unclear to me which charset is supposed to be used for strings
in internal buffers. I made BFU insert UTF-8 whenever CONFIG_UTF_8,
but form fields use the charset of the terminal; that may have to be
changed.
As a side effect, this change should solve bug 782, because
term_send_ucs no longer encodes in UTF-8 if CONFIG_UTF_8 is defined.
I think the UTF-8 and codepage encoding calls I added are safe, too.
A similar bug may still surface somewhere else, but 782 could be
closed for now.
This change also lays the foundation for binding actions to non-ASCII
keys, but the keystroke name parser doesn't yet support that.
The CONFIG_UTF_8 mode does not currently support non-ASCII characters
in hot keys, either.
do_move_bookmark was only updating the selection in the bookmarks manager
window in which the Move button was pressed. Now all windows are updated.
This patch also prevents a crash when the first item that was displayed
in a box was the last child of a folder and was being moved (the comment
removed in this patch was incorrect in assuming that bm->box->next must
be valid because it neglected to account for non-root children).
This change required that I move the definition of struct
hierbox_dialog_list_item from src/bfu/hierbox.c to src/bfu/hierbox.h.
Thanks to Kalle Olavi Niemitalo for finding both the update problem
and the crash.
With regular comments in the definition of the structure itself,
and with xgettext:c-format comments in constants of that type,
if xgettext would otherwise guess wrong; so that translators
will know they'll have to double any percent signs they add.
I didn't regenerate PO files, though.
src/bfu/menu.c (scroll_menu): Let neither menu->selected nor pos
become -2.
src/bfu/menu.c (menu_mouse_handler): Call set_menu_selection directly
rather than via scroll_menu, as sel is already known to be selectable.
(Not required for fixing the bug.)
src/bfu/menu.c (menu_search_handler): Break infinite loops also if
menu->selected is -1 initially.
src/bfu/menu.c (menu_handler): Instead of tweaking menu->selected
directly, let scroll_menu do it.
This fixes two bugs:
1. Pressing F9 did not make the main menu visible, but then pressing
e.g. Right made it visible.
2. Pressing F9 and then Down displayed the first submenu (File) at the
wrong position on the screen.
src/bfu/README: This new file currently contains a diagram of how the
various struct types of src/bfu/ and src/terminal/ relate to each
other. More documentation may be added later, although if it is
specific to a particular structure, then it should probably go in the
corresponding header file so that people will remember to update it.