If utf8_char2cells isn't told where the string that contains
the given UTF-8 character ends, it computes that itself. Two users
of utf8_char2cells, format_textutf8 and split_line, were calling
utf8_char2cells in a loop without providing the end of the string,
resulting in numerous calls by utf8_char2cells to strlen.
With this patch, format_textutf8 and split_line each find the end
of the string once and provide it to utf8_char2cells.
This particularly improves performance with textareas, since
format_textutf8 is called multiple times each time the user interacts
with the textarea and when it must be redrawn.
Closes: Bug 823 - Big textarea is too slow with CONFIG_UTF8
UCS_ORPHAN_CELL is currently defined as U+0020 SPACE, which was
already used before this macro, so the behaviour does not change,
but the code seems clearer now.
I searched for ' ' and 32 and 0x20 and \x20, and replaced with
UCS_ORPHAN_CELL wherever UCS_NO_CHAR was involved. However,
some BFU widgets first draw spaces and then overwrite with text;
those will require a more complex fix if UCS_ORPHAN_CELL is ever
changed to some other character.
The current rules are:
term.utf8
CONFIG_UTF8 UTF-8 I/O widget_data.cdata
----------- --------- ------------------
undefined disabled charset of the terminal
undefined enabled charset of the terminal
defined disabled charset of the terminal (*)
defined enabled always UTF-8
(*) kbd_field was incorrectly assuming UTF-8 in this case.
Explicitly compare the value that is returned by the widget handler
against EVENT_NOT_PROCESSED rather than relying on the fact that
EVENT_NOT_PROCESSED is equal to 1.
ffeedbdc5045a6a5db2bc75ecaab56bfe46c80ea
UCS_NO_CHAR here means the input was too short. Because the strings
generally come from the source code or from PO files, they should not
end in the middle of a character. However, the whole character may be
missing if the string is empty. So select_button_by_key() now checks
for that case separately.
UCS_NO_CHAR must not be passed to unicode_fold_label_case() because
the result is undefined.
The configure script no longer recognizes "CONFIG_UTF_8=yes" lines
in custom features.conf files. They will have to be changed to
"CONFIG_UTF8=yes". This incompatibility was deemed acceptable
because no released version of ELinks supports CONFIG_UTF_8.
The --enable-utf-8 option was not renamed.
Suggested by Miciah on #elinks.
What was renamed:
add_utf_8 => add_utf8
cp2utf_8 => cp2utf8
encode_utf_8 => encode_utf8
get_translation_table_to_utf_8 => get_translation_table_to_utf8
goto invalid_utf_8_start_byte => goto invalid_utf8_start_byte
goto utf_8 => goto utf8
goto utf_8_select => goto utf8_select
terminal_interlink.utf_8 => terminal_interlink.utf8
utf_8_to_unicode => utf8_to_unicode
What was not renamed:
terminal._template_.utf_8_io option, TERM_OPT_UTF_8_IO
Compatibility with existing elinks.conf files would require an alias.
--enable-utf-8
Because the name of the charset is UTF-8, --enable-utf-8 looks better
than --enable-utf8.
CONFIG_UTF_8
Will be renamed in a later commit.
Unicode/utf_8.cp, table_utf_8, aliases_utf_8
Will be renamed in a later commit.
This causes the documented-slow cp2u() to be called in a loop, which
fortunately doesn't have very many iterations. If this is too slow,
then cp2u() can be rewritten, or the hotkeys can be cached in struct
widget or struct widget_data.
Note that check_kbd_label_key() does not yet allow non-ASCII
characters when CONFIG_UTF_8 is defined. Before they are allowed,
menu.c should also be updated.
To reproduce before this patch:
- Run ELinks with an 80x25 terminal.
- Set document.browse.forms.confirm_submit = 1.
- Go to <http://bugzilla.elinks.cz/query.cgi>.
- Click the [ Search ] submit button.
- ELinks asks "Do you want to post form data to URL".
Each line of the URL begins at the horizontal center of the dialog,
and bleeds outside the right border of the dialog. Also, the
[ Yes ] and [ No ] buttons appear to float below the dialog.
Form fields and BFU text-input widgets then convert from UCS-4 to UTF-8.
If not all UTF-8 bytes fit, they don't insert anything. Thus it is no
longer possible to get invalid UTF-8 by hitting the length limit.
It is unclear to me which charset is supposed to be used for strings
in internal buffers. I made BFU insert UTF-8 whenever CONFIG_UTF_8,
but form fields use the charset of the terminal; that may have to be
changed.
As a side effect, this change should solve bug 782, because
term_send_ucs no longer encodes in UTF-8 if CONFIG_UTF_8 is defined.
I think the UTF-8 and codepage encoding calls I added are safe, too.
A similar bug may still surface somewhere else, but 782 could be
closed for now.
This change also lays the foundation for binding actions to non-ASCII
keys, but the keystroke name parser doesn't yet support that.
The CONFIG_UTF_8 mode does not currently support non-ASCII characters
in hot keys, either.
do_move_bookmark was only updating the selection in the bookmarks manager
window in which the Move button was pressed. Now all windows are updated.
This patch also prevents a crash when the first item that was displayed
in a box was the last child of a folder and was being moved (the comment
removed in this patch was incorrect in assuming that bm->box->next must
be valid because it neglected to account for non-root children).
This change required that I move the definition of struct
hierbox_dialog_list_item from src/bfu/hierbox.c to src/bfu/hierbox.h.
Thanks to Kalle Olavi Niemitalo for finding both the update problem
and the crash.
With regular comments in the definition of the structure itself,
and with xgettext:c-format comments in constants of that type,
if xgettext would otherwise guess wrong; so that translators
will know they'll have to double any percent signs they add.
I didn't regenerate PO files, though.
src/bfu/menu.c (scroll_menu): Let neither menu->selected nor pos
become -2.
src/bfu/menu.c (menu_mouse_handler): Call set_menu_selection directly
rather than via scroll_menu, as sel is already known to be selectable.
(Not required for fixing the bug.)
src/bfu/menu.c (menu_search_handler): Break infinite loops also if
menu->selected is -1 initially.
src/bfu/menu.c (menu_handler): Instead of tweaking menu->selected
directly, let scroll_menu do it.
This fixes two bugs:
1. Pressing F9 did not make the main menu visible, but then pressing
e.g. Right made it visible.
2. Pressing F9 and then Down displayed the first submenu (File) at the
wrong position on the screen.
src/bfu/README: This new file currently contains a diagram of how the
various struct types of src/bfu/ and src/terminal/ relate to each
other. More documentation may be added later, although if it is
specific to a particular structure, then it should probably go in the
corresponding header file so that people will remember to update it.
Note: there is ugly hack in ACT_EDIT_BACKSPACE where last byte of UTF-8
character is removed and then moving to left until complete UTF-8
character is found.
With UTF-8 support in terminal enabled it is possible to use double-width
UTF-8 strings as margins of buttons. Although they are displayed wrong when
UTF-8 support in terminal is disabled.
Preparation for using struct terminal in formating functions.
By now distinguish between formating widgets and formating widgets with
displaying was done with term == NULL and term != NULL. I hope I'am
not wrong.
Introduce the macros before_widgets and foreach_widget_back. Use the
latter in update_all_widgets instead of foreach_widget so that the
widgets are printed in reverse order, which means that any listbox is
drawn last, which allows it to grab the cursor from the selected button
when the dialogue box is initialised or redrawn.
Requested by Kirk Reiser for great usability with screen readers.
This changes the init target to be idempotent: most importantly it will now
never overwrite a Makefile if it exists. Additionally 'make init' will
generate the .vimrc files. Yay, no more stupid 'added fairies' commits! ;)
hierbox_ev_init and hierbox_ev_abort must return EVENT_NOT_PROCESSED
so that the generic dialog code runs and initialises the widgets and stuff.
This commit reverts commit f8310de64b to fix
a segfault and also adds comments to explain the unintuitive return value.
the same accelerator key to multiple buttons in a dialog box or
to multiple items in a menu. ELinks already has some support for
this but it requires the translator to run ELinks and manually
scan through all menus and dialogs. The attached changes make it
possible to quickly detect and list any conflicts, including ones
that can only occur on operating systems or configurations that
the translator is not currently using.
The changes have no immediate effect on the elinks executable or
the MO files. PO files become larger, however.
The scheme works like this:
- Like before, accelerator keys in translatable strings are
tagged with the tilde (~) character.
- Whenever a C source file defines an accelerator key, it must
assign one or more named "contexts" to it. The translations in
the PO files inherit these contexts. If multiple strings use
the same accelerator (case insensitive) in the same context,
that's a conflict and can be detected automatically.
- The contexts are defined with "gettext_accelerator_context"
comments in source files. These comments delimit regions where
all translatable strings containing tildes are given the same
contexts. There must be one special comment at the top of the
region; it lists the contexts assigned to that region. The
region automatically ends at the end of the function (found
with regexp /^\}/), but it can also be closed explicitly with
another special comment. The comments are formatted like this:
/* [gettext_accelerator_context(foo, bar, baz)]
begins a region that uses the contexts "foo", "bar", and "baz".
The comma is the delimiter; whitespace is optional.
[gettext_accelerator_context()]
ends the region. */
The scripts don't currently check whether this syntax occurs
inside or outside comments.
- The names of contexts consist of C identifiers delimited with
periods. I typically used the name of a function that sets
up a dialog, or the name of an array where the items of a
menu are listed. There is a special feature for static
functions: if the name begins with a period, then the period
will be replaced with the name of the source file and a colon.
- If a menu is programmatically generated from multiple parts,
of which some are never used together, so that it is safe to
use the same accelerators in them, then it is necessary to
define multiple contexts for the same menu. link_menu() in
src/viewer/text/link.c is the most complex example of this.
- During make update-po:
- A Perl script (po/gather-accelerator-contexts.pl) reads
po/elinks.pot, scans the source files listed in it for
"gettext_accelerator_context" comments, and rewrites
po/elinks.pot with "accelerator_context" comments that
indicate the contexts of each msgid: the union of all
contexts of all of its uses in the source files. It also
removes any "gettext_accelerator_context" comments that
xgettext --add-comments has copied to elinks.pot.
- If po/gather-accelerator-contexts.pl does not find any
contexts for some use of an msgid that seems to contain an
accelerator (because it contains a tilde), it warns. If the
tilde refers to e.g. "~/.elinks" and does not actually mark
an accelerator, the warning can be silenced by specifying the
special context "IGNORE", which the script otherwise ignores.
- msgmerge copies the "accelerator_context" comments from
po/elinks.pot to po/*.po. Translators do not edit those
comments.
- During make check-po:
- Another Perl script (po/check-accelerator-contexts.pl) reads
po/*.po and keeps track of which accelerators have been bound
in each context. It warns about any conflicts it finds.
This script does not access the C source files; thus it does
not matter if the line numbers in "#:" lines are out of date.
This implementation is not perfect and I am not proposing to
add it to the main source tree at this time. Specifically:
- It introduces compile-time dependencies on Perl and Locale::PO.
There should be a configure-time or compile-time check so that
the new features are skipped if the prerequisites are missing.
- When the scripts include msgstr strings in warnings, they
should transcode them from the charset of the PO file to the
one specified by the user's locale.
- It is not adequately documented (well, except perhaps here).
- po/check-accelerator-contexts.pl reports the same conflict
multiple times if it occurs in multiple contexts.
- The warning messages should include line numbers, so that users
of Emacs could conveniently edit the conflicting part of the PO
file. This is not feasible with the current version of
Locale::PO.
- Locale::PO does not understand #~ lines and spews warnings
about them. There is an ugly hack to hide these warnings.
- Jonas Fonseca suggested the script could propose accelerators
that are still available. This has not been implemented.
There are three files attached:
- po/gather-accelerator-contexts.pl: Augments elinks.pot with
context information.
- po/check-accelerator-contexts.pl: Checks conflicts.
- accelerator-contexts.diff: Makes po/Makefile run the scripts,
and adds special comments to source files.
The old behavior would lead screeanreaders to pronounce items like the
bookmarks as "unknown-unknown-<the link name>". That is the border
chars of the tree view would be renderered as "unknown".
Requested by Klaus Knopper.
Convert remaining conditional file building to use
OBJS-$(CONFIG_FOO) += foo.o
one problem with reverse meaining (in util/) fixed with local 'hack'.
Cleanup and remove stuff which is now default targets.
Ditch the building of an archive (.a) in favour of linking all objects in a
directory into a lib.o file. This makes it easy to link in subdirectories
and more importantly keeps the build logic in the local subdirectories.
Note: after updating you will have to rm **/*.a if you do not make clean
before updating.