Use it for the actual I/O only. Previously, defining CONFIG_UTF8 and
enabling UTF-8 used to force many strings to the UTF-8 charset
regardless of the terminal charset option. Now, those strings always
follow the terminal charset. This fixes bug 914 which was caused
because _() returned strings in the terminal charset and functions
then assumed they were in UTF-8. This reduction in the effects of
UTF-8 I/O may also simplify future testing.
Previously, html_special_form_control converted
form_control.default_value to the terminal charset, and init_form_state
then copied the value to form_state.value. However, when CONFIG_UTF8
is defined and UTF-8 I/O is enabled, form_state.value is supposed to
be in UTF-8, rather than in the terminal charset.
This mismatch could not be conveniently fixed in
html_special_form_control because that does not know which terminal is
being used and whether UTF-8 I/O is enabled there. Also, constructing
a conversion table from the document charset to form_state.value could
have ruined renderer_context.convert_table, because src/intl/charsets.c
does not support multiple concurrent conversion tables.
So instead, we now keep form_control.default_value in the document
charset, and convert it in the viewer each time it is needed. Because
the result of the conversion is kept in form_state.value between
incremental renderings, this shouldn't even slow things down too much.
I am not implementing the proper charset conversions for the DOM
defaultValue property yet, because the current code doesn't have
them for other string properties either, and bug 805 is already open
for that.
I am going make fc->default_value use the charset of the document, and
recoding the string from the form history to that might lose characters.
This change also affects what ECMAScript sees in the defaultValue property.
<http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML-20030109/html.html#ID-26091157>
says it should represent the HTML "value" attribute, so changing it
based on form history is not appropriate.
straconcat reads the args with va_arg(ap, const unsigned char *),
and the NULL macro may have the wrong type (e.g. int).
Many places pass string literals of type char * to straconcat. This
is in principle also a violation, but I'm ignoring it for now because
if it becomes a problem with some C implementation, then so will the
use of unsigned char * with printf "%s", which is so widespread in
ELinks that I'm not going to try fixing it now.
In goto_mark, copy the current_link of the old view state to the
old_current_link of the new view state so that clear_link will properly
clear the highlight for that link.
This fixes a bug introduced with the removal of link_bg in commit
c91c763d49.
* Recompute the pos variable for each cell, rather than just once per line.
This fixes the bug that only the first cell was being examined.
* Moved the bulk of the code outside the "if (frame && data >= 176 &&
data < 224)" conditional. This fixes the bug that only frame
characters were being added to the string.
* If the cell has UCS_NO_CHAR in it, don't add that to the string.
* Call encode_utf8 even for characters that originated from a frame.
This does not matter yet but will be correct if the function is
later changed to use the Unicode line-drawing characters for frames.
This allows code to use document->cached instead of
find_in_cache(document->uri), thereby increasing the likelihood
of getting the correct cache entry.
This should fix Bug 756 - "assertion (cached)->object.refcount >= 0 failed"
after HTTP proxy was changed.
Patches for this were written by me and then later by Jonas.
This commit combines our independent implementations.
As draw_textarea_utf8 loops over each character of the textarea content, it
checks whether the character is on the screen; draws it if so; increments
the screen co-ordinate; and updates the position in the textarea text.
The last step was being skipped when the character was not on the line,
so a line would be drawn from the beginning, even if the left edge of the
textarea is off the screen.
Closes: Bug 835 - Text in textarea is unaffected by horizontal scrolling of
document in UTF-8 mode
If utf8_char2cells isn't told where the string that contains
the given UTF-8 character ends, it computes that itself. Two users
of utf8_char2cells, format_textutf8 and split_line, were calling
utf8_char2cells in a loop without providing the end of the string,
resulting in numerous calls by utf8_char2cells to strlen.
With this patch, format_textutf8 and split_line each find the end
of the string once and provide it to utf8_char2cells.
This particularly improves performance with textareas, since
format_textutf8 is called multiple times each time the user interacts
with the textarea and when it must be redrawn.
Closes: Bug 823 - Big textarea is too slow with CONFIG_UTF8
UCS_ORPHAN_CELL is currently defined as U+0020 SPACE, which was
already used before this macro, so the behaviour does not change,
but the code seems clearer now.
I searched for ' ' and 32 and 0x20 and \x20, and replaced with
UCS_ORPHAN_CELL wherever UCS_NO_CHAR was involved. However,
some BFU widgets first draw spaces and then overwrite with text;
those will require a more complex fix if UCS_ORPHAN_CELL is ever
changed to some other character.
The configure script no longer recognizes "CONFIG_UTF_8=yes" lines
in custom features.conf files. They will have to be changed to
"CONFIG_UTF8=yes". This incompatibility was deemed acceptable
because no released version of ELinks supports CONFIG_UTF_8.
The --enable-utf-8 option was not renamed.
Suggested by Miciah on #elinks.
What was renamed:
add_utf_8 => add_utf8
cp2utf_8 => cp2utf8
encode_utf_8 => encode_utf8
get_translation_table_to_utf_8 => get_translation_table_to_utf8
goto invalid_utf_8_start_byte => goto invalid_utf8_start_byte
goto utf_8 => goto utf8
goto utf_8_select => goto utf8_select
terminal_interlink.utf_8 => terminal_interlink.utf8
utf_8_to_unicode => utf8_to_unicode
What was not renamed:
terminal._template_.utf_8_io option, TERM_OPT_UTF_8_IO
Compatibility with existing elinks.conf files would require an alias.
--enable-utf-8
Because the name of the charset is UTF-8, --enable-utf-8 looks better
than --enable-utf8.
CONFIG_UTF_8
Will be renamed in a later commit.
Unicode/utf_8.cp, table_utf_8, aliases_utf_8
Will be renamed in a later commit.
form_state.state_cell is no longer used for FC_TEXT, FC_PASSWORD, nor FC_FILE.
Instead, get_link_cursor_offset() computes the cell with utf8_ptr2chars
(a new function) or utf8_ptr2cells. This shouldn't slow down ELinks too
much, as it's done only for the selected link and only once per redraw.
The left side of a scrolled input field is always aligned at a
character boundary. The right side might not be.
The new comments describe how the members were apparently intended to
be used. However, the implementation does not actually work when
CONFIG_UTF_8 is defined, and the current semantics do not even allow
an efficient implementation of long (mostly scrolled out) strings.
Restring the spaghetti in dump_to_file to fix a bug that was introduced
in commit 2a6125e3d0 whereby when
document.dump.codepage != "utf-8", the document itself was not output,
only the references list.
Actions can now be bound to e.g. Ctrl-Alt-A. The keybinding code also
supports other combinations of modifiers, like Shift-Ctrl-Up, but the
escape sequence decoder doesn't yet.
Don't let Ctrl-Alt-letter combinations open menus.
The cast is not necessary since we already check the bounds, but by using a
cast here, it hopefully makes it more obvious what the long comment above
is pointing out: namely that we put the value of a signed integer into an
unsigned char.