The current rules are:
term.utf8
CONFIG_UTF8 UTF-8 I/O widget_data.cdata
----------- --------- ------------------
undefined disabled charset of the terminal
undefined enabled charset of the terminal
defined disabled charset of the terminal (*)
defined enabled always UTF-8
(*) kbd_field was incorrectly assuming UTF-8 in this case.
This reverts the following commits:
- 86ed79deaf
Use wcwidth if available and applicable.
- 304f5fa1ea
comment fix (__STDC_ISO_10646__, not __STDC_ISO_10646)
- part of 71eebf1cc7
Compensate for glibc not defining wcwidth() when _XOPEN_SOURCE is not set
And adds a lengthy comment about LC_CTYPE problems.
Explicitly compare the value that is returned by the widget handler
against EVENT_NOT_PROCESSED rather than relying on the fact that
EVENT_NOT_PROCESSED is equal to 1.
ffeedbdc5045a6a5db2bc75ecaab56bfe46c80ea
The secure file saving code plays some shenanigans with the umask.
Previously, the code could fail to restore the old umask when certain libc
calls failed: malloc, mkstemp, fdopen, and fopen. This resulted in
unrelated code creating files with the wrong umode. Specifically, the
download code's automatic directory creation was creating directories
without the execute permission bit.
Thanks to Quiznos for reporting and helping to track the problem down.
To reproduce:
- Start ELinks.
- Enable the ui.tabs.wraparound option.
- Press t to open a second tab.
- Go to http://elinks.cz/ in the second tab.
- Press 3< to step three tabs to the left.
In the statement "tab = tabs + tab % tabs;", tab == -2 and tabs == 2.
So tab % tabs == 0 and tab becomes 2, which is out of range.
The new version calls get_opt_bool even if the tab parameter is already in
range, but the cost should be negligible compared to the redraw_terminal()
call that follows.
Previously, each mapping between a codepage byte and a Unicode
character was stored as a struct table_entry, which listed both the
byte and the character. This representation may be optimal for sparse
mappings, but codepages map almost every possible byte to a character,
so it is more efficient to just have an array that lists the Unicode
character corresponding to each byte from 0x80 to 0xFF. The bytes are
not stored but rather implied by the array index. The tcvn5712 and
viscii codepages have a total of four mappings that do not fit in the
arrays, so we still use struct table_entry for those.
This change also makes cp2u() operate in O(1) time and may speed up
other functions as well.
The "sed | while read" concoction in Unicode/gen-cp looks rather
unhealthy. It would probably be faster and more readable if rewritten
in Perl, but IMO that goes for the previous version as well, so I
suppose whoever wrote it had a reason not to use Perl here.
Before:
text data bss dec hex filename
38948 28528 3311 70787 11483 src/intl/charsets.o
500096 85568 82112 667776 a3080 src/elinks
After:
text data bss dec hex filename
31558 28528 3311 63397 f7a5 src/intl/charsets.o
492878 85568 82112 660558 a144e src/elinks
So the text section shrank by 7390 bytes.
Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls
--disable-cookies --disable-formhist --disable-globhist
--disable-mailcap --disable-mimetypes --disable-smb --disable-mouse
--disable-sysmouse --disable-leds --disable-marks --disable-css
--enable-small --enable-utf-8 --without-gpm --without-bzlib
--without-idn --without-spidermonkey --without-lua --without-gnutls
--without-openssl CFLAGS="-Os -ggdb -Wall"
Before:
text data bss dec hex filename
25726 62992 3343 92061 1679d src/intl/charsets.o
653856 120020 82144 856020 d0fd4 src/elinks
After:
text data bss dec hex filename
60190 28528 3311 92029 1677d src/intl/charsets.o
688320 85556 82112 855988 d0fb4 src/elinks
So 34464 bytes were moved from the data section to the text section
and should be more likely to get shared between ELinks processes.
Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls
--disable-cookies --disable-formhist --disable-globhist
--disable-mailcap --disable-mimetypes --disable-smb --disable-mouse
--disable-sysmouse --disable-leds --disable-marks --disable-css
--enable-small --enable-utf-8 --without-gpm --without-bzlib
--without-idn --without-spidermonkey --without-lua --without-gnutls
--without-openssl CFLAGS="-O2 -ggdb -Wall"
UCS_NO_CHAR here means the input was too short. Because the strings
generally come from the source code or from PO files, they should not
end in the middle of a character. However, the whole character may be
missing if the string is empty. So select_button_by_key() now checks
for that case separately.
UCS_NO_CHAR must not be passed to unicode_fold_label_case() because
the result is undefined.
The configure script no longer recognizes "CONFIG_UTF_8=yes" lines
in custom features.conf files. They will have to be changed to
"CONFIG_UTF8=yes". This incompatibility was deemed acceptable
because no released version of ELinks supports CONFIG_UTF_8.
The --enable-utf-8 option was not renamed.
Suggested by Miciah on #elinks.
What was renamed:
add_utf_8 => add_utf8
cp2utf_8 => cp2utf8
encode_utf_8 => encode_utf8
get_translation_table_to_utf_8 => get_translation_table_to_utf8
goto invalid_utf_8_start_byte => goto invalid_utf8_start_byte
goto utf_8 => goto utf8
goto utf_8_select => goto utf8_select
terminal_interlink.utf_8 => terminal_interlink.utf8
utf_8_to_unicode => utf8_to_unicode
What was not renamed:
terminal._template_.utf_8_io option, TERM_OPT_UTF_8_IO
Compatibility with existing elinks.conf files would require an alias.
--enable-utf-8
Because the name of the charset is UTF-8, --enable-utf-8 looks better
than --enable-utf8.
CONFIG_UTF_8
Will be renamed in a later commit.
Unicode/utf_8.cp, table_utf_8, aliases_utf_8
Will be renamed in a later commit.
Previously, the window sizes computed for xterm were a few pixels off,
and this could result in too few character cells being displayed.
This new version tries to read the window size increment from the
WM_NORMAL_HINTS property set by xterm, and base the computations on
that.
Previously, utf8_step_forward() and utf8_step_backward() left *count
unchanged if some parameter was invalid. Now, they properly store 0.
This flaw had no effect in practice, because all current callers pass
count=NULL, and invalid parameters shouldn't be used anyway.
form_state.state_cell is no longer used for FC_TEXT, FC_PASSWORD, nor FC_FILE.
Instead, get_link_cursor_offset() computes the cell with utf8_ptr2chars
(a new function) or utf8_ptr2cells. This shouldn't slow down ELinks too
much, as it's done only for the selected link and only once per redraw.
The left side of a scrolled input field is always aligned at a
character boundary. The right side might not be.
The new comments describe how the members were apparently intended to
be used. However, the implementation does not actually work when
CONFIG_UTF_8 is defined, and the current semantics do not even allow
an efficient implementation of long (mostly scrolled out) strings.
Surrogates are now treated the same way as out-of-range characters
like U+110000; if a link has such an access key, then the ECMAScript
accessKey property cannot be read. It seems currently impossible to
set such an access key though, because accesskey_string_to_unicode()
doesn't support multibyte characters yet.
Reported by Jonas Fonseca.
Also add an empty line above the label in init_tab; but there are
still several labels elsewhere that don't have empty lines above them.
The previous scheme incorrectly accepted 0xC1 0x80 as U+0040.
That could have been fixed by tweaking the loop, but the constant
array is surely easier to verify.
- Include arpa/inet.h to get hton* ntoh* functions.
- Use socklen_t instead of int.
- Try to define PF_INET to AF_INET if it doesn't exist.
Reported-by: Andy Tanenbaum <ast@cs.vu.nl>
In the previous version, invalid UTF-8 from a terminal caused
UCS_NO_CHAR (0xFFFFFFFD) to be stored in a term_event_key_T, resulting
in -3 which was then incidentally treated as an unassigned special key.
Now, invalid UTF-8 is instead mapped to UCS_REPLACEMENT_CHARACTER
and treated as a character. The fact that handle_interlink_event
calls term_send_ucs when it receives invalid UTF-8 makes it pretty
clear that this is how it was intended.
src/viewer/text/link.c (not changed in this commit) already referred
to UCS_REPLACEMENT_CHARACTER in a comment even though it was not
previously defined.
The message appears when the user has selected e.g. "Main mapping"
rather than an action inside it. Because the main mapping is a keymap,
ELinks must not tell the user to select a keymap.
whether we ought to add the conn->progress->start to
the conn->est_length. Currently displaying resuming works correctly with
ftp.task.gda.pl and ftp.pld-linux.org.
Restring the spaghetti in dump_to_file to fix a bug that was introduced
in commit 2a6125e3d0 whereby when
document.dump.codepage != "utf-8", the document itself was not output,
only the references list.
Decrement term->current_tab before calling delete_window() instead of after
deleting all backgrounded tabs, so get_tab_by_number() will see a
consistent value.
It cannot be restricted just to characters that have passed
check_kbd_label_key(), because hotkeys in strings received from
gettext must also be processed with it, and there we don't have
a struct term_event for check_kbd_label_key().
This causes the documented-slow cp2u() to be called in a loop, which
fortunately doesn't have very many iterations. If this is too slow,
then cp2u() can be rewritten, or the hotkeys can be cached in struct
widget or struct widget_data.
Note that check_kbd_label_key() does not yet allow non-ASCII
characters when CONFIG_UTF_8 is defined. Before they are allowed,
menu.c should also be updated.
According to Jonas Fonseca, if init_string(&canonical) fails, then it
anyway sets canonical.source = NULL and makes done_string(&canonical)
safe, even if canonical was previously uninitialized.
Actions can now be bound to e.g. Ctrl-Alt-A. The keybinding code also
supports other combinations of modifiers, like Shift-Ctrl-Up, but the
escape sequence decoder doesn't yet.
Don't let Ctrl-Alt-letter combinations open menus.
The cast is not necessary since we already check the bounds, but by using a
cast here, it hopefully makes it more obvious what the long comment above
is pointing out: namely that we put the value of a signed integer into an
unsigned char.
This fixes a bug: in the previous version, l_bind_key() modified the
buffer whose address lua_tostring() returned, even though that is not
allowed according to Lua documentation <http://www.lua.org/pil/24.2.2.html>.
The change affects the user interface: previously, if the user typed
"ctrl+cokebottle" in the "Add keybinding" dialog box, ELinks would
change the text in the widget to "Ctrl-cokebottle" before complaining
that the keystroke is invalid. Now, it leaves the widget unchanged.
This commit does not yet add const to parameters of parse_keystroke()
and related functions.
Before really_add_keybinding() is called, check_keystroke() calls
parse_keystroke(), which converts the modifier prefix to canonical
form: for example, "alt+f9" becomes "Alt-f9". This commit makes
really_add_keybinding() normally ignore that string and generate a
brand new one, e.g. "Alt-F9" (note the upper-case F), for its
"Keystroke already used" warning. Likewise, " " turns to "Space".
After this commit, it should be possible to change parse_keystroke()
to never write back into its input string.
If really_add_keybinding() cannot generate the string for some reason
(out of memory?), then it will use whatever parse_keystroke() has left
in the buffer. The alternatives would be to omit the keystroke name
from the warning or to reject the keybinding entirely; it isn't clear
what the best solution is here, but the one I implemented is closest
to the previous behaviour.
This requires compiling cp2u() in even without CONFIG_UTF_8.
I also added an is_kbd_character macro to make try_document_key
more resilient to changes in the definition of term_event_key_T.
The previous version used only the low 8 bits of the key code.
This one arranges for the whole key to be rejected if it's not ASCII.
Perhaps the modifiers should be checked too, but I'm not changing that now.
In revision 1.15 of dns.c (as it was called way back then), pasky
backported a fix from Links 0.97pre2 to try gethostbyaddr before
trying gethostbyname for DNS lookups:
MacOS address resolution fix (Aldy Hernandez) (from 0.97pre2)
However, that fix introduced a bug, because it was calling gethostbyaddr
on all addresses, not just IP addresses. Mikulas fixed that bug in Links
0.98:
Do not call gethostbyaddr when name is not ip address (it should avoid
some useless nameserver queries)'
This fix was never backported to ELinks. Until today.
This commit is functionally the same as the fix in Links 0.98, plus it uses
inet_aton for great correctness!
This fixes a bug reported in #elinks by tnks, whereby lookups for
yubnub.org resulted in 121.117.98.110 == 0x7975626E == 'y', 'u', 'b', 'n'.
I believe that it also fixes bug 691 (which is already closed with a
workaround).
To reproduce before this patch:
- Run ELinks with an 80x25 terminal.
- Set document.browse.forms.confirm_submit = 1.
- Go to <http://bugzilla.elinks.cz/query.cgi>.
- Click the [ Search ] submit button.
- ELinks asks "Do you want to post form data to URL".
Each line of the URL begins at the horizontal center of the dialog,
and bleeds outside the right border of the dialog. Also, the
[ Yes ] and [ No ] buttons appear to float below the dialog.
Previously, ELinks used to silently discard the Alt modifier from
Alt- keystrokes when UTF-8 I/O was enabled. Now, separate actions
can be bound to and Alt-.
However, if CONFIG_UTF_8 is defined, then actions cannot be bound to
non-ASCII characters, regardless of modifiers. This is because the
code that handles names of keystrokes assumes a character can only be
a single byte. This commit does not change that.
Form fields and BFU text-input widgets then convert from UCS-4 to UTF-8.
If not all UTF-8 bytes fit, they don't insert anything. Thus it is no
longer possible to get invalid UTF-8 by hitting the length limit.
It is unclear to me which charset is supposed to be used for strings
in internal buffers. I made BFU insert UTF-8 whenever CONFIG_UTF_8,
but form fields use the charset of the terminal; that may have to be
changed.
As a side effect, this change should solve bug 782, because
term_send_ucs no longer encodes in UTF-8 if CONFIG_UTF_8 is defined.
I think the UTF-8 and codepage encoding calls I added are safe, too.
A similar bug may still surface somewhere else, but 782 could be
closed for now.
This change also lays the foundation for binding actions to non-ASCII
keys, but the keystroke name parser doesn't yet support that.
The CONFIG_UTF_8 mode does not currently support non-ASCII characters
in hot keys, either.
There is no need to check whether ev->ev == EVENT_KBD;
if decode_terminal_escape_sequence called
decode_terminal_mouse_escape_sequence, then the former neither modified
kbd.key nor passed &kbd to the latter, so kbd.key remains KBD_UNDEF.
If ev->ev was not checked, then it should not be trusted either.
So reinitialize the whole *ev if a keyboard event was indeed found.
For instance, if Ctrl-F1 were pressed and src/terminal/kbd.c supported it,
then toupper(KBD_F1) would be called, resulting in undefined behaviour.
src/terminal/kbd.c does not support such combinations yet, but it is
safest to fix the bug already.
... since the latter is for printing int64_T and we don't check for that and
we use PRId64 only for printing values having the off_t types.
Besides off_t has it's own ELinks specific defaults so it should be safer
to use an internal format string. If off_t is 8 bytes use "lld" else use
"ld".
Reported-by: Andy Tanenbaum <ast@cs.vu.nl>
Drop some code for superscript and subscript handling that was deleted
in commit 65016cdca4, then added back
with the UTF-8 merge in commit 2a6125e3d0,
and then disabled in commit 1b653b9765.
Also list the capnames with which the escape sequences could be
read from Terminfo, and the ECMA-48 interpretations of the bytes
(parenthesized if they seem unrelated to the keys). This is in
preparation for fixing bug 96.
decode_terminal_escape_sequence() used to handle both, but
there is now a separate decode_terminal_application_key()
for ESC O. I have not yet edited decode_terminal_escape_sequence();
there may be dead code in it.
If there is e.g. ESC [ in the input buffer, combine that to Alt-[.
Check the first character too; don't blindly assume it is ESC, as
it can be NUL as well. Note this means you can no longer activate
the main menu by pressing Ctrl-@ (or Ctrl-Space on some terminals).
Otherwise, the timeout could cause ELinks to resume reading from
the terminal device while another process is still using it.
This actually happened in a test.
On entry to some functions that could resume reading from the device,
assert that the terminal has not been blocked.
To reproduce the bug before this patch:
Enable CONFIG_UTF_8, UTF-8 I/O, and UTF-8 charset.
Go to www.google.com and type "abc" in the text input field.
Then press Left. The cursor jumps to "a" when it should go to "c".
to encode it using utf_8_to_unicode. If every unicode_val_T value
could be a result of that function then one must add out param
to the utf_8_to_unicode signaling 'true' UCS_NO_CHAR.
GPM is awful, tho', and this is an ugly hack. Why can't it just report
buttons like everyone else?
Another nice thing is that we don't seem to get the wheel events more
frequently than once in a second or so. Therefore it will work properly
only when you scroll slooooowly. :^)
When the user presses enter during a text type-ahead search, simply cancel
the search without additionally following the current link. Link type-ahead
searching still will follow the active link on enter.
using magic ;)
Now in resume mode connection is always interrupted
and resumed. Even when all file is downloaded from beginning
conection will be resumed from old end of file. Feel free to fix it.
In protocol/common.c length of string is known, so pass it
instead of -1 to encode_uri_string.
Introduced encode_win32_uri_string, because there were problems
with : and \ in base href.
Don't call clear_dialog, which sets the focus to the listbox. Neither the
button widget nor the listbox widget has a clear callback, and the only
other thing that clear_dialog does is focus the first widget and redraw, so
call redraw_dialog instead.
Thanks to Kalle Olavi Niemitalo for noticing the issue.
Make move_up and move_down return no value. Instead, save the old y value
and compare it to the new after calling move_up or move_down in
move_page_up or move_page_down, respectively.
This fixes a bug where if given a prefix, if that prefix specified a number
of pages greater than move-page-up actually scrolled, there would be no
screen update, because the last call to move_up would return FRAME_EVENT_OK
which would be returned from move_page_up, even tho move_page_up would have
previously returned FRAME_EVENT_REFRESH.
Don't try the key as an accesskey if a menu was opened, whether it was just
the main menu or whether it was a submenu of the menu menu (we would try
the key as an accesskey in the latter case).
In send_kbd_event, replace the KBD_MOD_ALT modifier when trying the key as
an accesskey rather than when we don't.
Commit 3ce3f01f30 introduced a bug whereby
if a tab set the current position in the line to or greater than the number
of bytes remaining in the source, the line was split after the tab.