The current rules are:
term.utf8
CONFIG_UTF8 UTF-8 I/O widget_data.cdata
----------- --------- ------------------
undefined disabled charset of the terminal
undefined enabled charset of the terminal
defined disabled charset of the terminal (*)
defined enabled always UTF-8
(*) kbd_field was incorrectly assuming UTF-8 in this case.
This reverts the following commits:
- 86ed79deaf
Use wcwidth if available and applicable.
- 304f5fa1ea
comment fix (__STDC_ISO_10646__, not __STDC_ISO_10646)
- part of 71eebf1cc7
Compensate for glibc not defining wcwidth() when _XOPEN_SOURCE is not set
And adds a lengthy comment about LC_CTYPE problems.
Explicitly compare the value that is returned by the widget handler
against EVENT_NOT_PROCESSED rather than relying on the fact that
EVENT_NOT_PROCESSED is equal to 1.
ffeedbdc5045a6a5db2bc75ecaab56bfe46c80ea
ELinks currently fails this test. Also, it does not support all the
DOM features used here. I don't know whether the scripts should be
simplified or ELinks should be enhanced to support them.
The secure file saving code plays some shenanigans with the umask.
Previously, the code could fail to restore the old umask when certain libc
calls failed: malloc, mkstemp, fdopen, and fopen. This resulted in
unrelated code creating files with the wrong umode. Specifically, the
download code's automatic directory creation was creating directories
without the execute permission bit.
Thanks to Quiznos for reporting and helping to track the problem down.
To reproduce:
- Start ELinks.
- Enable the ui.tabs.wraparound option.
- Press t to open a second tab.
- Go to http://elinks.cz/ in the second tab.
- Press 3< to step three tabs to the left.
In the statement "tab = tabs + tab % tabs;", tab == -2 and tabs == 2.
So tab % tabs == 0 and tab becomes 2, which is out of range.
The new version calls get_opt_bool even if the tab parameter is already in
range, but the cost should be negligible compared to the redraw_terminal()
call that follows.
Previously, each mapping between a codepage byte and a Unicode
character was stored as a struct table_entry, which listed both the
byte and the character. This representation may be optimal for sparse
mappings, but codepages map almost every possible byte to a character,
so it is more efficient to just have an array that lists the Unicode
character corresponding to each byte from 0x80 to 0xFF. The bytes are
not stored but rather implied by the array index. The tcvn5712 and
viscii codepages have a total of four mappings that do not fit in the
arrays, so we still use struct table_entry for those.
This change also makes cp2u() operate in O(1) time and may speed up
other functions as well.
The "sed | while read" concoction in Unicode/gen-cp looks rather
unhealthy. It would probably be faster and more readable if rewritten
in Perl, but IMO that goes for the previous version as well, so I
suppose whoever wrote it had a reason not to use Perl here.
Before:
text data bss dec hex filename
38948 28528 3311 70787 11483 src/intl/charsets.o
500096 85568 82112 667776 a3080 src/elinks
After:
text data bss dec hex filename
31558 28528 3311 63397 f7a5 src/intl/charsets.o
492878 85568 82112 660558 a144e src/elinks
So the text section shrank by 7390 bytes.
Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls
--disable-cookies --disable-formhist --disable-globhist
--disable-mailcap --disable-mimetypes --disable-smb --disable-mouse
--disable-sysmouse --disable-leds --disable-marks --disable-css
--enable-small --enable-utf-8 --without-gpm --without-bzlib
--without-idn --without-spidermonkey --without-lua --without-gnutls
--without-openssl CFLAGS="-Os -ggdb -Wall"
Before:
text data bss dec hex filename
25726 62992 3343 92061 1679d src/intl/charsets.o
653856 120020 82144 856020 d0fd4 src/elinks
After:
text data bss dec hex filename
60190 28528 3311 92029 1677d src/intl/charsets.o
688320 85556 82112 855988 d0fb4 src/elinks
So 34464 bytes were moved from the data section to the text section
and should be more likely to get shared between ELinks processes.
Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls
--disable-cookies --disable-formhist --disable-globhist
--disable-mailcap --disable-mimetypes --disable-smb --disable-mouse
--disable-sysmouse --disable-leds --disable-marks --disable-css
--enable-small --enable-utf-8 --without-gpm --without-bzlib
--without-idn --without-spidermonkey --without-lua --without-gnutls
--without-openssl CFLAGS="-O2 -ggdb -Wall"
UCS_NO_CHAR here means the input was too short. Because the strings
generally come from the source code or from PO files, they should not
end in the middle of a character. However, the whole character may be
missing if the string is empty. So select_button_by_key() now checks
for that case separately.
UCS_NO_CHAR must not be passed to unicode_fold_label_case() because
the result is undefined.