When the user tells ELinks to search for a regexp, ELinks 0.11.0
passes the regexp to regcomp() and the formatted document to
regexec(), both in the terminal charset. This works OK for unibyte
ASCII-compatible charsets because the regexp metacharacters are all in
the ASCII range. And ELinks 0.11.0 doesn't support multibyte or
ASCII-incompatible (e.g. EBCDIC) charsets in terminals, so it is no
big deal if regexp searches fail in such locales.
ELinks 0.12pre1 attempts to support UTF-8 as the terminal charset if
CONFIG_UTF8 is defined. Then, struct search contains unicode_val_T c
rather than unsigned char c, and get_srch() and add_srch_chr()
together save UTF-32 values there if the terminal charset is UTF-8.
In plain-text searches, is_in_range_plain() compares those values
directly if the search is case sensitive, or folds them to lower case
if the search is case insensitive: with towlower() if the terminal
charset is UTF-8, or with tolower() otherwise. In regexp searches
however, get_search_region_from_search_nodes() still truncates all
values to 8 bits in order to generate the string that
search_for_pattern() then passes to regexec(). In UTF-8 locales,
regexec() expects this string to be in UTF-8 and can't make sense of
the truncated characters. There is also a possible conflict in
regcomp() if the locale is UTF-8 but the terminal charset is not, or
vice versa.
Rejected ways of fixing the charset mismatches:
* When the terminal charset is UTF-8, recode the formatted document
from UTF-32 to UTF-8 for regexp searching. This would work if the
terminal and the locale both use UTF-8, or if both use unibyte
ASCII-compatible charsets, but not if only one of them uses UTF-8.
* Convert both the regexp and the formatted document to the charset of
the locale, as that is what regcomp() and regexec() expect. ELinks
would have to somehow keep track of which bytes in the converted
string correspond to which characters in the document; not entirely
trivial because convert_string() can replace a single unconvertible
character with a string of ASCII characters. If ELinks were
eventually changed to use iconv() for unrecognized charsets, such
tracking would become even harder.
* Temporarily switch to a locale that uses the charset of the
terminal. Unfortunately, it seems there is no portable way to
construct a name for such a locale. It is also possible that no
suitable locale is available; especially on Windows, whose C library
defines MB_LEN_MAX as 2 and thus cannot support UTF-8 locales.
Instead, this commit makes ELinks do the regexp matching with regwcomp
and regwexec from the TRE library. This way, ELinks can losslessly
recode both the pattern and the document to Unicode and rely on the
regexp code in TRE decoding them properly, regardless of locale.
There are some possible problems though:
1. ELinks stores strings as UTF-32 in arrays of unicode_val_T, but TRE
uses wchar_t instead. If wchar_t is UTF-16, as it is on Microsoft
Windows, then TRE will misdecode the strings. It wouldn't be too
hard to make ELinks convert to UTF-16 in this case, but (a) TRE
doesn't currently support UTF-16 either, and it seems possible that
wchar_t-independent UTF-32 interfaces will be added to TRE; and (b)
there seems to be little interest on using ELinks on Windows anyway.
2. The Citrus Project apparently wanted BSD to use a locale-dependent
wchar_t: e.g. UTF-32 in some locales and an ISO 2022 derivative in
others. Regexp searches in ELinks now do not support the latter.
[ Adapted to elinks-0.12 from bug 1060 attachment 506.
Commit message by me. --KON ]
When ELinks runs in an X11 terminal emulator (e.g. xterm), or in GNU
Screen, it tries to update the title of the window to match the title
of the current document. To do this, ELinks sends an "OSC 1 ; Pt BEL"
sequence to the terminal. Unfortunately, xterm expects the Pt string
to be in the ISO-8859-1 charset, making it impossible to display e.g.
Cyrillic characters. In xterm patch #210 (2006-03-12) however, there
is a menu item and a resource that can make xterm take the Pt string
in UTF-8 instead, allowing characters from all around the world.
The downside is that ELinks apparently cannot ask xterm whether the
setting is on or off; so add a terminal._template_.latin1_title option
to ELinks and let the user edit that instead.
Complete list of changes:
- Add the terminal._template_.latin1_title option. But do not add
that to the terminal options window because it's already rather
crowded there.
- In set_window_title(), take a new codepage argument. Use it to
decode the title into Unicode characters, and remove only actual
control characters. For example, CP437 has graphical characters in
the 0x80...0x9F range, so don't remove those, even though ISO-8859-1
has control characters in the same range. Likewise, don't
misinterpret single bytes of UTF-8 characters as control characters.
- In set_window_title(), do not truncate the title to the width of the
window. The font is likely to be different and proportional anyway.
But do truncate before 1024 bytes, an xterm limit.
- In struct itrm, add a title_codepage member to remember which
charset the master said it was going to use in the terminal window
title. Initialize title_codepage in handle_trm(), update it in
dispatch_special() if the master sends the new request
TERM_FN_TITLE_CODEPAGE, and use it in most set_window_title() calls;
but not in the one that sets $TERM as the title, because that string
was not received from the master and should consist of ASCII
characters only.
- In set_terminal_title(), convert the caller-provided title to
ISO-8859-1 or UTF-8 if appropriate, and report the codepage to the
slave with the new TERM_FN_TITLE_CODEPAGE request. The conversion
can run out of memory, so return a success/error flag, rather than
void. In display_window_title(), check this result and don't update
caches on error.
- Add a NEWS entry for all of this.
src/config/kbdbind.c (parse_keystroke): If the user types "Ctrl-i",
it should mean "Ctrl-I" rather than "Ctrl-İ", because the Ctrl-
combinations are only well known for ASCII characters. This does not
matter in practice though, because src/terminal/kbd.c converts 0x09
to (KBD_MOD_NONE, KBD_TAB) and not to (KBD_MOD_CTRL, 'I').
src/osdep/beos/beos.c (get_system_env): Changing the locale does not
affect the TERM environment variable, I think, so it should not affect
the interpretation either.
Conflicts:
NEWS (bug 939 was listed twice)
doc/man/man5/elinks.conf.5 (regenerated)
po/fr.po (only in comments and such)
po/pl.po (only in comments and such)
src/protocol/fsp/fsp.c (the relevant changes were already here)
elinks --config-help used to sort options like this:
document.history
document.history.global
document.history.global.enable
document.history.global.max_items
document.history.global.display_type
document.history.keep_unhistory
Now it'll instead be:
document.history
document.history.keep_unhistory
document.history.global
document.history.global.enable
document.history.global.max_items
document.history.global.display_type
i.e. all the options listed under a subheading are children of the
tree named by it. This makes elinks.conf(5) look saner.
If a newline has a backslash in front of it, then str_rd replaces it
with a space. However, the newline was in the original config file,
so the line number must still be incremented.
Previously, it only pretended to rewrite the configuration file, so it
set or cleared OPT_MUST_SAVE but never changed or output any options.
Now, it actually sets the options when ELinks is loading the
configuration file. Also, when ELinks is rewriting the configuration
file, it now compares the values in the included file to the current
values of the options, and sets or clears OPT_MUST_SAVE accordingly.
So, if elinks.conf contains a "set" command for an alias and ELinks
updates that, it now knows it doesn't have to append another "set"
command for the underlying option.
So if ELinks is rewriting a configuration file that contains a "set"
command for a negated alias, then it properly writes the value of the
alias, rather than the value of the underlying option.
That is, let the setter function of the underlying option store the
negated value. Previously, redir_set used to tweak the value of the
option after it has already called the underlying setter.
Also, replace OPT_WATERMARK with OPT_MUST_SAVE, which has the opposite
meaning.
Watermarking of aliases does not yet work correctly in this version.
Neither does the "include" command.
Previously, they were reset by smart_config_string(), which was not
called if the value of the option was saved by rewriting an existing
command in elinks.conf. Also, it is better to reset the flags only
after the file operations have actually succeeded.
Previously, ELinks set the OPT_WATERMARK flag in all deleted options
when config.saving_style was 2, thus mostly preventing them from being
saved. This had the unfortunate consequence that if you started with
no elinks.conf, set config.saving_style = 2, deleted some built-in
option (e.g. a URL rewriting rule), saved the settings, and restarted
ELinks, then the built-in option would reappear.
get_keymap_id returns -1 when it can't find the keymap. Because the return
type of get_keymap_id is enum keymap_id and enum keymap_id did not have any
explicit values defined, it could be unsigned, which meant that when
get_keymap_id returned -1, it was really returning a huge positive number.
This meant that when callers checker whether the return value was negative,
they were essentially performing no check at all, so they might give
get_keymap_id an invalid keymap name, get back an invalid keymap_id, and
use that invalid keymap_id.
This commit adds KEYMAP_INVALID = -1 to enum keymap_id and makes all
functions that deal with the enumeration use that symbol.
get_keymap_id returns -1 when it can't find the keymap. Because the return
type of get_keymap_id is enum keymap_id and enum keymap_id did not have any
explicit values defined, it could be unsigned, which meant that when
get_keymap_id returned -1, it was really returning a huge positive number.
This meant that when callers checker whether the return value was negative,
they were essentially performing no check at all, so they might give
get_keymap_id an invalid keymap name, get back an invalid keymap_id, and
use that invalid keymap_id.
This commit adds KEYMAP_INVALID = -1 to enum keymap_id and makes all
functions that deal with the enumeration use that symbol.
move-link-down-line moves the cursor down to the line with a link.
move-link-up-line moves the cursor up to the line with a link.
move-link-prev-line moves to the previous link horizontally.
move-link-next-line moves to the next link horizontally.
(cherry picked from commit 8259a56e99)
(cherry picked from commit 2eb3532416)
move-link-down-line moves the cursor down to the line with a link.
move-link-up-line moves the cursor up to the line with a link.
move-link-prev-line moves to the previous link horizontally.
move-link-next-line moves to the next link horizontally.
(cherry picked from commit 8259a56e99)
Note that this is the infrastructure, but all relevant get_opt_* calls must be changed to pass the session so that the domain-specific options are looked up.
Add @want_domain parameter to parse_set_common and read in the domain-name if the flag is set.
Add parse_set_domain wrapper for parse_set_common.
Add "set_domain" configuration directive with the following syntax: set_domain domain option = value
Modify create_config_string and smart_config_output_fn to spit out domain-specific option trees.
Add @smart_config_output_fn_domain global variable to facilitate this.
Define structure domain_tree and define list @domain_trees.
Add routine get_domain_tree to find or, if necessary, create the shadow tree for the given domain name.
Add routine get_domain_option search for an option in all domain shadow-trees and return the option in the best matching domain tree.
Modify get_opt_ to use get_domain_option to check for domain-specific options.
Add clean-up routine done_domain_trees, called on exit, to free any domain trees.
change_hook_active_link: pass update_cache_document_options @ses.
Now when changing the global settings, it will not simply copy the new values for the global settings to the document-options cache, but also check session-specific settings. This doesn't really matter yet, since the options in question can't be set on a per-session basis, but is in preparation for future changes.
The parsing in parse_set and parse_unset saves, overwrites with a NUL, and restores a character in the string that is being parsed. If there is a malloc failure between overwriting and restoring, the restore is not done. This commit changes that behaviour to restore before returning.
Introduce get_option_shadow. This routine takes an option, the tree under which that option resides, and another tree. It returns a corresponding option with parallel ancestry in the second tree, creating that option and ancestry if it does not already exist.
Add enum copy_option_flags.
Add the CO_NO_LISTBOX_ITEM flag, which is used to suppress listbox creation for shadowed trees.
Add the CO_SHALLOW flag, which is used to suppress the duplication of unwanted children for shadowed trees.
Add a flags parameter to copy_option and tree_dup (and out of necessity, struct option_type_info and str_dup).
Rename struct key to struct named_key, use more const, change the num
member from int to term_event_key_T, and put a KBD_UNDEF at the end of
the array (even though it won't be read).
straconcat reads the args with va_arg(ap, const unsigned char *),
and the NULL macro may have the wrong type (e.g. int).
Many places pass string literals of type char * to straconcat. This
is in principle also a violation, but I'm ignoring it for now because
if it becomes a problem with some C implementation, then so will the
use of unsigned char * with printf "%s", which is so widespread in
ELinks that I'm not going to try fixing it now.
Don't cast function pointers; calling functions via pointers of
incorrect types is not guaranteed to work. Instead, define the
functions with the desired types, and make them cast the incoming
parameters. Or define wrapper functions if the return types don't
match.
really_exit_prog wasn't being used outside src/dialogs/menu.c,
and I had to change its parameter type, so it's now static.
Normally, the success msgbox is shown only if the ui.success_msgbox
option is set as 1, and clicking "Do not show anymore" would then
toggle the option to 0, and no more such msgboxes would appear.
However, if there already are two success msgboxes being displayed
(most likely in different terminals), then clicking "Do not show
anymore" in the first of them would reset the option to 0, but doing
the same in the second of them would toggle the option back to 1.
Rename toggle_success_msgbox to disable_success_msgbox, and make it
always reset the option to 0, regardless of the previous value.
In the elinks.conf.5 manual page, the text below the list of modes was
getting included in the last list item. This appears to be a design
error in AsciiDoc. Work around it by moving the text above the list.
The numbering of document.dump.color_mode and terminal._template_.colors
is now the same regardless of compile-time options, unlike in previous
versions. Therefore this version of ELinks may interpret a configuration
file differently from previous versions even if compiled with the same
options. This is unfortunate but the alternatives (keeping the numbering
dependent on configuration options; defining separate options that use
the new numbering; starting the numbers from 10 or so and recognizing the
previous ones only for compatibility) seem even worse.
doc/remote.txt says there must be a nonempty sequence of ASCII
alphabetic characters before the opening parenthesis. Check that
they really are ASCII characters and that the sequence is nonempty.
Thus, elinks -remote '(foo)' now treats the string as an address,
rather than as a command.
Be more strict about the format accepted by the ELinks specific extension
to the -remote URL syntax. That is, commands must begin with a nonempty
sequence of ASCII alphabetic characters followed by optional whitespace and
an opening parenthesis. Also, document the syntax.
Fixes bug 830.
The problem is that if you run elinks in xterm with the default white
background, it will be totally unreadable if transparency is turned
on. We should default to usability in all common environments, eyecandy
lovers can do the extra setup for their specific one.
It also makes the description note that elinks still assumes the
background is black.
Revert commit c9ce4260e5,
which made "elinks -remote http://elinks.cz/" fail with an error
"ELinks: Cannot parse option -remote: Remote method not supported"
even though doc/remote.txt says it should open the URL in a new tab.
Added document.cache.interval option. When time elapsed since previous access
to the document is less than interval then the document is taken from
the cache. Otherwise the request with filled "If-Modified-Since" and/or
"If-None-Match" header field is sent. By default interval is set to 10 minutes.
This requires the correct time to be set on your machine.
The configure script no longer recognizes "CONFIG_UTF_8=yes" lines
in custom features.conf files. They will have to be changed to
"CONFIG_UTF8=yes". This incompatibility was deemed acceptable
because no released version of ELinks supports CONFIG_UTF_8.
The --enable-utf-8 option was not renamed.
The message appears when the user has selected e.g. "Main mapping"
rather than an action inside it. Because the main mapping is a keymap,
ELinks must not tell the user to select a keymap.
According to Jonas Fonseca, if init_string(&canonical) fails, then it
anyway sets canonical.source = NULL and makes done_string(&canonical)
safe, even if canonical was previously uninitialized.
Actions can now be bound to e.g. Ctrl-Alt-A. The keybinding code also
supports other combinations of modifiers, like Shift-Ctrl-Up, but the
escape sequence decoder doesn't yet.
Don't let Ctrl-Alt-letter combinations open menus.
This fixes a bug: in the previous version, l_bind_key() modified the
buffer whose address lua_tostring() returned, even though that is not
allowed according to Lua documentation <http://www.lua.org/pil/24.2.2.html>.
The change affects the user interface: previously, if the user typed
"ctrl+cokebottle" in the "Add keybinding" dialog box, ELinks would
change the text in the widget to "Ctrl-cokebottle" before complaining
that the keystroke is invalid. Now, it leaves the widget unchanged.
This commit does not yet add const to parameters of parse_keystroke()
and related functions.
Before really_add_keybinding() is called, check_keystroke() calls
parse_keystroke(), which converts the modifier prefix to canonical
form: for example, "alt+f9" becomes "Alt-f9". This commit makes
really_add_keybinding() normally ignore that string and generate a
brand new one, e.g. "Alt-F9" (note the upper-case F), for its
"Keystroke already used" warning. Likewise, " " turns to "Space".
After this commit, it should be possible to change parse_keystroke()
to never write back into its input string.
If really_add_keybinding() cannot generate the string for some reason
(out of memory?), then it will use whatever parse_keystroke() has left
in the buffer. The alternatives would be to omit the keystroke name
from the warning or to reject the keybinding entirely; it isn't clear
what the best solution is here, but the one I implemented is closest
to the previous behaviour.