This check used to be in src/elinks.h. Move it to configure.in so
that (1) the result can be logged and (2) ELinks won't even link with
TRE if wchar_t prevents its use.
Also, rename HAVE_TRE_REGEX_H to CONFIG_TRE, to reflect that it is not
always defined if the header exists.
When setting up default values for terminal options, use named
constants like TERM_VT100 or COLOR_MODE_16, rather than plain integers
like 1. This is just to make the source code easier to read and
perhaps more resistant to future bugs. The binary should not change.
Documentation strings of most options used to contain a "\n" at the
end of each source line. When the option manager displayed these
strings, it treated each "\n" as a hard newline. On 80x24 terminals
however, the option description window has only 60 columes available
for the text (with the default setup.h), and the hard newlines were
further apart, so the option manager wrapped the text a second time,
resulting in rather ugly output where long lones are interleaved with
short ones. This could also cause the text to take up too much
vertical space and not fit in the window.
Replace most of those hard newlines with spaces so that the option
manager (or perhaps BFU) will take care of the wrapping. At the same
time, rewrap the strings in source code so that the source lines are
at most 79 columns wide.
In some options though, there is a list of possible values and their
meanings. In those lists, if the description of one value does not
fit in one line, then continuation lines should be indented. The
option manager and BFU are not currently able to do that. So, keep
the hard newlines in those lists, but rewrap them to 60 columns so
that they are less likely to require further wrapping at runtime.
When the user tells ELinks to search for a regexp, ELinks 0.11.0
passes the regexp to regcomp() and the formatted document to
regexec(), both in the terminal charset. This works OK for unibyte
ASCII-compatible charsets because the regexp metacharacters are all in
the ASCII range. And ELinks 0.11.0 doesn't support multibyte or
ASCII-incompatible (e.g. EBCDIC) charsets in terminals, so it is no
big deal if regexp searches fail in such locales.
ELinks 0.12pre1 attempts to support UTF-8 as the terminal charset if
CONFIG_UTF8 is defined. Then, struct search contains unicode_val_T c
rather than unsigned char c, and get_srch() and add_srch_chr()
together save UTF-32 values there if the terminal charset is UTF-8.
In plain-text searches, is_in_range_plain() compares those values
directly if the search is case sensitive, or folds them to lower case
if the search is case insensitive: with towlower() if the terminal
charset is UTF-8, or with tolower() otherwise. In regexp searches
however, get_search_region_from_search_nodes() still truncates all
values to 8 bits in order to generate the string that
search_for_pattern() then passes to regexec(). In UTF-8 locales,
regexec() expects this string to be in UTF-8 and can't make sense of
the truncated characters. There is also a possible conflict in
regcomp() if the locale is UTF-8 but the terminal charset is not, or
vice versa.
Rejected ways of fixing the charset mismatches:
* When the terminal charset is UTF-8, recode the formatted document
from UTF-32 to UTF-8 for regexp searching. This would work if the
terminal and the locale both use UTF-8, or if both use unibyte
ASCII-compatible charsets, but not if only one of them uses UTF-8.
* Convert both the regexp and the formatted document to the charset of
the locale, as that is what regcomp() and regexec() expect. ELinks
would have to somehow keep track of which bytes in the converted
string correspond to which characters in the document; not entirely
trivial because convert_string() can replace a single unconvertible
character with a string of ASCII characters. If ELinks were
eventually changed to use iconv() for unrecognized charsets, such
tracking would become even harder.
* Temporarily switch to a locale that uses the charset of the
terminal. Unfortunately, it seems there is no portable way to
construct a name for such a locale. It is also possible that no
suitable locale is available; especially on Windows, whose C library
defines MB_LEN_MAX as 2 and thus cannot support UTF-8 locales.
Instead, this commit makes ELinks do the regexp matching with regwcomp
and regwexec from the TRE library. This way, ELinks can losslessly
recode both the pattern and the document to Unicode and rely on the
regexp code in TRE decoding them properly, regardless of locale.
There are some possible problems though:
1. ELinks stores strings as UTF-32 in arrays of unicode_val_T, but TRE
uses wchar_t instead. If wchar_t is UTF-16, as it is on Microsoft
Windows, then TRE will misdecode the strings. It wouldn't be too
hard to make ELinks convert to UTF-16 in this case, but (a) TRE
doesn't currently support UTF-16 either, and it seems possible that
wchar_t-independent UTF-32 interfaces will be added to TRE; and (b)
there seems to be little interest on using ELinks on Windows anyway.
2. The Citrus Project apparently wanted BSD to use a locale-dependent
wchar_t: e.g. UTF-32 in some locales and an ISO 2022 derivative in
others. Regexp searches in ELinks now do not support the latter.
[ Adapted to elinks-0.12 from bug 1060 attachment 506.
Commit message by me. --KON ]
When ELinks runs in an X11 terminal emulator (e.g. xterm), or in GNU
Screen, it tries to update the title of the window to match the title
of the current document. To do this, ELinks sends an "OSC 1 ; Pt BEL"
sequence to the terminal. Unfortunately, xterm expects the Pt string
to be in the ISO-8859-1 charset, making it impossible to display e.g.
Cyrillic characters. In xterm patch #210 (2006-03-12) however, there
is a menu item and a resource that can make xterm take the Pt string
in UTF-8 instead, allowing characters from all around the world.
The downside is that ELinks apparently cannot ask xterm whether the
setting is on or off; so add a terminal._template_.latin1_title option
to ELinks and let the user edit that instead.
Complete list of changes:
- Add the terminal._template_.latin1_title option. But do not add
that to the terminal options window because it's already rather
crowded there.
- In set_window_title(), take a new codepage argument. Use it to
decode the title into Unicode characters, and remove only actual
control characters. For example, CP437 has graphical characters in
the 0x80...0x9F range, so don't remove those, even though ISO-8859-1
has control characters in the same range. Likewise, don't
misinterpret single bytes of UTF-8 characters as control characters.
- In set_window_title(), do not truncate the title to the width of the
window. The font is likely to be different and proportional anyway.
But do truncate before 1024 bytes, an xterm limit.
- In struct itrm, add a title_codepage member to remember which
charset the master said it was going to use in the terminal window
title. Initialize title_codepage in handle_trm(), update it in
dispatch_special() if the master sends the new request
TERM_FN_TITLE_CODEPAGE, and use it in most set_window_title() calls;
but not in the one that sets $TERM as the title, because that string
was not received from the master and should consist of ASCII
characters only.
- In set_terminal_title(), convert the caller-provided title to
ISO-8859-1 or UTF-8 if appropriate, and report the codepage to the
slave with the new TERM_FN_TITLE_CODEPAGE request. The conversion
can run out of memory, so return a success/error flag, rather than
void. In display_window_title(), check this result and don't update
caches on error.
- Add a NEWS entry for all of this.
src/config/kbdbind.c (parse_keystroke): If the user types "Ctrl-i",
it should mean "Ctrl-I" rather than "Ctrl-İ", because the Ctrl-
combinations are only well known for ASCII characters. This does not
matter in practice though, because src/terminal/kbd.c converts 0x09
to (KBD_MOD_NONE, KBD_TAB) and not to (KBD_MOD_CTRL, 'I').
src/osdep/beos/beos.c (get_system_env): Changing the locale does not
affect the TERM environment variable, I think, so it should not affect
the interpretation either.
elinks --config-help used to sort options like this:
document.history
document.history.global
document.history.global.enable
document.history.global.max_items
document.history.global.display_type
document.history.keep_unhistory
Now it'll instead be:
document.history
document.history.keep_unhistory
document.history.global
document.history.global.enable
document.history.global.max_items
document.history.global.display_type
i.e. all the options listed under a subheading are children of the
tree named by it. This makes elinks.conf(5) look saner.
If a newline has a backslash in front of it, then str_rd replaces it
with a space. However, the newline was in the original config file,
so the line number must still be incremented.
Previously, it only pretended to rewrite the configuration file, so it
set or cleared OPT_MUST_SAVE but never changed or output any options.
Now, it actually sets the options when ELinks is loading the
configuration file. Also, when ELinks is rewriting the configuration
file, it now compares the values in the included file to the current
values of the options, and sets or clears OPT_MUST_SAVE accordingly.
So, if elinks.conf contains a "set" command for an alias and ELinks
updates that, it now knows it doesn't have to append another "set"
command for the underlying option.
So if ELinks is rewriting a configuration file that contains a "set"
command for a negated alias, then it properly writes the value of the
alias, rather than the value of the underlying option.
That is, let the setter function of the underlying option store the
negated value. Previously, redir_set used to tweak the value of the
option after it has already called the underlying setter.
Also, replace OPT_WATERMARK with OPT_MUST_SAVE, which has the opposite
meaning.
Watermarking of aliases does not yet work correctly in this version.
Neither does the "include" command.
Previously, they were reset by smart_config_string(), which was not
called if the value of the option was saved by rewriting an existing
command in elinks.conf. Also, it is better to reset the flags only
after the file operations have actually succeeded.
Previously, ELinks set the OPT_WATERMARK flag in all deleted options
when config.saving_style was 2, thus mostly preventing them from being
saved. This had the unfortunate consequence that if you started with
no elinks.conf, set config.saving_style = 2, deleted some built-in
option (e.g. a URL rewriting rule), saved the settings, and restarted
ELinks, then the built-in option would reappear.
get_keymap_id returns -1 when it can't find the keymap. Because the return
type of get_keymap_id is enum keymap_id and enum keymap_id did not have any
explicit values defined, it could be unsigned, which meant that when
get_keymap_id returned -1, it was really returning a huge positive number.
This meant that when callers checker whether the return value was negative,
they were essentially performing no check at all, so they might give
get_keymap_id an invalid keymap name, get back an invalid keymap_id, and
use that invalid keymap_id.
This commit adds KEYMAP_INVALID = -1 to enum keymap_id and makes all
functions that deal with the enumeration use that symbol.
move-link-down-line moves the cursor down to the line with a link.
move-link-up-line moves the cursor up to the line with a link.
move-link-prev-line moves to the previous link horizontally.
move-link-next-line moves to the next link horizontally.
(cherry picked from commit 8259a56e99)
(cherry picked from commit 2eb3532416)
Rename struct key to struct named_key, use more const, change the num
member from int to term_event_key_T, and put a KBD_UNDEF at the end of
the array (even though it won't be read).