This check used to be in src/elinks.h. Move it to configure.in so
that (1) the result can be logged and (2) ELinks won't even link with
TRE if wchar_t prevents its use.
Also, rename HAVE_TRE_REGEX_H to CONFIG_TRE, to reflect that it is not
always defined if the header exists.
Documentation strings of most options used to contain a "\n" at the
end of each source line. When the option manager displayed these
strings, it treated each "\n" as a hard newline. On 80x24 terminals
however, the option description window has only 60 columes available
for the text (with the default setup.h), and the hard newlines were
further apart, so the option manager wrapped the text a second time,
resulting in rather ugly output where long lones are interleaved with
short ones. This could also cause the text to take up too much
vertical space and not fit in the window.
Replace most of those hard newlines with spaces so that the option
manager (or perhaps BFU) will take care of the wrapping. At the same
time, rewrap the strings in source code so that the source lines are
at most 79 columns wide.
In some options though, there is a list of possible values and their
meanings. In those lists, if the description of one value does not
fit in one line, then continuation lines should be indented. The
option manager and BFU are not currently able to do that. So, keep
the hard newlines in those lists, but rewrap them to 60 columns so
that they are less likely to require further wrapping at runtime.
When the user tells ELinks to search for a regexp, ELinks 0.11.0
passes the regexp to regcomp() and the formatted document to
regexec(), both in the terminal charset. This works OK for unibyte
ASCII-compatible charsets because the regexp metacharacters are all in
the ASCII range. And ELinks 0.11.0 doesn't support multibyte or
ASCII-incompatible (e.g. EBCDIC) charsets in terminals, so it is no
big deal if regexp searches fail in such locales.
ELinks 0.12pre1 attempts to support UTF-8 as the terminal charset if
CONFIG_UTF8 is defined. Then, struct search contains unicode_val_T c
rather than unsigned char c, and get_srch() and add_srch_chr()
together save UTF-32 values there if the terminal charset is UTF-8.
In plain-text searches, is_in_range_plain() compares those values
directly if the search is case sensitive, or folds them to lower case
if the search is case insensitive: with towlower() if the terminal
charset is UTF-8, or with tolower() otherwise. In regexp searches
however, get_search_region_from_search_nodes() still truncates all
values to 8 bits in order to generate the string that
search_for_pattern() then passes to regexec(). In UTF-8 locales,
regexec() expects this string to be in UTF-8 and can't make sense of
the truncated characters. There is also a possible conflict in
regcomp() if the locale is UTF-8 but the terminal charset is not, or
vice versa.
Rejected ways of fixing the charset mismatches:
* When the terminal charset is UTF-8, recode the formatted document
from UTF-32 to UTF-8 for regexp searching. This would work if the
terminal and the locale both use UTF-8, or if both use unibyte
ASCII-compatible charsets, but not if only one of them uses UTF-8.
* Convert both the regexp and the formatted document to the charset of
the locale, as that is what regcomp() and regexec() expect. ELinks
would have to somehow keep track of which bytes in the converted
string correspond to which characters in the document; not entirely
trivial because convert_string() can replace a single unconvertible
character with a string of ASCII characters. If ELinks were
eventually changed to use iconv() for unrecognized charsets, such
tracking would become even harder.
* Temporarily switch to a locale that uses the charset of the
terminal. Unfortunately, it seems there is no portable way to
construct a name for such a locale. It is also possible that no
suitable locale is available; especially on Windows, whose C library
defines MB_LEN_MAX as 2 and thus cannot support UTF-8 locales.
Instead, this commit makes ELinks do the regexp matching with regwcomp
and regwexec from the TRE library. This way, ELinks can losslessly
recode both the pattern and the document to Unicode and rely on the
regexp code in TRE decoding them properly, regardless of locale.
There are some possible problems though:
1. ELinks stores strings as UTF-32 in arrays of unicode_val_T, but TRE
uses wchar_t instead. If wchar_t is UTF-16, as it is on Microsoft
Windows, then TRE will misdecode the strings. It wouldn't be too
hard to make ELinks convert to UTF-16 in this case, but (a) TRE
doesn't currently support UTF-16 either, and it seems possible that
wchar_t-independent UTF-32 interfaces will be added to TRE; and (b)
there seems to be little interest on using ELinks on Windows anyway.
2. The Citrus Project apparently wanted BSD to use a locale-dependent
wchar_t: e.g. UTF-32 in some locales and an ISO 2022 derivative in
others. Regexp searches in ELinks now do not support the latter.
[ Adapted to elinks-0.12 from bug 1060 attachment 506.
Commit message by me. --KON ]
When ELinks runs in an X11 terminal emulator (e.g. xterm), or in GNU
Screen, it tries to update the title of the window to match the title
of the current document. To do this, ELinks sends an "OSC 1 ; Pt BEL"
sequence to the terminal. Unfortunately, xterm expects the Pt string
to be in the ISO-8859-1 charset, making it impossible to display e.g.
Cyrillic characters. In xterm patch #210 (2006-03-12) however, there
is a menu item and a resource that can make xterm take the Pt string
in UTF-8 instead, allowing characters from all around the world.
The downside is that ELinks apparently cannot ask xterm whether the
setting is on or off; so add a terminal._template_.latin1_title option
to ELinks and let the user edit that instead.
Complete list of changes:
- Add the terminal._template_.latin1_title option. But do not add
that to the terminal options window because it's already rather
crowded there.
- In set_window_title(), take a new codepage argument. Use it to
decode the title into Unicode characters, and remove only actual
control characters. For example, CP437 has graphical characters in
the 0x80...0x9F range, so don't remove those, even though ISO-8859-1
has control characters in the same range. Likewise, don't
misinterpret single bytes of UTF-8 characters as control characters.
- In set_window_title(), do not truncate the title to the width of the
window. The font is likely to be different and proportional anyway.
But do truncate before 1024 bytes, an xterm limit.
- In struct itrm, add a title_codepage member to remember which
charset the master said it was going to use in the terminal window
title. Initialize title_codepage in handle_trm(), update it in
dispatch_special() if the master sends the new request
TERM_FN_TITLE_CODEPAGE, and use it in most set_window_title() calls;
but not in the one that sets $TERM as the title, because that string
was not received from the master and should consist of ASCII
characters only.
- In set_terminal_title(), convert the caller-provided title to
ISO-8859-1 or UTF-8 if appropriate, and report the codepage to the
slave with the new TERM_FN_TITLE_CODEPAGE request. The conversion
can run out of memory, so return a success/error flag, rather than
void. In display_window_title(), check this result and don't update
caches on error.
- Add a NEWS entry for all of this.
In the elinks.conf.5 manual page, the text below the list of modes was
getting included in the last list item. This appears to be a design
error in AsciiDoc. Work around it by moving the text above the list.
The numbering of document.dump.color_mode and terminal._template_.colors
is now the same regardless of compile-time options, unlike in previous
versions. Therefore this version of ELinks may interpret a configuration
file differently from previous versions even if compiled with the same
options. This is unfortunate but the alternatives (keeping the numbering
dependent on configuration options; defining separate options that use
the new numbering; starting the numbers from 10 or so and recognizing the
previous ones only for compatibility) seem even worse.
The problem is that if you run elinks in xterm with the default white
background, it will be totally unreadable if transparency is turned
on. We should default to usability in all common environments, eyecandy
lovers can do the extra setup for their specific one.
It also makes the description note that elinks still assumes the
background is black.
Added document.cache.interval option. When time elapsed since previous access
to the document is less than interval then the document is taken from
the cache. Otherwise the request with filled "If-Modified-Since" and/or
"If-None-Match" header field is sent. By default interval is set to 10 minutes.
This requires the correct time to be set on your machine.
then dump_to_file_256 is defined in dump.c but not used.
If configure --enable-debug was used, then gcc warns about
the unused function, and the warning stops the build.
2. The description of document.dump.color_mode ends with a
newline, provoking a runtime warning from check_description
in src/config/options.c.
3. options.inc has preprocessor directives inside macro arguments.
That is not portable C. xgettext (GNU gettext-tools) 0.14.3 is
not smart enough to figure out the possible combinations, and
copies an incorrect string to elinks.pot.
the same accelerator key to multiple buttons in a dialog box or
to multiple items in a menu. ELinks already has some support for
this but it requires the translator to run ELinks and manually
scan through all menus and dialogs. The attached changes make it
possible to quickly detect and list any conflicts, including ones
that can only occur on operating systems or configurations that
the translator is not currently using.
The changes have no immediate effect on the elinks executable or
the MO files. PO files become larger, however.
The scheme works like this:
- Like before, accelerator keys in translatable strings are
tagged with the tilde (~) character.
- Whenever a C source file defines an accelerator key, it must
assign one or more named "contexts" to it. The translations in
the PO files inherit these contexts. If multiple strings use
the same accelerator (case insensitive) in the same context,
that's a conflict and can be detected automatically.
- The contexts are defined with "gettext_accelerator_context"
comments in source files. These comments delimit regions where
all translatable strings containing tildes are given the same
contexts. There must be one special comment at the top of the
region; it lists the contexts assigned to that region. The
region automatically ends at the end of the function (found
with regexp /^\}/), but it can also be closed explicitly with
another special comment. The comments are formatted like this:
/* [gettext_accelerator_context(foo, bar, baz)]
begins a region that uses the contexts "foo", "bar", and "baz".
The comma is the delimiter; whitespace is optional.
[gettext_accelerator_context()]
ends the region. */
The scripts don't currently check whether this syntax occurs
inside or outside comments.
- The names of contexts consist of C identifiers delimited with
periods. I typically used the name of a function that sets
up a dialog, or the name of an array where the items of a
menu are listed. There is a special feature for static
functions: if the name begins with a period, then the period
will be replaced with the name of the source file and a colon.
- If a menu is programmatically generated from multiple parts,
of which some are never used together, so that it is safe to
use the same accelerators in them, then it is necessary to
define multiple contexts for the same menu. link_menu() in
src/viewer/text/link.c is the most complex example of this.
- During make update-po:
- A Perl script (po/gather-accelerator-contexts.pl) reads
po/elinks.pot, scans the source files listed in it for
"gettext_accelerator_context" comments, and rewrites
po/elinks.pot with "accelerator_context" comments that
indicate the contexts of each msgid: the union of all
contexts of all of its uses in the source files. It also
removes any "gettext_accelerator_context" comments that
xgettext --add-comments has copied to elinks.pot.
- If po/gather-accelerator-contexts.pl does not find any
contexts for some use of an msgid that seems to contain an
accelerator (because it contains a tilde), it warns. If the
tilde refers to e.g. "~/.elinks" and does not actually mark
an accelerator, the warning can be silenced by specifying the
special context "IGNORE", which the script otherwise ignores.
- msgmerge copies the "accelerator_context" comments from
po/elinks.pot to po/*.po. Translators do not edit those
comments.
- During make check-po:
- Another Perl script (po/check-accelerator-contexts.pl) reads
po/*.po and keeps track of which accelerators have been bound
in each context. It warns about any conflicts it finds.
This script does not access the C source files; thus it does
not matter if the line numbers in "#:" lines are out of date.
This implementation is not perfect and I am not proposing to
add it to the main source tree at this time. Specifically:
- It introduces compile-time dependencies on Perl and Locale::PO.
There should be a configure-time or compile-time check so that
the new features are skipped if the prerequisites are missing.
- When the scripts include msgstr strings in warnings, they
should transcode them from the charset of the PO file to the
one specified by the user's locale.
- It is not adequately documented (well, except perhaps here).
- po/check-accelerator-contexts.pl reports the same conflict
multiple times if it occurs in multiple contexts.
- The warning messages should include line numbers, so that users
of Emacs could conveniently edit the conflicting part of the PO
file. This is not feasible with the current version of
Locale::PO.
- Locale::PO does not understand #~ lines and spews warnings
about them. There is an ugly hack to hide these warnings.
- Jonas Fonseca suggested the script could propose accelerators
that are still available. This has not been implemented.
There are three files attached:
- po/gather-accelerator-contexts.pl: Augments elinks.pot with
context information.
- po/check-accelerator-contexts.pl: Checks conflicts.
- accelerator-contexts.diff: Makes po/Makefile run the scripts,
and adds special comments to source files.