Check the return value of get_opt_rec on "document.browse.search.regex"
before dereferencing it. The option is not there if regular expression
support is disabled at build time.
This commit fixes a bug introduced in commit
b2d51c75ff0d6c52a4f6a2761801beb641cba3a2.
Check the return value of get_opt_rec on "document.browse.search.regex"
before dereferencing it. The option is not there if regular expression
support is disabled at build time.
This commit fixes a bug introduced in commit
b2d51c75ff0d6c52a4f6a2761801beb641cba3a2.
In src/session/task.c, if ses_goto() was going to ask the user to
confirm, it did:
task->session_task.target.frame = null_or_stracpy(target_frame);
It added the struct task to a memory_list, so the structure was freed
when the message box was closed. The target frame string was however
never freed. To fix this leak, add the target frame string to the
memory_list too.
Alternatively, this could have been fixed by making post_yes() and
post_no() free the string. It is however a bit better to use the
memory_list because msg_box() frees that even if it is unable to
display the message box.
hooks.py file in the ELinks distribution will fail with an ImportError
exception for any user who hasn't also installed contrib/python/lp3.py.
For the benefit of users who may not otherwise need that file, I'd
suggest delaying the import of lp3 until it's actually used so that
the rest of hooks.py will still work without it.
In bug 1067, dom_rss_pop_document() freed a node with done_dom_node()
even though call_dom_node_callbacks() was still using that node. This
made call_dom_node_callbacks() read a function pointer from beyond the
end of an array and call that. Add assertions to detect out-of-range
node types, and comments to warn about the bug.
Debian libmozjs-dev 1.9.0.4-2 has JS_ReportAllocationOverflow but
js-1.7.0 reportedly hasn't. Check at configure time whether that
function is available. If not, use JS_ReportOutOfMemory instead.
Reported by Witold Filipczyk.
I didn't read the code of the tre library, but I suppose that when sizes of
wchar_t and unicode_val_T are equal it will work fine.
[ From bug 1060 attachment 508. --KON ]
When the user tells ELinks to search for a regexp, ELinks 0.11.0
passes the regexp to regcomp() and the formatted document to
regexec(), both in the terminal charset. This works OK for unibyte
ASCII-compatible charsets because the regexp metacharacters are all in
the ASCII range. And ELinks 0.11.0 doesn't support multibyte or
ASCII-incompatible (e.g. EBCDIC) charsets in terminals, so it is no
big deal if regexp searches fail in such locales.
ELinks 0.12pre1 attempts to support UTF-8 as the terminal charset if
CONFIG_UTF8 is defined. Then, struct search contains unicode_val_T c
rather than unsigned char c, and get_srch() and add_srch_chr()
together save UTF-32 values there if the terminal charset is UTF-8.
In plain-text searches, is_in_range_plain() compares those values
directly if the search is case sensitive, or folds them to lower case
if the search is case insensitive: with towlower() if the terminal
charset is UTF-8, or with tolower() otherwise. In regexp searches
however, get_search_region_from_search_nodes() still truncates all
values to 8 bits in order to generate the string that
search_for_pattern() then passes to regexec(). In UTF-8 locales,
regexec() expects this string to be in UTF-8 and can't make sense of
the truncated characters. There is also a possible conflict in
regcomp() if the locale is UTF-8 but the terminal charset is not, or
vice versa.
Rejected ways of fixing the charset mismatches:
* When the terminal charset is UTF-8, recode the formatted document
from UTF-32 to UTF-8 for regexp searching. This would work if the
terminal and the locale both use UTF-8, or if both use unibyte
ASCII-compatible charsets, but not if only one of them uses UTF-8.
* Convert both the regexp and the formatted document to the charset of
the locale, as that is what regcomp() and regexec() expect. ELinks
would have to somehow keep track of which bytes in the converted
string correspond to which characters in the document; not entirely
trivial because convert_string() can replace a single unconvertible
character with a string of ASCII characters. If ELinks were
eventually changed to use iconv() for unrecognized charsets, such
tracking would become even harder.
* Temporarily switch to a locale that uses the charset of the
terminal. Unfortunately, it seems there is no portable way to
construct a name for such a locale. It is also possible that no
suitable locale is available; especially on Windows, whose C library
defines MB_LEN_MAX as 2 and thus cannot support UTF-8 locales.
Instead, this commit makes ELinks do the regexp matching with regwcomp
and regwexec from the TRE library. This way, ELinks can losslessly
recode both the pattern and the document to Unicode and rely on the
regexp code in TRE decoding them properly, regardless of locale.
There are some possible problems though:
1. ELinks stores strings as UTF-32 in arrays of unicode_val_T, but TRE
uses wchar_t instead. If wchar_t is UTF-16, as it is on Microsoft
Windows, then TRE will misdecode the strings. It wouldn't be too
hard to make ELinks convert to UTF-16 in this case, but (a) TRE
doesn't currently support UTF-16 either, and it seems possible that
wchar_t-independent UTF-32 interfaces will be added to TRE; and (b)
there seems to be little interest on using ELinks on Windows anyway.
2. The Citrus Project apparently wanted BSD to use a locale-dependent
wchar_t: e.g. UTF-32 in some locales and an ISO 2022 derivative in
others. Regexp searches in ELinks now do not support the latter.
[ Adapted to elinks-0.12 from bug 1060 attachment 506.
Commit message by me. --KON ]
In src/bookmarks/dialogs.c, do_add_bookmark() gets the title and URL
in the terminal charset and needs to know which one that is. When a
bookmark is being added, save the struct terminal * to dialog.udata2
and read the charset from there. When a bookmark is being edited,
dialog.udata2 is needed for the struct bookmark *, but there we always
have the parent struct dialog_data * in dialog.udata and can get the
terminal from that.
These functions now expect or return strings in UTF-8:
delete_folder_by_name (sneak in a const, too), bookmark_terminal_tabs,
open_bookmark_folder, and get_auto_save_bookmark_foldername_utf8 (new
function).
When setting the title or URL of a bookmark from SMJS user scripting,
use update_bookmark() instead of writing directly to struct bookmark.
It triggers the bookmark-update event and sets the bookmarks_dirty
flag.
SpiderMonkey uses UTF-16 and the strings in struct bookmark are in
UTF-8. Previously, the conversions behaved as if the strings had been
in ISO-8859-1.
SpiderMonkey also supports JS_SetCStringsAreUTF8(), which would make
the existing functions convert between UTF-16 and UTF-8, but that
effect is global so I dare not enable it yet. Besides, I don't know
if that function works in all the SpiderMonkey versions that ELinks
claims to work with.
This also makes the bookmark-update event carry strings in UTF-8.
The only current consumer of that event is bookmark_change_hook(),
which ignores the strings, so no changes are needed there.
When the file is being read, Expat provides the strings to ELinks in
UTF-8, so ELinks can put them in struct bookmark without conversions.
Make sure gettext returns any placeholder strings in UTF-8, too.
Replace '\r' with ' ' in bookmark titles and URLs.
When the file is being written, put encoding="UTF-8" in the XML
declaration, and then write out the strings from struct bookmark
without character set conversions. Do replace some characters
with entity references though, by calling add_html_to_string().
When ELinks is parsing an XML element in from an XBEL bookmark file,
it collects the attributes of the element to the current_node->attrs
list. Previously, struct attributes had room for one string only:
the last element of current_node->attrs was the name of the first
attribute, and it was preceded by the value of the first attribute,
the name of the second attribute, the value of the second attribute,
and so on. However, when get_attribute_value() was looking for a
given name, it compared the values as well. So, if you had for
example <bookmark id="href" href="http://elinks.cz/">, then
get_attribute_value("href") would incorrectly return "href".
To fix this confusion, store values in the new member
attributes.value, rather than in attributes.name.