1
0
mirror of https://github.com/rkd77/elinks.git synced 2024-12-04 14:46:47 -05:00
Commit Graph

3537 Commits

Author SHA1 Message Date
M. Levinson
b41e738905 Since commit 2c14587b88, the sample
hooks.py file in the ELinks distribution will fail with an ImportError
exception for any user who hasn't also installed contrib/python/lp3.py.

For the benefit of users who may not otherwise need that file, I'd
suggest delaying the import of lp3 until it's actually used so that
the rest of hooks.py will still work without it.
2009-02-15 19:58:42 +01:00
Kalle Olavi Niemitalo
ece4bfcc28 Merge branch 'elinks-0.12' into elinks-0.13
Conflicts:
	src/document/dom/renderer.c (split into rss.c, source.c)
2009-02-15 05:08:06 +02:00
Kalle Olavi Niemitalo
c8cee1df61 bug 1067: More verbose NEWS. 2009-02-15 05:02:43 +02:00
Kalle Olavi Niemitalo
d14f65a331 bug 1067: Comments about freeing the DOM document node. 2009-02-15 04:27:39 +02:00
Kalle Olavi Niemitalo
eb820a57a6 bug 1067: Assertions and comments about done_dom_node().
In bug 1067, dom_rss_pop_document() freed a node with done_dom_node()
even though call_dom_node_callbacks() was still using that node.  This
made call_dom_node_callbacks() read a function pointer from beyond the
end of an array and call that.  Add assertions to detect out-of-range
node types, and comments to warn about the bug.
2009-02-15 03:39:00 +02:00
Witold Filipczyk
f77748299b bug 1067: Fix for elinks-0.13. 2009-02-12 10:05:35 +01:00
Witold Filipczyk
9054e57c55 Merge branch 'elinks-0.12'
Conflicts:
	src/document/dom/renderer.c
2009-02-12 10:01:57 +01:00
Witold Filipczyk
a7c2f14e6d bug 1067: the node was freed, but still used. 2009-02-12 09:48:04 +01:00
Kalle Olavi Niemitalo
c53e6335a1 Mention bug 761 in NEWS. 2009-02-09 00:24:13 +02:00
Kalle Olavi Niemitalo
7067fc7af9 Check for JS_ReportAllocationOverflow before using it.
Debian libmozjs-dev 1.9.0.4-2 has JS_ReportAllocationOverflow but
js-1.7.0 reportedly hasn't.  Check at configure time whether that
function is available.  If not, use JS_ReportOutOfMemory instead.

Reported by Witold Filipczyk.
2009-02-08 23:07:22 +02:00
Kalle Olavi Niemitalo
d2854dca8d Merge branch 'elinks-0.12' into elinks-0.13
Conflicts:
	src/bookmarks/backend/default.c
	src/bookmarks/bookmarks.c
	src/session/session.c
	src/terminal/event.c
	src/viewer/text/search.c
2009-02-08 22:02:57 +02:00
Witold Filipczyk
b536802b9c Small improvement in lp3.py. 2009-02-08 18:10:17 +01:00
Kalle Olavi Niemitalo
7941c7097a Bug 1060: Document the need for TRE. 2009-02-08 18:55:15 +02:00
Witold Filipczyk
2c14587b88 Added the lp3.py to let elinks users listen to the lp3. 2009-02-08 17:49:33 +01:00
Kalle Olavi Niemitalo
63a362ee53 Bug 1060: Try TRE_LIBS=-ltre if pkg-config tre fails.
This works around Debian bug 513055 in libtre-dev.
2009-02-08 18:26:22 +02:00
Witold Filipczyk
664048098a Bug 1060: #undef HAVE_TRE_REGEX_H only in elinks.h
I didn't read the code of the tre library, but I suppose that when sizes of
wchar_t and unicode_val_T are equal it will work fine.

[ From bug 1060 attachment 508.  --KON ]
2009-02-08 18:26:22 +02:00
Witold Filipczyk
c5a7f87c43 Bug 1060: Use libtre for regexp searches.
When the user tells ELinks to search for a regexp, ELinks 0.11.0
passes the regexp to regcomp() and the formatted document to
regexec(), both in the terminal charset.  This works OK for unibyte
ASCII-compatible charsets because the regexp metacharacters are all in
the ASCII range.  And ELinks 0.11.0 doesn't support multibyte or
ASCII-incompatible (e.g. EBCDIC) charsets in terminals, so it is no
big deal if regexp searches fail in such locales.

ELinks 0.12pre1 attempts to support UTF-8 as the terminal charset if
CONFIG_UTF8 is defined.  Then, struct search contains unicode_val_T c
rather than unsigned char c, and get_srch() and add_srch_chr()
together save UTF-32 values there if the terminal charset is UTF-8.
In plain-text searches, is_in_range_plain() compares those values
directly if the search is case sensitive, or folds them to lower case
if the search is case insensitive: with towlower() if the terminal
charset is UTF-8, or with tolower() otherwise.  In regexp searches
however, get_search_region_from_search_nodes() still truncates all
values to 8 bits in order to generate the string that
search_for_pattern() then passes to regexec().  In UTF-8 locales,
regexec() expects this string to be in UTF-8 and can't make sense of
the truncated characters.  There is also a possible conflict in
regcomp() if the locale is UTF-8 but the terminal charset is not, or
vice versa.

Rejected ways of fixing the charset mismatches:

* When the terminal charset is UTF-8, recode the formatted document
  from UTF-32 to UTF-8 for regexp searching.  This would work if the
  terminal and the locale both use UTF-8, or if both use unibyte
  ASCII-compatible charsets, but not if only one of them uses UTF-8.

* Convert both the regexp and the formatted document to the charset of
  the locale, as that is what regcomp() and regexec() expect.  ELinks
  would have to somehow keep track of which bytes in the converted
  string correspond to which characters in the document; not entirely
  trivial because convert_string() can replace a single unconvertible
  character with a string of ASCII characters.  If ELinks were
  eventually changed to use iconv() for unrecognized charsets, such
  tracking would become even harder.

* Temporarily switch to a locale that uses the charset of the
  terminal.  Unfortunately, it seems there is no portable way to
  construct a name for such a locale.  It is also possible that no
  suitable locale is available; especially on Windows, whose C library
  defines MB_LEN_MAX as 2 and thus cannot support UTF-8 locales.

Instead, this commit makes ELinks do the regexp matching with regwcomp
and regwexec from the TRE library.  This way, ELinks can losslessly
recode both the pattern and the document to Unicode and rely on the
regexp code in TRE decoding them properly, regardless of locale.

There are some possible problems though:

1. ELinks stores strings as UTF-32 in arrays of unicode_val_T, but TRE
   uses wchar_t instead.  If wchar_t is UTF-16, as it is on Microsoft
   Windows, then TRE will misdecode the strings.  It wouldn't be too
   hard to make ELinks convert to UTF-16 in this case, but (a) TRE
   doesn't currently support UTF-16 either, and it seems possible that
   wchar_t-independent UTF-32 interfaces will be added to TRE; and (b)
   there seems to be little interest on using ELinks on Windows anyway.

2. The Citrus Project apparently wanted BSD to use a locale-dependent
   wchar_t: e.g. UTF-32 in some locales and an ISO 2022 derivative in
   others.  Regexp searches in ELinks now do not support the latter.

[ Adapted to elinks-0.12 from bug 1060 attachment 506.
  Commit message by me.  --KON ]
2009-02-08 18:26:22 +02:00
Kalle Olavi Niemitalo
264a66fe4d bug 153: UTF-8 bookmark.title has been fully implemented.
Mention it in NEWS too.
2009-02-08 18:26:21 +02:00
Kalle Olavi Niemitalo
311d95358d bug 153, 1066: Convert bookmarks to/from UTF-8 when searching. 2009-02-08 18:26:21 +02:00
Kalle Olavi Niemitalo
8c0fa7f09c bug 153, 1066: Convert strings to edit-bookmark dialog from UTF-8. 2009-02-08 18:26:21 +02:00
Kalle Olavi Niemitalo
5a29dbc4a1 bug 153, 1066: Convert strings to bookmark info dialog from UTF-8. 2009-02-08 18:26:20 +02:00
Kalle Olavi Niemitalo
b3acd2a5bc bug 153: Convert titles in bookmark manager from UTF-8. 2009-02-08 18:26:20 +02:00
Kalle Olavi Niemitalo
b3f9d48bba bug 153, 1066: Convert strings from add-bookmark dialogs to UTF-8.
In src/bookmarks/dialogs.c, do_add_bookmark() gets the title and URL
in the terminal charset and needs to know which one that is.  When a
bookmark is being added, save the struct terminal * to dialog.udata2
and read the charset from there.  When a bookmark is being edited,
dialog.udata2 is needed for the struct bookmark *, but there we always
have the parent struct dialog_data * in dialog.udata and can get the
terminal from that.
2009-02-08 18:26:19 +02:00
Kalle Olavi Niemitalo
b432b735e4 bug 1066: Attempt to convert -remote addBookmark(URL) to UTF-8.
Currently, it is not clear which codepage is used in struri().
Assume it is the system codepage.
2009-02-08 18:26:19 +02:00
Kalle Olavi Niemitalo
99d1269bc5 bug 153, 1066: Convert session-snapshot bookmarks to/from UTF-8.
These functions now expect or return strings in UTF-8:
delete_folder_by_name (sneak in a const, too), bookmark_terminal_tabs,
open_bookmark_folder, and get_auto_save_bookmark_foldername_utf8 (new
function).
2009-02-08 18:26:19 +02:00
Kalle Olavi Niemitalo
11acd03eb2 Use update_bookmark() in SMJS bookmark object.
When setting the title or URL of a bookmark from SMJS user scripting,
use update_bookmark() instead of writing directly to struct bookmark.
It triggers the bookmark-update event and sets the bookmarks_dirty
flag.
2009-02-08 18:26:18 +02:00
Kalle Olavi Niemitalo
97d72d15a0 bug 153, 1066: Convert properties of SMJS bookmark to/from UTF-8.
SpiderMonkey uses UTF-16 and the strings in struct bookmark are in
UTF-8.  Previously, the conversions behaved as if the strings had been
in ISO-8859-1.

SpiderMonkey also supports JS_SetCStringsAreUTF8(), which would make
the existing functions convert between UTF-16 and UTF-8, but that
effect is global so I dare not enable it yet.  Besides, I don't know
if that function works in all the SpiderMonkey versions that ELinks
claims to work with.
2009-02-08 18:26:18 +02:00
Kalle Olavi Niemitalo
03b112796d bug 153, 1066: Add codepage parameter to update_bookmark().
This also makes the bookmark-update event carry strings in UTF-8.
The only current consumer of that event is bookmark_change_hook(),
which ignores the strings, so no changes are needed there.
2009-02-08 18:26:18 +02:00
Kalle Olavi Niemitalo
73f925ce21 bug 153, 1066: Convert XBEL bookmarks to/from UTF-8.
When the file is being read, Expat provides the strings to ELinks in
UTF-8, so ELinks can put them in struct bookmark without conversions.
Make sure gettext returns any placeholder strings in UTF-8, too.
Replace '\r' with ' ' in bookmark titles and URLs.

When the file is being written, put encoding="UTF-8" in the XML
declaration, and then write out the strings from struct bookmark
without character set conversions.  Do replace some characters
with entity references though, by calling add_html_to_string().
2009-02-08 18:26:04 +02:00
Witold Filipczyk
d91668b0c5 Pass the codepage (cp) instead of options to the scan_http_equiv. 2009-01-27 09:23:56 +01:00
Witold Filipczyk
39c6589edb Added the get_cp_highhalf function, which will be used by xhtml. 2009-01-26 21:11:14 +01:00
Kalle Olavi Niemitalo
8c0ae2a215 bug 153, 1066: Convert ~/.elinks/bookmarks to/from UTF-8.
The ~/.elinks/bookmarks file is in the system charset,
for compatibility with earlier ELinks releases,
but internally the strings are in UTF-8.
2009-01-24 14:38:59 +02:00
Kalle Olavi Niemitalo
1cb81679f4 bug 153, 1066: Add add_bookmark_cp(). 2009-01-24 12:18:28 +02:00
Kalle Olavi Niemitalo
d1f2f8df80 bug 153, 1066: init_bookmark() and add_bookmark() expect UTF-8.
Comment changes only.
2009-01-24 12:17:48 +02:00
Kalle Olavi Niemitalo
37de386051 bug 153, 1066: Document that bookmarks should be UTF-8.
Comment changes only.
2009-01-24 12:12:45 +02:00
Kalle Olavi Niemitalo
9088f11c64 Make encode_utf8() extern even without CONFIG_UTF8.
It will soon be needed for conversions from UTF-16 to UTF-8.
2009-01-04 16:55:24 +02:00
Kalle Olavi Niemitalo
a82a5cc6d5 XBEL bug 761: Distinguish between names and values of attributes.
When ELinks is parsing an XML element in from an XBEL bookmark file,
it collects the attributes of the element to the current_node->attrs
list.  Previously, struct attributes had room for one string only:
the last element of current_node->attrs was the name of the first
attribute, and it was preceded by the value of the first attribute,
the name of the second attribute, the value of the second attribute,
and so on.  However, when get_attribute_value() was looking for a
given name, it compared the values as well.  So, if you had for
example <bookmark id="href" href="http://elinks.cz/">, then
get_attribute_value("href") would incorrectly return "href".

To fix this confusion, store values in the new member
attributes.value, rather than in attributes.name.
2009-01-04 15:15:21 +02:00
Witold Filipczyk
fb05ed1fa2 youtube2.js: Show link to the video instead of the <meta refresh../>. 2009-01-03 17:39:02 +01:00
Kalle Olavi Niemitalo
30dbe6a2f8 Use get_terminal_codepage in handle_interlink_event.
This should have been in an earlier commit but I somehow missed it.

Related to bug 1064 but does not change visible behaviour yet.
2009-01-01 22:59:11 +00:00
Witold Filipczyk
ba70d61051 762: Instead of setting a bare pointer for task.target.frame always
use the dynamically allocated value. null_or_stracpy and mem_free_set
macros are used. Slower, but safer.
2009-01-01 22:06:59 +01:00
Kalle Olavi Niemitalo
e5722ad0d9 Bug 1061: Correctly truncate UTF-8 titles in the tab bar. 2009-01-01 20:01:50 +00:00
Kalle Olavi Niemitalo
8d19b87cb1 Bug 885: Truncate title at 600 bytes, not 1024.
Although xterm allows 1024 bytes, GNU Screen apparently has a lower
limit.
2009-01-01 19:54:35 +00:00
Kalle Olavi Niemitalo
687f19dbde Merge branch 'elinks-0.12' into elinks-0.13
Conflicts:
	src/bfu/dialog.c
	src/bfu/hotkey.c
	src/bfu/inpfield.c
	src/dialogs/options.c
	src/document/renderer.c
	src/intl/gettext/libintl.h
	src/protocol/http/codes.c
	src/session/task.c
	src/terminal/event.c
	src/terminal/terminal.h
	src/viewer/text/form.c
	src/viewer/text/link.c

And a semantic conflict in src/terminal/terminal.c.
2009-01-01 19:14:01 +00:00
Kalle Olavi Niemitalo
29c34df62e Fix assertion failure if IMG/@usemap refers to a different file.
Change test/imgmap2.html so it can be used for testing this too.

Debian Iceweasel 3.0.4 does not appear to support such external
client-side image maps.  Well, that's one place where ELinks is
superior, I guess.  There might be a security problem though if ELinks
were to let scripts of the referring page examine the links in the
image map.
2009-01-01 19:12:41 +00:00
Kalle Olavi Niemitalo
dc41f0bd4c test: Don't refer to deleted files from imgmap.html.
align.html and poocs.net.html have been deleted.
Point the links to href_tests.html and nbsp.html instead.
2009-01-01 18:36:34 +00:00
Kalle Olavi Niemitalo
5be3f71ddd Add test/image.png and use it in test/imgmap.html.
This makes the image-map test work sensibly in graphical browsers too.
2009-01-01 18:35:11 +00:00
Kalle Olavi Niemitalo
b6dfdf86a6 Bug 885: Proper charset support in xterm window title
When ELinks runs in an X11 terminal emulator (e.g. xterm), or in GNU
Screen, it tries to update the title of the window to match the title
of the current document.  To do this, ELinks sends an "OSC 1 ; Pt BEL"
sequence to the terminal.  Unfortunately, xterm expects the Pt string
to be in the ISO-8859-1 charset, making it impossible to display e.g.
Cyrillic characters.  In xterm patch #210 (2006-03-12) however, there
is a menu item and a resource that can make xterm take the Pt string
in UTF-8 instead, allowing characters from all around the world.
The downside is that ELinks apparently cannot ask xterm whether the
setting is on or off; so add a terminal._template_.latin1_title option
to ELinks and let the user edit that instead.

Complete list of changes:

- Add the terminal._template_.latin1_title option.  But do not add
  that to the terminal options window because it's already rather
  crowded there.

- In set_window_title(), take a new codepage argument.  Use it to
  decode the title into Unicode characters, and remove only actual
  control characters.  For example, CP437 has graphical characters in
  the 0x80...0x9F range, so don't remove those, even though ISO-8859-1
  has control characters in the same range.  Likewise, don't
  misinterpret single bytes of UTF-8 characters as control characters.

- In set_window_title(), do not truncate the title to the width of the
  window.  The font is likely to be different and proportional anyway.
  But do truncate before 1024 bytes, an xterm limit.

- In struct itrm, add a title_codepage member to remember which
  charset the master said it was going to use in the terminal window
  title.  Initialize title_codepage in handle_trm(), update it in
  dispatch_special() if the master sends the new request
  TERM_FN_TITLE_CODEPAGE, and use it in most set_window_title() calls;
  but not in the one that sets $TERM as the title, because that string
  was not received from the master and should consist of ASCII
  characters only.

- In set_terminal_title(), convert the caller-provided title to
  ISO-8859-1 or UTF-8 if appropriate, and report the codepage to the
  slave with the new TERM_FN_TITLE_CODEPAGE request.  The conversion
  can run out of memory, so return a success/error flag, rather than
  void.  In display_window_title(), check this result and don't update
  caches on error.

- Add a NEWS entry for all of this.
2009-01-01 16:17:03 +00:00
Kalle Olavi Niemitalo
8f4d7f9903 Define cp_to_unicode() even without CONFIG_UTF8.
And make its last parameter point to const.  add_cp_html_to_string()
no longer needs to pretend UTF-8 is ISO-8859-1.
2009-01-01 16:17:03 +00:00
Kalle Olavi Niemitalo
ad45176dde Add get_terminal_codepage().
This simplifies the callers a little and may help implement
simultaneous support for different charsets on different terminals
of the same type (bug 1064).
2009-01-01 16:16:17 +00:00
Kalle Olavi Niemitalo
25da8085b3 Fix double-free crash if EOF immediately follows </MAP>.
look_for_link() used to return 0 both when it found the closing </MAP>
tag, and when it hit the end of the file.  In the first case, it also
added *menu to the memory_list; in the second case, it did not.  The
caller get_image_map() supposedly distinguished between these cases by
checking whether pos >= eof, and freed *menu separately if so.

However, if the </MAP> was at the very end of the HTML file, so that
not even a newline followed it, then look_for_link() left pos == eof
even though it had found the </MAP> and added *menu to the memory_list.
This made get_image_map() misinterpret the result and mem_free(*menu)
even though *menu had already been freed as part of the memory_list;
thus the crash.

To fix this, make look_for_link() return -1 instead of 0 if it hits
EOF without finding the </MAP>.  Then make get_image_map() check the
return value instead of comparing pos to eof.  And add a test case,
although not an automated one.

Alternatively, look_for_link() could have been changed to decrement
pos between finding the </MAP> and returning 0.  Then, the pos >= eof
comparison in get_image_map() would have been false.  That scheme
would however have been a bit more difficult to understand and
maintain, I think.

Reported by Paul B. Mahol.
(cherry picked from commit a2404407ce)
2008-12-31 20:15:44 +00:00