As commit 7db8abf6e7 does for Lua
and the document info box, change the Python scripting backend's
current_document and current_header APIs to use document->cached
instead of find_in_cached so the currently displayed document
will be used rather than the latest version of the document.
The numbering of document.dump.color_mode and terminal._template_.colors
is now the same regardless of compile-time options, unlike in previous
versions. Therefore this version of ELinks may interpret a configuration
file differently from previous versions even if compiled with the same
options. This is unfortunate but the alternatives (keeping the numbering
dependent on configuration options; defining separate options that use
the new numbering; starting the numbers from 10 or so and recognizing the
previous ones only for compatibility) seem even worse.
This does not conflict with querying the palette from xterm (bug 890)
because although those palettes would have to be modifiable, they
would be terminal-specific rather than global.
Convert each byte of them to UCS_REPLACEMENT_CHARACTER. This may not
be the optimal solution but at least it ought to be safe. Also raise
an internal error if the value read from utf8char_len_tab[] is out of
range.
Note that ELinks is still using the RFC 2279 definition of UTF-8 and
thus allows characters up to 0x7FFFFFFF, even though RFC 3629 has
changed the maximum to 0x10FFFF.
This fixes parse_ftp_number to use off_t instead of long to store its
(intermediate) result and return type. It also introduces an OFFT_MAX type
"limit" that is used for validating the size of the parsed number.
A test-case for was added in 37c9bf3f75 to
test-ftp-parser and the patch has been confirmed to fix the test-case by
adamg and me. This closes bug 899, which is a duplicate of debian bug
403139.
I do not fully understand this code, but I am sure skipping characters
like this is a bug, and correcting it seems to fix bug 826 (too small
table for double-cell characters). I don't see any similar bugs in
other parts of set_hline.
The patch is from bug 826, comment 4, attachment 308. The warning
there about unicode_to_cell(UCS_NO_CHAR) still applies but this patch
does not make the situation worse. I have logged a separate bug 901
about those calls.
cleanup_python and python_done_keybinding_interface called by it
now reset the PyObject *python_hooks, *keybindings variables back
to NULL when they release the references. Without this change,
dangling pointers left in those variables could cause problems
if the Python scripting module were deinitialized and reinitialized.
It looks like such reinitialization is not currently possible though,
because enhancement request 73 (plugins support) has not yet been
implemented.
Before it was only to get the password when the user name was also
requested. This fixes FSP access to password protected directories.
The problem was discovered by Witold and mentioned in the post to
elinks-dev with the message-id: <20061209204151.GA32758@pldmachine> on
2006-12-09.
Now the currently displayed version of the current document,
rather than the latest version of the current document, will be used
for the document info box and the current_document() Lua function.
This allows code to use document->cached instead of
find_in_cache(document->uri), thereby increasing the likelihood
of getting the correct cache entry.
This should fix Bug 756 - "assertion (cached)->object.refcount >= 0 failed"
after HTTP proxy was changed.
Patches for this were written by me and then later by Jonas.
This commit combines our independent implementations.
The configure script checks whether it is possible to compile a use of
POPpx without an n_a variable; if not, the source code then defines
those variables. This is slower than including Perl's patchlevel.h
and comparing the version numbers to 5.8.8 but I expect this to be
more reliable as well.
Before this patch, init_python would crash trying to set up elinks.home
at the Python side. Now it uses None as the value in that case.
Also, init_python no longer adds "(null)" to $PYTHONPATH.
This change does not fix any bug, but the SMJS builtin classes use
negative tinyids already, so I presume this is the preferred practice.
At least it means the tinyids won't have to be renumbered later if
some of these objects are changed to behave as arrays.
This reverts baf7b0e91d:
Fix segfaults caused by ruby scripting (gentoo bug #121247).
which reverted 5145ae266a:
Change the Python, Ruby, and SEE hooks for pre-format-html to work
properly now that they are given a non-NUL-terminated string.
and also makes the Ruby hooks interface generally use rb_str_new(str, len)
in favor of rb_str_new2(str) to avoid relying on NUL-terminated being
handled correctly by Ruby. Also, it was wrong for the preformat hook which
is not always handed a NUL-terminated string. Finally, the gentoo bug
(http://bugs.gentoo.org/show_bug.cgi?id=121247) is currently reopened which
suggests that the previous fix was not correct.
If CONFIG_UTF8 is not defined, then text_end is not used, and GCC
could warn about that. Because configure can add -Werror to CFLAGS,
the warning could then cause the whole build to fail.
The error was:
sgml-parser.c: In function 'print_indent':
sgml-parser.c:99: warning: field precision should have type 'int', but argument 2 has type 'long unsigned int'
If ECMAScript code does obj[42], then the getProperty or setProperty
function of the JSClass of obj gets 42 as the property ID and must not
treat that as an internal error.
The getProperty and setProperty functions of a JSClass must not assume
that the obj parameter points to an instance of that class. It might
instead point to another object that merely has an instance of the
class in its prototype chain. Thus, do not assert that JS_InstanceOf
returns true there. Instead, run the check even with CONFIG_FASTMEM,
and just return JS_FALSE if it fails.
As draw_textarea_utf8 loops over each character of the textarea content, it
checks whether the character is on the screen; draws it if so; increments
the screen co-ordinate; and updates the position in the textarea text.
The last step was being skipped when the character was not on the line,
so a line would be drawn from the beginning, even if the left edge of the
textarea is off the screen.
Closes: Bug 835 - Text in textarea is unaffected by horizontal scrolling of
document in UTF-8 mode
If utf8_char2cells isn't told where the string that contains
the given UTF-8 character ends, it computes that itself. Two users
of utf8_char2cells, format_textutf8 and split_line, were calling
utf8_char2cells in a loop without providing the end of the string,
resulting in numerous calls by utf8_char2cells to strlen.
With this patch, format_textutf8 and split_line each find the end
of the string once and provide it to utf8_char2cells.
This particularly improves performance with textareas, since
format_textutf8 is called multiple times each time the user interacts
with the textarea and when it must be redrawn.
Closes: Bug 823 - Big textarea is too slow with CONFIG_UTF8
The quote_level was decremented unconditionally and could become negative
resulting in a negative index after applying "modulus 2". Reproducable
with an HTML file contianing "</q>".
Reported by paakku.
src/protocol/smb/smb.c: Added #error directives so that this
vulnerable code cannot be accidentally compiled in.
features.conf: Disable CONFIG_SMB by default and explain why.
configure.in: If CONFIG_SMB is enabled, disable it and warn the user.
This is for people who have customized features.conf.
doc/remote.txt says there must be a nonempty sequence of ASCII
alphabetic characters before the opening parenthesis. Check that
they really are ASCII characters and that the sequence is nonempty.
Thus, elinks -remote '(foo)' now treats the string as an address,
rather than as a command.
UCS_ORPHAN_CELL is currently defined as U+0020 SPACE, which was
already used before this macro, so the behaviour does not change,
but the code seems clearer now.
I searched for ' ' and 32 and 0x20 and \x20, and replaced with
UCS_ORPHAN_CELL wherever UCS_NO_CHAR was involved. However,
some BFU widgets first draw spaces and then overwrite with text;
those will require a more complex fix if UCS_ORPHAN_CELL is ever
changed to some other character.
Be more strict about the format accepted by the ELinks specific extension
to the -remote URL syntax. That is, commands must begin with a nonempty
sequence of ASCII alphabetic characters followed by optional whitespace and
an opening parenthesis. Also, document the syntax.
Fixes bug 830.
When I first read the warning about "NO NATIONAL CHARS SUPPORT!"
I was amazed: XML is based on Unicode, so why would the authors of the
XBEL specification have botched support for those characters? The bug
is actually in the ELinks implementation of XBEL, and it has already
been entered in the ELinks bugzilla. Make the documentation string
refer to that.
Recognize all of 
 
 with any number of leading
zeroes. (Previously only and 
 were supported.) All of
these are case insensitive.
Treat each CR+LF combination ( ) as a single newline.
Fix bug 834 (various gzip-encoded documents were being truncated),
which I introduced with commit e441361f2c.
Thanks to Witek for reporting the bug, Kalle for determining the
problematic commit, and Jonas for letting me know about the bug report(!).
... mainly bittorrent:// and bittorrent://x
The BitTorrent URL is supposed to contain an embedded URL pointing to a
metainfo file. If this is not the case a "custom" error message will be
shown. Also fixes calling of free_list() on an uninitialized list.
Closes bug 729.
A simple "update" of Hugo Haas' patch posted for bug 107. This of course
also affects the (undocumented?) feature of file:// refering to the local
directory in that directories named "localhost" can no longer be displayed
using file://localhost. Nobody should do that anyway.
It will grab at the first fragment of the cache entry and try to detect the
content-type by looking for valid HTML. It is very stupid for now, simply
searching for "<html>", which may be bogus in certain circumstances. And I
am not sure if this is better left out and up to the scripting backends,
e.g. SMJS can now modify the cache entry.
A feable fix for bug 396.
The problem is that if you run elinks in xterm with the default white
background, it will be totally unreadable if transparency is turned
on. We should default to usability in all common environments, eyecandy
lovers can do the extra setup for their specific one.
It also makes the description note that elinks still assumes the
background is black.
This is necessary when using a POSIX-compliant stdio implementation, which
will set the FILE error flag when input from the pipe is exhausted, causing
all subsequent reads to fail.
Adjust the size of to_read for the initial read instead of setting the init
flag and using that later to check whether to read a smaller amount than
the value in to_read. This also affects the realloc call on the initial
read, which was allocating more memory than necessary (altho this
discrepency would be corrected with the realloc for the next read).
Otherwise if the page installs multiple timers the old one would live
on unreferenced and possibly (likely) trigger after the document's death
and everything would go to hell.
/usr/bin/ld: warning multiple definitions of symbol _locale_charset
lib.o definition of _locale_charset in section (__TEXT,__text)
/usr/lib/libiconv.dylib(localcharset.o) definition of _locale_charset
Revert commit c9ce4260e5,
which made "elinks -remote http://elinks.cz/" fail with an error
"ELinks: Cannot parse option -remote: Remote method not supported"
even though doc/remote.txt says it should open the URL in a new tab.
The previous version assumed the first non-digit after the CSI was the
Final Byte, for example the first semicolon in the "\E[?1;2c" report.
It then treated all subsequent bytes as typed characters.
According to Standard ECMA-48 (Fifth Edition - June 1991), there may
be any number of Parameter Bytes in the range 0x30 to 0x3F, and then
any number of Intermediate Bytes in the range 0x20 to 0x2F, between
the CSI and the Final Byte.
This version still does not support control sequences longer than
ITRM_IN_QUEUE_SIZE bytes.
Added document.cache.interval option. When time elapsed since previous access
to the document is less than interval then the document is taken from
the cache. Otherwise the request with filled "If-Modified-Since" and/or
"If-None-Match" header field is sent. By default interval is set to 10 minutes.
This requires the correct time to be set on your machine.
The current rules are:
term.utf8
CONFIG_UTF8 UTF-8 I/O widget_data.cdata
----------- --------- ------------------
undefined disabled charset of the terminal
undefined enabled charset of the terminal
defined disabled charset of the terminal (*)
defined enabled always UTF-8
(*) kbd_field was incorrectly assuming UTF-8 in this case.
This reverts the following commits:
- 86ed79deaf
Use wcwidth if available and applicable.
- 304f5fa1ea
comment fix (__STDC_ISO_10646__, not __STDC_ISO_10646)
- part of 71eebf1cc7
Compensate for glibc not defining wcwidth() when _XOPEN_SOURCE is not set
And adds a lengthy comment about LC_CTYPE problems.
Explicitly compare the value that is returned by the widget handler
against EVENT_NOT_PROCESSED rather than relying on the fact that
EVENT_NOT_PROCESSED is equal to 1.
ffeedbdc5045a6a5db2bc75ecaab56bfe46c80ea
The secure file saving code plays some shenanigans with the umask.
Previously, the code could fail to restore the old umask when certain libc
calls failed: malloc, mkstemp, fdopen, and fopen. This resulted in
unrelated code creating files with the wrong umode. Specifically, the
download code's automatic directory creation was creating directories
without the execute permission bit.
Thanks to Quiznos for reporting and helping to track the problem down.
To reproduce:
- Start ELinks.
- Enable the ui.tabs.wraparound option.
- Press t to open a second tab.
- Go to http://elinks.cz/ in the second tab.
- Press 3< to step three tabs to the left.
In the statement "tab = tabs + tab % tabs;", tab == -2 and tabs == 2.
So tab % tabs == 0 and tab becomes 2, which is out of range.
The new version calls get_opt_bool even if the tab parameter is already in
range, but the cost should be negligible compared to the redraw_terminal()
call that follows.
Previously, each mapping between a codepage byte and a Unicode
character was stored as a struct table_entry, which listed both the
byte and the character. This representation may be optimal for sparse
mappings, but codepages map almost every possible byte to a character,
so it is more efficient to just have an array that lists the Unicode
character corresponding to each byte from 0x80 to 0xFF. The bytes are
not stored but rather implied by the array index. The tcvn5712 and
viscii codepages have a total of four mappings that do not fit in the
arrays, so we still use struct table_entry for those.
This change also makes cp2u() operate in O(1) time and may speed up
other functions as well.
The "sed | while read" concoction in Unicode/gen-cp looks rather
unhealthy. It would probably be faster and more readable if rewritten
in Perl, but IMO that goes for the previous version as well, so I
suppose whoever wrote it had a reason not to use Perl here.
Before:
text data bss dec hex filename
38948 28528 3311 70787 11483 src/intl/charsets.o
500096 85568 82112 667776 a3080 src/elinks
After:
text data bss dec hex filename
31558 28528 3311 63397 f7a5 src/intl/charsets.o
492878 85568 82112 660558 a144e src/elinks
So the text section shrank by 7390 bytes.
Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls
--disable-cookies --disable-formhist --disable-globhist
--disable-mailcap --disable-mimetypes --disable-smb --disable-mouse
--disable-sysmouse --disable-leds --disable-marks --disable-css
--enable-small --enable-utf-8 --without-gpm --without-bzlib
--without-idn --without-spidermonkey --without-lua --without-gnutls
--without-openssl CFLAGS="-Os -ggdb -Wall"
Before:
text data bss dec hex filename
25726 62992 3343 92061 1679d src/intl/charsets.o
653856 120020 82144 856020 d0fd4 src/elinks
After:
text data bss dec hex filename
60190 28528 3311 92029 1677d src/intl/charsets.o
688320 85556 82112 855988 d0fb4 src/elinks
So 34464 bytes were moved from the data section to the text section
and should be more likely to get shared between ELinks processes.
Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls
--disable-cookies --disable-formhist --disable-globhist
--disable-mailcap --disable-mimetypes --disable-smb --disable-mouse
--disable-sysmouse --disable-leds --disable-marks --disable-css
--enable-small --enable-utf-8 --without-gpm --without-bzlib
--without-idn --without-spidermonkey --without-lua --without-gnutls
--without-openssl CFLAGS="-O2 -ggdb -Wall"
UCS_NO_CHAR here means the input was too short. Because the strings
generally come from the source code or from PO files, they should not
end in the middle of a character. However, the whole character may be
missing if the string is empty. So select_button_by_key() now checks
for that case separately.
UCS_NO_CHAR must not be passed to unicode_fold_label_case() because
the result is undefined.
The configure script no longer recognizes "CONFIG_UTF_8=yes" lines
in custom features.conf files. They will have to be changed to
"CONFIG_UTF8=yes". This incompatibility was deemed acceptable
because no released version of ELinks supports CONFIG_UTF_8.
The --enable-utf-8 option was not renamed.
Suggested by Miciah on #elinks.
What was renamed:
add_utf_8 => add_utf8
cp2utf_8 => cp2utf8
encode_utf_8 => encode_utf8
get_translation_table_to_utf_8 => get_translation_table_to_utf8
goto invalid_utf_8_start_byte => goto invalid_utf8_start_byte
goto utf_8 => goto utf8
goto utf_8_select => goto utf8_select
terminal_interlink.utf_8 => terminal_interlink.utf8
utf_8_to_unicode => utf8_to_unicode
What was not renamed:
terminal._template_.utf_8_io option, TERM_OPT_UTF_8_IO
Compatibility with existing elinks.conf files would require an alias.
--enable-utf-8
Because the name of the charset is UTF-8, --enable-utf-8 looks better
than --enable-utf8.
CONFIG_UTF_8
Will be renamed in a later commit.
Unicode/utf_8.cp, table_utf_8, aliases_utf_8
Will be renamed in a later commit.
Previously, the window sizes computed for xterm were a few pixels off,
and this could result in too few character cells being displayed.
This new version tries to read the window size increment from the
WM_NORMAL_HINTS property set by xterm, and base the computations on
that.
Previously, utf8_step_forward() and utf8_step_backward() left *count
unchanged if some parameter was invalid. Now, they properly store 0.
This flaw had no effect in practice, because all current callers pass
count=NULL, and invalid parameters shouldn't be used anyway.
form_state.state_cell is no longer used for FC_TEXT, FC_PASSWORD, nor FC_FILE.
Instead, get_link_cursor_offset() computes the cell with utf8_ptr2chars
(a new function) or utf8_ptr2cells. This shouldn't slow down ELinks too
much, as it's done only for the selected link and only once per redraw.
The left side of a scrolled input field is always aligned at a
character boundary. The right side might not be.
The new comments describe how the members were apparently intended to
be used. However, the implementation does not actually work when
CONFIG_UTF_8 is defined, and the current semantics do not even allow
an efficient implementation of long (mostly scrolled out) strings.
Surrogates are now treated the same way as out-of-range characters
like U+110000; if a link has such an access key, then the ECMAScript
accessKey property cannot be read. It seems currently impossible to
set such an access key though, because accesskey_string_to_unicode()
doesn't support multibyte characters yet.
Reported by Jonas Fonseca.
Also add an empty line above the label in init_tab; but there are
still several labels elsewhere that don't have empty lines above them.
The previous scheme incorrectly accepted 0xC1 0x80 as U+0040.
That could have been fixed by tweaking the loop, but the constant
array is surely easier to verify.
- Include arpa/inet.h to get hton* ntoh* functions.
- Use socklen_t instead of int.
- Try to define PF_INET to AF_INET if it doesn't exist.
Reported-by: Andy Tanenbaum <ast@cs.vu.nl>
In the previous version, invalid UTF-8 from a terminal caused
UCS_NO_CHAR (0xFFFFFFFD) to be stored in a term_event_key_T, resulting
in -3 which was then incidentally treated as an unassigned special key.
Now, invalid UTF-8 is instead mapped to UCS_REPLACEMENT_CHARACTER
and treated as a character. The fact that handle_interlink_event
calls term_send_ucs when it receives invalid UTF-8 makes it pretty
clear that this is how it was intended.
src/viewer/text/link.c (not changed in this commit) already referred
to UCS_REPLACEMENT_CHARACTER in a comment even though it was not
previously defined.
The message appears when the user has selected e.g. "Main mapping"
rather than an action inside it. Because the main mapping is a keymap,
ELinks must not tell the user to select a keymap.
whether we ought to add the conn->progress->start to
the conn->est_length. Currently displaying resuming works correctly with
ftp.task.gda.pl and ftp.pld-linux.org.
Restring the spaghetti in dump_to_file to fix a bug that was introduced
in commit 2a6125e3d0 whereby when
document.dump.codepage != "utf-8", the document itself was not output,
only the references list.
Decrement term->current_tab before calling delete_window() instead of after
deleting all backgrounded tabs, so get_tab_by_number() will see a
consistent value.
It cannot be restricted just to characters that have passed
check_kbd_label_key(), because hotkeys in strings received from
gettext must also be processed with it, and there we don't have
a struct term_event for check_kbd_label_key().
This causes the documented-slow cp2u() to be called in a loop, which
fortunately doesn't have very many iterations. If this is too slow,
then cp2u() can be rewritten, or the hotkeys can be cached in struct
widget or struct widget_data.
Note that check_kbd_label_key() does not yet allow non-ASCII
characters when CONFIG_UTF_8 is defined. Before they are allowed,
menu.c should also be updated.