Before, *_html_parser_state() operated with struct html_element *. Now, it is
transparent for the renderer (just void *), so that DOM won't have to provide
this struct but will be able to use something internal.
...as struct text_style. This way it might be possible later to
add other default formatting attributes by CSS and it allows
quite a code simplification in the DOM renderer.
Currently, all DOM, HTML and plain renderers had their own routine for
conversion from text style to screen attribute. This moves text_style and
text_style_format from html/parser.h to renderer.h and introduces new generic
routine get_screen_chracter_template() that is used by all the specific
rendering engines.
Now, CSS is initialized separately for each of the renderers, so that
also RSS doesn't just choose styles of random DOM node types.
init_template_by_style() is introduced as the common backend for
loading CSS properties.
This fixes ELinks crashing on this with terminal width e.g. 103:
<p align="justify">
xxxx xxxx xxxx xxxxx xxxxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
xxxxx xxxx xxxxxxx xxx xxxxxxx xxx xxxx xxxx xxxx xx xxxx x xxx xxxx
xxxx xx xxxx xxxx xxxx xxx—xxx xxxx xx—xxx xxxx<em> </em>x
xxxx </p>
This test was removed for an unknown reason in commit
b1cc717789.
Discovered together with Miciah.
All the needed memory has been allocated before the loop so we can use
copy_screen_chars() directly. This avoids the assertion failure in
copy_chars() for width==0 and should be a bit faster too. According
to ISO/IEC 9899:1999 7.21.1p2, memcpy() doesn't copy anything if n==0
(but the pointers must be valid).
(original 'git cherry-pick' arguments: cherry-pick bug968-att394)
There were conflicts in src/document/css/ because 0.12.GIT switched
to LIST_OF(struct css_selector) and 0.13.GIT switched to struct
css_selector_set. Resolved by using LIST_OF(struct css_selector)
inside struct css_selector_set.
Fix warnings:
dom/stack.h:70: Warning: explicit link request to 'pop_dom_node' could not be resolved
dom/stack.h:71: Warning: explicit link request to 'pop_dom_nodes' could not be resolved
dom/stack.h:71: Warning: explicit link request to 'pop_dom_state' could not be resolved
dom/stack.h:115: Warning: explicit link request to 'done_dom_node' could not be resolved
Use @returns instead of \return in src/document/css/parser.c,
and other such things.
When skipping "@media print { #foo {bar: baz} pre {white-space: normal} }",
the previous code would look for the first "{" and then the first "}", and
fail to skip the "pre" rule. Seen at support.microsoft.com.
I originally posted this change as part of attachment 383 to bug 722.
Use it for the actual I/O only. Previously, defining CONFIG_UTF8 and
enabling UTF-8 used to force many strings to the UTF-8 charset
regardless of the terminal charset option. Now, those strings always
follow the terminal charset. This fixes bug 914 which was caused
because _() returned strings in the terminal charset and functions
then assumed they were in UTF-8. This reduction in the effects of
UTF-8 I/O may also simplify future testing.
Because the renderer no longer does that.
The comment "We don't cope well with entities here" may now be
obsolete but I'm not sure about that so I'm leaving it in.
options->cp is still used for this in seven places where html_context
is not easily available. Those should eventually be corrected too,
but I'm checking this change in already because it's better than what
we had before.
Previously, html_special_form_control converted
form_control.default_value to the terminal charset, and init_form_state
then copied the value to form_state.value. However, when CONFIG_UTF8
is defined and UTF-8 I/O is enabled, form_state.value is supposed to
be in UTF-8, rather than in the terminal charset.
This mismatch could not be conveniently fixed in
html_special_form_control because that does not know which terminal is
being used and whether UTF-8 I/O is enabled there. Also, constructing
a conversion table from the document charset to form_state.value could
have ruined renderer_context.convert_table, because src/intl/charsets.c
does not support multiple concurrent conversion tables.
So instead, we now keep form_control.default_value in the document
charset, and convert it in the viewer each time it is needed. Because
the result of the conversion is kept in form_state.value between
incremental renderings, this shouldn't even slow things down too much.
I am not implementing the proper charset conversions for the DOM
defaultValue property yet, because the current code doesn't have
them for other string properties either, and bug 805 is already open
for that.
This does not yet fix bug 947 for the case where the document is UTF-8
and the terminal is ISO-8859-1. That will require changing charsets.c
too, it seems.
trim_chars was called only in debug mode and the results of the get_attr_val
for value=" something " in debug mode differ from normal and fastmem mode.
[ From commit c4500039b2 on the witekfl
branch. --KON ]
straconcat reads the args with va_arg(ap, const unsigned char *),
and the NULL macro may have the wrong type (e.g. int).
Many places pass string literals of type char * to straconcat. This
is in principle also a violation, but I'm ignoring it for now because
if it becomes a problem with some C implementation, then so will the
use of unsigned char * with printf "%s", which is so widespread in
ELinks that I'm not going to try fixing it now.
When tables were rendered first time html_format_part was called with
document==NULL. <meta http-equiv=Refresh.../> was inside a table,
so document was NULL. Second time the table knew its dimensions
and document was not NULL.
I do not fully understand this code, but I am sure skipping characters
like this is a bug, and correcting it seems to fix bug 826 (too small
table for double-cell characters). I don't see any similar bugs in
other parts of set_hline.
The patch is from bug 826, comment 4, attachment 308. The warning
there about unicode_to_cell(UCS_NO_CHAR) still applies but this patch
does not make the situation worse. I have logged a separate bug 901
about those calls.
This allows code to use document->cached instead of
find_in_cache(document->uri), thereby increasing the likelihood
of getting the correct cache entry.
This should fix Bug 756 - "assertion (cached)->object.refcount >= 0 failed"
after HTTP proxy was changed.
Patches for this were written by me and then later by Jonas.
This commit combines our independent implementations.
The quote_level was decremented unconditionally and could become negative
resulting in a negative index after applying "modulus 2". Reproducable
with an HTML file contianing "</q>".
Reported by paakku.
Recognize all of 
 
 with any number of leading
zeroes. (Previously only and 
 were supported.) All of
these are case insensitive.
Treat each CR+LF combination ( ) as a single newline.
The configure script no longer recognizes "CONFIG_UTF_8=yes" lines
in custom features.conf files. They will have to be changed to
"CONFIG_UTF8=yes". This incompatibility was deemed acceptable
because no released version of ELinks supports CONFIG_UTF_8.
The --enable-utf-8 option was not renamed.
Suggested by Miciah on #elinks.
What was renamed:
add_utf_8 => add_utf8
cp2utf_8 => cp2utf8
encode_utf_8 => encode_utf8
get_translation_table_to_utf_8 => get_translation_table_to_utf8
goto invalid_utf_8_start_byte => goto invalid_utf8_start_byte
goto utf_8 => goto utf8
goto utf_8_select => goto utf8_select
terminal_interlink.utf_8 => terminal_interlink.utf8
utf_8_to_unicode => utf8_to_unicode
What was not renamed:
terminal._template_.utf_8_io option, TERM_OPT_UTF_8_IO
Compatibility with existing elinks.conf files would require an alias.
--enable-utf-8
Because the name of the charset is UTF-8, --enable-utf-8 looks better
than --enable-utf8.
CONFIG_UTF_8
Will be renamed in a later commit.
Unicode/utf_8.cp, table_utf_8, aliases_utf_8
Will be renamed in a later commit.
Surrogates are now treated the same way as out-of-range characters
like U+110000; if a link has such an access key, then the ECMAScript
accessKey property cannot be read. It seems currently impossible to
set such an access key though, because accesskey_string_to_unicode()
doesn't support multibyte characters yet.
Drop some code for superscript and subscript handling that was deleted
in commit 65016cdca4, then added back
with the UTF-8 merge in commit 2a6125e3d0,
and then disabled in commit 1b653b9765.
Commit 3ce3f01f30 introduced a bug whereby
if a tab set the current position in the line to or greater than the number
of bytes remaining in the source, the line was split after the tab.
Its return value is now an enum that lets callers know whether an
error occurred. However, this commit changes the callers only
minimally, so they do not yet check the return value.
In html_subscript, html_subscript_close, html_superscript, html_quote, and
html_quote_close, use put_chrs instead of html_context->put_chars_f.
Element handlers should use put_chrs so that it can correctly handle
whitespace and stuff.
Instead of saving the old link colours when selecting a link and using that
to restore them when unselecting it, just copy the data from the document.
- Eliminate struct link_bg and the .link_bg and .link_bg_n members
of struct document_view.
- Eliminate the free_link routine and don't call it from draw_doc,
clear_link, or detach_formatted.
- Add a .old_current_link member to struct view_state and initialise it in
init_vs.
- Don't save link_bg in draw_current_link.
- Rewrite clear_link to use the document data instead of link_bg.
- Modify init_link_drawing not to allocate ling_bg and to return a pointer
to a static variable for the template character.
Introduce html_subscript_close callback. Draw opening and closing brackets
and carets for subscript and superscript text directly in the element
handlers rather than performing weirdness in the renderer. This both
improves readability and fixes bug 284, misplaced brackets with subscripts.
Add close callbacks html_html_close, html_style_close, and
html_xmp_close. end_element now calls the element close callback instead
of performing special handling for certain tags.
Note: there are ugly bug (feature?) - when there isn't enought room for
whole double-width char two double-chars are displayed. Can be seen on
table with double-width chars reduced as much as possible.
Don't replace UTF-8 bytes with '*'. Probably there is need to do better
check what will be displayed.
Also get_current_link_title is no longer pretty and trivial. (o:
This patch modifies ELinks wrapping behavior slightly.
* The wrap command now toggles line wrapping in HTML mode, as well as
text mode. Note that when the HTML view of a page is wrapped, its
source view is unwrapped, and vice versa.
* Tabs in text-mode lines are now handled correctly.
* Wrapping a line that reaches exactly to the edge of the screen will
no longer produce a blank line in text mode.
* Text within extra-wide table cells is now wrapped to less than the
screen width, to eliminate sideways scrolling.
The last point is only enabled by setting TABLE_LINE_PADDING to a
non-negative number, in the src/setup.h header file, because it is a
significant change of behavior from previous versions.
Revert commit 2f0490cb04
('Eval embedded scripts at once') and follow-up commit
997f61bb32 ('Use document_view instead of
view_state. It is safer probably') because the change causes crashes on
numerous pages and just looks wrong.
Fixes problems with host or protocol parts not being lowercased. This
triggers an assertion failure when trying to download such links. Reported
by lindi-.
It add support for normalizing a DOM document in various ways, such as
removing comments, converting CDATA section nodes to text nodes, cleanup
whitespace, etc.
Use it in the RSS renderer to sanitize the text to be rendered.
This changes the init target to be idempotent: most importantly it will now
never overwrite a Makefile if it exists. Additionally 'make init' will
generate the .vimrc files. Yay, no more stupid 'added fairies' commits! ;)
The problem is to get access to the context when it is not the first one
and it has to happen outside of the context callbacks. This changes the
memory management so that the context adder returns the context. To further
improve the use of contexts add a context destructor which makes it
possible to unregister (temporary) contexts.