Weak points:
- alignof
- js problems
Todo:
- make js work with C++ and mozjs-17
- then mozjs-24
- then mozjs-52
- then mozjs-60
- decrease number of warnings
Make u2cp_() map code points U+0080 to U+009F via strange_chars[] even
if the target codepage is UTF-8. This helps with buggy web pages that
use ’ when they mean ’. This change does not affect how
ELinks decodes raw bytes 0x80 to 0x9F in HTML.
u2cp_() is used only via the u2cp and u2cp_no_nbsp macros.
Possible side effects of this change at each use of these macros:
* get_translation_table(): Not affected because it does not call u2cp
if the target codepage is UTF-8.
* get_entity_string(): Numeric character references are affected, as intended.
Character entity references are not affected because entities[]
does not define any entities in the U+0080...U+009F range.
* kbd_field(), term_send_ucs(), field_op(): Affected. It is no longer
possible to enter code points U+0080...U+009F from the terminal.
This should not be a problem in practice because those would be
control characters anyway and should therefore be filtered by the
slave process (which doesn't yet recognize them; bug 777).
After viewing a page with Big5 charset and next a page with UTF-8
charset iconv was used, so slows down a bit.
Now iconv is used only for those charsets, which has the iconv
bitfield set.
I am not hooking these to "make test", for two reasons:
1. utf8_step_forward is inside #ifdef CONFIG_UTF8 and I don't see
how to make tests conditional on such options.
2. test/libtest.sh was copied from Git, which is under GPLv2-only.
Adding more dependencies on it could make ELinks more difficult
to relicense under GPLv2-or-later.
This check used to be in src/elinks.h. Move it to configure.in so
that (1) the result can be logged and (2) ELinks won't even link with
TRE if wchar_t prevents its use.
Also, rename HAVE_TRE_REGEX_H to CONFIG_TRE, to reflect that it is not
always defined if the header exists.
C99 6.7.4p3 and 6.7.4p6 set some constraints on what can be done in
inline functions and how they can be declared. In particular, any
function declared inline must also be defined in the same translation
unit. To comply with that, remove inline specifiers from function
declarations in header files when the functions are not also defined
in those header files.
Sun Studio 11 on Solaris 9 is stricter than C99 and does not allow
references to static identifiers in extern inline functions. Make the
configure script detect this and define NONSTATIC_INLINE accordingly
in config.h. Then use that in the definitions of all non-static
inline functions.
Document the restrictions and this scheme in doc/hacking.txt.
When the file is being read, Expat provides the strings to ELinks in
UTF-8, so ELinks can put them in struct bookmark without conversions.
Make sure gettext returns any placeholder strings in UTF-8, too.
Replace '\r' with ' ' in bookmark titles and URLs.
When the file is being written, put encoding="UTF-8" in the XML
declaration, and then write out the strings from struct bookmark
without character set conversions. Do replace some characters
with entity references though, by calling add_html_to_string().
This simplifies the callers a little and may help implement
simultaneous support for different charsets on different terminals
of the same type (bug 1064).
In get_entity_string and point_intersect, do not initialise arrays with
static storage duration to zero; the C standard states that such objects
are automatically initialised to zero.
Add #include directives to fix these errors:
[CC] src/intl/gettext/l10nflist.o
cc1: warnings being treated as errors
.../src/intl/gettext/l10nflist.c: In function ‘_nl_normalize_codeset’:
.../src/intl/gettext/l10nflist.c:352: error: implicit declaration of function ‘c_tolower’
[CC] src/dom/css/scanner.o
cc1: warnings being treated as errors
In file included from .../src/dom/scanner.h:4,
from .../src/dom/css/scanner.h:4,
from .../src/dom/css/scanner.c:12:
.../src/dom/string.h: In function ‘dom_string_casecmp’:
.../src/dom/string.h:34: error: implicit declaration of function ‘c_strncasecmp’
Bug 932 is about ELinks letting control characters 0x80...0x9F through
to the terminal. It did not occur with ISO 8859-1, 8859-2, 8859-15,
or 8859-16, because the ELinks mappings for those charsets did not
include those bytes. However, the www.unicode.org versions imported
in the previous commit do include the problematic bytes.
To avoid a possible regression before the ELinks 0.12.0 release,
comment those control-character mappings out again. This workaround
should be reverted after bug 932 has been fixed properly.
Add copyright and licence notices, and a NEWS entry.
The data in the new versions is not entirely the same as what ELinks
used to have:
- Unicode/8859_1.cp: Adds control characters.
- Unicode/8859_2.cp: Adds control characters.
- Unicode/8859_4.cp: Adds some control characters that ELinks assumed
there already.
- Unicode/8859_7.cp: Adds three characters.
- Unicode/8859_15.cp: Adds control characters.
- Unicode/8859_16.cp: Adds control characters and swaps 0xA5 with 0xAB.
- Unicode/koi8_r.cp: Changes 0x95 and adds some control characters
that ELinks assumed there already.
- Unicode/macroman.cp: Changes 0xC6 and removes some control characters
that ELinks assumes there anyway.
Instead, convert the element pointers inside the comparison functions.
The last argument of qsort() is supposed to be of type
int (*)(const void *, const void *). Previously, comp_links() was
defined to take struct link * instead of const void *, and the type
mismatch was silenced by casting the function pointer to void *.
This was in principle not portable because:
(1) The different pointer types may have different representations.
In a word-oriented machine, the const void * might include a byte
selector while the struct link * might not.
(2) Casting a function pointer to a data pointer can lose bits in some
memory models. Apparently this does not occur in POSIX-conforming
systems though, as dlsym() would fail if it did.
This commit also fixes hits_cmp() and compare_dir_entries(), which
had similar problems. However, I'm leaving alias_compare() in
src/intl/gettext/localealias.c unchanged for now, so as not to diverge
from the GNU version.
I also checked the bsearch() calls but they were all okay, apart from
one that used the alias_compare() mentioned above.
Don't look for gettext message catalogs in ../po/ unless ELinks is being
run as src/elinks, ./src/elinks, or .../src/elinks.
Discovered by Arnaud Giersch, this alternate fix (than what is in debian
package 0.11.1-1.4) closes debian bug #417789 and redhat bug #235411.
Also reported in: CVE-2007-2027.
Restricting it to only work with --enable-debug was also considered,
however, it is an important feature for translaters so this less
paranoid fix was chosen.
This ensures that no other string can have the same address. It
probably never was a problem though, because the strings to which it
can be compared either are allocated from the heap or are in
strings[][] which already has unshared storage.
Convert each byte of them to UCS_REPLACEMENT_CHARACTER. This may not
be the optimal solution but at least it ought to be safe. Also raise
an internal error if the value read from utf8char_len_tab[] is out of
range.
Note that ELinks is still using the RFC 2279 definition of UTF-8 and
thus allows characters up to 0x7FFFFFFF, even though RFC 3629 has
changed the maximum to 0x10FFFF.
UCS_ORPHAN_CELL is currently defined as U+0020 SPACE, which was
already used before this macro, so the behaviour does not change,
but the code seems clearer now.
I searched for ' ' and 32 and 0x20 and \x20, and replaced with
UCS_ORPHAN_CELL wherever UCS_NO_CHAR was involved. However,
some BFU widgets first draw spaces and then overwrite with text;
those will require a more complex fix if UCS_ORPHAN_CELL is ever
changed to some other character.
/usr/bin/ld: warning multiple definitions of symbol _locale_charset
lib.o definition of _locale_charset in section (__TEXT,__text)
/usr/lib/libiconv.dylib(localcharset.o) definition of _locale_charset
This reverts the following commits:
- 86ed79deaf
Use wcwidth if available and applicable.
- 304f5fa1ea
comment fix (__STDC_ISO_10646__, not __STDC_ISO_10646)
- part of 71eebf1cc7
Compensate for glibc not defining wcwidth() when _XOPEN_SOURCE is not set
And adds a lengthy comment about LC_CTYPE problems.