mirror of
https://github.com/rkd77/elinks.git
synced 2024-12-04 14:46:47 -05:00
I18N bug 1112: Use strange_chars[] for UTF-8 output too
Make u2cp_() map code points U+0080 to U+009F via strange_chars[] even if the target codepage is UTF-8. This helps with buggy web pages that use ’ when they mean ’. This change does not affect how ELinks decodes raw bytes 0x80 to 0x9F in HTML. u2cp_() is used only via the u2cp and u2cp_no_nbsp macros. Possible side effects of this change at each use of these macros: * get_translation_table(): Not affected because it does not call u2cp if the target codepage is UTF-8. * get_entity_string(): Numeric character references are affected, as intended. Character entity references are not affected because entities[] does not define any entities in the U+0080...U+009F range. * kbd_field(), term_send_ucs(), field_op(): Affected. It is no longer possible to enter code points U+0080...U+009F from the terminal. This should not be a problem in practice because those would be control characters anyway and should therefore be filtered by the slave process (which doesn't yet recognize them; bug 777).
This commit is contained in:
parent
17712f9cf3
commit
450f227ea1
5
NEWS
5
NEWS
@ -27,6 +27,11 @@ Bugs that should be removed from NEWS before the 0.12.0 release:
|
||||
``elinks.action''.
|
||||
* critical bug 1083: Avoid an infinite loop when trying to decompress
|
||||
malformed data. Caused by the bug 1068 fix in ELinks 0.12pre3.
|
||||
* bug 1112: Map most numeric character references € ... Ÿ
|
||||
to graphical characters also when the output charset is UTF-8.
|
||||
ELinks 0.12pre1 was the first release that supported UTF-8 as the
|
||||
terminal charset, and ELinks 0.12pre5 was the first release that
|
||||
supported UTF-8 as the dump charset.
|
||||
|
||||
ELinks 0.12pre5:
|
||||
----------------
|
||||
|
@ -189,6 +189,11 @@ u2cp_(unicode_val_T u, int to, enum nbsp_mode nbsp_mode)
|
||||
|
||||
if (u < 128) return strings[u];
|
||||
|
||||
if (u < 0xa0) {
|
||||
u = strange_chars[u - 0x80];
|
||||
if (!u) return NULL;
|
||||
}
|
||||
|
||||
to &= ~SYSTEM_CHARSET_FLAG;
|
||||
|
||||
if (is_cp_ptr_utf8(&codepages[to]))
|
||||
@ -202,13 +207,6 @@ u2cp_(unicode_val_T u, int to, enum nbsp_mode nbsp_mode)
|
||||
}
|
||||
if (u == UCS_SOFT_HYPHEN) return "";
|
||||
|
||||
if (u < 0xa0) {
|
||||
unicode_val_T strange = strange_chars[u - 0x80];
|
||||
|
||||
if (!strange) return NULL;
|
||||
return u2cp_(strange, to, nbsp_mode);
|
||||
}
|
||||
|
||||
if (u < 0xFFFF)
|
||||
for (j = 0; j < 0x80; j++)
|
||||
if (codepages[to].highhalf[j] == u)
|
||||
|
Loading…
Reference in New Issue
Block a user