1
0
mirror of https://github.com/rkd77/elinks.git synced 2024-12-04 14:46:47 -05:00

I18N bug 1112: Use strange_chars[] for UTF-8 output too

Make u2cp_() map code points U+0080 to U+009F via strange_chars[] even
if the target codepage is UTF-8.  This helps with buggy web pages that
use ’ when they mean ’.  This change does not affect how
ELinks decodes raw bytes 0x80 to 0x9F in HTML.

u2cp_() is used only via the u2cp and u2cp_no_nbsp macros.
Possible side effects of this change at each use of these macros:

* get_translation_table(): Not affected because it does not call u2cp
  if the target codepage is UTF-8.
* get_entity_string(): Numeric character references are affected, as intended.
  Character entity references are not affected because entities[]
  does not define any entities in the U+0080...U+009F range.
* kbd_field(), term_send_ucs(), field_op(): Affected.  It is no longer
  possible to enter code points U+0080...U+009F from the terminal.
  This should not be a problem in practice because those would be
  control characters anyway and should therefore be filtered by the
  slave process (which doesn't yet recognize them; bug 777).
This commit is contained in:
Kalle Olavi Niemitalo 2011-04-17 18:09:29 +03:00 committed by Kalle Olavi Niemitalo
parent 17712f9cf3
commit 450f227ea1
2 changed files with 10 additions and 7 deletions

5
NEWS
View File

@ -27,6 +27,11 @@ Bugs that should be removed from NEWS before the 0.12.0 release:
``elinks.action''.
* critical bug 1083: Avoid an infinite loop when trying to decompress
malformed data. Caused by the bug 1068 fix in ELinks 0.12pre3.
* bug 1112: Map most numeric character references € ... Ÿ
to graphical characters also when the output charset is UTF-8.
ELinks 0.12pre1 was the first release that supported UTF-8 as the
terminal charset, and ELinks 0.12pre5 was the first release that
supported UTF-8 as the dump charset.
ELinks 0.12pre5:
----------------

View File

@ -189,6 +189,11 @@ u2cp_(unicode_val_T u, int to, enum nbsp_mode nbsp_mode)
if (u < 128) return strings[u];
if (u < 0xa0) {
u = strange_chars[u - 0x80];
if (!u) return NULL;
}
to &= ~SYSTEM_CHARSET_FLAG;
if (is_cp_ptr_utf8(&codepages[to]))
@ -202,13 +207,6 @@ u2cp_(unicode_val_T u, int to, enum nbsp_mode nbsp_mode)
}
if (u == UCS_SOFT_HYPHEN) return "";
if (u < 0xa0) {
unicode_val_T strange = strange_chars[u - 0x80];
if (!strange) return NULL;
return u2cp_(strange, to, nbsp_mode);
}
if (u < 0xFFFF)
for (j = 0; j < 0x80; j++)
if (codepages[to].highhalf[j] == u)