1
0
mirror of https://github.com/rkd77/elinks.git synced 2024-12-04 14:46:47 -05:00
Commit Graph

115 Commits

Author SHA1 Message Date
Kalle Olavi Niemitalo
73f925ce21 bug 153, 1066: Convert XBEL bookmarks to/from UTF-8.
When the file is being read, Expat provides the strings to ELinks in
UTF-8, so ELinks can put them in struct bookmark without conversions.
Make sure gettext returns any placeholder strings in UTF-8, too.
Replace '\r' with ' ' in bookmark titles and URLs.

When the file is being written, put encoding="UTF-8" in the XML
declaration, and then write out the strings from struct bookmark
without character set conversions.  Do replace some characters
with entity references though, by calling add_html_to_string().
2009-02-08 18:26:04 +02:00
Kalle Olavi Niemitalo
9088f11c64 Make encode_utf8() extern even without CONFIG_UTF8.
It will soon be needed for conversions from UTF-16 to UTF-8.
2009-01-04 16:55:24 +02:00
Kalle Olavi Niemitalo
8f4d7f9903 Define cp_to_unicode() even without CONFIG_UTF8.
And make its last parameter point to const.  add_cp_html_to_string()
no longer needs to pretend UTF-8 is ISO-8859-1.
2009-01-01 16:17:03 +00:00
Kalle Olavi Niemitalo
ad45176dde Add get_terminal_codepage().
This simplifies the callers a little and may help implement
simultaneous support for different charsets on different terminals
of the same type (bug 1064).
2009-01-01 16:16:17 +00:00
Kalle Olavi Niemitalo
aaf6be8a36 Bug 1004: Fix implicit declarations of c_* functions
Add #include directives to fix these errors:

      [CC]   src/intl/gettext/l10nflist.o
cc1: warnings being treated as errors
.../src/intl/gettext/l10nflist.c: In function ‘_nl_normalize_codeset’:
.../src/intl/gettext/l10nflist.c:352: error: implicit declaration of function ‘c_tolower’

      [CC]   src/dom/css/scanner.o
cc1: warnings being treated as errors
In file included from .../src/dom/scanner.h:4,
                 from .../src/dom/css/scanner.h:4,
                 from .../src/dom/css/scanner.c:12:
.../src/dom/string.h: In function ‘dom_string_casecmp’:
.../src/dom/string.h:34: error: implicit declaration of function ‘c_strncasecmp’
2008-11-01 22:27:08 +02:00
M. Vefa Bicakci
96b3093519 Patch 2: Modifications to the remaining parts of ELinks
[Forward ported to 0.12 from bug 1004 attachment 499.  --KON]
2008-11-01 22:20:25 +02:00
Kalle Olavi Niemitalo
12d66ff043 Bug 932: Redisable 0x80...0x9F mappings in some charsets.
Bug 932 is about ELinks letting control characters 0x80...0x9F through
to the terminal.  It did not occur with ISO 8859-1, 8859-2, 8859-15,
or 8859-16, because the ELinks mappings for those charsets did not
include those bytes.  However, the www.unicode.org versions imported
in the previous commit do include the problematic bytes.

To avoid a possible regression before the ELinks 0.12.0 release,
comment those control-character mappings out again.  This workaround
should be reverted after bug 932 has been fixed properly.
2008-10-11 15:35:34 +03:00
Kalle Olavi Niemitalo
c9ca6fd448 Refresh charsets from www.unicode.org.
Add copyright and licence notices, and a NEWS entry.

The data in the new versions is not entirely the same as what ELinks
used to have:

- Unicode/8859_1.cp: Adds control characters.
- Unicode/8859_2.cp: Adds control characters.
- Unicode/8859_4.cp: Adds some control characters that ELinks assumed
  there already.
- Unicode/8859_7.cp: Adds three characters.
- Unicode/8859_15.cp: Adds control characters.
- Unicode/8859_16.cp: Adds control characters and swaps 0xA5 with 0xAB.
- Unicode/koi8_r.cp: Changes 0x95 and adds some control characters
  that ELinks assumed there already.
- Unicode/macroman.cp: Changes 0xC6 and removes some control characters
  that ELinks assumes there anyway.
2008-10-11 15:35:09 +03:00
Kalle Olavi Niemitalo
11e9a816f5 const in name_to_language 2008-02-03 14:42:07 +02:00
Kalle Olavi Niemitalo
cb90ed94f0 const in get_cp_index 2008-02-03 14:41:35 +02:00
Miciah Dashiel Butler Masters
3a0286e447 Strings corrections from Malcolm Parsons
Fix the spelling and grammar in various comments, variable names, comment
descriptions, and documentation.
2008-01-27 04:19:23 +00:00
Kalle Olavi Niemitalo
e2cc0bd434 Don't cast qsort comparison function pointers.
Instead, convert the element pointers inside the comparison functions.

The last argument of qsort() is supposed to be of type
int (*)(const void *, const void *).  Previously, comp_links() was
defined to take struct link * instead of const void *, and the type
mismatch was silenced by casting the function pointer to void *.
This was in principle not portable because:

(1) The different pointer types may have different representations.
    In a word-oriented machine, the const void * might include a byte
    selector while the struct link * might not.

(2) Casting a function pointer to a data pointer can lose bits in some
    memory models.  Apparently this does not occur in POSIX-conforming
    systems though, as dlsym() would fail if it did.

This commit also fixes hits_cmp() and compare_dir_entries(), which
had similar problems.  However, I'm leaving alias_compare() in
src/intl/gettext/localealias.c unchanged for now, so as not to diverge
from the GNU version.

I also checked the bsearch() calls but they were all okay, apart from
one that used the alias_compare() mentioned above.
2007-10-06 23:05:05 +03:00
Kalle Olavi Niemitalo
69e9a586ba Bug 960: Redefine LOADMSGCAT_USE_MMAP instead of HAVE_MMAP.
And add a prominent notice as stipulated in GNU GPL version 2 section 2a.

[ From commit ba54124f16 in ELinks
  0.11.3.GIT.  --KON ]
2007-07-02 23:48:03 +03:00
Jonas Fonseca
110c564af3 Check if the program path contains "src/" before using ../po files
Don't look for gettext message catalogs in ../po/ unless ELinks is being
run as src/elinks, ./src/elinks, or .../src/elinks.

Discovered by Arnaud Giersch, this alternate fix (than what is in debian
package 0.11.1-1.4) closes debian bug #417789 and redhat bug #235411.
Also reported in: CVE-2007-2027.

Restricting it to only work with --enable-debug was also considered,
however, it is an important feature for translaters so this less
paranoid fix was chosen.
2007-05-03 08:55:23 +02:00
Kalle Olavi Niemitalo
341d54151f Debian bug 380347: Prevent a buffer overflow in entity_cache. 2007-05-01 11:23:25 +03:00
Kalle Olavi Niemitalo
da3c8c5ce2 Bugs 879, 928, 947: Specially map U+00A0 and U+00AD in translation tables. 2007-04-26 21:39:46 +03:00
Kalle Olavi Niemitalo
70dc594d93 Bug 879: New constant UCS_SOFT_HYPHEN; use where applicable. 2007-04-22 22:38:40 +03:00
Kalle Olavi Niemitalo
9d6c0b13e8 Bug 879, u2cp_: Use UCS_NO_BREAK_SPACE instead of 0xa0. 2007-04-22 22:37:12 +03:00
Kalle Olavi Niemitalo
8c66e34323 intl: Fork get_cp_config_name off get_cp_mime_name.
This may help with bug 914 but I'm not testing that yet.
2007-03-20 20:41:05 +02:00
Kalle Olavi Niemitalo
2ac31b6144 utf8_to_unicode: Let the end parameter point to const. 2007-03-18 20:13:15 +02:00
Laurent MONIN
278dec0664 Fix gcc warning: value computed is not used. Patch by Alexey Tourbin. 2007-03-05 21:10:02 +01:00
Kalle Olavi Niemitalo
a6886634bc Make unicode_7b[] static const.
The .data section of src/intl/charsets.o is only 40 bytes now.
Inspired by bug 381.
2007-02-03 23:25:16 +02:00
Kalle Olavi Niemitalo
974a5cdffd Make entities[] static const.
Inspired by bug 381.
2007-02-03 19:51:45 +02:00
Kalle Olavi Niemitalo
ebf549ba77 Fix document.html.wrap_nbsp in UTF-8 terminals.
!CONFIG_UTF8, ISO-8859-1 doc, ASCII terminal, UTF-8 or unibyte I/O:
    ok,   ok,   ok, A0 ok
!CONFIG_UTF8, ISO-8859-1 doc, ISO-8859-1 terminal, UTF-8 or unibyte I/O:
    ok,   ok,   ok, A0 ok
!CONFIG_UTF8, UTF-8 doc, ASCII terminal, UTF-8 or unibyte I/O:
    ok,   ok,   ok, C2 A0 fail (drawn as "\001").
!CONFIG_UTF8, UTF-8 doc, ISO-8859-1 terminal, UTF-8 or unibyte I/O:
    ok,   ok,   ok, C2 A0 fail (not wrapped).
CONFIG_UTF8, ISO-8859-1 doc, ASCII terminal, UTF-8 or unibyte I/O:
    ok,   ok,   ok, A0 ok
CONFIG_UTF8, ISO-8859-1 doc, ISO-8859-1 terminal, UTF-8 or unibyte I/O:
    ok,   ok,   ok, A0 ok
CONFIG_UTF8, ISO-8859-1 doc, UTF-8 terminal, UTF-8 I/O:
  all fail (not wrapped); after patch all ok.
CONFIG_UTF8, UTF-8 doc, ASCII terminal, UTF-8 or unibyte I/O:
    ok,   ok,   ok, C2 A0 fail (drawn as "\001").
CONFIG_UTF8, UTF-8 doc, ISO-8859-1 terminal, UTF-8 or unibyte I/O:
    ok,   ok,   ok, C2 A0 fail (not wrapped)
CONFIG_UTF8, UTF-8 doc, UTF-8 terminal, UTF-8 I/O:
  all fail (not wrapped); after patch all ok.
2007-01-30 10:21:12 +02:00
Kalle Olavi Niemitalo
ae5fe80100 Document that NBSP_CHAR is not used in UTF-8 strings. 2007-01-29 20:57:37 +02:00
Kalle Olavi Niemitalo
f4709c3794 Bug 882: Replace C1 controls with spaces in UTF-8 to the terminal. 2007-01-27 11:12:22 +02:00
Kalle Olavi Niemitalo
65645624b4 cp1250, cp1257: Don't map undefined bytes to U+0000. 2007-01-27 09:58:18 +02:00
Kalle Olavi Niemitalo
a577455b24 Revise comments in struct codepage_desc and struct conv_table. 2007-01-03 07:32:00 +02:00
Kalle Olavi Niemitalo
455ea77ead Make strings[] and no_str[] const. 2007-01-02 21:40:14 +02:00
Kalle Olavi Niemitalo
1668d78998 Make cp2utf8 return a pointer to const. 2007-01-02 21:39:34 +02:00
Kalle Olavi Niemitalo
62d321fb31 Make add_utf8 accept a pointer to const. 2007-01-02 21:36:03 +02:00
Kalle Olavi Niemitalo
ef96caad01 Make u2cp and u2cp_no_nbsp return a pointer to const. 2007-01-02 20:08:59 +02:00
Kalle Olavi Niemitalo
712803bbeb Make entity_cache.result point to const. 2007-01-02 20:08:25 +02:00
Kalle Olavi Niemitalo
d314348e92 Make get_entity_string return a pointer to const. 2007-01-02 08:29:08 +02:00
Kalle Olavi Niemitalo
83f753f750 conv_table.u.str points to const. 2007-01-02 01:31:22 +02:00
Kalle Olavi Niemitalo
2434c180f2 Make no_str in charsets.c an array rather than a pointer variable.
This ensures that no other string can have the same address.  It
probably never was a problem though, because the strings to which it
can be compared either are allocated from the heap or are in
strings[][] which already has unshared storage.
2007-01-02 01:07:57 +02:00
Kalle Olavi Niemitalo
161b46a479 Make table[] in charsets.c static.
There is no extern declaration for it anywhere.
2007-01-02 00:58:38 +02:00
Kalle Olavi Niemitalo
9d14ea4e5a Document some variables in charsets.c. 2007-01-02 00:54:14 +02:00
Kalle Olavi Niemitalo
10f1bd0efc Document struct conv_table. 2007-01-01 21:11:46 +02:00
Kalle Olavi Niemitalo
e45f5a8915 utf8char_len_tab[] is const.
This change moves 256 bytes of data into a read-only section, perhaps
reducing memory consumption when multiple ELinks processes run in parallel.
2007-01-01 17:18:05 +02:00
Kalle Olavi Niemitalo
cde14dcd18 utf8_to_unicode: Reject characters in the surrogate range.
This isn't CESU-8.
2006-12-23 01:48:07 +02:00
Kalle Olavi Niemitalo
114ce8c833 utf8_to_unicode: Reject invalid sequences, such as overlong.
Convert each byte of them to UCS_REPLACEMENT_CHARACTER.  This may not
be the optimal solution but at least it ought to be safe.  Also raise
an internal error if the value read from utf8char_len_tab[] is out of
range.

Note that ELinks is still using the RFC 2279 definition of UTF-8 and
thus allows characters up to 0x7FFFFFFF, even though RFC 3629 has
changed the maximum to 0x10FFFF.
2006-12-20 22:08:34 +02:00
Kalle Olavi Niemitalo
8b8cd57941 Use new macro UCS_ORPHAN_CELL for broken double-cell characters.
UCS_ORPHAN_CELL is currently defined as U+0020 SPACE, which was
already used before this macro, so the behaviour does not change,
but the code seems clearer now.

I searched for ' ' and 32 and 0x20 and \x20, and replaced with
UCS_ORPHAN_CELL wherever UCS_NO_CHAR was involved.  However,
some BFU widgets first draw spaces and then overwrite with text;
those will require a more complex fix if UCS_ORPHAN_CELL is ever
changed to some other character.
2006-11-13 00:49:59 +02:00
Kalle Olavi Niemitalo
7809efa1b5 Names of enum constants should be in upper case.
Requested by Miciah.
2006-11-12 14:51:18 +02:00
Kalle Olavi Niemitalo
40b6edc69d u2cp_: Make the no_nbsp_hack parameter an enum.
This is from attachment 279 of bug 811.  The change does not yet
affect any visible behaviour.
2006-11-12 14:29:09 +02:00
Jonas Fonseca
180c8befac Fix linker warning on Mac OS X by prefixing locale_charset with "elinks_"
/usr/bin/ld: warning multiple definitions of symbol _locale_charset
lib.o definition of _locale_charset in section (__TEXT,__text)
/usr/lib/libiconv.dylib(localcharset.o) definition of _locale_charset
2006-11-04 08:46:45 +01:00
Kalle Olavi Niemitalo
d050cb67aa Revert the use of wcwidth() and describe why.
This reverts the following commits:
- 86ed79deaf
  Use wcwidth if available and applicable.
- 304f5fa1ea
  comment fix (__STDC_ISO_10646__, not __STDC_ISO_10646)
- part of 71eebf1cc7
  Compensate for glibc not defining wcwidth() when _XOPEN_SOURCE is not set
And adds a lengthy comment about LC_CTYPE problems.
2006-10-22 00:05:37 +03:00
Petr Baudis
71eebf1cc7 Compensate for glibc not defining wcwidth() when _XOPEN_SOURCE is not set 2006-10-12 23:43:49 +02:00
Laurent MONIN
09991b59f1 Partial Afrikaans translation was added. Thanks to Friedel Wolff. 2006-10-11 14:39:04 +02:00
Laurent MONIN
e86e1d0fa3 Trim some trailing whitespaces. 2006-09-29 00:07:54 +02:00