1
0
Fork 0
Commit Graph

24 Commits

Author SHA1 Message Date
Kalle Olavi Niemitalo 62818a39f9 Unicode/gen-case: Upgrade ISC licence to July 2007 version
I had already done this to my other scripts on 2008-09-28 (commit
c67885d880) but missed Unicode/gen-case.
Update it, and list it in COPYING.

(Although Unicode/gen-case is part of the source tree, this version of
ELinks does not use that file for anything.)
(cherry picked from elinks-0.12 commit c7602eb744)
2012-11-03 23:01:28 +02:00
Kalle Olavi Niemitalo 12d66ff043 Bug 932: Redisable 0x80...0x9F mappings in some charsets.
Bug 932 is about ELinks letting control characters 0x80...0x9F through
to the terminal.  It did not occur with ISO 8859-1, 8859-2, 8859-15,
or 8859-16, because the ELinks mappings for those charsets did not
include those bytes.  However, the www.unicode.org versions imported
in the previous commit do include the problematic bytes.

To avoid a possible regression before the ELinks 0.12.0 release,
comment those control-character mappings out again.  This workaround
should be reverted after bug 932 has been fixed properly.
2008-10-11 15:35:34 +03:00
Kalle Olavi Niemitalo c9ca6fd448 Refresh charsets from www.unicode.org.
Add copyright and licence notices, and a NEWS entry.

The data in the new versions is not entirely the same as what ELinks
used to have:

- Unicode/8859_1.cp: Adds control characters.
- Unicode/8859_2.cp: Adds control characters.
- Unicode/8859_4.cp: Adds some control characters that ELinks assumed
  there already.
- Unicode/8859_7.cp: Adds three characters.
- Unicode/8859_15.cp: Adds control characters.
- Unicode/8859_16.cp: Adds control characters and swaps 0xA5 with 0xAB.
- Unicode/koi8_r.cp: Changes 0x95 and adds some control characters
  that ELinks assumed there already.
- Unicode/macroman.cp: Changes 0xC6 and removes some control characters
  that ELinks assumes there anyway.
2008-10-11 15:35:09 +03:00
Kalle Olavi Niemitalo e2d7ce588f Relicense my Perl scripts to ISC license
The primary motivation for this change is that the disclaimer now
refers to the author whereas it used to refer to the copyright holder.

The ISC license is the preferred license for new code in OpenBSD.
http://www.openbsd.org/policy.html
http://www.openbsd.org/cgi-bin/cvsweb/src/share/misc/license.template?rev=1.2

I am also removing the reference to "the same terms as Perl itself"
because those terms are not being distributed with ELinks.  Anyway,
Perl 5 is dual licensed under the Artistic License and the GNU General
Public License (version 1 or later), and the ISC license seems GPL
compatible to me.
2008-03-23 13:28:06 +02:00
Kalle Olavi Niemitalo a6886634bc Make unicode_7b[] static const.
The .data section of src/intl/charsets.o is only 40 bytes now.
Inspired by bug 381.
2007-02-03 23:25:16 +02:00
Kalle Olavi Niemitalo 974a5cdffd Make entities[] static const.
Inspired by bug 381.
2007-02-03 19:51:45 +02:00
Kalle Olavi Niemitalo 65645624b4 cp1250, cp1257: Don't map undefined bytes to U+0000. 2007-01-27 09:58:18 +02:00
Kalle Olavi Niemitalo 4a5af7fd26 Bug 381: Store codepage-to-Unicode mappings as dense arrays.
Previously, each mapping between a codepage byte and a Unicode
character was stored as a struct table_entry, which listed both the
byte and the character.  This representation may be optimal for sparse
mappings, but codepages map almost every possible byte to a character,
so it is more efficient to just have an array that lists the Unicode
character corresponding to each byte from 0x80 to 0xFF.  The bytes are
not stored but rather implied by the array index.  The tcvn5712 and
viscii codepages have a total of four mappings that do not fit in the
arrays, so we still use struct table_entry for those.

This change also makes cp2u() operate in O(1) time and may speed up
other functions as well.

The "sed | while read" concoction in Unicode/gen-cp looks rather
unhealthy.  It would probably be faster and more readable if rewritten
in Perl, but IMO that goes for the previous version as well, so I
suppose whoever wrote it had a reason not to use Perl here.

Before:

   text	   data	    bss	    dec	    hex	filename
  38948	  28528	   3311	  70787	  11483	src/intl/charsets.o
 500096	  85568	  82112	 667776	  a3080	src/elinks

After:

   text	   data	    bss	    dec	    hex	filename
  31558	  28528	   3311	  63397	   f7a5	src/intl/charsets.o
 492878	  85568	  82112	 660558	  a144e	src/elinks

So the text section shrank by 7390 bytes.

Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls
--disable-cookies --disable-formhist --disable-globhist
--disable-mailcap --disable-mimetypes --disable-smb --disable-mouse
--disable-sysmouse --disable-leds --disable-marks --disable-css
--enable-small --enable-utf-8 --without-gpm --without-bzlib
--without-idn --without-spidermonkey --without-lua --without-gnutls
--without-openssl CFLAGS="-Os -ggdb -Wall"
2006-09-24 16:55:29 +03:00
Kalle Olavi Niemitalo 62d6db44aa Bug 381: Make codepage data const.
Before:

   text	   data	    bss	    dec	    hex	filename
  25726	  62992	   3343	  92061	  1679d	src/intl/charsets.o
 653856	 120020	  82144	 856020	  d0fd4	src/elinks

After:

   text	   data	    bss	    dec	    hex	filename
  60190	  28528	   3311	  92029	  1677d	src/intl/charsets.o
 688320	  85556	  82112	 855988	  d0fb4	src/elinks

So 34464 bytes were moved from the data section to the text section
and should be more likely to get shared between ELinks processes.

Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls
--disable-cookies --disable-formhist --disable-globhist
--disable-mailcap --disable-mimetypes --disable-smb --disable-mouse
--disable-sysmouse --disable-leds --disable-marks --disable-css
--enable-small --enable-utf-8 --without-gpm --without-bzlib
--without-idn --without-spidermonkey --without-lua --without-gnutls
--without-openssl CFLAGS="-O2 -ggdb -Wall"
2006-09-24 11:59:23 +03:00
Kalle Olavi Niemitalo 9c94a896b7 Internally rename the utf_8 codepage to utf8. 2006-09-17 16:23:17 +03:00
Kalle Olavi Niemitalo f7fd49cf28 UTF-8: New function unicode_fold_label_case and a related script. 2006-08-06 20:02:42 +00:00
Jonas Fonseca d2e346436a Hmm, seem b.delta decided not to become 0x03B4 like it should 2006-01-10 15:39:11 +01:00
Jonas Fonseca aa75cade23 Reinsert part of comment for nVDash; fixes 8e0eda5e4d 2006-01-03 23:38:37 +01:00
Jonas Fonseca b5065e7a17 Add header about where to get the SGML entity database from unicode.org
... and summon up the local changes made.
2006-01-03 17:20:50 +01:00
Jonas Fonseca 8c684e8c73 Skip entities with unknown unicode (0x????) 2006-01-03 17:12:58 +01:00
Jonas Fonseca 395a64f569 Merge in the public entity set names from the unicode.org database
This also fixes b.delta to have the correct value 0x03B4. The main difference
to ELinks' entity database is:

 - entities not in the unicode database from 1997:
   Scomma, Tcomma, euro, scomma, tcomma
 - obsolete entities kept for compatibility:
   emdash, endash, hibar
2006-01-03 17:10:19 +01:00
Jonas Fonseca 8e0eda5e4d Merge in the 0x???? chars and fix some incomplete descriptions 2006-01-03 16:48:11 +01:00
Jonas Fonseca 3e6c08ce12 Move the SGML entity database back to the format used by unicode.org 2006-01-03 16:43:31 +01:00
Jonas Fonseca af089507dc Remove unneeded Unicode/.gitignore 2005-12-25 02:41:03 +01:00
Laurent MONIN df065ead80 Remove now useless $Id: lines. 2005-10-21 09:14:07 +02:00
Petr Baudis 06ea255a22 Convert part of the build to the new build system
The root makefile is converted as well as some leaf Makefiles. This
also brings in the required infrastructure and adjusts configure.in
appropriately.

I converted only makefiles containing no configurable stuff, since
that'll require more consideration yet.
2005-09-15 21:03:56 +02:00
Jonas Fonseca 7462f22635 Remove now obsolete .cvsignore files. 2005-09-15 18:33:20 +02:00
Jonas Fonseca e54f78bf3f Oops, missed the generated stuff in Unicode/. 2005-09-15 18:29:59 +02:00
Petr Baudis 0f6d4310ad Initial commit of the HEAD branch of the ELinks CVS repository, as of
Thu Sep 15 15:57:07 CEST 2005. The previous history can be added to this
by grafting.
2005-09-15 15:58:31 +02:00