elinks

mirror of https://github.com/rkd77/elinks.git synced 2025-02-02 15:09:23 -05:00

Author	SHA1	Message	Date
Kalle Olavi Niemitalo	65645624b4	cp1250, cp1257: Don't map undefined bytes to U+0000.	2007-01-27 09:58:18 +02:00
Kalle Olavi Niemitalo	4a5af7fd26	Bug 381: Store codepage-to-Unicode mappings as dense arrays. Previously, each mapping between a codepage byte and a Unicode character was stored as a struct table_entry, which listed both the byte and the character. This representation may be optimal for sparse mappings, but codepages map almost every possible byte to a character, so it is more efficient to just have an array that lists the Unicode character corresponding to each byte from 0x80 to 0xFF. The bytes are not stored but rather implied by the array index. The tcvn5712 and viscii codepages have a total of four mappings that do not fit in the arrays, so we still use struct table_entry for those. This change also makes cp2u() operate in O(1) time and may speed up other functions as well. The "sed \| while read" concoction in Unicode/gen-cp looks rather unhealthy. It would probably be faster and more readable if rewritten in Perl, but IMO that goes for the previous version as well, so I suppose whoever wrote it had a reason not to use Perl here. Before: text data bss dec hex filename 38948 28528 3311 70787 11483 src/intl/charsets.o 500096 85568 82112 667776 a3080 src/elinks After: text data bss dec hex filename 31558 28528 3311 63397 f7a5 src/intl/charsets.o 492878 85568 82112 660558 a144e src/elinks So the text section shrank by 7390 bytes. Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls --disable-cookies --disable-formhist --disable-globhist --disable-mailcap --disable-mimetypes --disable-smb --disable-mouse --disable-sysmouse --disable-leds --disable-marks --disable-css --enable-small --enable-utf-8 --without-gpm --without-bzlib --without-idn --without-spidermonkey --without-lua --without-gnutls --without-openssl CFLAGS="-Os -ggdb -Wall"	2006-09-24 16:55:29 +03:00
Kalle Olavi Niemitalo	62d6db44aa	Bug 381: Make codepage data const. Before: text data bss dec hex filename 25726 62992 3343 92061 1679d src/intl/charsets.o 653856 120020 82144 856020 d0fd4 src/elinks After: text data bss dec hex filename 60190 28528 3311 92029 1677d src/intl/charsets.o 688320 85556 82112 855988 d0fb4 src/elinks So 34464 bytes were moved from the data section to the text section and should be more likely to get shared between ELinks processes. Measured on i686-pc-linux-gnu with: --disable-xbel --disable-nls --disable-cookies --disable-formhist --disable-globhist --disable-mailcap --disable-mimetypes --disable-smb --disable-mouse --disable-sysmouse --disable-leds --disable-marks --disable-css --enable-small --enable-utf-8 --without-gpm --without-bzlib --without-idn --without-spidermonkey --without-lua --without-gnutls --without-openssl CFLAGS="-O2 -ggdb -Wall"	2006-09-24 11:59:23 +03:00
Kalle Olavi Niemitalo	9c94a896b7	Internally rename the utf_8 codepage to utf8.	2006-09-17 16:23:17 +03:00
Kalle Olavi Niemitalo	f7fd49cf28	UTF-8: New function unicode_fold_label_case and a related script.	2006-08-06 20:02:42 +00:00
Jonas Fonseca	d2e346436a	Hmm, seem b.delta decided not to become 0x03B4 like it should	2006-01-10 15:39:11 +01:00
Jonas Fonseca	aa75cade23	Reinsert part of comment for nVDash; fixes 8e0eda5e4d4	2006-01-03 23:38:37 +01:00
Jonas Fonseca	b5065e7a17	Add header about where to get the SGML entity database from unicode.org ... and summon up the local changes made.	2006-01-03 17:20:50 +01:00
Jonas Fonseca	8c684e8c73	Skip entities with unknown unicode (0x????)	2006-01-03 17:12:58 +01:00
Jonas Fonseca	395a64f569	Merge in the public entity set names from the unicode.org database This also fixes b.delta to have the correct value 0x03B4. The main difference to ELinks' entity database is: - entities not in the unicode database from 1997: Scomma, Tcomma, euro, scomma, tcomma - obsolete entities kept for compatibility: emdash, endash, hibar	2006-01-03 17:10:19 +01:00
Jonas Fonseca	8e0eda5e4d	Merge in the 0x???? chars and fix some incomplete descriptions	2006-01-03 16:48:11 +01:00
Jonas Fonseca	3e6c08ce12	Move the SGML entity database back to the format used by unicode.org	2006-01-03 16:43:31 +01:00
Jonas Fonseca	af089507dc	Remove unneeded Unicode/.gitignore	2005-12-25 02:41:03 +01:00
Laurent MONIN	df065ead80	Remove now useless $Id: lines.	2005-10-21 09:14:07 +02:00
Petr Baudis	06ea255a22	Convert part of the build to the new build system The root makefile is converted as well as some leaf Makefiles. This also brings in the required infrastructure and adjusts configure.in appropriately. I converted only makefiles containing no configurable stuff, since that'll require more consideration yet.	2005-09-15 21:03:56 +02:00
Jonas Fonseca	7462f22635	Remove now obsolete .cvsignore files.	2005-09-15 18:33:20 +02:00
Jonas Fonseca	e54f78bf3f	Oops, missed the generated stuff in Unicode/.	2005-09-15 18:29:59 +02:00
Petr Baudis	0f6d4310ad	Initial commit of the HEAD branch of the ELinks CVS repository, as of Thu Sep 15 15:57:07 CEST 2005. The previous history can be added to this by grafting.	2005-09-15 15:58:31 +02:00

18 Commits