1
0
mirror of https://github.com/rkd77/elinks.git synced 2024-12-04 14:46:47 -05:00
Commit Graph

68 Commits

Author SHA1 Message Date
Witold Filipczyk
c5a7f87c43 Bug 1060: Use libtre for regexp searches.
When the user tells ELinks to search for a regexp, ELinks 0.11.0
passes the regexp to regcomp() and the formatted document to
regexec(), both in the terminal charset.  This works OK for unibyte
ASCII-compatible charsets because the regexp metacharacters are all in
the ASCII range.  And ELinks 0.11.0 doesn't support multibyte or
ASCII-incompatible (e.g. EBCDIC) charsets in terminals, so it is no
big deal if regexp searches fail in such locales.

ELinks 0.12pre1 attempts to support UTF-8 as the terminal charset if
CONFIG_UTF8 is defined.  Then, struct search contains unicode_val_T c
rather than unsigned char c, and get_srch() and add_srch_chr()
together save UTF-32 values there if the terminal charset is UTF-8.
In plain-text searches, is_in_range_plain() compares those values
directly if the search is case sensitive, or folds them to lower case
if the search is case insensitive: with towlower() if the terminal
charset is UTF-8, or with tolower() otherwise.  In regexp searches
however, get_search_region_from_search_nodes() still truncates all
values to 8 bits in order to generate the string that
search_for_pattern() then passes to regexec().  In UTF-8 locales,
regexec() expects this string to be in UTF-8 and can't make sense of
the truncated characters.  There is also a possible conflict in
regcomp() if the locale is UTF-8 but the terminal charset is not, or
vice versa.

Rejected ways of fixing the charset mismatches:

* When the terminal charset is UTF-8, recode the formatted document
  from UTF-32 to UTF-8 for regexp searching.  This would work if the
  terminal and the locale both use UTF-8, or if both use unibyte
  ASCII-compatible charsets, but not if only one of them uses UTF-8.

* Convert both the regexp and the formatted document to the charset of
  the locale, as that is what regcomp() and regexec() expect.  ELinks
  would have to somehow keep track of which bytes in the converted
  string correspond to which characters in the document; not entirely
  trivial because convert_string() can replace a single unconvertible
  character with a string of ASCII characters.  If ELinks were
  eventually changed to use iconv() for unrecognized charsets, such
  tracking would become even harder.

* Temporarily switch to a locale that uses the charset of the
  terminal.  Unfortunately, it seems there is no portable way to
  construct a name for such a locale.  It is also possible that no
  suitable locale is available; especially on Windows, whose C library
  defines MB_LEN_MAX as 2 and thus cannot support UTF-8 locales.

Instead, this commit makes ELinks do the regexp matching with regwcomp
and regwexec from the TRE library.  This way, ELinks can losslessly
recode both the pattern and the document to Unicode and rely on the
regexp code in TRE decoding them properly, regardless of locale.

There are some possible problems though:

1. ELinks stores strings as UTF-32 in arrays of unicode_val_T, but TRE
   uses wchar_t instead.  If wchar_t is UTF-16, as it is on Microsoft
   Windows, then TRE will misdecode the strings.  It wouldn't be too
   hard to make ELinks convert to UTF-16 in this case, but (a) TRE
   doesn't currently support UTF-16 either, and it seems possible that
   wchar_t-independent UTF-32 interfaces will be added to TRE; and (b)
   there seems to be little interest on using ELinks on Windows anyway.

2. The Citrus Project apparently wanted BSD to use a locale-dependent
   wchar_t: e.g. UTF-32 in some locales and an ISO 2022 derivative in
   others.  Regexp searches in ELinks now do not support the latter.

[ Adapted to elinks-0.12 from bug 1060 attachment 506.
  Commit message by me.  --KON ]
2009-02-08 18:26:22 +02:00
Jonas Fonseca
705acfa05a Use git tools instead of cogito for getting the build ID
The build ID now includes both last tagged version, commit generation
since last tagged version, as well as the leading characters of the
commit ID and a flag for dirty working tree.
(cherry picked from commit c2a0d3b969)
2008-03-01 13:55:16 +02:00
Laurent MONIN
b389d1b20e Revert "Use the new OBJS-unless$(CONFIG_FOO) instead of $(call not,...)"
This reverts commit e07354f5d5.
2007-09-14 17:43:36 +02:00
Jonas Fonseca
e07354f5d5 Use the new OBJS-unless$(CONFIG_FOO) instead of $(call not,...) 2007-09-14 16:48:26 +02:00
Jonas Fonseca
1079c95b9d Integrate Doxygen better in the build system
This change:

 - Adds a check for the doxygen program to configure.
 - Moves the Doxyfile from src/Doxyfile to doc/Doxyfile.in.
 - Generates a doc/Doxyfile from doc/Doxyfile.in inserting
   an absolute path to the source directory, so that it
   also works when builddir != srcdir.
 - Adds `make api` rule for running doxygen; it depends on
   api/doxygen file which is never created to force the rule
   to always run.
2007-08-08 14:23:21 +02:00
Jonas Fonseca
8bfab2242e Fix 'make test' dependency when building test utility programs
Problems was caused by undefined symbols:

	src/util/conv.c:308: undefined reference to `is_cp_utf8'
	src/util/conv.c:320: undefined reference to `cp2u'
2007-05-26 13:46:12 +02:00
Witold Filipczyk
8688e623d4 Used the builtin macro RM in place of defined UNINSTALL. 2007-02-24 11:09:55 +02:00
Witold Filipczyk
cf86e2e72f Added SEE_CFLAGS to the Makefile.config. Use SEE_CFLAGS only when necessary. 2007-02-18 17:09:32 +02:00
Kalle Olavi Niemitalo
388b1b0efd Define datarootdir in Makefile.config.in, for better Autoconf compatibility.
With Autoconf 2.60a, the default values of datadir, infodir, and
mandir refer to ${datarootdir}.  If Makefile.config.in does not define
datarootdir, Autoconf detects this and expands ${datarootdir} when it
substitutes expressions like @datadir@, but it also outputs the
following warning:

config.status: creating Makefile.config
config.status: WARNING:  /home/Kalle/src/elinks/Makefile.config.in seems to ignore the --datarootdir setting

According to a comment in config.status, "This hack should be removed
a few years after 2.60."  So it seems best to prepare for that now by
defining datarootdir = @datarootdir@ in Makefile.config.in.  Earlier
versions of Autoconf may leave that line unexpanded; but because the
makefiles do not directly refer to ${datarootdir}, there's no harm.
2006-09-17 17:55:53 +03:00
Kalle Olavi Niemitalo
92cb452a9e Rename CONFIG_UTF_8 to CONFIG_UTF8.
The configure script no longer recognizes "CONFIG_UTF_8=yes" lines
in custom features.conf files.  They will have to be changed to
"CONFIG_UTF8=yes".  This incompatibility was deemed acceptable
because no released version of ELinks supports CONFIG_UTF_8.

The --enable-utf-8 option was not renamed.
2006-09-17 16:12:47 +03:00
Kalle Olavi Niemitalo
57a9871ea1 Prepend $(top_builddir) to @INSTALL@ if it is relative.
Reported to elinks-users on 2006-08-23.
2006-09-10 08:57:55 +03:00
Witold Filipczyk
fcc00bcfd9 Added uninstall target to the Makefile. 2006-09-03 09:27:21 +02:00
Witold Filipczyk
4f78b0dda1 True color mode. See new konsole.
TODO: dump
2006-08-19 23:39:40 +02:00
Kalle Olavi Niemitalo
40e257bedd build: Don't use $(AM_CFLAGS) anymore. Use $(CPPFLAGS) instead.
$(AM_CFLAGS) is one of the variables set by Automake, which ELinks no
longer uses.  $(CPPFLAGS) should be used whenever the C preprocessor
is run, according to the GNU Coding Standards.  (My build environment
does have an important -I option there.)
2006-08-05 12:36:20 +02:00
Witold Filipczyk
2a6125e3d0 Merge with utf8. src/document/plain/renderer.c replaced by utf8 version 2006-07-21 13:12:06 +02:00
Laurent MONIN
4b04a25b32 Support for negotiate-auth, using GSSAPI. It makes possible to
authenticate users by Kerberos. Patch by Karel Zak.
2006-06-14 14:41:59 +02:00
Witold Filipczyk
7d1a966239 lzma encoding support using LZMA SDK. Original lzma executable decompresses faster than this code. I have no idea why. 2006-03-24 12:30:54 +01:00
Pavol Babincak
f9d67aeb73 Added configure option --enable-utf-8
For enabling better UTF-8 support by Witek and Scrool.
2006-02-18 20:28:00 +01:00
4aaafc4716 Inroduced garbage collector. Disabled by default 2006-01-30 22:09:13 +01:00
Jonas Fonseca
20baf3b207 Newer versions of AsciiDoc (7.0.5 atleast) needs to be passed --unsafe
Detect this in configure and set ASCIIDOC_FLAGS accordingly.
2006-01-27 02:32:06 +01:00
Jonas Fonseca
16ff8a444f Move setting of TEST_LIB to Makefile.lib 2006-01-19 02:15:56 +01:00
Jonas Fonseca
206037eaa4 Handle the logic for util/{md5,sha1} in the Makefile 2006-01-19 02:08:07 +01:00
Jonas Fonseca
359d835050 Handle the logic for util/scanner in the Makefile; less CONFIG_* variables 2006-01-19 01:24:42 +01:00
bb9b4437fa - FSP protocol 2006-01-16 11:40:13 +01:00
Jonas Fonseca
ec9383b575 More build speed ups by using native make for more stuff
Nearly halves traversal of an empty tree.
2006-01-14 16:01:37 +01:00
Jonas Fonseca
9bd280b4f7 Use $(CURDIR) defined by make instead of using pwd 2006-01-14 14:50:42 +01:00
Jonas Fonseca
97fe1db841 Minor Makefile fixes 2006-01-14 10:59:58 +01:00
Laurent MONIN
bdc59d5ac4 Store lib.o name in a variable named LIB_O_NAME. 2006-01-12 19:06:50 +01:00
Laurent MONIN
f84692b89a Keep CONFIG_* list sorted. 2006-01-12 18:43:05 +01:00
Laurent MONIN
fd39e595a3 CONFIG_SEE -> CONFIG_ECMASCRIPT_SEE. 2006-01-12 17:33:33 +01:00
Laurent MONIN
018c4268b1 Add missing CONFIG_NLS. 2006-01-11 20:47:58 +01:00
Laurent MONIN
ef685396f3 Update to recent CONFIG_* macros. 2006-01-11 20:26:24 +01:00
Laurent MONIN
3f9bb0d7f9 CONFIG_BEOS -> CONFIG_OS_BEOS 2006-01-11 20:12:59 +01:00
Laurent MONIN
202965d338 CONFIG_WIN32 -> CONFIG_OS_WIN32 2006-01-11 20:10:27 +01:00
Laurent MONIN
86f5f2cf48 CONFIG_UNIX -> CONFIG_OS_UNIX 2006-01-11 20:10:27 +01:00
Laurent MONIN
9eafa94fd9 CONFIG_RISCOS -> CONFIG_OS_RISCOS 2006-01-11 20:10:26 +01:00
Laurent MONIN
b6ccfc0e07 CONFIG_OS2 -> CONFIG_OS_OS2 2006-01-11 20:10:26 +01:00
Laurent MONIN
a9b8abb70c 2006-01-11 14:14:11 +01:00
Laurent MONIN
52537b6733 2006-01-11 14:10:58 +01:00
Laurent MONIN
76751d1935 2006-01-11 14:10:51 +01:00
Laurent MONIN
5805586f0f 2006-01-11 14:10:41 +01:00
Laurent MONIN
4b2b5798ab 2006-01-11 14:07:17 +01:00
Laurent MONIN
f7a2dfc12a CONFIG_LUA -> CONFIG_SCRIPTING_LUA 2006-01-11 14:06:13 +01:00
Laurent MONIN
138ceef262 Add missing CONFIG_ECMASCRIPT_SMJS, it fixes building without SEE and
with Spidermonkey EcmaScript.
2006-01-11 13:44:33 +01:00
witekfl
d8592e4f99 Alternative experimental ECMAScript engine. 2006-01-10 19:17:29 +01:00
Jonas Fonseca
23f0085842 Move src/dom/test/libtest test/libtest.sh, put path to it in TEST_LIB 2006-01-03 00:34:10 +01:00
Jonas Fonseca
748bab64a7 Make the printed install paths simpler for man5 files when srcdir == builddir 2005-12-30 00:49:01 +01:00
Jonas Fonseca
61185ff34e Make ECMAScript browser scripting configurable
Either set CONFIG_SM_SCRIPTING in features.conf or pass to ./configure the
option --disable-sm-scripting. Now scripting is also enabled when needed
and not only if some other scripting backend is enabled.

Remove some remnants of SEE scripting backend.
2005-12-25 02:23:54 +01:00
Russ Rowan
3c9f192267 Colorize Pasky's build system a bit. 2005-12-15 02:44:15 -05:00
Jonas Fonseca
68d692724c Add rules to check all .c files with sparse
... and things ain't looking too good. Lots of warnings.
2005-11-24 13:24:19 +01:00