Jonas Fonseca
95c1de2315
Fix handling of incomplete processing instructions
...
When doing incremental rendering we now require the whole thing to be there
and that there is room for two tokens in the scanner token table. This is
necessary because we have to generate both a processing target token and a
processing data token to make life simpler for the parser.
Remove processing instruction data case label from the main parser loop. It
is safer this way since it already assumes that the processing target token
has been stored.
2006-01-28 03:35:36 +01:00
Jonas Fonseca
823c594524
Use ssize_t instead of size_t for length since it must carry a signed value
2006-01-28 03:24:16 +01:00
Jonas Fonseca
00c4e0bfa2
Do not attempt to read *string when string == scanner->end
...
There might be other places that needs to be reviewd for this.
2006-01-28 03:23:06 +01:00
Jonas Fonseca
d92a074e40
Fix parsing of '<a< b>' where the scanner didn't rewind to the proper place
...
Add test for this tag soup combo.
2006-01-28 03:21:27 +01:00
Laurent MONIN
5114c9d110
Trim trailing whitespaces.
2006-01-28 01:59:22 +01:00
Jonas Fonseca
e5e06764c4
Improve checks for incompleteness when parsing attributes
...
Check whether there are '=' and value tokens before handling them. If there
is any doubt the whole attribute structure is 'pushed back' into the
stream. That way incremental parsing will not add the value as a new
attribute because the name token was handled in the previous parsing run.
2006-01-28 01:40:56 +01:00
Jonas Fonseca
4ab1dde874
Preserve the scanner state when it is not the 'default' state
...
This is necessary to make it possible to resume parsing of element
attributes. Allows the incomplete string in the parsing state struct to
be unset.
2006-01-28 01:24:30 +01:00
Jonas Fonseca
c6e83d1d9c
Assert parsing depth >= parser stack depth
...
Like the comment says popping parsing nodes during incremental parsing
might trigger this.
2006-01-28 01:12:03 +01:00
Jonas Fonseca
9e7b0d4fa3
Remove assertion logic from parse_sgml_attributes()
...
They are getting out of hand and making it hard to use the function in
'unusual' situations (like when resuming parsing inside elements).
2006-01-28 01:09:05 +01:00
Jonas Fonseca
1e104afbba
Improve error checking when adding nodes
...
Fail with SGML_PARSER_CODE_MEM_ALLOC.
2006-01-28 01:05:42 +01:00
Jonas Fonseca
74728cab05
Also set the node subtype for <?xml-stylesheet?>
2006-01-28 01:00:28 +01:00
Jonas Fonseca
bccf5512d6
Force an incomplete token for quoted attribute values when there's no end
2006-01-28 00:56:48 +01:00
Jonas Fonseca
a2376609e3
Expand the testing of incremental parsing
...
There are still some bugs to resolve.
2006-01-28 00:50:06 +01:00
Jonas Fonseca
0f8aa77ebb
Add test for incremental SGML parsing
...
It is a loop that parses the same small document with various read sizes.
The sgml-parser program is updated to take --stdin option taking a the read
size as a required parameter.
2006-01-27 07:49:15 +01:00
Jonas Fonseca
b25cd27232
Add support incremental parsing
...
That is, add the last parts that saves and resumes previous incomplete
parsing states. Now the parsing stack push handler checks if the parent has
a resume flag set. When set, the incomplete fragment to resume is restored
and the new source fragment appended and parsing is continued.
2006-01-27 07:47:17 +01:00
Jonas Fonseca
9d91994f3c
Propone updating the scanner->state until incompleteness has been checked
...
That way the scanner state is meaningful when resuming during incremental
parsing.
2006-01-27 07:41:42 +01:00
Jonas Fonseca
afb45aace5
Add support for scanning comment endings such as '--!>' correctly
2006-01-25 18:18:01 +01:00
Jonas Fonseca
2eba71d95b
Add support for testing normalization using the DOM configuration module
2006-01-20 02:08:46 +01:00
Jonas Fonseca
cc61578fcb
Fix node pushing in walk_dom_nodes()
2006-01-20 02:07:24 +01:00
Jonas Fonseca
22e647813e
Fix DOM_CONFIG_NORMALIZE_WHITESPACE comment
2006-01-20 02:06:41 +01:00
Jonas Fonseca
7fe214fbb2
Fix text node appending; fix DOM configuration parser
2006-01-19 04:54:30 +01:00
Jonas Fonseca
fe43bf8a4f
Fix leaks in the DOM stack tracer
2006-01-19 04:51:33 +01:00
Jonas Fonseca
126ae8c764
#include dom/node.h instead of dom/stack.h
2006-01-19 04:50:10 +01:00
Jonas Fonseca
34b12d21bd
Upgrade to use dom_stack_codes in the callbacks
2006-01-17 16:58:19 +01:00
Jonas Fonseca
2cd151c5c5
Add parse_dom_config() which converts a textual config list to flags
2006-01-17 16:55:10 +01:00
Jonas Fonseca
1d52d67e50
Add get_dom_node_child() which searches for a node with given type
2006-01-16 07:11:02 +01:00
Jonas Fonseca
6c85c0f009
Add DOM configuration inspired module
...
It add support for normalizing a DOM document in various ways, such as
removing comments, converting CDATA section nodes to text nodes, cleanup
whitespace, etc.
Use it in the RSS renderer to sanitize the text to be rendered.
2006-01-16 05:12:34 +01:00
Jonas Fonseca
768f97c38e
Add get_dom_node_prev() which gets the previous sibling of a DOM node
2006-01-16 05:10:22 +01:00
Jonas Fonseca
4e6b05394d
ADD DOM_STACK_CODE_FREE_NODE so callbacks can remove nodes when popping
2006-01-16 05:09:45 +01:00
Jonas Fonseca
eecc22751d
Use dom_stack_code enum for dom_stack_callback_T
2006-01-16 00:55:58 +01:00
Jonas Fonseca
4a2cde1c00
Introduce dom_stack_code enum and use it for push_dom_node()
2006-01-16 00:40:51 +01:00
Jonas Fonseca
2748d043f9
Autogenerate .vimrc files and put the master in config/vimrc
...
This changes the init target to be idempotent: most importantly it will now
never overwrite a Makefile if it exists. Additionally 'make init' will
generate the .vimrc files. Yay, no more stupid 'added fairies' commits! ;)
2006-01-15 18:38:58 +01:00
Jonas Fonseca
082031c10c
Fix SGML parser test program
2006-01-14 12:44:06 +01:00
Jonas Fonseca
c8aa6c2360
Move struct sgml_parsing_state near the parsing state managing
2006-01-14 12:11:35 +01:00
Jonas Fonseca
e70b779366
Add code member to struct sgml_parser and simplify parsing state setup
...
parse_sgml() now just pushes a text node on the parsing state and the push
handler will now call parse_sgml_plain() and save the return code in
parser->code so parse_sgml() can return it. Much simpler.
2006-01-14 12:09:17 +01:00
Jonas Fonseca
0950996dd8
Change parse_sgml() to take buf+bufsize instead of DOM string
2006-01-14 11:32:11 +01:00
Jonas Fonseca
aecfb28711
Cleanup SGML info backends #includes and description
2006-01-14 08:07:00 +01:00
Laurent MONIN
5685221512
Trim trailing whitespaces.
2006-01-13 00:11:39 +01:00
Laurent MONIN
bdc59d5ac4
Store lib.o name in a variable named LIB_O_NAME.
2006-01-12 19:06:50 +01:00
Jonas Fonseca
2d80258f72
Mark doc'd headers with: /* API Doc :: <api-name> */
2006-01-11 11:03:59 +01:00
Jonas Fonseca
dd2516f597
Oops, someone added stuff he wuz not s'posed to
2006-01-09 14:11:29 +01:00
Jonas Fonseca
620730e642
Document the DOM scanner
2006-01-09 14:01:48 +01:00
Jonas Fonseca
3b166b0633
Document the DOM stack
2006-01-09 12:44:57 +01:00
Jonas Fonseca
938c8a80b4
Support more implicit markup of source files
2006-01-09 11:01:36 +01:00
Jonas Fonseca
db11b6452f
Fix a typo and a ref:[]
2006-01-09 01:20:03 +01:00
Jonas Fonseca
5b818b20ba
Use the new asciidoc code markup to document the DOM sgml parser
2006-01-08 23:36:07 +01:00
Jonas Fonseca
2f9c406ef1
Introduce add_to_dom_string() and turn init_dom_string() into its user
2006-01-08 03:40:54 +01:00
Jonas Fonseca
acb1f7e74d
Refactor computation of scanner error string length to get_sgml_error_end()
2006-01-07 23:51:19 +01:00
Jonas Fonseca
534a16fff1
Improve error detection
2006-01-07 23:40:21 +01:00
Jonas Fonseca
3835bf8449
A handful of fixes related to error detection
...
- Fix assertion failure by breaking the switch if an error token is next
when previous was a processing instruction.
- Fix <!notation parsing by skipping ident chars instead of spaces.
- Improve checking of processing instruction 'target'-end and what error
string is generated.
- For now put all of the processing instruction data in the error token.
- Remove a DBG()-print.
2006-01-07 05:18:43 +01:00
Jonas Fonseca
97f403a9d9
Add a test file for checking detection of errors by the SGML parser
2006-01-07 05:15:16 +01:00
Jonas Fonseca
03ee543e21
Make sgml-parser request error detection when passed --error option
2006-01-07 04:27:08 +01:00
Jonas Fonseca
c993a0012e
Add basic support for detection errors while scanning
...
It mostly uses the checking for incompleteness already in place. Tested
lightly so it will definately need some more work.
2006-01-07 04:26:08 +01:00
Jonas Fonseca
5defc48eb3
Add basic support for requesting error detection; SGML scanner part missing
2006-01-07 04:21:39 +01:00
Jonas Fonseca
a1e5122183
Drop unnneeded URL argument and simplify test helpers
2006-01-07 02:14:45 +01:00
Jonas Fonseca
f1c3c90a4f
Move line counting tests to own file; simplifies a few things
2006-01-07 02:02:21 +01:00
Jonas Fonseca
dee8ac5b45
Move test for incompleteness to own file
2006-01-07 01:48:51 +01:00
Jonas Fonseca
7ff2cb2607
Improve a comment a bit
2006-01-07 01:41:07 +01:00
Jonas Fonseca
7c65c06b41
Move up enum sgml_parser_code declaration
2006-01-07 01:29:44 +01:00
Jonas Fonseca
c9c41e38a2
test_expect_incomplete(): Put sgml-parser output to /dev/null
2006-01-07 01:27:48 +01:00
Jonas Fonseca
f8d44ffe32
scan_sgml_tokens(): Drop local variable and use scanner->current
...
... so lower level scanners can change the next token to use.
2006-01-07 01:25:42 +01:00
Jonas Fonseca
bca330fcbd
Simplify incomplete test helper and fix quoting problem
2006-01-07 01:22:14 +01:00
Jonas Fonseca
215d7ec158
Append memdebug to test dependencies in Makefile.lib
2006-01-06 22:11:45 +01:00
Jonas Fonseca
5f5c78a87f
Realign the test docs with reality
2006-01-06 18:32:22 +01:00
Jonas Fonseca
ab8a4b2847
Add more tests based on test/comments.html
2006-01-05 15:36:18 +01:00
Laurent MONIN
31c30864e0
Trim trailing whitespaces.
2006-01-04 18:08:48 +01:00
Jonas Fonseca
0bfb1d7742
Free nodes created on the SGML parsing stack
2006-01-04 00:29:10 +01:00
Jonas Fonseca
66cf866ab6
Cleanup the DOM stack flags; s/KEEP_NODES/FREE_NODES/
2006-01-03 20:35:32 +01:00
Jonas Fonseca
7a5f699a88
Drop unneeded -b arg to cmp, which isn't available in FreeBSD's version
2006-01-03 20:00:06 +01:00
Jonas Fonseca
146ca09c43
Improve support for running 'make test' when srcdir != builddir
...
Additionally, also make TESTDEPS conditionally contain memdebug object
binary only if CONFIG_DEBUG is yes.
2006-01-03 19:04:17 +01:00
Jonas Fonseca
50183bf5d8
Add support for recursively running all tests
2006-01-03 02:07:51 +01:00
Jonas Fonseca
ba5bdfec00
Move the 'make test' handling to Makefile.lib
...
The test rule is defined when TEST_PROGS is defined. Users should also set
TESTDEPS to get the correct object files linked in.
2006-01-03 00:45:22 +01:00
Jonas Fonseca
23f0085842
Move src/dom/test/libtest test/libtest.sh, put path to it in TEST_LIB
2006-01-03 00:34:10 +01:00
Jonas Fonseca
f88cbe6761
Add check of incomplete text
2006-01-02 22:35:03 +01:00
Jonas Fonseca
42156f4477
Change one test description to start with 'Parse ...'
2006-01-02 22:31:28 +01:00
Jonas Fonseca
f75ccffbc7
Fix SGML parsing and scanning so that all tests succeeds
...
This includes checking the return token of get_next_dom_scanner_token() and
fixing the calculated size of recovered processing instruction data tokens.
2006-01-02 21:04:51 +01:00
Jonas Fonseca
0160c0a464
Make it possible to test how incomplete input is parsed
...
Also fix the expected output of proc. instruction test.
2006-01-02 21:02:41 +01:00
Jonas Fonseca
e78d43f1ac
Add mode where the SGML scanner checks for completeness
2006-01-02 17:46:09 +01:00
Jonas Fonseca
af72dd8435
Make parse_sgml() return the sgml_parser_code enum
...
It is mostly just ignored for now. The SGML parser test tool will however
return parser code.
2006-01-02 17:40:42 +01:00
Jonas Fonseca
29279e71b7
Add SGML_TOKEN_INCOMPLETE and handle it in the parser
2006-01-02 17:20:39 +01:00
Jonas Fonseca
2d813f2cbf
Introduce enum sgml_parser_code and make the parsers return something
2006-01-02 17:14:51 +01:00
Jonas Fonseca
fcf7677584
Skip spaces immediately when recognising '<?ident'
2006-01-02 16:58:48 +01:00
Jonas Fonseca
8c9324cc37
Add test for SGML such as, e.g. '<parent<child/></parent>'
2006-01-02 16:26:01 +01:00
Jonas Fonseca
0071ea696c
Fix logic in update_number_of_lines() and tell parse_sgml() src is complete
2006-01-02 14:59:54 +01:00
Jonas Fonseca
58c31f44a0
Clearify the code a bit
2006-01-02 03:06:47 +01:00
Jonas Fonseca
dc10be626e
The attribute parsing of proc. instruction nodes has the complete source
2006-01-02 02:44:01 +01:00
Jonas Fonseca
f608e2a0ae
Add the concept of completeness to strings being parsed and scanned
...
... not used yet.
2006-01-02 02:08:20 +01:00
Jonas Fonseca
6e9a18b444
fix a few bugs for line counting in plain text
2006-01-02 01:49:12 +01:00
Jonas Fonseca
7717862401
Make it possible to pass --print-lines to test line counting
2006-01-02 01:48:08 +01:00
Jonas Fonseca
247debe34f
Add get_sgml_parser_line_number(), and fix a copy/paste error
2006-01-02 01:47:02 +01:00
Jonas Fonseca
275ba0b789
Use common print_indent() to simplify printf()-strings
2006-01-02 00:32:22 +01:00
Jonas Fonseca
b83bbf9c4a
Add sgml_parser_flag which can be used to specify SGML_PARSER_COUNT_LINES
2006-01-02 00:29:37 +01:00
Jonas Fonseca
1801a21b50
init_sgml_parser(): Rename flags to stack_flags
2006-01-02 00:29:36 +01:00
Laurent MONIN
54997c506f
Drop trailing whitespaces.
2006-01-02 00:15:20 +01:00
Jonas Fonseca
43b34dcb2f
Add DocBook element and attribute definitions and drop a bogus file
2006-01-01 23:59:57 +01:00
Jonas Fonseca
021af4e87c
Although aware ELinks doesn't need another sgml/doctype here is DocBook
...
It was created a long time ago so (I think) it deserves to survive. It
maps .sgml files to applicatino/docbook+xml and uses the highlighter.
2006-01-01 23:22:10 +01:00
Jonas Fonseca
6b62e0cb77
Declare struct sgml_parser_state above struct sgml_parser
...
... and describe the info member.
2005-12-31 20:02:39 +01:00
Jonas Fonseca
f0148c2ecf
Keep struct sgml_parsing_state private to the parser
2005-12-31 19:59:11 +01:00
Jonas Fonseca
4a766f350b
Just for fun also parse <?xml-stylesheet attributes
2005-12-31 03:13:39 +01:00
Jonas Fonseca
a578ed4667
Make the SGML scanner (optionally) keep track of line numbers
...
A new line is either \n or \f. The main logic for counting lines is in
skip_sgml{,_chars,_space}. For the general case where line numbers are not
wanted the code tries to avoid the extra checks for newlines.
This will be useful for reporting errors when loading the XBEL file.
2005-12-31 02:46:56 +01:00
Jonas Fonseca
b23beed031
Rename skip_comment() and skip_cdata_section() to conform to skip_sgml_*()
2005-12-31 02:00:09 +01:00
Jonas Fonseca
0891cda51e
Introduce skip_sgml_space() that wraps scan_sgml(..., SGML_SCAN_WHITESPACE)
2005-12-31 01:57:54 +01:00
Jonas Fonseca
9264221635
Make init_dom_scanner() take the state arg and drop a macro
2005-12-31 01:55:38 +01:00
Jonas Fonseca
7489c134f7
Make non-terminated comments and cdata sections have 'the rest' as content
2005-12-31 01:47:57 +01:00
Jonas Fonseca
8f7f6abc16
Use skip_sgml_chars() in skip_comment() and skip_cdata_section()
2005-12-31 01:40:52 +01:00
Jonas Fonseca
4e10bcf772
Drop useless code for proc. instruction scanning
2005-12-31 01:18:49 +01:00
Jonas Fonseca
e8ff8bd5f0
Fix another off-by-one error similar to the SGML comment parsing
2005-12-31 01:14:52 +01:00
Jonas Fonseca
ab7ba39d42
Introduce skip_sgml_chars() to avoid usage of memchr()
2005-12-31 00:06:12 +01:00
Jonas Fonseca
14a3f9c0fd
Disable dom-select building since it requires defining of DOM_STACK_TRACE
2005-12-31 00:05:49 +01:00
Jonas Fonseca
9a0bf83756
Add basic stuff for XBEL parsing/highlighting using the DOM engine
2005-12-30 22:19:32 +01:00
Jonas Fonseca
aa07b3edf4
Fix old (non) problem with using VERSION identifier by #undef'ing it first
2005-12-30 22:13:13 +01:00
Jonas Fonseca
65a114f4bc
Sort the RSS elements, they are supposed to be binarily searchable
2005-12-30 21:46:44 +01:00
Jonas Fonseca
ad052c3985
Hey, hey Cripple Creek Fai^H^Herry
2005-12-30 21:19:46 +01:00
Jonas Fonseca
41f1f5f9d3
Add a simple program for testing the DOM select code
...
It accepts --uri, --src and --selector args.
2005-12-30 03:33:48 +01:00
Jonas Fonseca
4f09ac99f7
Make it possible to identify the output of DOM stack tracers
2005-12-30 03:29:17 +01:00
Jonas Fonseca
0ddb5f2d18
Use the DOM stack tracer for getting a dump of active nodes
2005-12-30 03:02:59 +01:00
Jonas Fonseca
bd1beb1fab
Use the stack when creating the select node hierachy
2005-12-30 02:59:34 +01:00
Jonas Fonseca
4868c23a06
Cleanup the DOM test Makefile so it's more generic and more silent
2005-12-30 02:19:25 +01:00
Jonas Fonseca
76a524ddf6
More <?xml and comment tests, fix an off-by-one error for comments skipping
2005-12-29 22:26:39 +01:00
Jonas Fonseca
bd877570d2
Test some more obscure proc. instructions and fix some assertion failures
2005-12-29 21:52:27 +01:00
Jonas Fonseca
57168e1fbc
Handle <element path=/to/%61-&\one";files/> as a self-closing tag
...
Before the '/' before '>' would be interpreted as part of the attribute
value. Hope this is sensible slurping of the markup soup.
2005-12-29 20:38:43 +01:00
Jonas Fonseca
958a4a1b51
Add tests for more things like space handling and obscure formatting
2005-12-29 19:13:48 +01:00
Jonas Fonseca
beb8337fc5
Add rule to make test run from src/dom dir
2005-12-29 18:33:59 +01:00
Jonas Fonseca
1a177491a0
Fix SGML parsing of processing instructions (<?xml ...?>)
...
It involves adding a new scanner state which is used only to generate a new
processing instruction (PI) data token. This removes some scanner specific
code from the parser and makes handling of PIs more generic. The data of
XML PIs are still parsed as attributes and added to the PI node.
The 6th test now succeeds. Hurrah!
2005-12-29 18:31:49 +01:00
Jonas Fonseca
c24c67ce59
Make it possible to initialise a scanner in a specific state
2005-12-29 18:20:03 +01:00
Jonas Fonseca
889a0f16f8
Fix the expected output of processing instruction parsing
...
Spaces after the target should be skipped.
2005-12-29 18:00:26 +01:00
Jonas Fonseca
ba5dbd3a18
Add test_output_equals helper and add a few more tests
...
The last one fails for now. Incorrect parsing of processing instructions.
2005-12-29 06:54:41 +01:00
Jonas Fonseca
23f21f1924
Fine tune how some of the nodes are printed, fix string compressing
2005-12-29 06:50:51 +01:00
Jonas Fonseca
602d2d8a66
Add README for the test infrastructure mostly pasted from git/t/README
2005-12-29 05:12:36 +01:00
Jonas Fonseca
d394cb0bc1
Grab GITs shell script-based test infrastructure and add "Hello world" test
2005-12-29 04:44:03 +01:00
Jonas Fonseca
f42b39ee3c
Fix indentation so that things are printed at the start of the line
2005-12-29 04:39:20 +01:00
Jonas Fonseca
8dcbaa76f3
sgml-parser: Make it possible to specify the URL and the source to parse
2005-12-29 04:29:13 +01:00
Jonas Fonseca
c475f1fc0c
Drop linking with util/string.o, since memdebug no longer requires it
2005-12-28 23:07:06 +01:00
Jonas Fonseca
4feba6d515
Use stdio when printing enhanced values instead of allocating first
2005-12-28 23:02:45 +01:00
Jonas Fonseca
4bbc25c532
Remove dependency on util/string.h from dom/string.h
2005-12-28 21:20:55 +01:00
Jonas Fonseca
9bd346c295
dom_scanner_token_contains(): Use strcasecmp() instead of strlcasecmp()
2005-12-28 21:18:08 +01:00
Jonas Fonseca
fb6ca9a390
Use dom_string for storing the name member of dom_scanner_string_mapping
2005-12-28 21:10:05 +01:00
Jonas Fonseca
f1015f8a6a
Make files include dom/string.h instead of util/string.h directly
2005-12-28 20:45:55 +01:00
Jonas Fonseca
e34d0d3de4
Initialize the string->length in init_dom_string()
2005-12-28 19:49:22 +01:00
Jonas Fonseca
1b71368459
Add proof-of-concept stand-alone test binary which just prints Hello World
...
May it multiply in great numbers and help to stabilize the DOM
implementation.
2005-12-28 17:10:01 +01:00
Jonas Fonseca
1bd0c8758e
Make the DOM node creators take dom_string structs
2005-12-28 16:47:28 +01:00
Jonas Fonseca
ec7b293e4e
Some minor cleanup of token string access
2005-12-28 16:34:42 +01:00
Jonas Fonseca
6e163b186c
Make the dom_scanner_token store it's string in a dom_string struct
2005-12-28 16:23:36 +01:00
Jonas Fonseca
97c702c674
Make init_dom_select() take dom_string struct
2005-12-28 15:57:37 +01:00
Jonas Fonseca
2e4e404145
Make init_dom_scanner() take the source string as a dom_string struct
2005-12-28 15:55:21 +01:00
Jonas Fonseca
62d981c551
Store struct dom_scan_table_info data in a dom_string
2005-12-28 15:51:31 +01:00
Jonas Fonseca
73785bee02
Remove some unneeded #includes
2005-12-28 15:36:58 +01:00
Jonas Fonseca
dbf0948062
Do not decode entity references and fix the tree tracer for document nodes
...
The idea is to make the DOM thing not depend on too many external things.
2005-12-28 15:27:05 +01:00
Jonas Fonseca
d1e275be52
Make parse_sgml() take buffer as dom_string struct
2005-12-28 15:21:45 +01:00
Jonas Fonseca
11e168aba4
Make init_sgml_parser() take URI as dom_string struct
2005-12-28 15:19:10 +01:00