Jonas Fonseca
acb1f7e74d
Refactor computation of scanner error string length to get_sgml_error_end()
2006-01-07 23:51:19 +01:00
Jonas Fonseca
534a16fff1
Improve error detection
2006-01-07 23:40:21 +01:00
Jonas Fonseca
3835bf8449
A handful of fixes related to error detection
...
- Fix assertion failure by breaking the switch if an error token is next
when previous was a processing instruction.
- Fix <!notation parsing by skipping ident chars instead of spaces.
- Improve checking of processing instruction 'target'-end and what error
string is generated.
- For now put all of the processing instruction data in the error token.
- Remove a DBG()-print.
2006-01-07 05:18:43 +01:00
Jonas Fonseca
c993a0012e
Add basic support for detection errors while scanning
...
It mostly uses the checking for incompleteness already in place. Tested
lightly so it will definately need some more work.
2006-01-07 04:26:08 +01:00
Jonas Fonseca
5defc48eb3
Add basic support for requesting error detection; SGML scanner part missing
2006-01-07 04:21:39 +01:00
Jonas Fonseca
7ff2cb2607
Improve a comment a bit
2006-01-07 01:41:07 +01:00
Jonas Fonseca
7c65c06b41
Move up enum sgml_parser_code declaration
2006-01-07 01:29:44 +01:00
Jonas Fonseca
f8d44ffe32
scan_sgml_tokens(): Drop local variable and use scanner->current
...
... so lower level scanners can change the next token to use.
2006-01-07 01:25:42 +01:00
Laurent MONIN
31c30864e0
Trim trailing whitespaces.
2006-01-04 18:08:48 +01:00
Jonas Fonseca
0bfb1d7742
Free nodes created on the SGML parsing stack
2006-01-04 00:29:10 +01:00
Jonas Fonseca
66cf866ab6
Cleanup the DOM stack flags; s/KEEP_NODES/FREE_NODES/
2006-01-03 20:35:32 +01:00
Jonas Fonseca
f75ccffbc7
Fix SGML parsing and scanning so that all tests succeeds
...
This includes checking the return token of get_next_dom_scanner_token() and
fixing the calculated size of recovered processing instruction data tokens.
2006-01-02 21:04:51 +01:00
Jonas Fonseca
e78d43f1ac
Add mode where the SGML scanner checks for completeness
2006-01-02 17:46:09 +01:00
Jonas Fonseca
af72dd8435
Make parse_sgml() return the sgml_parser_code enum
...
It is mostly just ignored for now. The SGML parser test tool will however
return parser code.
2006-01-02 17:40:42 +01:00
Jonas Fonseca
29279e71b7
Add SGML_TOKEN_INCOMPLETE and handle it in the parser
2006-01-02 17:20:39 +01:00
Jonas Fonseca
2d813f2cbf
Introduce enum sgml_parser_code and make the parsers return something
2006-01-02 17:14:51 +01:00
Jonas Fonseca
fcf7677584
Skip spaces immediately when recognising '<?ident'
2006-01-02 16:58:48 +01:00
Jonas Fonseca
58c31f44a0
Clearify the code a bit
2006-01-02 03:06:47 +01:00
Jonas Fonseca
dc10be626e
The attribute parsing of proc. instruction nodes has the complete source
2006-01-02 02:44:01 +01:00
Jonas Fonseca
f608e2a0ae
Add the concept of completeness to strings being parsed and scanned
...
... not used yet.
2006-01-02 02:08:20 +01:00
Jonas Fonseca
6e9a18b444
fix a few bugs for line counting in plain text
2006-01-02 01:49:12 +01:00
Jonas Fonseca
247debe34f
Add get_sgml_parser_line_number(), and fix a copy/paste error
2006-01-02 01:47:02 +01:00
Jonas Fonseca
b83bbf9c4a
Add sgml_parser_flag which can be used to specify SGML_PARSER_COUNT_LINES
2006-01-02 00:29:37 +01:00
Jonas Fonseca
1801a21b50
init_sgml_parser(): Rename flags to stack_flags
2006-01-02 00:29:36 +01:00
Jonas Fonseca
43b34dcb2f
Add DocBook element and attribute definitions and drop a bogus file
2006-01-01 23:59:57 +01:00
Jonas Fonseca
021af4e87c
Although aware ELinks doesn't need another sgml/doctype here is DocBook
...
It was created a long time ago so (I think) it deserves to survive. It
maps .sgml files to applicatino/docbook+xml and uses the highlighter.
2006-01-01 23:22:10 +01:00
Jonas Fonseca
6b62e0cb77
Declare struct sgml_parser_state above struct sgml_parser
...
... and describe the info member.
2005-12-31 20:02:39 +01:00
Jonas Fonseca
f0148c2ecf
Keep struct sgml_parsing_state private to the parser
2005-12-31 19:59:11 +01:00
Jonas Fonseca
4a766f350b
Just for fun also parse <?xml-stylesheet attributes
2005-12-31 03:13:39 +01:00
Jonas Fonseca
a578ed4667
Make the SGML scanner (optionally) keep track of line numbers
...
A new line is either \n or \f. The main logic for counting lines is in
skip_sgml{,_chars,_space}. For the general case where line numbers are not
wanted the code tries to avoid the extra checks for newlines.
This will be useful for reporting errors when loading the XBEL file.
2005-12-31 02:46:56 +01:00
Jonas Fonseca
b23beed031
Rename skip_comment() and skip_cdata_section() to conform to skip_sgml_*()
2005-12-31 02:00:09 +01:00
Jonas Fonseca
0891cda51e
Introduce skip_sgml_space() that wraps scan_sgml(..., SGML_SCAN_WHITESPACE)
2005-12-31 01:57:54 +01:00
Jonas Fonseca
9264221635
Make init_dom_scanner() take the state arg and drop a macro
2005-12-31 01:55:38 +01:00
Jonas Fonseca
7489c134f7
Make non-terminated comments and cdata sections have 'the rest' as content
2005-12-31 01:47:57 +01:00
Jonas Fonseca
8f7f6abc16
Use skip_sgml_chars() in skip_comment() and skip_cdata_section()
2005-12-31 01:40:52 +01:00
Jonas Fonseca
4e10bcf772
Drop useless code for proc. instruction scanning
2005-12-31 01:18:49 +01:00
Jonas Fonseca
e8ff8bd5f0
Fix another off-by-one error similar to the SGML comment parsing
2005-12-31 01:14:52 +01:00
Jonas Fonseca
ab7ba39d42
Introduce skip_sgml_chars() to avoid usage of memchr()
2005-12-31 00:06:12 +01:00
Jonas Fonseca
9a0bf83756
Add basic stuff for XBEL parsing/highlighting using the DOM engine
2005-12-30 22:19:32 +01:00
Jonas Fonseca
aa07b3edf4
Fix old (non) problem with using VERSION identifier by #undef'ing it first
2005-12-30 22:13:13 +01:00
Jonas Fonseca
65a114f4bc
Sort the RSS elements, they are supposed to be binarily searchable
2005-12-30 21:46:44 +01:00
Jonas Fonseca
ad052c3985
Hey, hey Cripple Creek Fai^H^Herry
2005-12-30 21:19:46 +01:00
Jonas Fonseca
76a524ddf6
More <?xml and comment tests, fix an off-by-one error for comments skipping
2005-12-29 22:26:39 +01:00
Jonas Fonseca
bd877570d2
Test some more obscure proc. instructions and fix some assertion failures
2005-12-29 21:52:27 +01:00
Jonas Fonseca
57168e1fbc
Handle <element path=/to/%61-&\one";files/> as a self-closing tag
...
Before the '/' before '>' would be interpreted as part of the attribute
value. Hope this is sensible slurping of the markup soup.
2005-12-29 20:38:43 +01:00
Jonas Fonseca
1a177491a0
Fix SGML parsing of processing instructions (<?xml ...?>)
...
It involves adding a new scanner state which is used only to generate a new
processing instruction (PI) data token. This removes some scanner specific
code from the parser and makes handling of PIs more generic. The data of
XML PIs are still parsed as attributes and added to the PI node.
The 6th test now succeeds. Hurrah!
2005-12-29 18:31:49 +01:00
Jonas Fonseca
fb6ca9a390
Use dom_string for storing the name member of dom_scanner_string_mapping
2005-12-28 21:10:05 +01:00
Jonas Fonseca
f1015f8a6a
Make files include dom/string.h instead of util/string.h directly
2005-12-28 20:45:55 +01:00
Jonas Fonseca
1bd0c8758e
Make the DOM node creators take dom_string structs
2005-12-28 16:47:28 +01:00
Jonas Fonseca
6e163b186c
Make the dom_scanner_token store it's string in a dom_string struct
2005-12-28 16:23:36 +01:00
Jonas Fonseca
2e4e404145
Make init_dom_scanner() take the source string as a dom_string struct
2005-12-28 15:55:21 +01:00
Jonas Fonseca
73785bee02
Remove some unneeded #includes
2005-12-28 15:36:58 +01:00
Jonas Fonseca
d1e275be52
Make parse_sgml() take buffer as dom_string struct
2005-12-28 15:21:45 +01:00
Jonas Fonseca
11e168aba4
Make init_sgml_parser() take URI as dom_string struct
2005-12-28 15:19:10 +01:00
Jonas Fonseca
71533eef9a
Elute all DOM-related code and put it in src/dom
2005-12-28 14:05:14 +01:00