Take a quick stroll through the unchartered corners of the DOM node data
structures:
- Remove ununsed struct dom_node_id_item.
- Make the document node reference a future struct dom_document.
- Describe ideas for node data, e.g. the entity reference node should use
it for storing the unicode_val_T.
It uses mangleme by Michal Zalewski <lcamtuf@coredump.cx> to generate HTML
which is then fed into the sgml-parser program. By default 100 random HTML
documents are tested. But the test script takes the number of documents
to test against as an argument. Useful for torture testing the SGML parser.
This was cause by the recent change to allocate string during incremental
parsing where the node string was set after insertion. Test for this in the
works.
Fixes: b6b6d3c67e
This changes init_dom_node_() to take an allocated argument saying whether
to allocate or not. If the value is -1, node->allocated will be set to the
value of node->parent->allocated. This way the value is inherited like we
do it in the menu code. It should be a sane default since we eventually
want not to rely on the 'underlying' source of the document and there will
be less variables to pass around.
When doing incremental rendering we now require the whole thing to be there
and that there is room for two tokens in the scanner token table. This is
necessary because we have to generate both a processing target token and a
processing data token to make life simpler for the parser.
Remove processing instruction data case label from the main parser loop. It
is safer this way since it already assumes that the processing target token
has been stored.
Check whether there are '=' and value tokens before handling them. If there
is any doubt the whole attribute structure is 'pushed back' into the
stream. That way incremental parsing will not add the value as a new
attribute because the name token was handled in the previous parsing run.
It is a loop that parses the same small document with various read sizes.
The sgml-parser program is updated to take --stdin option taking a the read
size as a required parameter.
That is, add the last parts that saves and resumes previous incomplete
parsing states. Now the parsing stack push handler checks if the parent has
a resume flag set. When set, the incomplete fragment to resume is restored
and the new source fragment appended and parsing is continued.