1
0
mirror of https://github.com/rkd77/elinks.git synced 2024-06-15 23:35:34 +00:00
elinks/src/dom/sgml
2006-05-31 19:34:49 +02:00
..
docbook Autogenerate .vimrc files and put the master in config/vimrc 2006-01-15 18:38:58 +01:00
html Cleanup SGML info backends #includes and description 2006-01-14 08:07:00 +01:00
rss Cleanup SGML info backends #includes and description 2006-01-14 08:07:00 +01:00
xbel Cleanup SGML info backends #includes and description 2006-01-14 08:07:00 +01:00
dump.c DOM: Replace various DOM status/error/exception codes with dom_code enum 2006-01-31 22:01:35 +01:00
dump.h DOM: Add simple stack context based utility for dumping DOM trees to SGML 2006-01-30 06:07:16 +01:00
Makefile DOM: Add simple stack context based utility for dumping DOM trees to SGML 2006-01-30 06:07:16 +01:00
parser.c Trim trailing whitespaces. 2006-05-31 19:34:49 +02:00
parser.h DOM: Rename src/dom/dom.h src/dom/code.h 2006-01-31 23:30:55 +01:00
README Elute all DOM-related code and put it in src/dom 2005-12-28 14:05:14 +01:00
scanner.c Trim trailing whitespaces. 2006-05-31 19:34:49 +02:00
scanner.h Add basic support for requesting error detection; SGML scanner part missing 2006-01-07 04:21:39 +01:00
sgml.c Although aware ELinks doesn't need another sgml/doctype here is DocBook 2006-01-01 23:22:10 +01:00
sgml.h DOM: Add STATIC_DOM_STRING macro and make INIT_DOM_STRING cleaner 2006-01-28 22:55:15 +01:00

			SGML DOM tree loader

TODO items:

 - Check if the (HTML) DOM tree has a <base href="<base-uri>"> element that
   should be honoured.

 - Handle optional end tags.

 - The parser and scanner needs to know about the various data concepts of SGML
   like CDATA. It could be the start of DOCTYPE definitions. A generic way to
   create SGML parsers. One obvious place where CDATA would be useful is needed
   is for <script>#text</script> skipping which currently will generate elements
   for [ '<' <ident> ] sequences.

[Excepts from a mail from Apr 18 15:11 2004 to Witold Filipczyk]
-------------------------------------------------------------------------------
> AFAIK when <p> is not closed current code doesn't handle such situation. I'm
> thinking about function "close_tag" which automagically "closes" tags.

The problem with closing tags is to figure out if the end tag is optional. This
information is already available in the sgml_node_info structure via the
SGML_ELEMENT_END_OPTIONAL flag and the sgml_node_info is then part of the
sgml_parser_state structure that is available in the dom_navigator_state's data
member.

When initializating the dom navigator it get's passed an object size which it
uses for allocating this kind of private data.

If you look at add_sgml_element() you will see that it does:

        struct dom_navigator_state *state;
        struct sgml_parser_state *pstate;

        state = get_dom_navigator_top(navigator);
        assert(node == state->node && state->data);

        pstate = state->data;
        pstate->info = get_sgml_node_info(parser->info->elements, node);
        node->data.element.type = pstate->info->type;

Meaning it sets up the sgml_parser state.

Only problem is that I haven't had time to write patches so that the parser
actually uses the state info. It is available as:

	struct sgml_parser_state *pstate = get_dom_navigator_top(navigator)->data;

and then when another element should be generated we just have to check if the
top requires an end tag meaning

	if (pstate->info->flags & SGML_ELEMENT_END_OPTIONAL)

in which case we need to pop_dom_node(navigator) ..

It sounds easy dunno if I have forgotten something. Atleast that is a start and
we could maybe do more clever things. But my goal is to make the parser handle
fairly clean tag soup well. Later we can maybe put in some hooks to improve
really bad tag soup.
-------------------------------------------------------------------------------