2006-07-12 21:46:16 -04:00
|
|
|
This is the "Mikulas' HTML Processor". The parser and renderer we actually use.
|
|
|
|
It's scary, yes.
|
|
|
|
|
|
|
|
|
|
|
|
Parser - renderer interaction
|
|
|
|
=============================
|
|
|
|
|
|
|
|
(See also doc/hacking.txt. Incoherent (and possibly misleaded on some points;
|
|
|
|
don't trust it to the letter) rambling follows.)
|
|
|
|
|
|
|
|
The parser is written modularily so it is separated from the renderer and can
|
|
|
|
be actually used with a different renderer as well (they do that in Links2,
|
|
|
|
using graphics renderer with the current parser). The entry point from the rest
|
|
|
|
of ELinks is inside the renderer, which sets up couple of callbacks (like "put
|
|
|
|
this text on screen" or "special element indicator", where special element
|
|
|
|
might be a <hr>, a link or a table) and calls the parser.
|
|
|
|
|
2006-07-13 15:12:24 -04:00
|
|
|
Usually, the renderer just lets the parser chew through the document on its own and
|
|
|
|
only processes the callbacks, sometimes it kicks in though - at that point it
|
2006-07-12 21:46:16 -04:00
|
|
|
does a "management override", skips a chunk of the source and resumes the parser
|
2006-07-13 15:12:24 -04:00
|
|
|
after it's over. Most commonly, it does this with tables - when you hit a table,
|
2006-07-12 21:46:16 -04:00
|
|
|
html_table() does basically nothing and the renderer at that point skips to
|
|
|
|
</table>. What happens with the table you ask? The renderer calls itself
|
|
|
|
recursively on just the table; that means a separate parser instance is run for
|
|
|
|
the table, and for each distinct cell a new renderer instance is called. The
|
|
|
|
table is (parsed and) rendered two times - first to just find out the wanted
|
|
|
|
cell sizes, then it optimizes that and figures out the layout and does a second
|
2006-07-13 15:12:24 -04:00
|
|
|
rendering pass where it uses the calculated cell sizes to actually write the cells
|
|
|
|
to the canvas.
|
2006-07-12 21:46:16 -04:00
|
|
|
|
|
|
|
|
|
|
|
Box model plans
|
|
|
|
===============
|
|
|
|
|
2006-07-13 15:12:24 -04:00
|
|
|
The design described above - calling the renderer recursively on the table and each cell -
|
|
|
|
is a cheap substitute for the box model. Except for those cases where the
|
|
|
|
renderer basically just writes on the canvas sequentially as the text comes in, moving
|
2006-07-12 21:46:16 -04:00
|
|
|
the pen only rightwards and down; there are only some parameters like
|
|
|
|
indentation and border sizes which affect the rightwards and down motion.
|
|
|
|
|
|
|
|
It needs to be changed so that the renderer maintains a *box model* - at each
|
|
|
|
moment the text being written out is inside a stack of boxes. This is what
|
|
|
|
the table renderer achieves by just making each box a separate renderer
|
|
|
|
instance - you can't get away without boxes when rendering tables. But boxes
|
|
|
|
are essential for all block elements at the moment you go CSS, and we went CSS
|
|
|
|
and would like to support floating elements.
|
|
|
|
|
2006-07-13 15:12:24 -04:00
|
|
|
Even without support of the float property, boxes will have an immediate advantage since you will be
|
2006-07-12 21:46:16 -04:00
|
|
|
able to e.g. set their background - now e.g. slashdot looks really ugly since
|
2006-07-13 15:12:24 -04:00
|
|
|
it has background set on some block elements but since we have no box model we set the
|
|
|
|
background only on the rendered text itself, not the rectangular canvas around
|
2006-07-12 21:46:16 -04:00
|
|
|
it.
|
|
|
|
|
2006-07-13 15:12:24 -04:00
|
|
|
Boxes can probably be implemented by just transmitting information "here a new
|
2006-07-12 21:46:16 -04:00
|
|
|
box (block element) begins" and "here a box ends" from parser to the renderer,
|
2006-07-13 15:12:24 -04:00
|
|
|
and maintaining a box stack with some geometry information in the renderer (and
|
|
|
|
now, you've just got DOM for block elements if you turn this from a temporary
|
2006-07-12 21:46:16 -04:00
|
|
|
stack to a persistent tree; but we might not want to do that, at least not
|
|
|
|
unconditionally).
|
|
|
|
|
|
|
|
|
|
|
|
Rendering boxes
|
|
|
|
~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
When you are actually rendering, you must not apply the attributes on the whole
|
|
|
|
box after rendering; inline elements might have custom attributes and we won't
|
|
|
|
have boxes for those. Also, you don't know the box width until you've already
|
|
|
|
rendered it. Several possibilities occur to me:
|
|
|
|
|
|
|
|
(i) Dry-run each box to find out the dimensions, prefill the rectangle with
|
|
|
|
attributes and then hard-render. Awfully quadratic, would kill performance.
|
|
|
|
|
|
|
|
(ii) Post-fill: for each line and each box in stack, remember "content width".
|
|
|
|
After the box is over you look at its each line and post-fill it from the
|
|
|
|
content width to the box width (at both sides).
|
|
|
|
|
|
|
|
(iii) Maximalistic: assume maximal spread and reduce afterwards. Basically fill
|
|
|
|
the whole line (from starting X coordinate) with the box attributes and when
|
|
|
|
the box is done rendering, fill the rest of all its lines (from ending X
|
|
|
|
coordinate) with the parent's box attributes.
|
|
|
|
|
|
|
|
I like (iii) most.
|
|
|
|
|
|
|
|
|
|
|
|
Boxes and floats
|
|
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
In order to be able to support floats, you want to be able to revisit previous
|
|
|
|
boxes and resize them based on the new box. To do that, you need to remember
|
|
|
|
parser contexts for all the boxes in the stack (you should have only few of
|
|
|
|
those). If you hit a floating box, you:
|
|
|
|
|
|
|
|
* dry-render it to find out the dimensions and then do the dimensions
|
|
|
|
negotiation like you'd do with cell tables; or maybe a different one,
|
|
|
|
I didn't read the specs on it; don't put anything on canvas
|
|
|
|
|
|
|
|
* pop the floating box
|
|
|
|
|
|
|
|
* duplicate the parent box, so you have a "sibling box" that contains
|
|
|
|
all the content hit so far; limit its parser context to reach
|
|
|
|
only to the start of your floating box
|
|
|
|
|
|
|
|
Note that you don't want to merge multiple float boxes together.
|
|
|
|
So, an implementation might have something like this instead of the
|
|
|
|
"duplication":
|
|
|
|
|
|
|
|
struct box {
|
|
|
|
struct box floaters[];
|
|
|
|
}
|
|
|
|
|
|
|
|
where floaters are children of this box; normally you have one
|
|
|
|
which is mostly read-only and contains everything rendered so far,
|
|
|
|
and one which points to the next box in your stack. When you process
|
|
|
|
a floating box, it gets a separate entry in the floaters[] array
|
|
|
|
and won't get merged to floaters[0].
|
|
|
|
|
|
|
|
* wipe the sibling box (floaters[*]) from canvas
|
|
|
|
|
|
|
|
* rerender the sibling box with calculated geometry constraints
|
|
|
|
|
|
|
|
* rerender the floating box dtto
|
|
|
|
|
|
|
|
(This might be modified based on how floats are actually supposed to behave;
|
|
|
|
I'm not sure of that. ;-) ) This algorithm is careful not to keep a list of
|
|
|
|
all sibling boxes in memory since with a simple long document a mere sequence
|
|
|
|
of paragraphs would cost us huge amount of memory. There's probably no way
|
|
|
|
around keeping a list of sibling floating boxes in memory, though.
|