mirror of
https://github.com/rkd77/elinks.git
synced 2024-11-04 08:17:17 -05:00
Add short intro and talk about box model
This commit is contained in:
parent
1ae18267f8
commit
314dbc2dab
127
src/document/html/README
Normal file
127
src/document/html/README
Normal file
@ -0,0 +1,127 @@
|
||||
This is the "Mikulas' HTML Processor". The parser and renderer we actually use.
|
||||
It's scary, yes.
|
||||
|
||||
|
||||
Parser - renderer interaction
|
||||
=============================
|
||||
|
||||
(See also doc/hacking.txt. Incoherent (and possibly misleaded on some points;
|
||||
don't trust it to the letter) rambling follows.)
|
||||
|
||||
The parser is written modularily so it is separated from the renderer and can
|
||||
be actually used with a different renderer as well (they do that in Links2,
|
||||
using graphics renderer with the current parser). The entry point from the rest
|
||||
of ELinks is inside the renderer, which sets up couple of callbacks (like "put
|
||||
this text on screen" or "special element indicator", where special element
|
||||
might be a <hr>, a link or a table) and calls the parser.
|
||||
|
||||
Usually, renderer just lets the parser chew through the document on its own and
|
||||
just processes the callbacks, sometimes it kicks in though - at that point it
|
||||
does a "management override", skips a chunk of the source and resumes the parser
|
||||
after it's over. Most commonly it does this with tables - when you hit a table,
|
||||
html_table() does basically nothing and the renderer at that point skips to
|
||||
</table>. What happens with the table you ask? The renderer calls itself
|
||||
recursively on just the table; that means a separate parser instance is run for
|
||||
the table, and for each distinct cell a new renderer instance is called. The
|
||||
table is (parsed and) rendered two times - first to just find out the wanted
|
||||
cell sizes, then it optimizes that and figures out the layout and does a second
|
||||
rendering pass with figured out cell sizes enforced and cells actually written
|
||||
out to the canvas.
|
||||
|
||||
|
||||
Box model plans
|
||||
===============
|
||||
|
||||
What you've seen above - renderer called recursively on table and each cell -
|
||||
is a cheap substitute for box model. Basically, now except of those cases the
|
||||
renderer just writes on the canvas sequentially as the text comes in, moving
|
||||
the pen only rightwards and down; there are only some parameters like
|
||||
indentation and border sizes which affect the rightwards and down motion.
|
||||
|
||||
It needs to be changed so that the renderer maintains a *box model* - at each
|
||||
moment the text being written out is inside a stack of boxes. This is what
|
||||
the table renderer achieves by just making each box a separate renderer
|
||||
instance - you can't get away without boxes when rendering tables. But boxes
|
||||
are essential for all block elements at the moment you go CSS, and we went CSS
|
||||
and would like to support floating elements.
|
||||
|
||||
Even without float, boxes will have an immediate advantage since you will be
|
||||
able to e.g. set their background - now e.g. slashdot looks really ugly since
|
||||
it has bg set on some block elements but since we have no box model we set the
|
||||
background only on the rendered text itself, not the rectangural canvas around
|
||||
it.
|
||||
|
||||
Boxes can be probably implemented by just transmitting information "here a new
|
||||
box (block element) begins" and "here a box ends" from parser to the renderer,
|
||||
and maintaining box stack with some geometry information in the renderer (and
|
||||
wow, you've just got DOM for block elements if you turn this from a temporary
|
||||
stack to a persistent tree; but we might not want to do that, at least not
|
||||
unconditionally).
|
||||
|
||||
|
||||
Rendering boxes
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
When you are actually rendering, you must not apply the attributes on the whole
|
||||
box after rendering; inline elements might have custom attributes and we won't
|
||||
have boxes for those. Also, you don't know the box width until you've already
|
||||
rendered it. Several possibilities occur to me:
|
||||
|
||||
(i) Dry-run each box to find out the dimensions, prefill the rectangle with
|
||||
attributes and then hard-render. Awfully quadratic, would kill performance.
|
||||
|
||||
(ii) Post-fill: for each line and each box in stack, remember "content width".
|
||||
After the box is over you look at its each line and post-fill it from the
|
||||
content width to the box width (at both sides).
|
||||
|
||||
(iii) Maximalistic: assume maximal spread and reduce afterwards. Basically fill
|
||||
the whole line (from starting X coordinate) with the box attributes and when
|
||||
the box is done rendering, fill the rest of all its lines (from ending X
|
||||
coordinate) with the parent's box attributes.
|
||||
|
||||
I like (iii) most.
|
||||
|
||||
|
||||
Boxes and floats
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
In order to be able to support floats, you want to be able to revisit previous
|
||||
boxes and resize them based on the new box. To do that, you need to remember
|
||||
parser contexts for all the boxes in the stack (you should have only few of
|
||||
those). If you hit a floating box, you:
|
||||
|
||||
* dry-render it to find out the dimensions and then do the dimensions
|
||||
negotiation like you'd do with cell tables; or maybe a different one,
|
||||
I didn't read the specs on it; don't put anything on canvas
|
||||
|
||||
* pop the floating box
|
||||
|
||||
* duplicate the parent box, so you have a "sibling box" that contains
|
||||
all the content hit so far; limit its parser context to reach
|
||||
only to the start of your floating box
|
||||
|
||||
Note that you don't want to merge multiple float boxes together.
|
||||
So, an implementation might have something like this instead of the
|
||||
"duplication":
|
||||
|
||||
struct box {
|
||||
struct box floaters[];
|
||||
}
|
||||
|
||||
where floaters are children of this box; normally you have one
|
||||
which is mostly read-only and contains everything rendered so far,
|
||||
and one which points to the next box in your stack. When you process
|
||||
a floating box, it gets a separate entry in the floaters[] array
|
||||
and won't get merged to floaters[0].
|
||||
|
||||
* wipe the sibling box (floaters[*]) from canvas
|
||||
|
||||
* rerender the sibling box with calculated geometry constraints
|
||||
|
||||
* rerender the floating box dtto
|
||||
|
||||
(This might be modified based on how floats are actually supposed to behave;
|
||||
I'm not sure of that. ;-) ) This algorithm is careful not to keep a list of
|
||||
all sibling boxes in memory since with a simple long document a mere sequence
|
||||
of paragraphs would cost us huge amount of memory. There's probably no way
|
||||
around keeping a list of sibling floating boxes in memory, though.
|
Loading…
Reference in New Issue
Block a user