Add short intro and talk about box model

2025-06-30 22:19:29 -04:00 · 2006-07-13 03:46:16 +02:00 · 2006-07-13 03:46:16 +02:00 · 314dbc2dab
commit 314dbc2dab
parent 1ae18267f8
1 changed files with 127 additions and 0 deletions
--- a/src/document/html/README
+++ b/src/document/html/README
@ -0,0 +1,127 @@
+This is the "Mikulas' HTML Processor". The parser and renderer we actually use.
+It's scary, yes.
+
+
+Parser - renderer interaction
+=============================
+
+(See also doc/hacking.txt. Incoherent (and possibly misleaded on some points;
+don't trust it to the letter) rambling follows.)
+
+The parser is written modularily so it is separated from the renderer and can
+be actually used with a different renderer as well (they do that in Links2,
+using graphics renderer with the current parser). The entry point from the rest
+of ELinks is inside the renderer, which sets up couple of callbacks (like "put
+this text on screen" or "special element indicator", where special element
+might be a <hr>, a link or a table) and calls the parser.
+
+Usually, renderer just lets the parser chew through the document on its own and
+just processes the callbacks, sometimes it kicks in though - at that point it
+does a "management override", skips a chunk of the source and resumes the parser
+after it's over. Most commonly it does this with tables - when you hit a table,
+html_table() does basically nothing and the renderer at that point skips to
+</table>. What happens with the table you ask? The renderer calls itself
+recursively on just the table; that means a separate parser instance is run for
+the table, and for each distinct cell a new renderer instance is called. The
+table is (parsed and) rendered two times - first to just find out the wanted
+cell sizes, then it optimizes that and figures out the layout and does a second
+rendering pass with figured out cell sizes enforced and cells actually written
+out to the canvas.
+
+
+Box model plans
+===============
+
+What you've seen above - renderer called recursively on table and each cell -
+is a cheap substitute for box model. Basically, now except of those cases the
+renderer just writes on the canvas sequentially as the text comes in, moving
+the pen only rightwards and down; there are only some parameters like
+indentation and border sizes which affect the rightwards and down motion.
+
+It needs to be changed so that the renderer maintains a *box model* - at each
+moment the text being written out is inside a stack of boxes. This is what
+the table renderer achieves by just making each box a separate renderer
+instance - you can't get away without boxes when rendering tables. But boxes
+are essential for all block elements at the moment you go CSS, and we went CSS
+and would like to support floating elements.
+
+Even without float, boxes will have an immediate advantage since you will be
+able to e.g. set their background - now e.g. slashdot looks really ugly since
+it has bg set on some block elements but since we have no box model we set the
+background only on the rendered text itself, not the rectangural canvas around
+it.
+
+Boxes can be probably implemented by just transmitting information "here a new
+box (block element) begins" and "here a box ends" from parser to the renderer,
+and maintaining box stack with some geometry information in the renderer (and
+wow, you've just got DOM for block elements if you turn this from a temporary
+stack to a persistent tree; but we might not want to do that, at least not
+unconditionally).
+
+
+Rendering boxes
+~~~~~~~~~~~~~~~
+
+When you are actually rendering, you must not apply the attributes on the whole
+box after rendering; inline elements might have custom attributes and we won't
+have boxes for those. Also, you don't know the box width until you've already
+rendered it. Several possibilities occur to me:
+
+(i) Dry-run each box to find out the dimensions, prefill the rectangle with
+attributes and then hard-render. Awfully quadratic, would kill performance.
+
+(ii) Post-fill: for each line and each box in stack, remember "content width".
+After the box is over you look at its each line and post-fill it from the
+content width to the box width (at both sides).
+
+(iii) Maximalistic: assume maximal spread and reduce afterwards. Basically fill
+the whole line (from starting X coordinate) with the box attributes and when
+the box is done rendering, fill the rest of all its lines (from ending X
+coordinate) with the parent's box attributes.
+
+I like (iii) most.
+
+
+Boxes and floats
+~~~~~~~~~~~~~~~~
+
+In order to be able to support floats, you want to be able to revisit previous
+boxes and resize them based on the new box. To do that, you need to remember
+parser contexts for all the boxes in the stack (you should have only few of
+those). If you hit a floating box, you:
+
+	* dry-render it to find out the dimensions and then do the dimensions
+	  negotiation like you'd do with cell tables; or maybe a different one,
+	  I didn't read the specs on it; don't put anything on canvas
+
+	* pop the floating box
+
+	* duplicate the parent box, so you have a "sibling box" that contains
+	  all the content hit so far; limit its parser context to reach
+	  only to the start of your floating box
+
+	  Note that you don't want to merge multiple float boxes together.
+	  So, an implementation might have something like this instead of the
+	  "duplication":
+
+	  struct box {
+		struct box floaters[];
+	  }
+
+	  where floaters are children of this box; normally you have one
+	  which is mostly read-only and contains everything rendered so far,
+	  and one which points to the next box in your stack. When you process
+	  a floating box, it gets a separate entry in the floaters[] array
+	  and won't get merged to floaters[0].
+
+	* wipe the sibling box (floaters[*]) from canvas
+
+	* rerender the sibling box with calculated geometry constraints
+
+	* rerender the floating box dtto
+
+(This might be modified based on how floats are actually supposed to behave;
+I'm not sure of that. ;-) ) This algorithm is careful not to keep a list of
+all sibling boxes in memory since with a simple long document a mere sequence
+of paragraphs would cost us huge amount of memory. There's probably no way
+around keeping a list of sibling floating boxes in memory, though.