forth/f-strings.txt

47 lines
2.0 KiB
Plaintext
Executable File

Forth Strings
In order to flexibly generate HTML, Forth requires the ability to
construct strings of arbitrary length in memory by concatenating and
nesting multiple string segments.
There are several string-handling Forth modules available, but none
are straightforward, so I'm considering a custom module.
For string concatenation, my first idea was to allocated space
for the combined string for each concatenation, but I'm afraid
generation of a page of HTML in memory would require allocating
several times the final page size as each string segment is combined
and recombined several times into larger and larger sections or the
document.
An alternative idea is to allocate two buffers each of the estimated
maximum page size. Then all concatenations are expressed as appending
and/or prepending strings to the current pafe image. An appended
string could simply be copied to the end of the page buffer. To
prepend a string, the copy buffer would be initialized with the
string, page buffer contents appended, then the resulting combined
string copied back to the page buffer. Would have to track end of
page image within buffer. This would limit memory usage to twice the
estimated maximum page size, but would require a check for buffer
overflow on exceptionally large pages.
Current average size of *.html, *.txt, *.org files in cavenet
green dataset is approx. 2500 bytes. Average word count per file is
24000.
Another alternative: use an array of string addesses and one of
string lengths. Concatenate strings by appending or inserting
compiled string addresses and lengths in their respective arrays.
This would avoid duplication of strings and memory for them. Would
impose maximum on number of string segments that could comprise
a web page.
Taking as an upper estimate each word in a page requiring a start
and an end tag would make an average of approximately 72000 string
segments.
Of course, must ask if complexity of building strings in memory
before printing is justified versus just printing strings in
sequence as they occur in processing.