QA findings from a multi-round review of the FreeDOS submission prep work:
- TUI rendering refactor: src/tui.c emitted ANSI escape sequences via
printf, which displays as raw text on bare FreeDOS (no ANSI.SYS).
Add four HAL ops (tui_enter, tui_leave, render_run, set_cursor_shape)
and route per-cell rendering through them. POSIX backend keeps the
ANSI path; DOS backend drives BIOS INT 10h via the existing
bios_set_cursor / bios_write_char helpers. The TUI's logical cursor
goes through the saved orig_locate to avoid recursing through the
swapped-in gw_hal->locate.
- DOS extended-key mapping: dos_getch returns 0x100 | scancode for
arrows / F-keys; tui_read_key wasn't translating those to its TK_*
constants, so the editor never saw arrow keys or F1-F10 on DOS.
Add a __MSDOS__-conditional translation table in tui_read_key.
- Version banner: GW_VERSION was still 0.16.0 even though the v0.17.0
release prep was already in CHANGES.TXT. Bump.
- Compiler PulseAudio link: gwbasic-compile -c hardcoded
'-lgwrt -lm -lpthread' on the gcc command line. When libgwrt was
built against libpulse-simple (the default on any host with the
PulseAudio dev headers installed), the compile workflow failed with
'undefined reference to pa_simple_drain'. CMake now passes
GWRT_HAS_PULSEAUDIO to gwbasic-compile when libpulse is present, and
the compiler appends -lpulse-simple to the link line.
- FRE("") garbage collection: the interpreter skipped strpool_gc with a
comment 'unsafe during expression eval', but that's exactly what real
GW-BASIC's FRE("") does (and the AOT compiler path already did). Add
the GC call; strpool_pin/unpin is the existing escape hatch if a
caller has live pool pointers on the C stack. Fixes the string_gc
compat test.
- Test harness normalization: run_tests.sh stripped trailing whitespace
on the actual output but not the expected file, causing spurious
mismatches against golden files captured from real GWBASIC.EXE.
Normalize both sides identically. Fixes the peek_gfx mismatch.
- Print_using: snprintf into mantissa[32] with %.*f and an unbounded
dec triggered a -Wformat-truncation warning. Clamp dec to 20 (IEEE
double has at most ~17 significant decimal digits).
- Doc/version consistency: 16-bit binary size reported as 127KB in one
place and 128KB in three; standardize on 128KB. HAL backend count
said '1 file' but is now 2. CI test count said 'all 66 test
programs' but is 72. Add a v0.17.0 row to the development.md table.
Update getting-started.md DOS section to match the BIOS-rendering
reality and add a manual TUI verification checklist.
- dos_init now writes back BIOS-reported cols/rows to dos_hal struct
fields (forward-declared so dos_init can reference it).
After these changes: 72/72 interpreter tests pass, compat 68/68
matched, no warnings on the Linux build.
6.8 KiB
Architecture
Pipeline
Source text → Tokenizer (CRUNCH) → Token stream
↓
Expression evaluator (FRMEVL)
↓
Statement dispatcher (NEWSTT)
↓
TUI screen buffer (interactive)
↓
HAL (platform I/O)
The interpreter mirrors the original GW-BASIC's internal pipeline, which -- like
most Microsoft interpreters of the era -- is a tight loop around three core
routines. CRUNCH tokenizes source lines into a compact byte stream. NEWSTT
dispatches each statement. FRMEVL evaluates expressions. All platform I/O
goes through a HAL vtable (hal_ops_t), so the core interpreter has no idea
whether it's talking to an ANSI terminal or a teletype from 1975.
In interactive mode, the TUI layer swaps in its own HAL function pointers and
redirects all output through a dynamically allocated screen buffer. Rendering
itself is delegated back to the HAL via four ops (tui_enter, tui_leave,
render_run, set_cursor_shape): the POSIX backend emits ANSI escape
sequences, the DOS backend drives BIOS INT 10h directly, so the full-screen
editor works on bare FreeDOS without ANSI.SYS. In piped mode the TUI stays
out of the way and the HAL writes straight to stdout.
Module Map
| Module | Source | Original Assembly |
|---|---|---|
| Tokenizer (CRUNCH/LIST) | tokenizer.c |
GWMAIN.ASM |
| Expression evaluator | eval.c |
GWEVAL.ASM |
| Execution loop + control flow | interp.c |
BINTRP.ASM |
| TUI screen editor | tui.c |
-- |
| Graphics engine | graphics.c |
-- |
| Token/keyword tables | tokens.c, tokens.h |
IBMRES.ASM |
| Error handling | error.c |
GWDATA.ASM |
| Integer arithmetic | math_int.c |
MATH1.ASM |
| Float ops + MBF conversion | math_float.c |
MATH2.ASM |
| Transcendentals | math_transcend.c |
MATH1.ASM |
| String functions | strings.c |
BISTRS.ASM |
| PRINT / LPRINT | print.c |
BINTRP.ASM |
| PRINT USING | print_using.c |
BIPRTU.ASM |
| Variables + arrays | vars.c, arrays.c |
GWMAIN.ASM |
| File I/O + random access | fileio.c |
BIPTRG.ASM |
| Program I/O (SAVE/LOAD) | program_io.c |
BIMISC.ASM |
| INPUT/LINE INPUT | input.c |
BINTRP.ASM |
| Sound engine | sound.c |
-- |
| Virtual memory (PEEK/POKE) | virmem.c |
-- |
| Hardware I/O ports | portio.c |
-- |
| String space pool + GC | strpool.c |
GWEVAL.ASM (GETSPA/GARBAG) |
| AOT compiler analysis | analysis.c |
-- |
| AOT compiler codegen | codegen.c |
-- |
| Compiled program runtime | gwrt.c |
-- |
| Platform abstraction | hal_posix.c, hal_dos.c |
OEM*.ASM |
Source Layout
src/ -- core interpreter + compiler (27 files)
include/ -- headers (18 files)
platform/ -- HAL backends (2 files)
gwbasickernel/ -- Jupyter notebook kernel (Python, 6 files)
tests/ -- 72 automated test programs, 4 classic interactive programs, compat harness
docs/ -- Sphinx documentation
TUI Architecture
The TUI (tui.c) implements the classic GW-BASIC full-screen editor:
- Screen buffer --
tui_cell_t *screenis dynamically allocated atrows × cols(default 25×80, or full terminal size with--full). Each cell stores a character and color attribute, accessed viaTUI_CELL(r, c). - HAL interception --
tui_init()swaps HAL function pointers so all existing PRINT/LIST/error output automatically goes through the screen buffer. No changes needed toprint.c,error.c, or most ofinterp.c. - Line editor --
tui_read_line()implements the defining GW-BASIC UX: free cursor movement with arrow keys, and pressing Enter on any screen line re-enters that line's content as BASIC input. - Function keys -- F1-F10 with default GW-BASIC bindings, configurable via
the
KEY n, "string"statement.KEY ONshows the bar on the bottom row. - Break handling -- SIGINT sets a flag checked each statement in the run loop.
Ahead-of-Time Compiler
gwbasic-compile translates tokenized .bas programs to C source linked
against libgwrt.a:
.bas → gw_crunch() → analysis pass → C codegen → gcc → native binary
The analysis pass (analysis.c) collects variables, GOTO/GOSUB targets,
DATA literals, and DEFINT/DEFSNG/DEFDBL/DEFSTR type defaults. When --warn
is active, it also tracks variable assignment vs. use context to detect
uninitialized variables, GOTO to nonexistent lines, and unreachable code.
The codegen (codegen.c) walks the token stream and emits C source:
variables become static int16_t var_A_int, control flow uses goto L_100
labels, and expressions are compiled inline using the same precedence-climbing
structure as eval.c. Optimizations include constant folding, dead code
elimination, FOR step=1 elision, and fast-path expression emission.
The --safe mode modifies generated C to use overflow-checked integer
arithmetic (gw_int_add/sub/mul from math_int.c) instead of bare C
operators, and enhanced-diagnostic array/GOSUB runtime functions that report
variable names, subscript values, and BASIC line numbers on error.
The runtime library (gwrt.c / libgwrt.a) provides initialization,
DATA/READ support, GOSUB return-label stack, and wrappers around the existing
interpreter modules. It includes all interpreter code except the execution
loop, so compiled programs share the same string pool, GC, file I/O, and
graphics implementation.
Design Decisions
Relation to Original Assembly
Microsoft released the original GW-BASIC source
in 2020 -- 43,771 lines of 8088 assembly spread across 43 .ASM files, complete
with Greg Whitten's comments and Neil Konzen's transcendental math routines
(which are, frankly, impressive for 16-bit fixed-point). This reimplementation
uses that assembly as a reference, not as input to a transliterator -- the
algorithms are reimplemented in idiomatic C with modern data structures.
Key Differences from the Original
- IEEE 754 floating point -- MBF (Microsoft Binary Format) conversion is used at the binary save/load boundary and for file I/O (CVI/CVS/CVD, MKI$/MKS$/MKD$), matching the original's on-disk format
- Dynamic memory allocation --
malloc/freeinstead of a 64KB segment layout - String space pool -- 32KB contiguous pool with compacting GC at statement boundaries, matching the original's GETSPA/GARBAG approach
setjmp/longjmp-- for error recovery, matching the original's stack reset behavior- ANSI terminal -- TUI uses ANSI escape sequences and alternate screen buffer instead of direct CGA memory access
- Dynamic screen buffer -- allocated at runtime based on terminal size, rather than hardcoded to 25×80