Files
gw-basic-2026/docs/architecture.md
Eremey Valetov 70ffd39562 v0.17.0: BIOS-routed TUI on DOS, version banner, compiler PulseAudio link
QA findings from a multi-round review of the FreeDOS submission prep work:

- TUI rendering refactor: src/tui.c emitted ANSI escape sequences via
  printf, which displays as raw text on bare FreeDOS (no ANSI.SYS).
  Add four HAL ops (tui_enter, tui_leave, render_run, set_cursor_shape)
  and route per-cell rendering through them.  POSIX backend keeps the
  ANSI path; DOS backend drives BIOS INT 10h via the existing
  bios_set_cursor / bios_write_char helpers.  The TUI's logical cursor
  goes through the saved orig_locate to avoid recursing through the
  swapped-in gw_hal->locate.

- DOS extended-key mapping: dos_getch returns 0x100 | scancode for
  arrows / F-keys; tui_read_key wasn't translating those to its TK_*
  constants, so the editor never saw arrow keys or F1-F10 on DOS.
  Add a __MSDOS__-conditional translation table in tui_read_key.

- Version banner: GW_VERSION was still 0.16.0 even though the v0.17.0
  release prep was already in CHANGES.TXT.  Bump.

- Compiler PulseAudio link: gwbasic-compile -c hardcoded
  '-lgwrt -lm -lpthread' on the gcc command line.  When libgwrt was
  built against libpulse-simple (the default on any host with the
  PulseAudio dev headers installed), the compile workflow failed with
  'undefined reference to pa_simple_drain'.  CMake now passes
  GWRT_HAS_PULSEAUDIO to gwbasic-compile when libpulse is present, and
  the compiler appends -lpulse-simple to the link line.

- FRE("") garbage collection: the interpreter skipped strpool_gc with a
  comment 'unsafe during expression eval', but that's exactly what real
  GW-BASIC's FRE("") does (and the AOT compiler path already did).  Add
  the GC call; strpool_pin/unpin is the existing escape hatch if a
  caller has live pool pointers on the C stack.  Fixes the string_gc
  compat test.

- Test harness normalization: run_tests.sh stripped trailing whitespace
  on the actual output but not the expected file, causing spurious
  mismatches against golden files captured from real GWBASIC.EXE.
  Normalize both sides identically.  Fixes the peek_gfx mismatch.

- Print_using: snprintf into mantissa[32] with %.*f and an unbounded
  dec triggered a -Wformat-truncation warning.  Clamp dec to 20 (IEEE
  double has at most ~17 significant decimal digits).

- Doc/version consistency: 16-bit binary size reported as 127KB in one
  place and 128KB in three; standardize on 128KB.  HAL backend count
  said '1 file' but is now 2.  CI test count said 'all 66 test
  programs' but is 72.  Add a v0.17.0 row to the development.md table.
  Update getting-started.md DOS section to match the BIOS-rendering
  reality and add a manual TUI verification checklist.

- dos_init now writes back BIOS-reported cols/rows to dos_hal struct
  fields (forward-declared so dos_init can reference it).

After these changes: 72/72 interpreter tests pass, compat 68/68
matched, no warnings on the Linux build.
2026-05-03 12:25:41 -04:00

6.8 KiB
Raw Permalink Blame History

Architecture

Pipeline

Source text → Tokenizer (CRUNCH) → Token stream
                                      ↓
                              Expression evaluator (FRMEVL)
                                      ↓
                              Statement dispatcher (NEWSTT)
                                      ↓
                              TUI screen buffer (interactive)
                                      ↓
                              HAL (platform I/O)

The interpreter mirrors the original GW-BASIC's internal pipeline, which -- like most Microsoft interpreters of the era -- is a tight loop around three core routines. CRUNCH tokenizes source lines into a compact byte stream. NEWSTT dispatches each statement. FRMEVL evaluates expressions. All platform I/O goes through a HAL vtable (hal_ops_t), so the core interpreter has no idea whether it's talking to an ANSI terminal or a teletype from 1975.

In interactive mode, the TUI layer swaps in its own HAL function pointers and redirects all output through a dynamically allocated screen buffer. Rendering itself is delegated back to the HAL via four ops (tui_enter, tui_leave, render_run, set_cursor_shape): the POSIX backend emits ANSI escape sequences, the DOS backend drives BIOS INT 10h directly, so the full-screen editor works on bare FreeDOS without ANSI.SYS. In piped mode the TUI stays out of the way and the HAL writes straight to stdout.

Module Map

Module Source Original Assembly
Tokenizer (CRUNCH/LIST) tokenizer.c GWMAIN.ASM
Expression evaluator eval.c GWEVAL.ASM
Execution loop + control flow interp.c BINTRP.ASM
TUI screen editor tui.c --
Graphics engine graphics.c --
Token/keyword tables tokens.c, tokens.h IBMRES.ASM
Error handling error.c GWDATA.ASM
Integer arithmetic math_int.c MATH1.ASM
Float ops + MBF conversion math_float.c MATH2.ASM
Transcendentals math_transcend.c MATH1.ASM
String functions strings.c BISTRS.ASM
PRINT / LPRINT print.c BINTRP.ASM
PRINT USING print_using.c BIPRTU.ASM
Variables + arrays vars.c, arrays.c GWMAIN.ASM
File I/O + random access fileio.c BIPTRG.ASM
Program I/O (SAVE/LOAD) program_io.c BIMISC.ASM
INPUT/LINE INPUT input.c BINTRP.ASM
Sound engine sound.c --
Virtual memory (PEEK/POKE) virmem.c --
Hardware I/O ports portio.c --
String space pool + GC strpool.c GWEVAL.ASM (GETSPA/GARBAG)
AOT compiler analysis analysis.c --
AOT compiler codegen codegen.c --
Compiled program runtime gwrt.c --
Platform abstraction hal_posix.c, hal_dos.c OEM*.ASM

Source Layout

src/         -- core interpreter + compiler (27 files)
include/     -- headers (18 files)
platform/    -- HAL backends (2 files)
gwbasickernel/ -- Jupyter notebook kernel (Python, 6 files)
tests/       -- 72 automated test programs, 4 classic interactive programs, compat harness
docs/        -- Sphinx documentation

TUI Architecture

The TUI (tui.c) implements the classic GW-BASIC full-screen editor:

  • Screen buffer -- tui_cell_t *screen is dynamically allocated at rows × cols (default 25×80, or full terminal size with --full). Each cell stores a character and color attribute, accessed via TUI_CELL(r, c).
  • HAL interception -- tui_init() swaps HAL function pointers so all existing PRINT/LIST/error output automatically goes through the screen buffer. No changes needed to print.c, error.c, or most of interp.c.
  • Line editor -- tui_read_line() implements the defining GW-BASIC UX: free cursor movement with arrow keys, and pressing Enter on any screen line re-enters that line's content as BASIC input.
  • Function keys -- F1-F10 with default GW-BASIC bindings, configurable via the KEY n, "string" statement. KEY ON shows the bar on the bottom row.
  • Break handling -- SIGINT sets a flag checked each statement in the run loop.

Ahead-of-Time Compiler

gwbasic-compile translates tokenized .bas programs to C source linked against libgwrt.a:

.bas → gw_crunch() → analysis pass → C codegen → gcc → native binary

The analysis pass (analysis.c) collects variables, GOTO/GOSUB targets, DATA literals, and DEFINT/DEFSNG/DEFDBL/DEFSTR type defaults. When --warn is active, it also tracks variable assignment vs. use context to detect uninitialized variables, GOTO to nonexistent lines, and unreachable code.

The codegen (codegen.c) walks the token stream and emits C source: variables become static int16_t var_A_int, control flow uses goto L_100 labels, and expressions are compiled inline using the same precedence-climbing structure as eval.c. Optimizations include constant folding, dead code elimination, FOR step=1 elision, and fast-path expression emission.

The --safe mode modifies generated C to use overflow-checked integer arithmetic (gw_int_add/sub/mul from math_int.c) instead of bare C operators, and enhanced-diagnostic array/GOSUB runtime functions that report variable names, subscript values, and BASIC line numbers on error.

The runtime library (gwrt.c / libgwrt.a) provides initialization, DATA/READ support, GOSUB return-label stack, and wrappers around the existing interpreter modules. It includes all interpreter code except the execution loop, so compiled programs share the same string pool, GC, file I/O, and graphics implementation.

Design Decisions

Relation to Original Assembly

Microsoft released the original GW-BASIC source in 2020 -- 43,771 lines of 8088 assembly spread across 43 .ASM files, complete with Greg Whitten's comments and Neil Konzen's transcendental math routines (which are, frankly, impressive for 16-bit fixed-point). This reimplementation uses that assembly as a reference, not as input to a transliterator -- the algorithms are reimplemented in idiomatic C with modern data structures.

Key Differences from the Original

  • IEEE 754 floating point -- MBF (Microsoft Binary Format) conversion is used at the binary save/load boundary and for file I/O (CVI/CVS/CVD, MKI$/MKS$/MKD$), matching the original's on-disk format
  • Dynamic memory allocation -- malloc/free instead of a 64KB segment layout
  • String space pool -- 32KB contiguous pool with compacting GC at statement boundaries, matching the original's GETSPA/GARBAG approach
  • setjmp/longjmp -- for error recovery, matching the original's stack reset behavior
  • ANSI terminal -- TUI uses ANSI escape sequences and alternate screen buffer instead of direct CGA memory access
  • Dynamic screen buffer -- allocated at runtime based on terminal size, rather than hardcoded to 25×80