* tests: automated headless FreeDOS QEMU smoke
Fully-automated counterpart to the manual run_freedos_qemu.sh: overlays the
FreeDOS image (no mutation), stages the interpreter and a SYSTEM-terminated
smoke on C:, injects the run plus poweroff into the image's startup batch,
boots headless, and diffs OUT.TXT against the golden file. Local-dev only
(needs qemu, a FreeDOS qcow2, mtools, nbd, and passwordless sudo); CI keeps
using the DOSBox-X path. Exercises the binary on a real FreeDOS install
rather than DOSBox-X emulation.
* Release 0.18.0
Cross-language linking (link BASIC into C/Fortran, call C from BASIC via
'$EXTERN), the v0.18 codegen/perf batch (paren string-comparison fix,
--no-gc-check/--fast-math, larger 32-bit caps, process-local DATE$/TIME$),
and the automated FreeDOS QEMU smoke. Bumps GW_VERSION and updates the
banners, CHANGES.TXT, and the development history table.
---------
Co-authored-by: Eremey Valetov <evvaletov@users.noreply.github.com>
Add a '$EXTERN NAME(ARGTYPES) AS RET pragma so compiled BASIC can call C
functions directly, the natural follow-up to Level 1 (--emit-obj /
--main-name). The pragma is an apostrophe comment, so the interpreter
ignores it while the compiler registers it.
Map INTEGER/SINGLE/DOUBLE/STRING to int16_t/float/double/const char* at the
boundary: a string argument crosses as a temporary C copy that is freed
after the call, and a string return is copied into the pool. The call name
is matched case-insensitively but emitted as the C symbol with the case
written in the pragma. Names are recognized before parse_var() truncates
identifiers to two significant characters, so multi-character C function
names work.
A string return that aliases a char* argument is copied before the argument
temporaries are freed, which avoids a use-after-free. Over-supplied
arguments are consumed without desyncing the token stream and warn on arity
mismatch.
Docs: getting-started.md "Foreign Functions from BASIC". Test:
tests/run_ffi_test.sh, wired into CI. 63/63 compiler, 72/72 interpreter,
68/68 compat still pass.
Also refile the roadmap "Next Up" backlog as git-bug issues and prune
docs/roadmap.md to point at git-bug as the source of truth for planned work.
Co-authored-by: Eremey Valetov <evvaletov@users.noreply.github.com>
Level 1 of the cross-language linking roadmap entry: produce an
object file with a renamed entry point so a BASIC program can be
linked into a larger C or Fortran build.
- src/compiler_main.c: --emit-obj runs gcc -c (compile-only,
produces prog.o) and skips the runtime link. --main-name NAME
(or --main-name=NAME) is plumbed through codegen_opts_t.
- src/codegen.c: emit `int <name>(int argc, char **argv)` instead
of always emitting `main`. Default unchanged when --main-name
isn't specified.
- include/codegen.h: add main_name to codegen_opts_t.
- docs/getting-started.md: new "Cross-Language Linking" section
with C and Fortran (iso_c_binding) driver examples.
- docs/roadmap.md: three levels of cross-language linking, with
Level 1 marked done, Level 2 (BASIC-side EXTERN declarations)
as the next concrete step, Level 3 (BASIC SUBs as C functions)
deferred. Also added: FORTRAN-style WRITE / C-style PRINTF
formatted I/O extensions, and a NumPy / DataFrame / Matplotlib-
style standard library section as a separate sub-project track.
Verified end-to-end: a BASIC program compiled with --emit-obj
--main-name=run_basic_greet links cleanly with both a C driver
(gcc) and a Fortran driver (gfortran with iso_c_binding), and
prints the BASIC output before returning to the host. All
72 interpreter / 68 compat / 63 compiler tests still pass.
Four roadmap items:
- codegen: fix parenthesized string comparison. emit_atom didn't
consume the body of a string-literal token (`"`), so for
PRINT (A$+B$ < "ZZZ") it emitted a 0 placeholder, advanced one byte,
and left "ZZZ" to be reparsed as a variable + extra trailing tokens
-- the binary then failed to link with `var_ZZ_sng` undeclared.
emit_atom now skips to the closing quote. Separately, the
left_type tracking in emit_num_prec dropped VT_STR after a string +
string concat (becoming VT_SNG), so the string-comparison codepath
skipped when the relational operator arrived. Preserve VT_STR
through TOK_PLUS when both operands are strings. Verified: paren
string-cmp now compiles and produces the same -1 / 0 result as the
interpreter.
- compiler: --no-gc-check and --fast-math optimization flags.
--no-gc-check skips the per-line gwrt_check_line() (no string-pool
GC, no Ctrl+Break trap). --fast-math drops the divide-by-zero
guard on `/`; the divisor still goes through (double) so 10/0
produces inf rather than SIGFPE. Both threaded through
codegen_opts_t and exposed in --help. --inline-arrays from the
roadmap deferred -- larger refactor.
- interp: raise static caps on 32-bit / Linux builds. vars 256
-> 1024, arrays 64 -> 256, MAX_FOR_DEPTH 16 -> 64, MAX_GOSUB_DEPTH
24 -> 128, MAX_WHILE_DEPTH 16 -> 64. Codegen FOR_STACK_MAX 16
-> 64. Analysis-pass caps: MAX_LINES 4096 -> 8192, MAX_VARS 256
-> 1024, MAX_GOTOS 256 -> 1024, MAX_DATA 1024 -> 4096,
MAX_GOSUB_RET 256 -> 1024. 16-bit DOS keeps the original modest
caps via #ifdef _M_I86 -- the MEDIUM model has a single 64KB
DGROUP for all static data and the bumped sizes broke runtime
startup under DOSBox-X. 16-bit binary grew from 128KB to 132KB
from the offset_secs field plus DATE$/TIME$ shift code, well
within the FreeDOS budget.
- interp + codegen: DATE$ / TIME$ assignment via process-local
clock offset. Was a no-op accept-and-ignore. Now sets
gw.time_offset_secs (long), and DATE$ / TIME$ / TIMER readers
apply it to time(NULL) before formatting. The OS clock is
unaffected (would need root). Compiled-binary readers also
reference gw.time_offset_secs since libgwrt shares the gw
struct. Verified: PRINT DATE$; DATE$="12-31-1999"; PRINT DATE$
shows the expected before/after in both interpreter and AOT
paths.
After these changes: 72/72 interpreter tests, 68/68 compat, 63/63
compiler tests, DOS smoke under DOSBox-X all pass. Build clean on
both Linux (cmake) and 16-bit DOS (build_dos.sh 16).
- .github/workflows/ci.yml: the dos-cross-compile job failed on the
first push because build_dos.sh sources $HOME/openwatcom-v2/setvars.sh
if WATCOM is unset, but that file isn't part of the OpenWatcom V2
snapshot — I'd been creating it locally by hand. Add a "Configure
OpenWatcom env" step that generates setvars.sh after extraction (so
build_dos.sh works) AND exports WATCOM/PATH/INCLUDE via $GITHUB_ENV
(so subsequent steps work even if setvars.sh sourcing changes).
Also stash both DOS binaries before the next-mode clean wipes them,
so the artifact upload actually has both .exe files.
- src/codegen.c: switch the four remaining emit_str_atom callers
(CVI/CVS/CVD function args + string-comparison left/right) to
emit_str_expr. Now `CVS(A$+B$)` and `A$+B$ < C$` accept
concatenation in their string operands; previously the atom-level
caller stopped at the first identifier and the trailing `+` confused
downstream parsing. Verified: CVS(MKS$(3.14)+MKS$(0)) round-trips
to 3.14 in both interpreter and compiled binary. All 72 interpreter
+ 63 compiler tests still pass.
- docs/getting-started.md: document that gwbasic-compile auto-numbers
unnumbered direct-mode lines (last_num + 10) so scratchpad-style
programs compile without manual renumbering.
- tests/run_freedos_qemu.sh: helper for going through the manual TUI
checklist on bare FreeDOS. Modern qemu-kvm doesn't expose -fda on
the default machine type and fat:rw: protocol is gone, so a fully
automated FreeDOS smoke isn't tractable from userspace; this script
builds a FAT data image (mtools), attaches it as -hdb to the FreeDOS
qcow2, and points the user at the manual sequence in the script
header. The DOSBox-X harness (run_dos_smoke.sh) remains the
automated DOS smoke.
Three fixes that lift seven test programs from skipped to passing,
bringing the AOT compiler harness from 56/56 to 63/63.
- Unnumbered programs (compiler_main.c): src/compiler_main.c skipped
any line that didn't start with a digit, so direct-mode .bas files
like hello.bas, math_ops.bas, string_ops.bas (no line numbers)
failed with "No program lines found". load_file now auto-assigns
line numbers (last_num + 10) to unnumbered lines, with overflow
protection at line 65520.
- String concatenation in PRINT (codegen.c): emit_str_atom had a
broken concat loop that emitted "; _cat = gw_str_concat(&...
_cat.sval ...)" — _cat was never declared, so any program with a
string-literal concat in PRINT (like PRINT "ABC" + "DEF") failed
to link. Concat is properly handled by emit_str_expr's outer
loop; remove the dead/broken code in the atom. Fixes
string_ops.bas.
- Transcendental result type (codegen.c): peek_expr_type returned
VT_DBL for ATN/LOG/EXP/VAL, so PRINT formatted them with 15-digit
double precision (e.g. 3.141592653589793) while real GW-BASIC and
the interpreter format the single-precision result as 3.141593.
Real GW-BASIC's transcendentals are single-precision; only CDBL
forces double. Demote ATN/LOG/EXP/VAL to VT_SNG; CDBL stays
VT_DBL. Fixes math_ops.bas.
Also: tests/run_compiler_tests.sh now runs the compiled binary from
the project root rather than the tempdir where it was built, so
test programs that reference tests/programs/ via relative paths
(chain_test, common_test, run_file, misc_stmts) resolve their
targets. Earlier I'd misdiagnosed those failures as ON ERROR
divergence — they were just CWD-dependent path lookups.
Doc/test counts: 56 → 63 in README, docs/index.md, docs/development.md,
docs/roadmap.md. Roadmap updated to note the compiler now accepts
unnumbered programs.
- build_dos.sh: Linux-friendly cross-compile to DOS via OpenWatcom V2.
OpenWatcom's wmake on Linux can't apply the .c.obj implicit rule for
subdirectory paths, and Makefile.dos / Makefile.dos16 rely on DOS-
only commands like 'del'. Script invokes wcc / wcc386 directly,
tracks 16-bit vs 32-bit mode via a stamp file (auto-cleans on
switch), generates a wlink directive file (the brace-delimited file
list wouldn't survive shell quoting), and supports clean. The DOS
Makefiles still work on Windows / DOS hosts.
- tests/run_compiler_tests.sh: AOT compiler harness. For each .bas
in tests/programs/, compiles via gwbasic-compile -c, runs the
resulting executable, normalizes output and diffs against the
golden file from tests/expected/. Skip list covers chain/common
multi-file flows, hardware/timing-dependent programs, unnumbered
direct-mode programs (compiler requires line numbers), and
misc_stmts/run_file (interpreter-vs-compiler ON ERROR divergence).
Result: 56/56 pass.
- tests/run_dos_smoke.sh + dos_smoke.bas + expected: runs gwbasic16.exe
under DOSBox-X (flatpak) with a program that exercises arithmetic,
strings, control flow, GOSUB, FOR/NEXT, DATA/READ, DEF FN, OPEN/
PRINT#/CLOSE, and diffs against the interpreter's golden output.
Uses $HOME for the staging dir (DOSBox-X flatpak doesn't see /tmp).
- pkg/GWBASIC.LSM + pkg/build_pkg.sh: FreeDOS submission package.
Produces dist/gwbasic-<VERSION>.zip with the standard FreeDOS
layout (APPINFO/GWBASIC.LSM, BIN/GWBASIC.EXE, DOC/GWBASIC/{README,
CHANGES,LICENSE} with CRLF, SOURCE/GWBASIC/<full source>). Source
tree is filtered through git ls-files to exclude build artifacts.
- docs/Makefile: standard Sphinx Makefile so 'cd docs && make html'
works as documented in README.md.
- .github/workflows/ci.yml: split into two jobs. build-and-test now
also runs the compiler harness. New dos-cross-compile job caches
~/openwatcom-v2, downloads the OpenWatcom V2 snapshot if not
cached, builds both 16-bit and 32-bit DOS binaries, asserts size
bounds, and uploads them as artifacts.
- .gitignore: ignore .dos_build_mode (script's stamp), .link_dir/
(transient wlink directive dir), dist/ (package output).
QA findings from a multi-round review of the FreeDOS submission prep work:
- TUI rendering refactor: src/tui.c emitted ANSI escape sequences via
printf, which displays as raw text on bare FreeDOS (no ANSI.SYS).
Add four HAL ops (tui_enter, tui_leave, render_run, set_cursor_shape)
and route per-cell rendering through them. POSIX backend keeps the
ANSI path; DOS backend drives BIOS INT 10h via the existing
bios_set_cursor / bios_write_char helpers. The TUI's logical cursor
goes through the saved orig_locate to avoid recursing through the
swapped-in gw_hal->locate.
- DOS extended-key mapping: dos_getch returns 0x100 | scancode for
arrows / F-keys; tui_read_key wasn't translating those to its TK_*
constants, so the editor never saw arrow keys or F1-F10 on DOS.
Add a __MSDOS__-conditional translation table in tui_read_key.
- Version banner: GW_VERSION was still 0.16.0 even though the v0.17.0
release prep was already in CHANGES.TXT. Bump.
- Compiler PulseAudio link: gwbasic-compile -c hardcoded
'-lgwrt -lm -lpthread' on the gcc command line. When libgwrt was
built against libpulse-simple (the default on any host with the
PulseAudio dev headers installed), the compile workflow failed with
'undefined reference to pa_simple_drain'. CMake now passes
GWRT_HAS_PULSEAUDIO to gwbasic-compile when libpulse is present, and
the compiler appends -lpulse-simple to the link line.
- FRE("") garbage collection: the interpreter skipped strpool_gc with a
comment 'unsafe during expression eval', but that's exactly what real
GW-BASIC's FRE("") does (and the AOT compiler path already did). Add
the GC call; strpool_pin/unpin is the existing escape hatch if a
caller has live pool pointers on the C stack. Fixes the string_gc
compat test.
- Test harness normalization: run_tests.sh stripped trailing whitespace
on the actual output but not the expected file, causing spurious
mismatches against golden files captured from real GWBASIC.EXE.
Normalize both sides identically. Fixes the peek_gfx mismatch.
- Print_using: snprintf into mantissa[32] with %.*f and an unbounded
dec triggered a -Wformat-truncation warning. Clamp dec to 20 (IEEE
double has at most ~17 significant decimal digits).
- Doc/version consistency: 16-bit binary size reported as 127KB in one
place and 128KB in three; standardize on 128KB. HAL backend count
said '1 file' but is now 2. CI test count said 'all 66 test
programs' but is 72. Add a v0.17.0 row to the development.md table.
Update getting-started.md DOS section to match the BIOS-rendering
reality and add a manual TUI verification checklist.
- dos_init now writes back BIOS-reported cols/rows to dos_hal struct
fields (forward-declared so dos_init can reference it).
After these changes: 72/72 interpreter tests pass, compat 68/68
matched, no warnings on the Linux build.
- Add CHANGES.TXT with full version history (DOS-friendly format)
- Add DOS/FreeDOS section to README.md
- Move compiler memory safety and DOS target to Completed in roadmap
- Add hal_dos.c to architecture.md module map
- Add .tab-color to .gitignore
Fix 6 crash paths when TUI screen buffer allocation fails (common on
16-bit DOS due to near-heap exhaustion):
- main.c: REPL and AUTO mode fall back to fgets-based line reader
- tui.c: tui_key_on/off/list return early if tui.screen is NULL
Add DOS build documentation to getting-started.md (16-bit and 32-bit
targets, running on FreeDOS). Fix stale version string (0.14.0 -> 0.16.0).
Remove Unicode em dashes (U+2014) and en dashes (U+2013) from all
Markdown files. Use ASCII -- for parenthetical breaks and - for
hyphenation, matching standard plain-text conventions.
README.md: rewrite with v0.16.0 version, compiler section, Jupyter
kernel section, hardware I/O in statement table, accurate test counts
(72 interpreter + 14 kernel + 69 compiler), build instructions for
all three targets.
docs/roadmap.md: clean up Phase 2 accumulation into single coherent
compiler description. List all language coverage, operators, functions,
and optimizations. Remove stale intermediate progress markers.
docs/development.md: update test counts (72 programs, 68 golden files),
add kernel and compiler test commands.
Rewrites gfx_draw() as a recursive draw_engine() to support all DRAW
mini-language features:
Bug fixes:
- M command parsing: skip generic arg parser so M100,50 correctly
parses both coordinates instead of consuming x as a generic arg
- S (scale) semantics: distance is now (arg ?: 1) * scale / 4, matching
original GW-BASIC where S4 means 1 pixel per unit, not 4
- A (rotation): implements 90-degree rotation state with direction
vector transform for all 8 direction commands
New features:
- TA n: arbitrary rotation angle (-360 to 360 degrees) via cos/sin
- =variable;: numeric variable substitution in DRAW strings
- X stringvar;: execute substring from string variable (recursive)
- Scale factor applied to relative M coordinates
Binary tokenized SAVE/LOAD now stores float constants in Microsoft Binary
Format (MBF) on disk, matching original GWBASIC.EXE. A token-walking function
(convert_floats) converts IEEE↔MBF at the save_binary()/load_binary() boundary.
Also fixes a latent bug where load_binary() scanned for 0x00 to find the end
of each token line — this fails when float bytes contain null (e.g. MBF for
100.5 is 00 00 49 87). The loader now uses the next-line pointer to compute
token data length, matching the original's approach.
Expand roadmap with detailed implementation plans for the next three
features: MBF binary file compatibility (token-stream conversion at
load/save boundary), hardware I/O simulator (portio.c with PIT/speaker/
CGA/joystick port emulation), and DRAW command fixes (M parsing bug,
scale semantics, A rotation, TA/variable substitution). Remove hardware
I/O from known limitations (moving to planned). Fix stale test counts
(64 -> 66) and version string (0.11.0 -> 0.13.0) across docs.
Apply coordinate mapping (VIEW/WINDOW) to POINT(x,y) function so it
returns correct pixel values when WINDOW is active. Remove unused
palette[] array from graphics.c (Sixel encoder uses palette_map[]
directly). Expand view_window.bas test to cover WINDOW SCREEN mode,
VIEW+WINDOW combination, and PMAP inverse mapping. Fix CI test count
in docs.
BSAVE/BLOAD: save and load virtual memory blocks with 0xFD-header
binary format, operating on the current DEF SEG segment.
TUI color: tui_refresh emits ANSI SGR codes from cell attributes;
COLOR statement sets tui.current_attr when TUI is active.
Extended PEEK/POKE: CGA graphics framebuffer (interlaced layout) via
gfx_cga_peek/poke routed through virmem when gfx_active(); BIOS
keyboard shift flags (offset 0x17 bit 7 = insert mode).
Add bibliography to language reference. 64 tests, all passing.
Binary SAVE/LOAD: SAVE now writes tokenized binary by default (0xFF header
format), matching original GW-BASIC behavior. SAVE "file",A for ASCII.
LOAD auto-detects binary vs ASCII from the first byte. Command-line file
loading also auto-detects, so binary .BAS files just work.
INKEY$ extended keys: arrow keys, Home/End/PgUp/PgDn, Insert/Delete, and
F1-F10 now return the correct CHR$(0) + scan_code two-byte sequences per
the IBM PC convention. Refactored event trap key parsing to use tui_read_key()
instead of duplicating escape sequence parsing.
Golden-file regression tests: generated .expected output files for 55 of 58
test programs (3 timing-dependent tests excluded). The test runner now
reports compat match status alongside pass/fail.
Classic programs: added Hamurabi, Lunar Lander, Gunner, and Diamond from
David Ahl's BASIC Computer Games (1978) in tests/classic/ for manual
compatibility testing.
Docs updated with compiler roadmap item and hardware I/O simulator plan.
Add event-driven programming: ON TIMER(n) GOSUB with TIMER ON/OFF/STOP,
ON KEY(n) GOSUB with KEY(n) ON/OFF/STOP for F1-F10. Fix F-key escape
sequence parser (F9/F10 detection, push back consumed bytes on unmatched
sequences). Add EDIT statement for TUI line editing. Guard key trap
polling so keystrokes aren't consumed when no traps are configured.