4 Commits

Author SHA1 Message Date
Eremey Valetov
89fe0fb0b3 Compiler: $EXTERN pragma for calling C functions from BASIC (Level 2 FFI) (#1)
Add a '$EXTERN NAME(ARGTYPES) AS RET pragma so compiled BASIC can call C
functions directly, the natural follow-up to Level 1 (--emit-obj /
--main-name). The pragma is an apostrophe comment, so the interpreter
ignores it while the compiler registers it.

Map INTEGER/SINGLE/DOUBLE/STRING to int16_t/float/double/const char* at the
boundary: a string argument crosses as a temporary C copy that is freed
after the call, and a string return is copied into the pool. The call name
is matched case-insensitively but emitted as the C symbol with the case
written in the pragma. Names are recognized before parse_var() truncates
identifiers to two significant characters, so multi-character C function
names work.

A string return that aliases a char* argument is copied before the argument
temporaries are freed, which avoids a use-after-free. Over-supplied
arguments are consumed without desyncing the token stream and warn on arity
mismatch.

Docs: getting-started.md "Foreign Functions from BASIC". Test:
tests/run_ffi_test.sh, wired into CI. 63/63 compiler, 72/72 interpreter,
68/68 compat still pass.

Also refile the roadmap "Next Up" backlog as git-bug issues and prune
docs/roadmap.md to point at git-bug as the source of truth for planned work.

Co-authored-by: Eremey Valetov <evvaletov@users.noreply.github.com>
2026-06-13 15:06:23 +03:00
Eremey Valetov
f207d74aec codegen fixes, --no-gc-check / --fast-math, raise caps, DATE$/TIME$ shift
Four roadmap items:

- codegen: fix parenthesized string comparison.  emit_atom didn't
  consume the body of a string-literal token (`"`), so for
  PRINT (A$+B$ < "ZZZ") it emitted a 0 placeholder, advanced one byte,
  and left "ZZZ" to be reparsed as a variable + extra trailing tokens
  -- the binary then failed to link with `var_ZZ_sng` undeclared.
  emit_atom now skips to the closing quote.  Separately, the
  left_type tracking in emit_num_prec dropped VT_STR after a string +
  string concat (becoming VT_SNG), so the string-comparison codepath
  skipped when the relational operator arrived.  Preserve VT_STR
  through TOK_PLUS when both operands are strings.  Verified: paren
  string-cmp now compiles and produces the same -1 / 0 result as the
  interpreter.

- compiler: --no-gc-check and --fast-math optimization flags.
  --no-gc-check skips the per-line gwrt_check_line() (no string-pool
  GC, no Ctrl+Break trap).  --fast-math drops the divide-by-zero
  guard on `/`; the divisor still goes through (double) so 10/0
  produces inf rather than SIGFPE.  Both threaded through
  codegen_opts_t and exposed in --help.  --inline-arrays from the
  roadmap deferred -- larger refactor.

- interp: raise static caps on 32-bit / Linux builds.  vars 256
  -> 1024, arrays 64 -> 256, MAX_FOR_DEPTH 16 -> 64, MAX_GOSUB_DEPTH
  24 -> 128, MAX_WHILE_DEPTH 16 -> 64.  Codegen FOR_STACK_MAX 16
  -> 64.  Analysis-pass caps: MAX_LINES 4096 -> 8192, MAX_VARS 256
  -> 1024, MAX_GOTOS 256 -> 1024, MAX_DATA 1024 -> 4096,
  MAX_GOSUB_RET 256 -> 1024.  16-bit DOS keeps the original modest
  caps via #ifdef _M_I86 -- the MEDIUM model has a single 64KB
  DGROUP for all static data and the bumped sizes broke runtime
  startup under DOSBox-X.  16-bit binary grew from 128KB to 132KB
  from the offset_secs field plus DATE$/TIME$ shift code, well
  within the FreeDOS budget.

- interp + codegen: DATE$ / TIME$ assignment via process-local
  clock offset.  Was a no-op accept-and-ignore.  Now sets
  gw.time_offset_secs (long), and DATE$ / TIME$ / TIMER readers
  apply it to time(NULL) before formatting.  The OS clock is
  unaffected (would need root).  Compiled-binary readers also
  reference gw.time_offset_secs since libgwrt shares the gw
  struct.  Verified: PRINT DATE$; DATE$="12-31-1999"; PRINT DATE$
  shows the expected before/after in both interpreter and AOT
  paths.

After these changes: 72/72 interpreter tests, 68/68 compat, 63/63
compiler tests, DOS smoke under DOSBox-X all pass.  Build clean on
both Linux (cmake) and 16-bit DOS (build_dos.sh 16).
2026-05-04 18:56:58 -04:00
Eremey Valetov
20ecdae938 Add --warn and --safe memory safety flags to the compiler
Three progressive levels for gwbasic-compile:

--warn: static analysis warnings (uninitialized variables, GOTO to
nonexistent line, unreachable code detection). Zero runtime cost.

--safe (implies --warn): runtime checked integer arithmetic via
gw_int_add/sub/mul/neg matching real GW-BASIC overflow semantics,
enhanced array bounds diagnostics with variable names and line numbers,
GOSUB stack overflow diagnostics with source line reporting.

--safe=sanitize (implies --safe): passes -fsanitize=address,undefined
to gcc for full memory error detection.

Also: fix pre-existing missing closing paren in array LET-to-integer
codegen, add strpool_pin/unpin infrastructure, add compiler optimization
flags and memory safety sections to roadmap.

72/72 interpreter tests pass. 64/64 eligible compiler tests pass in
--safe mode.
2026-04-09 13:14:26 -04:00
Eremey Valetov
d3b57d9f3b Implement ahead-of-time compiler (Phase 1): BASIC to C via token stream
New tool gwbasic-compile translates tokenized .bas programs to C source,
which gcc compiles into native executables linked against libgwrt.a (the
interpreter's runtime modules minus the execution loop).

Pipeline: .bas → gw_crunch() → analysis pass (line table, variable census,
GOTO targets, DATA collection) → C codegen → gcc → native executable.

Phase 1 supports: PRINT, LET, IF/THEN/ELSE, GOTO, GOSUB/RETURN, FOR/NEXT,
END/STOP/SYSTEM, REM, DATA/READ/RESTORE, CLS, arithmetic/relational/logical
operators, core math functions (SIN, COS, SQR, ABS, etc.), string functions
(LEFT$, RIGHT$, MID$, CHR$, ASC, VAL, STR$, LEN, etc.), string concatenation.

All control flow uses goto/labels (no C for/while) so GOTO into loops works.
GOSUB uses a return-label stack with switch dispatch.
2026-03-29 06:59:42 -04:00