codegen fixes, --no-gc-check / --fast-math, raise caps, DATE$/TIME$ shift

Four roadmap items:

- codegen: fix parenthesized string comparison.  emit_atom didn't
  consume the body of a string-literal token (`"`), so for
  PRINT (A$+B$ < "ZZZ") it emitted a 0 placeholder, advanced one byte,
  and left "ZZZ" to be reparsed as a variable + extra trailing tokens
  -- the binary then failed to link with `var_ZZ_sng` undeclared.
  emit_atom now skips to the closing quote.  Separately, the
  left_type tracking in emit_num_prec dropped VT_STR after a string +
  string concat (becoming VT_SNG), so the string-comparison codepath
  skipped when the relational operator arrived.  Preserve VT_STR
  through TOK_PLUS when both operands are strings.  Verified: paren
  string-cmp now compiles and produces the same -1 / 0 result as the
  interpreter.

- compiler: --no-gc-check and --fast-math optimization flags.
  --no-gc-check skips the per-line gwrt_check_line() (no string-pool
  GC, no Ctrl+Break trap).  --fast-math drops the divide-by-zero
  guard on `/`; the divisor still goes through (double) so 10/0
  produces inf rather than SIGFPE.  Both threaded through
  codegen_opts_t and exposed in --help.  --inline-arrays from the
  roadmap deferred -- larger refactor.

- interp: raise static caps on 32-bit / Linux builds.  vars 256
  -> 1024, arrays 64 -> 256, MAX_FOR_DEPTH 16 -> 64, MAX_GOSUB_DEPTH
  24 -> 128, MAX_WHILE_DEPTH 16 -> 64.  Codegen FOR_STACK_MAX 16
  -> 64.  Analysis-pass caps: MAX_LINES 4096 -> 8192, MAX_VARS 256
  -> 1024, MAX_GOTOS 256 -> 1024, MAX_DATA 1024 -> 4096,
  MAX_GOSUB_RET 256 -> 1024.  16-bit DOS keeps the original modest
  caps via #ifdef _M_I86 -- the MEDIUM model has a single 64KB
  DGROUP for all static data and the bumped sizes broke runtime
  startup under DOSBox-X.  16-bit binary grew from 128KB to 132KB
  from the offset_secs field plus DATE$/TIME$ shift code, well
  within the FreeDOS budget.

- interp + codegen: DATE$ / TIME$ assignment via process-local
  clock offset.  Was a no-op accept-and-ignore.  Now sets
  gw.time_offset_secs (long), and DATE$ / TIME$ / TIMER readers
  apply it to time(NULL) before formatting.  The OS clock is
  unaffected (would need root).  Compiled-binary readers also
  reference gw.time_offset_secs since libgwrt shares the gw
  struct.  Verified: PRINT DATE$; DATE$="12-31-1999"; PRINT DATE$
  shows the expected before/after in both interpreter and AOT
  paths.

After these changes: 72/72 interpreter tests, 68/68 compat, 63/63
compiler tests, DOS smoke under DOSBox-X all pass.  Build clean on
both Linux (cmake) and 16-bit DOS (build_dos.sh 16).
This commit is contained in:
Eremey Valetov
2026-05-04 18:56:58 -04:00
parent da1e6cebf1
commit f207d74aec
9 changed files with 162 additions and 46 deletions

View File

@@ -113,16 +113,32 @@ line numbers are preserved. Direct-mode scratchpad scripts and classic
```
Usage: gwbasic-compile [options] input.bas
Options:
-o FILE Output C source file (default: stdout)
-c Compile to executable (invoke gcc)
-O LEVEL GCC optimization level (default: 2)
--keep-c Keep generated C file (with -c)
--runtime DIR Path to runtime headers/library
--warn Static analysis warnings
--safe Runtime safety checks (implies --warn)
-o FILE Output C source file (default: stdout)
-c Compile to executable (invoke gcc)
-O LEVEL GCC optimization level (default: 2)
--keep-c Keep generated C file (with -c)
--runtime DIR Path to runtime headers/library
--warn Static analysis warnings
--safe Runtime safety checks (implies --warn)
--safe=sanitize Above + address/UB sanitizers (with -c)
--no-gc-check Skip per-line gwrt_check_line() (no GC, no Break)
--fast-math Skip division-by-zero checks
```
### Performance Flags (`--no-gc-check` / `--fast-math`)
`--no-gc-check` skips the `gwrt_check_line()` call emitted at the start of
every non-REM line. That call drives the string-pool compacting GC and
the Ctrl+Break trap. Removing it gives a small per-line speedup for
programs that don't allocate strings or need responsive interruption.
String reassignment can still trigger compaction lazily, but the
guaranteed periodic check is gone.
`--fast-math` removes the explicit divide-by-zero check around the `/`
operator. The result of `X = 10 / 0` becomes `inf` rather than raising
"Division by zero". Useful for compute-bound code that already validates
inputs.
### Memory Safety (`--warn` / `--safe`)
The `--warn` flag enables compile-time static analysis warnings:

View File

@@ -112,8 +112,12 @@ Tested on FreeDOS 1.4 via QEMU.
## Known Limitations
- Maximum 256 variables, 64 arrays, 16 FOR nesting, 24 GOSUB nesting,
16 WHILE nesting
- Static caps -- 32-bit / Linux builds: 1024 variables, 256 arrays,
64 FOR nesting, 128 GOSUB nesting, 64 WHILE nesting. 16-bit real-mode
DOS keeps the original modest caps (256 / 64 / 16 / 24 / 16) because
the MEDIUM model has a single 64KB DGROUP for all static data.
- `CALL`/`CALLS` (machine code execution) raises Illegal function call
- `DATE$`/`TIME$` assignment accepted but does not modify the system clock
- `DATE$`/`TIME$` assignment shifts the program's view of the clock via
a process-local offset; the OS time is unaffected (setting the OS
clock would require root)
- Device stubs (`ERDEV`, `IOCTL`, `COM`, `LCOPY`) return defaults

View File

@@ -5,11 +5,11 @@
#include <stdbool.h>
#include <stdint.h>
#define MAX_LINES 4096
#define MAX_VARS 256
#define MAX_GOTOS 256
#define MAX_DATA 1024
#define MAX_GOSUB_RET 256
#define MAX_LINES 8192
#define MAX_VARS 1024
#define MAX_GOTOS 1024
#define MAX_DATA 4096
#define MAX_GOSUB_RET 1024
typedef struct {
uint16_t line_num;

View File

@@ -8,6 +8,9 @@
typedef struct {
bool safe_mode; /* --safe: emit runtime safety checks */
bool warn_mode; /* --warn: static analysis warnings */
bool no_gc_check; /* --no-gc-check: skip gwrt_check_line per line
* (no string-pool GC, no Ctrl+Break check) */
bool fast_math; /* --fast-math: skip / by-zero checks */
} codegen_opts_t;
/* Generate C source from the analyzed program */

View File

@@ -24,17 +24,29 @@ typedef struct {
/* Default variable types for A-Z (DEFTBL) */
gw_valtype_t def_type[26];
/* Variable storage */
var_entry_t vars[256];
/* Variable storage. Caps stay modest on 16-bit DOS (MEDIUM model has
* a single 64KB DGROUP for all static data); 32-bit / Linux builds
* raise them substantially. */
#ifdef _M_I86
#define MAX_VAR_TABLE 256
#define MAX_ARRAY_TABLE 64
#define MAX_FOR_DEPTH 16
#define MAX_GOSUB_DEPTH 24
#define MAX_WHILE_DEPTH 16
#else
#define MAX_VAR_TABLE 1024
#define MAX_ARRAY_TABLE 256
#define MAX_FOR_DEPTH 64
#define MAX_GOSUB_DEPTH 128
#define MAX_WHILE_DEPTH 64
#endif
var_entry_t vars[MAX_VAR_TABLE];
int var_count;
array_entry_t arrays[64];
array_entry_t arrays[MAX_ARRAY_TABLE];
int array_count;
int option_base;
/* Control flow stacks */
#define MAX_FOR_DEPTH 16
#define MAX_GOSUB_DEPTH 24
#define MAX_WHILE_DEPTH 16
for_entry_t for_stack[MAX_FOR_DEPTH];
int for_sp;
gosub_entry_t gosub_stack[MAX_GOSUB_DEPTH];
@@ -53,6 +65,11 @@ typedef struct {
uint8_t *cont_text;
program_line_t *cont_line;
/* Process-local clock offset (seconds). DATE$ / TIME$ assignments
* shift the program's view of the clock without touching the OS
* time (which would need root). Defaults to 0. */
long time_offset_secs;
/* DATA pointer */
uint8_t *data_ptr;
program_line_t *data_line_ptr;

View File

@@ -19,13 +19,15 @@
static FILE *out;
static analysis_t *ana;
static bool safe_mode;
static bool no_gc_check;
static bool fast_math;
static uint16_t emit_line; /* current BASIC line number being emitted */
static uint8_t *tp; /* token pointer (mirrors gw.text_ptr) */
static int ret_label_counter;
static int for_label_counter;
/* FOR stack: maps variable to its for_label_counter */
#define FOR_STACK_MAX 16
#define FOR_STACK_MAX 64
static struct { char name[2]; gw_valtype_t type; int label; bool has_step; } for_stack[FOR_STACK_MAX];
static int for_stack_sp;
@@ -560,7 +562,8 @@ static void emit_atom(void)
tp += 2;
skip_spaces();
if (xtok == XSTMT_TIMER) {
EMIT("((float)(time(NULL) %% 86400))"); /* seconds since midnight */
/* Seconds since midnight, offset-aware. */
EMIT("((float)(((time(NULL)+gw.time_offset_secs) %% 86400)))");
return;
}
/* PMAP(coord, func) */
@@ -701,6 +704,21 @@ static void emit_atom(void)
return;
}
/* String literal in numeric context: emit a placeholder zero but
* consume the entire literal body (up to the closing quote) so the
* outer parser doesn't reparse the contents as random tokens. The
* VT_STR-aware caller (emit_num_prec's string-cmp path) re-reads tp
* from left_start and routes the operand through emit_str_expr; the
* placeholder is a fallback for non-cmp contexts where a string in
* a numeric position would already be a type error. */
if (tok == '"') {
EMIT("0 /* str literal in num ctx */");
tp++;
while (*tp && *tp != '"') tp++;
if (*tp == '"') tp++;
return;
}
/* Fallback */
EMIT("0 /* unknown tok 0x%02x */", tok);
tp++;
@@ -789,10 +807,19 @@ static void emit_num_prec(int min_prec)
EMIT(" %s ", rop);
emit_num_prec(prec + 1);
} else if (op == TOK_DIV) {
/* Division with zero-check via GCC statement expression */
EMIT(" / ({double _d=");
emit_num_prec(prec + 1);
EMIT("; if(_d==0.0)gw_error(11); _d;})");
if (fast_math) {
/* Force float division (GW-BASIC / always returns float).
* Without the cast, integer / integer would trap on
* divide-by-zero with SIGFPE on Linux x86. */
EMIT(" / (double)(");
emit_num_prec(prec + 1);
EMIT(")");
} else {
/* Division with zero-check via GCC statement expression */
EMIT(" / ({double _d=");
emit_num_prec(prec + 1);
EMIT("; if(_d==0.0)gw_error(11); _d;})");
}
} else {
EMIT(" %s ", binop_c(op));
emit_num_prec(prec + 1);
@@ -919,9 +946,12 @@ static void emit_num_prec(int min_prec)
} else if (op == TOK_POW) {
fprintf(cm, "pow((double)(%s), (double)(%s))", left, right);
} else if (op == TOK_DIV) {
/* GW-BASIC / always produces float; check for division by zero */
fprintf(cm, "({double _dv=(double)(%s); if(_dv==0.0) gw_error(11); (double)(%s)/_dv;})",
right, left);
if (fast_math)
fprintf(cm, "((double)(%s) / (double)(%s))", left, right);
else
/* GW-BASIC / always produces float; check for division by zero */
fprintf(cm, "({double _dv=(double)(%s); if(_dv==0.0) gw_error(11); (double)(%s)/_dv;})",
right, left);
} else if (safe_mode && left_type == VT_INT && right_type == VT_INT &&
(op == TOK_PLUS || op == TOK_MINUS || op == TOK_MUL)) {
const char *fn = op == TOK_PLUS ? "gw_int_add"
@@ -942,6 +972,8 @@ static void emit_num_prec(int min_prec)
left_type = VT_DBL;
else if (cop || op == TOK_GT || op == TOK_LT || op == TOK_EQ)
left_type = VT_INT; /* comparisons return 0/-1 */
else if (op == TOK_PLUS && left_type == VT_STR && right_type == VT_STR)
left_type = VT_STR; /* string concat stays string */
else if (left_type != VT_INT || right_type != VT_INT)
left_type = (left_type == VT_DBL || right_type == VT_DBL)
? VT_DBL : VT_SNG;
@@ -1075,14 +1107,14 @@ static void emit_str_atom(void)
tp += 2;
skip_spaces();
if (xtok == XSTMT_DATE) {
EMIT("({time_t _t=time(NULL); struct tm *_tm=localtime(&_t);"
EMIT("({time_t _t=time(NULL)+gw.time_offset_secs; struct tm *_tm=localtime(&_t);"
" char _db[16]; snprintf(_db,16,\"%%02d-%%02d-%%04d\","
"_tm->tm_mon+1,_tm->tm_mday,_tm->tm_year+1900);"
" gw_str_from_cstr(_db);})");
return;
}
if (xtok == XSTMT_TIME) {
EMIT("({time_t _t=time(NULL); struct tm *_tm=localtime(&_t);"
EMIT("({time_t _t=time(NULL)+gw.time_offset_secs; struct tm *_tm=localtime(&_t);"
" char _tb[16]; snprintf(_tb,16,\"%%02d:%%02d:%%02d\","
"_tm->tm_hour,_tm->tm_min,_tm->tm_sec);"
" gw_str_from_cstr(_tb);})");
@@ -2709,6 +2741,8 @@ void codegen_emit(FILE *f, analysis_t *a, const codegen_opts_t *opts)
out = f;
ana = a;
safe_mode = opts ? opts->safe_mode : false;
no_gc_check = opts ? opts->no_gc_check : false;
fast_math = opts ? opts->fast_math : false;
ret_label_counter = 0;
for_label_counter = 0;
for_stack_sp = 0;
@@ -2786,9 +2820,11 @@ void codegen_emit(FILE *f, analysis_t *a, const codegen_opts_t *opts)
emit_line = line->num;
EMIT("L_%u:\n", line->num);
/* Skip GC/break check for REM-only lines */
/* Skip GC/break check for REM-only lines, and for the entire
* program under --no-gc-check (string pool will only compact when
* a heap-pressure threshold is hit, and Ctrl+Break is ignored). */
bool is_rem = (line->tokens[0] == TOK_REM || line->tokens[0] == TOK_SQUOTE);
if (!is_rem)
if (!is_rem && !no_gc_check)
EMIT(" gwrt_check_line(%u);\n", line->num);
/* Walk statements on this line */

View File

@@ -145,14 +145,16 @@ static void usage(void)
fprintf(stderr,
"Usage: gwbasic-compile [options] input.bas\n"
"Options:\n"
" -o FILE Output C source file (default: stdout)\n"
" -c Compile to executable (invoke gcc)\n"
" -O LEVEL GCC optimization level (default: 2)\n"
" --keep-c Keep generated C file (with -c)\n"
" --runtime DIR Path to runtime headers/library\n"
" --warn Static analysis warnings\n"
" --safe Runtime safety checks (implies --warn)\n"
" -o FILE Output C source file (default: stdout)\n"
" -c Compile to executable (invoke gcc)\n"
" -O LEVEL GCC optimization level (default: 2)\n"
" --keep-c Keep generated C file (with -c)\n"
" --runtime DIR Path to runtime headers/library\n"
" --warn Static analysis warnings\n"
" --safe Runtime safety checks (implies --warn)\n"
" --safe=sanitize Above + address/UB sanitizers (with -c)\n"
" --no-gc-check Skip per-line gwrt_check_line() (no GC, no Break)\n"
" --fast-math Skip division-by-zero checks\n"
);
}
@@ -167,6 +169,8 @@ int main(int argc, char **argv)
bool warn_mode = false;
bool safe_mode = false;
bool sanitize_mode = false;
bool no_gc_check = false;
bool fast_math = false;
for (int i = 1; i < argc; i++) {
if (strcmp(argv[i], "-o") == 0 && i + 1 < argc)
@@ -185,6 +189,10 @@ int main(int argc, char **argv)
sanitize_mode = safe_mode = warn_mode = true;
else if (strcmp(argv[i], "--safe") == 0)
safe_mode = warn_mode = true;
else if (strcmp(argv[i], "--no-gc-check") == 0)
no_gc_check = true;
else if (strcmp(argv[i], "--fast-math") == 0)
fast_math = true;
else if (strcmp(argv[i], "-h") == 0 || strcmp(argv[i], "--help") == 0) {
usage();
return 0;
@@ -230,7 +238,12 @@ int main(int argc, char **argv)
return 1;
}
codegen_opts_t opts = { .safe_mode = safe_mode, .warn_mode = warn_mode };
codegen_opts_t opts = {
.safe_mode = safe_mode,
.warn_mode = warn_mode,
.no_gc_check = no_gc_check,
.fast_math = fast_math,
};
codegen_emit(f, &analysis, &opts);
if (f != stdout)

View File

@@ -998,7 +998,7 @@ static gw_value_t eval_atom(void)
gw_value_t v;
v.type = VT_STR;
char tbuf[40];
time_t now = time(NULL);
time_t now = time(NULL) + gw.time_offset_secs;
struct tm *tm = localtime(&now);
if (xtok == XSTMT_DATE) {
snprintf(tbuf, sizeof(tbuf), "%02d-%02d-%04d",
@@ -1013,7 +1013,7 @@ static gw_value_t eval_atom(void)
}
if (xtok == XSTMT_TIMER) {
gw_chrget();
time_t now = time(NULL);
time_t now = time(NULL) + gw.time_offset_secs;
struct tm *tm = localtime(&now);
gw_value_t v;
v.type = VT_SNG;

View File

@@ -1385,21 +1385,48 @@ void gw_exec_stmt(void)
return;
}
/* DATE$ = "string" — accept and ignore (don't modify system clock) */
/* DATE$ = "MM-DD-YYYY" — shift the process-local clock so that
* DATE$ / TIME$ / TIMER readers see the new date. Time-of-day
* is preserved. The OS clock is unaffected. */
if (xstmt == XSTMT_DATE) {
gw_chrget();
gw_expect(TOK_EQ);
gw_value_t val = gw_eval_str();
char *s = gw_str_to_cstr(&val.sval);
gw_str_free(&val.sval);
int mon, day, year;
if (sscanf(s, "%d-%d-%d", &mon, &day, &year) == 3) {
time_t now = time(NULL);
struct tm tm = *localtime(&now);
tm.tm_mon = mon - 1;
tm.tm_mday = day;
tm.tm_year = year >= 1900 ? year - 1900 : year + 100;
time_t target = mktime(&tm);
gw.time_offset_secs = (long)(target - now);
}
free(s);
return;
}
/* TIME$ = "string" — accept and ignore (don't modify system clock) */
/* TIME$ = "HH:MM:SS" — shift the process-local clock to the new
* time-of-day; date is preserved. */
if (xstmt == XSTMT_TIME) {
gw_chrget();
gw_expect(TOK_EQ);
gw_value_t val = gw_eval_str();
char *s = gw_str_to_cstr(&val.sval);
gw_str_free(&val.sval);
int hour, min, sec;
if (sscanf(s, "%d:%d:%d", &hour, &min, &sec) == 3) {
time_t now = time(NULL) + gw.time_offset_secs;
struct tm tm = *localtime(&now);
tm.tm_hour = hour;
tm.tm_min = min;
tm.tm_sec = sec;
time_t target = mktime(&tm);
gw.time_offset_secs = (long)(target - time(NULL));
}
free(s);
return;
}