0
0
mirror of https://github.com/vim/vim.git synced 2025-07-26 11:04:33 -04:00

CI: Manage multibyte characters in syntax tests

As reported in #16559, bytes of a multibyte character may
be written as separate U+FFFD characters in a ":terminal"
window on a busy machine.  The testing facilities currently
offer an optional filtering step to be carried out between
reading and comparing the contents of two screendump files
for each such file.  This filtering has been resorted to
(#14767 and #16560) in an attempt to unconditionally replace
known non-Latin-1 characters with an arbitrary substitute
ASCII character and avoid this rendering mishap leading to
syntax tests failures.  However, it has been overlooked at
the time that metadata description (in shorthand) to follow
spurious U+FFFD characters may be *distinct* and make the
remainder of such a line, ASCII characters and whatnot, also
unequal between compared screendump files.

While it is straightforward to adapt current filter files to
ignore the line characters after the leftmost U+FFFD,

> It is challenging and error-prone to keep up to date filter
> files because moving around examples in source files will
> likely make redundant some previously required filter files
> and, at the same time, it may require creating new filter
> files for the same source file; substituting one multibyte
> character for another multibyte character will also demand
> a coordinated change for filter files.

Besides, unconditionally dropping arbitrary parts of a line
is rather too blunt an instrument.  An alternative approach
is to not use the supported filtering for this purpose; let
a syntax test pass or fail initially; then *if* the same
failure is imminent, drop the leftmost U+FFFD and the rest
of the previously seen line (repeating it for all previously
seen unequal lines) before another round of file contents
comparing.  The obvious disadvantage with this filtering,
unconditional and otherwise, is that if there are consistent
failures for _other reasons_ and the unequal parts happen to
be after U+FFFDs, then spurious test passing can happen when
stars align for _a particular test runner_.

Hence syntax test authors should strive to write as little
significant text after multibyte characters as syntactically
permissible, write multibyte characters closer to EOL in
general, and make sure that their checked-in and published
"*.dump" files do not have any U+FFFDs.

It is also practical to refrain from attempting screendump
generation if U+FFFDs can already be discovered, and instead
try re-running from scratch the syntax test in hand, while
accepting other recently generated screendumps without going
through with new rounds of verification.

Reference:
https://github.com/vim/vim/pull/16470#issuecomment-2599848525

closes: #17704

Signed-off-by: Aliaksei Budavei <0x000c70@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
This commit is contained in:
Aliaksei Budavei 2025-07-25 20:08:52 +02:00 committed by Christian Brabandt
parent 43b99c9376
commit 0fde6aebdd
No known key found for this signature in database
GPG Key ID: F3F92DA383FDDE09
4 changed files with 265 additions and 73 deletions

View File

@ -13,7 +13,7 @@ VIMRUNTIME = ../..
# Uncomment this line to use valgrind for memory leaks and extra warnings.
# VALGRIND = valgrind --tool=memcheck --leak-check=yes --num-callers=45 --log-file=valgrind.$*
# Trace ruler liveness on demand.
# Trace liveness on demand.
# VIM_SYNTAX_TEST_LOG = `pwd`/testdir/failed/00-TRACE_LOG
# ENVVARS = LC_ALL=C VIM_SYNTAX_TEST_LOG="$(VIM_SYNTAX_TEST_LOG)"
@ -39,7 +39,7 @@ test:
@# the "vimcmd" file is used by the screendump utils
@echo "../$(VIMPROG)" > testdir/vimcmd
@echo "$(RUN_VIMTEST)" >> testdir/vimcmd
@# Trace ruler liveness on demand.
@# Trace liveness on demand.
@#mkdir -p testdir/failed
@#touch "$(VIM_SYNTAX_TEST_LOG)"
VIMRUNTIME=$(VIMRUNTIME) $(ENVVARS) $(VIMPROG) --clean --not-a-term $(DEBUGLOG) -u testdir/runtest.vim > /dev/null

View File

@ -61,8 +61,6 @@ an "input/setup/java.vim" script file with the following lines:
Both inline setup commands and setup scripts may be used at the same time, the
script file will be sourced before any VIM_TEST_SETUP commands are executed.
Every line of a source file must not be longer than 1425 (19 x 75) characters.
If there is no further setup required, you can now run all tests:
make test
@ -112,6 +110,20 @@ If they look OK, move them to the "dumps" directory:
If you now run the test again, it will succeed.
Limitations for syntax plugin tests
-----------------------------------
Do not compose ASCII lines that do not fit a 19 by 75 window (1425 columns).
Use multibyte characters, if at all, sparingly (see #16559). When possible,
move multibyte characters closer to the end of a line and keep the line short:
no more than a 75-byte total of displayed characters. A poorly rendered line
may otherwise become wrapped when enough of spurious U+FFFD (0xEF 0xBF 0xBD)
characters claim more columns than are available (75) and then invalidate line
correspondence under test. Refrain from mixing non-spurious U+FFFD characters
with other multibyte characters in the same line.
Adjusting a syntax plugin test
------------------------------

View File

@ -106,15 +106,54 @@ func HandleSwapExists()
endif
endfunc
" Trace ruler liveness on demand.
if !empty($VIM_SYNTAX_TEST_LOG) && filewritable($VIM_SYNTAX_TEST_LOG)
def s:TraceRulerLiveness(context: string, times: number, tail: string)
" Trace liveness.
def s:TraceLiveness(context: string, times: number, tail: string)
writefile([printf('%s: %4d: %s', context, times, tail)],
$VIM_SYNTAX_TEST_LOG,
'a')
enddef
" Anticipate rendering idiosyncrasies (see #16559).
def s:CanFindRenderedFFFDChars(
buf: number,
in_name_and_out_name: string,
times: number): bool
if CannotUseRealEstate(in_name_and_out_name)
return false
endif
# Expect a 20-line terminal buffer (see "term_util#RunVimInTerminal()"),
# where the bottom, reserved line is of the default "&cmdheight".
var lines: list<number> = []
for lnum: number in range(1, 19)
if stridx(term_getline(buf, lnum), "\xef\xbf\xbd") >= 0
add(lines, lnum)
endif
endfor
TraceLiveness('F', times, string(lines))
return !empty(lines)
enddef
else
def s:TraceRulerLiveness(_: string, _: number, _: string)
" Do not trace liveness.
def s:TraceLiveness(_: string, _: number, _: string)
enddef
" Anticipate rendering idiosyncrasies (see #16559).
def s:CanFindRenderedFFFDChars(
buf: number,
in_name_and_out_name: string,
_: number): bool
if CannotUseRealEstate(in_name_and_out_name)
return false
endif
# Expect a 20-line terminal buffer (see "term_util#RunVimInTerminal()"),
# where the bottom, reserved line is of the default "&cmdheight".
for lnum: number in range(1, 19)
if stridx(term_getline(buf, lnum), "\xef\xbf\xbd") >= 0
return true
endif
endfor
return false
enddef
endif
@ -142,6 +181,68 @@ def s:CannotDumpShellFirstPage(buf: number, _: list<string>, ruler: list<string>
get(term_getcursor(buf), 0) != 20)
enddef
def s:CannotUseRealEstate(in_name_and_out_name: string): bool
# Expect defaults from "term_util#RunVimInTerminal()".
if winwidth(1) != 75 || winheight(1) != 20
ch_log(printf('Aborting for %s: (75 x 20) != (%d x %d)',
in_name_and_out_name,
winwidth(1),
winheight(1)))
return true
endif
return false
enddef
" Throw an "FFFD" string if U+FFFD characters are found in the terminal buffer
" during a non-last test round; otherwise, generate a screendump and proceed
" with its verification.
def s:VerifyScreenDumpOrThrowFFFD(
buf: number,
which_page: string,
in_name_and_out_name: string,
aborted_count: number,
max_aborted_count: number,
basename: string,
opts: dict<any>,
page_quota: dict<number>,
seen_pages: list<number>,
page_nr: number): number
if !has_key(page_quota, page_nr)
# Constrain management of unseen pages to the product of "wait" times
# "max_aborted_count" (see "opts" below). When _test repetition_ and
# _line rewriting_ FAIL page verification, the page gets to keep its
# unseen mark; when _test repetition_ is FAILING for a later page, all
# earlier unseen pages get another chance at _test repetition_ etc. before
# further progress can be made for the later page.
page_quota[page_nr] = max_aborted_count
endif
const with_fffd: bool = CanFindRenderedFFFDChars(
buf,
in_name_and_out_name,
(max_aborted_count - aborted_count + 1))
if with_fffd && aborted_count > 1
throw 'FFFD'
endif
ch_log(which_page .. ' screendump for ' .. in_name_and_out_name)
# Generate a screendump of every 19 lines of "buf", reusing the bottom line
# (or the bottom six or so lines for "*_01.dump") from the previous dump as
# the top line(s) in the next dump for continuity. Constrain generation of
# unseen pages for the last test round (via "wait").
const status: number = g:VerifyScreenDump(
buf,
basename,
(aborted_count != max_aborted_count)
? extend({wait: max_aborted_count}, opts, 'keep')
: opts)
if !with_fffd || (!status || !page_quota[page_nr])
add(seen_pages, page_nr)
else
TraceLiveness('Q', (max_aborted_count - aborted_count + 1), string(page_quota))
endif
page_quota[page_nr] -= 1
return status
enddef
" Poll for updates of the cursor position in the terminal buffer occupying the
" first window. (ALWAYS call the function or its equivalent before calling
" "VerifyScreenDump()" *and* after calling any number of "term_sendkeys()".)
@ -149,12 +250,7 @@ def s:TermPollRuler(
CannotDumpPage: func, # (TYPE FOR LEGACY CONTEXT CALL SITES.)
buf: number,
in_name_and_out_name: string): list<string>
# Expect defaults from "term_util#RunVimInTerminal()".
if winwidth(1) != 75 || winheight(1) != 20
ch_log(printf('Aborting for %s: (75 x 20) != (%d x %d)',
in_name_and_out_name,
winwidth(1),
winheight(1)))
if CannotUseRealEstate(in_name_and_out_name)
return ['0,0-1', 'All']
endif
# A two-fold role for redrawing:
@ -177,7 +273,7 @@ def s:TermPollRuler(
redraw
endif
endwhile
TraceRulerLiveness('P', (2048 - times), in_name_and_out_name)
TraceLiveness('P', (2048 - times), in_name_and_out_name)
return ruler
enddef
@ -186,12 +282,7 @@ enddef
" the terminal buffer. (Call the function before calling "VerifyScreenDump()"
" for the first time.)
def s:TermWaitAndPollRuler(buf: number, in_name_and_out_name: string): list<string>
# Expect defaults from "term_util#RunVimInTerminal()".
if winwidth(1) != 75 || winheight(1) != 20
ch_log(printf('Aborting for %s: (75 x 20) != (%d x %d)',
in_name_and_out_name,
winwidth(1),
winheight(1)))
if CannotUseRealEstate(in_name_and_out_name)
return ['0,0-1', 'All']
endif
# The contents of "ruler".
@ -210,7 +301,7 @@ def s:TermWaitAndPollRuler(buf: number, in_name_and_out_name: string): list<stri
redraw
endif
endwhile
TraceRulerLiveness('W', (32768 - times), in_name_and_out_name)
TraceLiveness('W', (32768 - times), in_name_and_out_name)
if strpart(ruler, 0, 8) !=# 'is_nonce'
# Retain any of "b:is_(bash|dash|kornshell|posix|sh)" entries and let
# "CannotDumpShellFirstPage()" win the cursor race.
@ -283,6 +374,34 @@ func RunTest()
redraw!
endfunc
def CollectFFFDChars()
const fffd: string = "\xef\xbf\xbd"
const flags: string = 'eW'
const pos: list<number> = getpos('.')
var fffds: list<list<number>> = []
try
cursor(1, 1)
var prev: list<number> = [0, 0]
var next: list<number> = [0, 0]
next = searchpos(fffd, 'c' .. flags)
while next[0] > 0 && prev != next
add(fffds, next)
prev = next
next = searchpos(fffd, flags)
endwhile
finally
setpos('.', pos)
endtry
if !empty(fffds)
# Use "actions/upload-artifact@v4" of ci.yml for delivery.
writefile(
[printf('%s: %s', bufname('%'), string(fffds))],
'failed/10-FFFDS',
'a')
endif
redraw!
enddef
def s:AssertCursorForwardProgress(): bool
const curnum: number = line('.')
if curnum <= cursor
@ -383,12 +502,18 @@ func RunTest()
return AssertCursorForwardProgress()
enddef
END
let MAX_ABORTED_COUNT = 5
let MAX_FAILED_COUNT = 5
let DUMP_OPTS = exists("$VIM_SYNTAX_TEST_WAIT_TIME") &&
let DUMP_OPTS = extend(
\ exists("$VIM_SYNTAX_TEST_WAIT_TIME") &&
\ !empty($VIM_SYNTAX_TEST_WAIT_TIME)
\ ? {'wait': max([1, str2nr($VIM_SYNTAX_TEST_WAIT_TIME)])}
\ : {}
lockvar DUMP_OPTS MAX_FAILED_COUNT XTESTSCRIPT
\ ? {'wait': max([1, str2nr($VIM_SYNTAX_TEST_WAIT_TIME)])}
\ : {},
\ {'FileComparisonPreAction':
\ function('g:ScreenDumpDiscardFFFDChars'),
\ 'NonEqualLineComparisonPostAction':
\ function('g:ScreenDumpLookForFFFDChars')})
lockvar DUMP_OPTS MAX_FAILED_COUNT MAX_ABORTED_COUNT XTESTSCRIPT
let ok_count = 0
let disused_pages = []
let failed_tests = []
@ -456,64 +581,118 @@ func RunTest()
let args = !empty(path)
\ ? prefix .. ' -S ' .. path
\ : prefix
let buf = RunVimInTerminal(args, {})
" edit the file only after catching the SwapExists event
call term_sendkeys(buf, ":edit " .. fname .. "\<CR>")
" set up the testing environment
call term_sendkeys(buf, ":call SetUpVim()\<CR>")
" load filetype specific settings
call term_sendkeys(buf, ":call LoadFiletype('" .. filetype .. "')\<CR>")
let fail = 0
" Make a synchronisation point between buffers by requesting to echo
" a known token in the terminal buffer and asserting its availability
" with "s:TermWaitAndPollRuler()".
if filetype == 'sh'
call term_sendkeys(buf, ":call ShellInfo()\<CR>")
else
call term_sendkeys(buf, ":echo 'is_nonce'\<CR>")
endif
let root_00 = root .. '_00'
let in_name_and_out_name = fname .. ': failed/' .. root_00 .. '.dump'
" Queue up all "term_sendkeys()"es and let them finish before returning
" from "s:TermWaitAndPollRuler()".
let ruler = s:TermWaitAndPollRuler(buf, in_name_and_out_name)
call ch_log('First screendump for ' .. in_name_and_out_name)
" Make a screendump at the start of the file: failed/root_00.dump
let fail = VerifyScreenDump(buf, root_00, DUMP_OPTS)
let nr = 0
" Accommodate the next code block to "buf"'s contingency for self
" wipe-out.
try
let keys_a = ":call ScrollToSecondPage((18 * 75 + 1), 19, 5) | redraw!\<CR>"
let keys_b = ":call ScrollToNextPage((18 * 75 + 1), 19, 5) | redraw!\<CR>"
while s:CannotSeeLastLine(ruler)
call term_sendkeys(buf, keys_a)
let keys_a = keys_b
let nr += 1
let root_next = printf('%s_%02d', root, nr)
let in_name_and_out_name = fname .. ': failed/' .. root_next .. '.dump'
let ruler = s:TermPollRuler(
\ function('s:CannotDumpNextPage', [buf, ruler]),
\ buf,
\ in_name_and_out_name)
call ch_log('Next screendump for ' .. in_name_and_out_name)
" Make a screendump of every 18 lines of the file: failed/root_NN.dump
let fail += VerifyScreenDump(buf, root_next, DUMP_OPTS)
let aborted_count = MAX_ABORTED_COUNT
let collected_count = 0
let seen_pages = []
let page_quota = {}
" See #16559. For each processed page, repeat pre-verification steps
" from scratch (subject to page cacheing) whenever U+FFFD characters
" are found in the terminal buffer with "term_getline()", i.e. treat
" these pages as if they were distinct test files. U+FFFD characters
" found at the last attempt (see "MAX_ABORTED_COUNT") will be ignored
" and "VerifyScreenDump()" will take over with own filtering.
while aborted_count > 0
let buf = RunVimInTerminal(args, {})
try
" edit the file only after catching the SwapExists event
call term_sendkeys(buf, ":edit " .. fname .. "\<CR>")
" set up the testing environment
call term_sendkeys(buf, ":call SetUpVim()\<CR>")
" load filetype specific settings
call term_sendkeys(buf, ":call LoadFiletype('" .. filetype .. "')\<CR>")
" Collect all *non-spurious* U+FFFD characters for scrutiny.
if aborted_count == 1 && collected_count != 1
let collected_count = 1
call term_sendkeys(buf, ":call CollectFFFDChars()\<CR>")
endif
" Make a synchronisation point between a terminal buffer and
" another buffer by requesting to echo a known token in the former
" and asserting its availability with "s:TermWaitAndPollRuler()"
" from the latter.
if filetype == 'sh'
call term_sendkeys(buf, ":call ShellInfo()\<CR>")
else
call term_sendkeys(buf, ":echo 'is_nonce'\<CR>")
endif
let page_nr = 0
let root_00 = root .. '_00'
let in_name_and_out_name = fname .. ': failed/' .. root_00 .. '.dump'
" Queue up all "term_sendkeys()"es and let them finish before
" returning from "s:TermWaitAndPollRuler()".
let ruler = s:TermWaitAndPollRuler(buf, in_name_and_out_name)
if index(seen_pages, page_nr) < 0
let fail += s:VerifyScreenDumpOrThrowFFFD(
\ buf,
\ 'First',
\ in_name_and_out_name,
\ aborted_count,
\ MAX_ABORTED_COUNT,
\ root_00,
\ DUMP_OPTS,
\ page_quota,
\ seen_pages,
\ page_nr)
" Reset "aborted_count" for another page.
let aborted_count = MAX_ABORTED_COUNT
endif
let keys_a = ":call ScrollToSecondPage((18 * 75 + 1), 19, 5) | redraw!\<CR>"
let keys_b = ":call ScrollToNextPage((18 * 75 + 1), 19, 5) | redraw!\<CR>"
while s:CannotSeeLastLine(ruler)
call term_sendkeys(buf, keys_a)
let keys_a = keys_b
let page_nr += 1
let root_next = printf('%s_%02d', root, page_nr)
let in_name_and_out_name = fname .. ': failed/' .. root_next .. '.dump'
let ruler = s:TermPollRuler(
\ function('s:CannotDumpNextPage', [buf, ruler]),
\ buf,
\ in_name_and_out_name)
if index(seen_pages, page_nr) < 0
let fail += s:VerifyScreenDumpOrThrowFFFD(
\ buf,
\ 'Next',
\ in_name_and_out_name,
\ aborted_count,
\ MAX_ABORTED_COUNT,
\ root_next,
\ DUMP_OPTS,
\ page_quota,
\ seen_pages,
\ page_nr)
" Reset "aborted_count" for another page.
let aborted_count = MAX_ABORTED_COUNT
endif
endwhile
call StopVimInTerminal(buf)
break
catch /^FFFD$/
" Clear out.
call StopVimInTerminal(buf)
while winnr('$') > 1
close
endwhile
let aborted_count -= 1
endtry
endwhile
call StopVimInTerminal(buf)
finally
call delete('Xtestscript')
endtry
let nr += 1
let pagename = printf('dumps/%s_%02d.dump', root, nr)
let page_nr += 1
let pagename = printf('dumps/%s_%02d.dump', root, page_nr)
while filereadable(pagename)
call add(disused_pages, pagename)
let nr += 1
let pagename = printf('dumps/%s_%02d.dump', root, nr)
let page_nr += 1
let pagename = printf('dumps/%s_%02d.dump', root, page_nr)
endwhile
" redraw here to avoid the following messages to get mixed up with screen

View File

@ -105,6 +105,7 @@ enddef
" some dictionary with some state entries;
" the file contents of the newly generated screen dump;
" the zero-based number of the line whose copies are not equal.
" (See an example in runtime/syntax/testdir/runtest.vim.)
"
" The file name used is "dumps/{filename}.dump".
"