mdoc man page covering all modes and the OTS/ingest long options,
verified with groff and NetBSD mandoc. CMake installs the binary and
the man page (guarded against add_subdirectory embedding). Also
corrects the stale direction-1 comment in the DOSBox round-trip
script: multi-file archives created by v3 have extracted fine in the
original since the custom-Huffman-tree fix.
tests/scripts/dos_smoke.sh runs the DJGPP-built uc2 inside DOSBox-X
via the flatpak and asserts:
- uc2 -h loads under a real DPMI host and prints the banner
- uc2 -l <archive> opens an existing UC2 archive and produces output
Skips cleanly when any of uc2.exe, CWSDPMI.EXE, or DOSBox-X are
missing. CWSDPMI.EXE is the standard DJGPP DPMI extender from
csdpmi7b.zip; fetch recipe added to cmake/README-djgpp.md.
Verified locally against build-djgpp/cli/uc2.exe +
tests/archives/basic.uc2.
Closes 20019aa. CI matrix entry (9379647) remains a separate
follow-up.
Same bug class as dae8a50 and 6d8087f: under -DNDEBUG (CMake's default
for Release, which CI uses) the assert macro expands to ((void)0) and
the wrapped expression is not evaluated. Calls inside assert() are
silently dropped.
Found 6 occurrences in test_ots.c (uc2_ots_varint_decode, parse_file)
where the call writes through output pointers. Under Release builds
these tests silently no-op rather than testing anything. Converted to
capture-then-check.
Audit otherwise clean: production code (lib/, cli/) has only one
assert-on-call, and it wraps a pure arithmetic helper.
Adds tests/scripts/check_assert_side_effects.py as a CI gate to keep
this class of bug out: matches assert(IDENT(...)) where IDENT contains
a side-effect verb (encode/decode/parse/...). Pure queries (_equal,
_match, _verify, _has_, _is_, _id, _root, _attest_name, memcmp, ...)
are not flagged. Wired into build.yml on the Linux runner.
Also gitignore Testing/ (CTest run outputs) and __pycache__/.
uc2_sha256: pure-C FIPS 180-4 implementation, one-shot and incremental
API, validated against published vectors (empty, abc, 56-byte,
1M 'a', byte-by-byte, every-split-point boundary).
uc2_ots: parser, serializer, and walker for the standard .ots binary
format. Strict canonical varint with 64-bit overflow check, depth-
bounded recursion, varbytes cap, max-digest cap. Walker supports
the calendar-path subset (APPEND, PREPEND, SHA256); proofs that
include other crypto ops (SHA1, RIPEMD160, KECCAK256) are accepted
as structurally valid but flagged for follow-up via the standard
'ots verify'.
UC2-OTS trailer: magic-bracketed sidecar appended after the recorded
archive bytes. Reverse-scan-safe; original UC2 Pro reader ignores
trailing bytes past its recorded length so backward compatibility is
preserved. Layout (all integers little-endian uint32):
front-magic + version + archive_len + proof_len + proof
+ proof_len + back-magic.
CLI: --ots-attach validates that the proof's leaf digest equals
SHA-256(archive[0..archive_len)) before appending and refuses to
overwrite an existing trailer unless -f is given. --ots-extract
writes the proof verbatim, byte-compatible with the standard
'ots verify'. --ots-info parses and prints the leaf, archive-match
status, and attestation list. uc2 -t recomputes the archive
SHA-256 and walks the proof.
Tests: 17 OTS unit tests (varint round-trip, canonical/overflow
rejection, file-envelope round-trip, walker on append/sha256/
sibling/unsupported-op/truncated/trailing-garbage, attest_name,
trailer round-trip + corruption rejection in 4 scenarios).
Plus an optional ctest target ots_cross_check that round-trips
the .ots through python-opentimestamps when the package is
installed; skipped (return code 77) otherwise.
Always assign custom master indices (>= FIRSTMASTER=2) to all files,
never SuperMaster (index 0). The original's ExtractFiles() routes
SuperMaster files through a code path that hangs. The original itself
never uses SuperMaster in file COMPRESS records — it always creates
at least one custom master, even for archives without dedup groups.
For ungrouped files, a default custom master is built from the largest
file's first 64KB. All files reference this master, matching the
original's archive structure.
The automated DOSBox-X test now validates multi-file round-trip in
both directions: 4 files UC2 v3 -> original, 5 files original -> UC2 v3.
All content verified byte-for-byte.
Single-file UC2 v3 archives are now fully backward compatible with the
original UC2 Pro — listing and extraction verified in automated DOSBox-X
test. SFX extraction timeout increased to 600s with 22-file completeness
check (incomplete extraction caused false test results throughout the
earlier investigation). Direction 1 (UC2 v3 -> original) test added.
Root cause: the original UC2 Pro expects csize=0 in the cdir COMPRESS
record (it ignores the field entirely). UC2 v3 was writing the actual
compressed size, which confused the original's archive reader.
Additional changes:
- Use default Huffman tree for all blocks (ensures tree encoding compat)
- Write method=compression_level in cdir COMPRESS (was hardcoded to 1)
- Add tests/scripts/bitdump.py for bit-level bitstream analysis
Single-file UC2 v3 archives are now fully readable by the original UC2
Pro (listing and extraction verified in DOSBox-X). Multi-file archives
still hang — the cdir bitstream decodes correctly in our Python analyzer
but fails in the original's ASM decompressor kernel. Investigation
continues; the bitdump.py tool enables targeted comparison.
Port the original TreeGen/RepairLengths/CodeGen algorithms faithfully
from TREEGEN.CPP for bitstream compatibility with the 1992 UC2 Pro:
- treegen() now accepts max_code_bits parameter (13 for main trees,
7 for tree-encoding meta-tree)
- Heap uses >= for child comparison (prefer right child on ties),
matching original Reheap()
- BuildCodeTree uses extract-one-then-combine pattern
- RepairLengths uses sorted linked lists with cascading space-fill
- Single/zero symbol cases assign length 1 to two symbols
- tree_enc RLE: trigger at run > 6 (not >= 6), max 20 per chunk,
single RepeatCode per run
- First block uses default tree (tree-changed=0) matching original
behavior for small blocks
Full backward compatibility with original UC2 Pro archives (Direction 2)
is maintained. Forward compatibility (UC2 v3 -> original, Direction 1)
remains in progress — the original still hangs, likely due to residual
bitstream-level differences in the ASM decompressor kernel.
Automated test that runs the original 1992 UC2 Pro (UC.EXE) in DOSBox-X
headlessly to create archives from the test corpus, then extracts with
UC2 v3 and verifies byte-for-byte file identity.
Key findings during development:
- uc2pro.exe is a UCEXE self-extracting archive, not the tool itself;
the actual archiver is UC.EXE inside the distribution
- UC.EXE must be run from its own directory (needs DOS.SEA overlay)
- DOSBox-X flatpak requires work dirs under $HOME (filesystem=home)
- The reverse direction (UC2 v3 → original) does not work: the original
UC2 Pro hangs reading UC2 v3 archives due to compression bitstream
differences (added as a roadmap item)
Also fixes create_archives.sh to use the same two-session DOSBox pattern
(extract SFX first, then use UC.EXE).
Test corpus (empty, text, binary, compressible, incompressible) with
reference archives created by original UC2 v2.3 in DOSBox. Two CTest
tests: test_identify (magic detection) and test_extract (full
extraction pipeline verified byte-for-byte against corpus).