Always assign custom master indices (>= FIRSTMASTER=2) to all files,
never SuperMaster (index 0). The original's ExtractFiles() routes
SuperMaster files through a code path that hangs. The original itself
never uses SuperMaster in file COMPRESS records — it always creates
at least one custom master, even for archives without dedup groups.
For ungrouped files, a default custom master is built from the largest
file's first 64KB. All files reference this master, matching the
original's archive structure.
The automated DOSBox-X test now validates multi-file round-trip in
both directions: 4 files UC2 v3 -> original, 5 files original -> UC2 v3.
All content verified byte-for-byte.
Single-file UC2 v3 archives are now fully backward compatible with the
original UC2 Pro — listing and extraction verified in automated DOSBox-X
test. SFX extraction timeout increased to 600s with 22-file completeness
check (incomplete extraction caused false test results throughout the
earlier investigation). Direction 1 (UC2 v3 -> original) test added.
Root cause: the original UC2 Pro expects csize=0 in the cdir COMPRESS
record (it ignores the field entirely). UC2 v3 was writing the actual
compressed size, which confused the original's archive reader.
Additional changes:
- Use default Huffman tree for all blocks (ensures tree encoding compat)
- Write method=compression_level in cdir COMPRESS (was hardcoded to 1)
- Add tests/scripts/bitdump.py for bit-level bitstream analysis
Single-file UC2 v3 archives are now fully readable by the original UC2
Pro (listing and extraction verified in DOSBox-X). Multi-file archives
still hang — the cdir bitstream decodes correctly in our Python analyzer
but fails in the original's ASM decompressor kernel. Investigation
continues; the bitdump.py tool enables targeted comparison.
Port the original TreeGen/RepairLengths/CodeGen algorithms faithfully
from TREEGEN.CPP for bitstream compatibility with the 1992 UC2 Pro:
- treegen() now accepts max_code_bits parameter (13 for main trees,
7 for tree-encoding meta-tree)
- Heap uses >= for child comparison (prefer right child on ties),
matching original Reheap()
- BuildCodeTree uses extract-one-then-combine pattern
- RepairLengths uses sorted linked lists with cascading space-fill
- Single/zero symbol cases assign length 1 to two symbols
- tree_enc RLE: trigger at run > 6 (not >= 6), max 20 per chunk,
single RepeatCode per run
- First block uses default tree (tree-changed=0) matching original
behavior for small blocks
Full backward compatibility with original UC2 Pro archives (Direction 2)
is maintained. Forward compatibility (UC2 v3 -> original, Direction 1)
remains in progress — the original still hangs, likely due to residual
bitstream-level differences in the ASM decompressor kernel.
Automated test that runs the original 1992 UC2 Pro (UC.EXE) in DOSBox-X
headlessly to create archives from the test corpus, then extracts with
UC2 v3 and verifies byte-for-byte file identity.
Key findings during development:
- uc2pro.exe is a UCEXE self-extracting archive, not the tool itself;
the actual archiver is UC.EXE inside the distribution
- UC.EXE must be run from its own directory (needs DOS.SEA overlay)
- DOSBox-X flatpak requires work dirs under $HOME (filesystem=home)
- The reverse direction (UC2 v3 → original) does not work: the original
UC2 Pro hangs reading UC2 v3 archives due to compression bitstream
differences (added as a roadmap item)
Also fixes create_archives.sh to use the same two-session DOSBox pattern
(extract SFX first, then use UC.EXE).
Test corpus (empty, text, binary, compressible, incompressible) with
reference archives created by original UC2 v2.3 in DOSBox. Two CTest
tests: test_identify (magic detection) and test_extract (full
extraction pipeline verified byte-for-byte against corpus).