Commit Graph

6 Commits

Author SHA1 Message Date
Eremey Valetov
6e62a7aa28 Fix multi-file backward compatibility with original UC2 Pro
Always assign custom master indices (>= FIRSTMASTER=2) to all files,
never SuperMaster (index 0).  The original's ExtractFiles() routes
SuperMaster files through a code path that hangs.  The original itself
never uses SuperMaster in file COMPRESS records — it always creates
at least one custom master, even for archives without dedup groups.

For ungrouped files, a default custom master is built from the largest
file's first 64KB.  All files reference this master, matching the
original's archive structure.

The automated DOSBox-X test now validates multi-file round-trip in
both directions: 4 files UC2 v3 -> original, 5 files original -> UC2 v3.
All content verified byte-for-byte.
2026-03-29 15:21:30 -04:00
Eremey Valetov
eddecfcfc2 Add bidirectional cross-tool round-trip test (both directions pass)
Single-file UC2 v3 archives are now fully backward compatible with the
original UC2 Pro — listing and extraction verified in automated DOSBox-X
test.  SFX extraction timeout increased to 600s with 22-file completeness
check (incomplete extraction caused false test results throughout the
earlier investigation).  Direction 1 (UC2 v3 -> original) test added.
2026-03-29 14:29:33 -04:00
Eremey Valetov
c736b19bae Fix single-file backward compatibility with original UC2 Pro
Root cause: the original UC2 Pro expects csize=0 in the cdir COMPRESS
record (it ignores the field entirely).  UC2 v3 was writing the actual
compressed size, which confused the original's archive reader.

Additional changes:
- Use default Huffman tree for all blocks (ensures tree encoding compat)
- Write method=compression_level in cdir COMPRESS (was hardcoded to 1)
- Add tests/scripts/bitdump.py for bit-level bitstream analysis

Single-file UC2 v3 archives are now fully readable by the original UC2
Pro (listing and extraction verified in DOSBox-X).  Multi-file archives
still hang — the cdir bitstream decodes correctly in our Python analyzer
but fails in the original's ASM decompressor kernel.  Investigation
continues; the bitdump.py tool enables targeted comparison.
2026-03-29 09:58:36 -04:00
Eremey Valetov
be7085c4d3 Rewrite Huffman tree generation to match original UC2 Pro
Port the original TreeGen/RepairLengths/CodeGen algorithms faithfully
from TREEGEN.CPP for bitstream compatibility with the 1992 UC2 Pro:

- treegen() now accepts max_code_bits parameter (13 for main trees,
  7 for tree-encoding meta-tree)
- Heap uses >= for child comparison (prefer right child on ties),
  matching original Reheap()
- BuildCodeTree uses extract-one-then-combine pattern
- RepairLengths uses sorted linked lists with cascading space-fill
- Single/zero symbol cases assign length 1 to two symbols
- tree_enc RLE: trigger at run > 6 (not >= 6), max 20 per chunk,
  single RepeatCode per run
- First block uses default tree (tree-changed=0) matching original
  behavior for small blocks

Full backward compatibility with original UC2 Pro archives (Direction 2)
is maintained.  Forward compatibility (UC2 v3 -> original, Direction 1)
remains in progress — the original still hangs, likely due to residual
bitstream-level differences in the ASM decompressor kernel.
2026-03-29 06:25:21 -04:00
Eremey Valetov
ab2d37286c Add cross-tool round-trip test vs original UC2 Pro in DOSBox-X
Automated test that runs the original 1992 UC2 Pro (UC.EXE) in DOSBox-X
headlessly to create archives from the test corpus, then extracts with
UC2 v3 and verifies byte-for-byte file identity.

Key findings during development:
- uc2pro.exe is a UCEXE self-extracting archive, not the tool itself;
  the actual archiver is UC.EXE inside the distribution
- UC.EXE must be run from its own directory (needs DOS.SEA overlay)
- DOSBox-X flatpak requires work dirs under $HOME (filesystem=home)
- The reverse direction (UC2 v3 → original) does not work: the original
  UC2 Pro hangs reading UC2 v3 archives due to compression bitstream
  differences (added as a roadmap item)

Also fixes create_archives.sh to use the same two-session DOSBox pattern
(extract SFX first, then use UC.EXE).
2026-03-28 18:55:03 -04:00
Eremey Valetov
ff06506bbc Add testing infrastructure with reference UC2 archives
Test corpus (empty, text, binary, compressible, incompressible) with
reference archives created by original UC2 v2.3 in DOSBox. Two CTest
tests: test_identify (magic detection) and test_extract (full
extraction pipeline verified byte-for-byte against corpus).
2026-03-11 08:29:04 -04:00