Several allocation sizes were computed from input-controlled counts or
lengths and could wrap before the malloc/fread, yielding an undersized
buffer that is then indexed past its end (mainly on 32-bit targets such
as DJGPP, where size_t is 32 bits):
- ingest restore_v2 multiplied an untrusted 32-bit chunk count from the
archive header by the entry size; cap the count (also bounds memory).
- ingest write and uc2_dict_serialize had the same multiply/add on
locally-derived sizes; cap them too.
- uc2_blockstore_ingest checked off + clen > len, which can wrap;
rewrite as off > len || clen > len - off.
- the libarchive plugin's extract_write grew its buffer with an
unchecked len addition and power-of-two doubling that could wrap;
guard both.
- uc2_bwt_revert used the caller-supplied primary_index to index its
buffers without a bound, and multiplied len by sizeof(uint32_t)
without an overflow check.
Also: uc2_merkle_build used the realloc result without checking it, so
an OOM left tree->chunks NULL and the next write dereferenced it; keep
the chunks gathered so far instead. 22/22 ctest on Release and ASan.
The read handler now composes full directory paths from the cdir's
directory ids rather than emitting bare leaf names: build_dir_path
walks the parent chain (root dirid 0, depth-capped against cyclic
cdirs), so multi-file archives with subdirectories list correctly.
Master-block resolution (M4) and tagged long names (M6) already work
through libuc2's extract and tag paths; this adds a libarchive
round-trip test that creates archives at Huffman and rANS levels and
verifies every byte back through libarchive's public API. Documents
the plugin build recipe (libarchive source tree + static lib).
Verified against libarchive 3.7.7; round-trip clean under valgrind.
Extraction of level 6-9 archives crashed (first seen on NetBSD/sdf.org,
reproducible everywhere), and files larger than the 64KB sliding window
silently corrupted at every level. Four causes:
- cli: master COMPRESS records hardcoded method 1 while master data was
compressed at opt.level, so rANS masters were fed to the Huffman
decoder. Records now carry method 10 at levels 6-9; levels 2-5 keep
method 1 for original UC2 Pro compatibility.
- decompress: decompressor_rans stopped at remaining == 0 without
consuming the end-of-block pair and its 12 extra bits, leaving the
bit cursor desynchronized; the next block-present read landed inside
the EOB extras and parsed a phantom block. The loop now decodes all
nsyms symbols and guards output writes instead.
- decompress: a refill read returning a single byte into an empty
buffer let head overtake tail in bits_feed; the unsigned difference
wrapped and head walked off the 4KB buffer (the actual segfault).
The refill now loops until a full byte pair is available, and a
sticky error flag stops the decoder treating negative bits_get
returns as data.
- compress/decompress: chunk loads wrote linearly past the circular
window edge, and the rANS decoder flushed output in one linear write
that cannot express ring wrap. Loads are now capped at the edge and
the decoder flushes incrementally in ring order.
Also: BCJ E8/E9 byte assembly no longer shifts promoted ints into the
sign bit, and the libarchive plugin uses timegm on NetBSD/OpenBSD/
DragonFly so DOS timestamps are not offset by the local timezone.
New cli_bigfile regression test (>128KB round-trip at L5 and L6); it
fails against the previous binary and passes now. Verified: 22/22
ctest including the DOSBox-X round-trip against original uc2pro.exe,
ASan/UBSan clean, and the full matrix on NetBSD 10 (sdf.org).
read_header() slurps the archive on first call (using
__archive_read_ahead + __archive_read_consume), opens libuc2 against
the slurped buffer, walks uc2_read_cdir to cache every entry, and
yields one per call mapped onto archive_entry's pathname / size /
mtime / mode. Tagged entries are resolved via uc2_get_tag. Memory
scales with archive size in v1; seekable adapter via
__archive_read_seek is a future revision.
read_data() runs uc2_extract through a buffering write callback, then
yields the decompressed entry as a single slice (libarchive's pull
API permits this). read_data_skip and cleanup are correct.
Build verified clean against libarchive 3.7.7. End-to-end runtime
test via bsdtar requires a custom libarchive build that links the
plugin (the read-format API is internal). Integration recipe added
to contrib/libarchive/README.md.
Closes 591db60. M4 (master-block dep tracking regression test) and
M7 (bsdtar round-trip) tracked separately.
Replaces the skeleton with a real implementation of the bid callback,
self-registration, and graceful-EOF stubs for the rest of the
read-format vtable. Builds against a libarchive source tree
(LIBARCHIVE_SOURCE_DIR option) because the read-format API is
internal -- the public -devel package only ships archive.h and
archive_entry.h, not archive_read_private.h.
Key changes:
- __archive_read_ahead reads the first 4 bytes; magic check returns
bid 64 on 0x55 0x43 0x32 0x1A.
- __archive_read_register_format wired with the correct 12-argument
signature against libarchive 3.7.7.
- archive_platform_config.uc2.h.in stands in for the generated
config.h, satisfying archive_platform.h's include-or-error gate
without us needing to run libarchive's own configure.
Resulting libuc2_libarchive.a exports archive_read_support_format_uc2
with three undefined references (__archive_check_magic,
__archive_read_ahead, __archive_read_register_format) that resolve
when linked into a libarchive tree.
Read_header / read_data / cleanup are EOF stubs. Wiring to libuc2
is milestone 2+.
Closes b0b06a5; M2-3 tracked at 591db60.
contrib/libarchive/ contains a design doc, an annotated skeleton of
archive_read_support_format_uc2(), and a CMake target that gates the
build on -DUC2_BUILD_LIBARCHIVE_PLUGIN=ON plus find_package(LibArchive).
The skeleton has the five required callbacks (bid, read_header,
read_data, read_data_skip, cleanup) with TODO markers at each
implementation point. The bid function has the magic-byte check
ready; the rest call into libuc2 for parsing and decompression.
libarchive's read-format API is internal; an out-of-tree .so cannot
be loaded into an unmodified libarchive. The integration plan in
contrib/libarchive/README.md is to upstream the file as a PR against
libarchive/libarchive. Full implementation is tracked as
git-bug b0b06a5.