Several allocation sizes were computed from input-controlled counts or lengths and could wrap before the malloc/fread, yielding an undersized buffer that is then indexed past its end (mainly on 32-bit targets such as DJGPP, where size_t is 32 bits): - ingest restore_v2 multiplied an untrusted 32-bit chunk count from the archive header by the entry size; cap the count (also bounds memory). - ingest write and uc2_dict_serialize had the same multiply/add on locally-derived sizes; cap them too. - uc2_blockstore_ingest checked off + clen > len, which can wrap; rewrite as off > len || clen > len - off. - the libarchive plugin's extract_write grew its buffer with an unchecked len addition and power-of-two doubling that could wrap; guard both. - uc2_bwt_revert used the caller-supplied primary_index to index its buffers without a bound, and multiplied len by sizeof(uint32_t) without an overflow check. Also: uc2_merkle_build used the realloc result without checking it, so an OOM left tree->chunks NULL and the next write dereferenced it; keep the chunks gathered so far instead. 22/22 ctest on Release and ASan.
UC2 read-format plugin for libarchive
This directory contains the design and a skeleton implementation of a
read-only .uc2 format handler for libarchive. The goal is to make
UC2 archives transparently extractable by every libarchive-using tool
(bsdtar, cmake, pkg(8), file-roller, Ark, and others).
Status
- Milestones 1-3 shipped.
archive_read_support_format_uc2.cimplements:bid()--__archive_read_aheadreads the first 4 bytes, returns 64 on UC2 magic.read_header()-- on first call, slurps the entire archive into memory via__archive_read_ahead+__archive_read_consume, opens alibuc2handle bound to the slurped buffer, walksuc2_read_cdirto cache every entry (withuc2_get_tagresolution for tagged entries), then yields entries one per call viaarchive_entry_set_pathname/set_size/set_mtime/set_filetype/set_perm.read_data()-- on first call per entry, runsuc2_extractwith a buffering write callback, then yields the whole entry in one slice; subsequent calls returnARCHIVE_EOF.read_data_skip()andcleanup()-- correct.
- Memory model: archive is slurped fully on the first
read_header, so memory use scales with archive size. Acceptable for v1; future work can swap in a seekable adapter when the underlying filter supports__archive_read_seek. CMakeLists.txtactivates with-DUC2_BUILD_LIBARCHIVE_PLUGIN=ON -DLIBARCHIVE_SOURCE_DIR=<libarchive-checkout>. The pin against a source tree (rather thanfind_package(LibArchive)) is required because the read-format API is internal -- the public-develpackage ships onlyarchive.handarchive_entry.h.
Integration recipe (manual, until upstream merge)
To actually exercise the plugin from bsdtar, the plugin must be
linked into the libarchive binary itself (the relevant API is internal
and not exported from the system shared library). Two paths:
-
Drop-in patch. Copy
archive_read_support_format_uc2.cintolibarchive/libarchive/, then add one line tolibarchive/libarchive/archive_read_support_format_all.c:archive_read_support_format_uc2(a);plus one entry in
libarchive/libarchive/CMakeLists.txtnext to the otherarchive_read_support_format_*.csources. Rebuild libarchive; thenbsdtar -tf archive.uc2lists entries. -
External link. Build
libuc2_libarchive.afrom this directory (cmake -DUC2_BUILD_LIBARCHIVE_PLUGIN=ON -DLIBARCHIVE_SOURCE_DIR=...). Build a customlibarchive_static.athat includes the sameLIBARCHIVE_SOURCE_DIR. Link both into a small driver program that callsarchive_read_support_format_uc2(a).
The upstream PR (milestone 8 in the original issue) replaces both
recipes with a single first-class bsdtar integration.
Why an out-of-tree skeleton?
libarchive's read-format plugin API is internal.
archive_read_register_format is a static function, not part of the
public ABI. An out-of-tree .so cannot be loaded into an unmodified
libarchive at runtime.
The supported integration paths are:
-
Upstream merge. Submit
archive_read_support_format_uc2.cas a PR againstlibarchive/libarchive. Once merged, distros pick it up and every tool that links libarchive sees.uc2automatically. This is the long-term goal. -
Patched libarchive build. Distribute a small patch that includes the UC2 plugin against a known libarchive version. Useful for testing before upstream merge and for users who want
.uc2support before the upstream release reaches their distro. -
Static-library wrapper. Build the plugin as part of a custom tool that statically links libarchive + this plugin. Useful for demo binaries; not a substitute for upstream merge because the wrapper still won't be picked up by
bsdtaretc.
Architecture
UC2 archives use a fixed front header (29 bytes), a record stream of compressed bodies, and a compressed central directory whose offset is recorded in the front header. The central directory holds OHEAD records for masters, dirs, and files; entry attributes are in OSMETA + DIRMETA / FILEMETA.
The plugin uses libuc2 for parsing and decompression and adapts the
results to libarchive's struct archive_entry model. libuc2 already
exposes a streaming read API (uc2_open, uc2_read_cdir,
uc2_extract) and is GPL-3.0 / LGPL-3.0; the plugin is GPL-3.0-or-later
to match the cli/main.c license boundary. See
docs/license-audit.md for the
provenance table.
Callback responsibilities
-
bid: read the first 4 bytes via
__archive_read_ahead, check for the UC2 magic (0x1A324355). Return 64 on match, 0 otherwise. libarchive uses the highest bid to pick a format; 64 is the conventional "format-recognised" score. -
read_header: on first call, open the libuc2 handle and read the central directory into memory. On every call, return one entry's metadata via
archive_entry_*setters. When entries are exhausted, returnARCHIVE_EOF. -
read_data: stream decompressed bytes for the current entry. libuc2's
uc2_extractinvokes a write callback per chunk; the plugin needs to convert this push model into libarchive's pull model (the standard way: a small ring buffer, plus a generator loop or coroutine). The simplest first implementation buffers the whole entry, which is correct but increases memory pressure for very large files; refine later. -
read_data_skip: advance to the next entry without producing data. Decompression cannot be safely skipped (master-block dependencies), so the plugin still decompresses, just discards.
-
cleanup: close the libuc2 handle, free buffers.
libuc2 IO callbacks
libuc2 takes user-supplied callbacks for read/alloc/free/warn. The plugin wires these to libarchive's filter stack:
read->__archive_read_seek+__archive_read_aheadalloc/free->malloc/freewarn-> push to libarchive's warning log viaarchive_set_error.
Build
The CMake target only configures when libarchive headers are present.
Install on Fedora/RHEL with dnf install libarchive-devel, on Debian
with apt install libarchive-dev, or build libarchive from source.
cmake -B build -DUC2_BUILD_LIBARCHIVE_PLUGIN=ON
cmake --build build --target uc2_libarchive
The built object can be linked into a libarchive-using application or
patched into libarchive's source tree (libarchive/libarchive/).
Roadmap
The current skeleton compiles into a stub library that registers a no-op format. The implementation milestones, in order:
- bid function with magic check (~20 lines)
- read_header for the first entry only (single-file archives)
- read_data for uncompressed-by-master entries
- Master-block decompression and dependency tracking
- Multi-file archives + directory entries
- Tagged entries (long names, extended attributes)
- Round-trip test against bsdtar built from a patched libarchive
- Upstream PR
Each milestone is independently shippable as a working subset.