13 Commits

Author SHA1 Message Date
Eremey Valetov
fc767a1739 cli: report a write error at fclose on extraction
Some checks failed
Build / Linux (push) Has been cancelled
Build / Windows (MSVC) (push) Has been cancelled
Build / macOS (push) Has been cancelled
Build / libarchive plugin (push) Has been cancelled
Build / DOS (DJGPP) (push) Has been cancelled
Docs / build (push) Has been cancelled
Docs / deploy (push) Has been cancelled
uc2_extract's output file was closed without checking fclose, so a
deferred write error (a full disk, for example) could silently
truncate the extracted file. Fail loudly instead, unless extraction
already reported an error.
2026-06-13 10:55:25 -04:00
Eremey Valetov
ad923d7ea0 fix heap overflow parsing a damaged central directory
Some checks failed
Build / Linux (push) Has been cancelled
Build / Windows (MSVC) (push) Has been cancelled
Build / macOS (push) Has been cancelled
Build / libarchive plugin (push) Has been cancelled
Build / DOS (DJGPP) (push) Has been cancelled
Docs / build (push) Has been cancelled
Docs / deploy (push) Has been cancelled
A crafted archive could crash the reader with an out-of-bounds read in
the directory-skip path (uc2_finish_cdir -> uc2_read_cdir -> uc2_get_tag).

decompress_cdir allocates cdir_buf inside its decode loop but, on its
error paths (decode failure or a checksum mismatch), returned before
setting cdir_range.end -- leaving cdir_buf non-NULL with a stale end. A
later uc2_read_cdir/uc2_finish_cdir then saw cdir_buf != NULL, skipped
re-reading, and walked a range whose end pointed below its start, so
range_len wrapped and range_get handed out wild pointers. Free cdir_buf
on every error path so the invariant "cdir_buf != NULL iff cdir_range is
valid" holds, and make range_len report an empty range (rather than a
huge one) if end ever precedes ptr, as defense in depth for the whole
parser.

Also add a compression-ratio ceiling to the cdir decode: a tiny crafted
stream can expand via long matches, so abort once the output far
outgrows the compressed bytes consumed.

Found with a new libFuzzer harness (tests/fuzz/, not built by default).
Memory-safety is clean over sustained fuzzing after this change; 22/22
ctest on Release and ASan. A residual slow-input timeout via a separate
decode path is tracked for follow-up.
2026-06-13 10:53:49 -04:00
Eremey Valetov
62a90af101 guard allocation sizes against integer overflow
Some checks failed
Build / Linux (push) Has been cancelled
Build / Windows (MSVC) (push) Has been cancelled
Build / macOS (push) Has been cancelled
Build / libarchive plugin (push) Has been cancelled
Build / DOS (DJGPP) (push) Has been cancelled
Docs / build (push) Has been cancelled
Docs / deploy (push) Has been cancelled
Several allocation sizes were computed from input-controlled counts or
lengths and could wrap before the malloc/fread, yielding an undersized
buffer that is then indexed past its end (mainly on 32-bit targets such
as DJGPP, where size_t is 32 bits):

- ingest restore_v2 multiplied an untrusted 32-bit chunk count from the
  archive header by the entry size; cap the count (also bounds memory).
- ingest write and uc2_dict_serialize had the same multiply/add on
  locally-derived sizes; cap them too.
- uc2_blockstore_ingest checked off + clen > len, which can wrap;
  rewrite as off > len || clen > len - off.
- the libarchive plugin's extract_write grew its buffer with an
  unchecked len addition and power-of-two doubling that could wrap;
  guard both.
- uc2_bwt_revert used the caller-supplied primary_index to index its
  buffers without a bound, and multiplied len by sizeof(uint32_t)
  without an overflow check.

Also: uc2_merkle_build used the realloc result without checking it, so
an OOM left tree->chunks NULL and the next write dereferenced it; keep
the chunks gathered so far instead. 22/22 ctest on Release and ASan.
2026-06-13 08:43:03 -04:00
Eremey Valetov
43cf875dfe cli: reject path-traversal in archive entry names on extraction
extract_cb appended a decoded entry name to the destination path with
no validation, so a crafted archive whose entry name contained "..",
a path separator, or an absolute form could write files outside the
chosen destination directory (a Zip-Slip). Each UC2 entry name is a
single path component -- the directory tree is rebuilt from dirid
parents -- so reject any name that is empty, ".", "..", or contains
'/' or '\'. The bundled writer only ever stores basenames, so this
affects malformed or hostile archives only; normal extraction
(including names like "..foo" and nested directories) is unchanged.
2026-06-13 08:35:59 -04:00
Eremey Valetov
5e0f3852c6 harden decoder against crafted archives: tree overrun, LZ distance, delta stride
A malformed archive could drive several out-of-bounds accesses in the
decoder, all reachable from untrusted input:

- ht_dec() expanded a Huffman RepeatCode without checking the
  destination against the end of the local stream[] array, so a crafted
  tree wrote past it on the stack. Reject the overrun as UC2_Damaged.

- The LZ match copy in both the rANS and the Huffman paths used a match
  distance straight from the bitstream. A distance larger than the
  bytes written so far (or one wrapped huge by a short bits_get on the
  distance extra-bits) made (u16)(tail - dist) reference window bytes
  that were never written, copying uninitialised memory into the
  output. Track produced history (master fill + output, saturating at
  the 64KB window) and reject dist beyond it.

- struct delta carried val[8], but decompressor() accepts methods up to
  49, giving strides up to 10; strides 9 and 10 indexed past the array
  (and silently mis-decoded). Size val[] to cover the accepted range.

Found by a code-review pass. Valid round-trips are unchanged: 22/22
ctest on Release and ASan, plus ASan round-trips across all levels for
inputs spanning the 64KB window. The assemble_name NULL-deref raised in
the same review is not reachable (dos_name is a fixed 11 bytes, far
under the 300-byte name buffer), so it is left as-is.
2026-06-13 08:33:37 -04:00
Eremey Valetov
13e29ee211 ci: install libfl2 for the DJGPP binutils
Some checks failed
Build / Linux (push) Has been cancelled
Build / Windows (MSVC) (push) Has been cancelled
Build / macOS (push) Has been cancelled
Build / libarchive plugin (push) Has been cancelled
Build / DOS (DJGPP) (push) Has been cancelled
Docs / build (push) Has been cancelled
Docs / deploy (push) Has been cancelled
The prebuilt DJGPP ar and ld from the andrewwutw release are linked
against the flex runtime (libfl.so.2), which a clean GitHub runner
does not have, so linking libuc2.a failed with a loader error.
Install libfl2 before extracting the toolchain.
2026-06-13 07:56:32 -04:00
Eremey Valetov
247de54352 harden decoding of damaged archives
Some checks failed
Build / Linux (push) Has been cancelled
Build / Windows (MSVC) (push) Has been cancelled
Build / macOS (push) Has been cancelled
Build / libarchive plugin (push) Has been cancelled
Build / DOS (DJGPP) (push) Has been cancelled
Docs / build (push) Has been cancelled
Docs / deploy (push) Has been cancelled
A truncated or corrupt archive could overrun memory during decode.
decompress_block guarded its match-copy length with an assert that
NDEBUG compiles out, so a short bits_get that underflowed the length
would overrun the 64KB window in release builds. Replace the assert
with a runtime check: an out-of-range length ends the block with
UC2_Damaged before the copy, and the existing checksum and size
validation then reports the archive as damaged. decompress_cdir bound
the walkable range to the buffer allocation rather than the bytes
actually decompressed, so a damaged directory that happened to match
the 16-bit checksum could be parsed into uninitialised heap; bound the
range to the decompressed length. The CLI also leaked the archive
handle and FILE on the directory-read and integrity-test error paths;
close both.

A prefix-sweep fuzzer drove these fixes. It still finds a rare,
heap-state-dependent out-of-bounds read in the directory-skip path
that these changes do not fully close; that and a stable fuzz harness
are tracked separately.
2026-06-13 07:53:53 -04:00
Eremey Valetov
09cdc80986 ci: build the DOS (DJGPP) target and consolidate the toolchain file
A new Linux job installs the andrewwutw DJGPP v3.4 cross-toolchain
(gcc 12.2.0, sha256-pinned), cross-compiles uc2.exe with
cmake/djgpp.cmake, and verifies the result is a DJGPP go32 DOS
executable. The DOS build had no CI coverage and could regress
silently.

The repo carried two diverged DJGPP toolchain files. djgpp.cmake
(referenced by the build docs) forces -nostdinc with explicit DJGPP
include paths, so it builds cleanly even on hosts where /usr/include
would otherwise leak past the cross-compiler. djgpp-toolchain.cmake
(previously referenced by the README) relied on the cross-gcc finding
its own headers and broke in that case. Keep djgpp.cmake as the single
toolchain file, point the README and roadmap at it, and drop
djgpp-toolchain.cmake.
2026-06-13 07:53:46 -04:00
Eremey Valetov
c394106c56 ci: build and test the libarchive plugin on Linux
Some checks failed
Build / Linux (push) Has been cancelled
Build / Windows (MSVC) (push) Has been cancelled
Build / macOS (push) Has been cancelled
Build / libarchive plugin (push) Has been cancelled
Docs / build (push) Has been cancelled
Docs / deploy (push) Has been cancelled
New job fetches libarchive 3.7.7 (sha256-pinned), builds it as a
dependency-free static library, then configures UC2 with the plugin
and runs the libarchive_roundtrip test. Keeps the plugin's
source-tree build path verified on every push without adding a
libarchive dependency to the default matrix.
2026-06-13 02:27:56 -04:00
Eremey Valetov
d26791bfbd libarchive plugin: directory paths, round-trip test (M5-M6)
The read handler now composes full directory paths from the cdir's
directory ids rather than emitting bare leaf names: build_dir_path
walks the parent chain (root dirid 0, depth-capped against cyclic
cdirs), so multi-file archives with subdirectories list correctly.
Master-block resolution (M4) and tagged long names (M6) already work
through libuc2's extract and tag paths; this adds a libarchive
round-trip test that creates archives at Huffman and rANS levels and
verifies every byte back through libarchive's public API. Documents
the plugin build recipe (libarchive source tree + static lib).

Verified against libarchive 3.7.7; round-trip clean under valgrind.
2026-06-13 02:10:56 -04:00
Eremey Valetov
b86309542d cli: fail loudly when archive offsets would exceed 4 GiB
The UC2 container stores 32-bit offsets; ftell results were cast to
unsigned at four sites, so positions past 4 GiB would wrap silently
and corrupt the directory. tell32() now reports the format limit and
exits. Also checks the ftell result reserved for the ingest manifest
instead of seeking to -1 on error. Multi-volume spanning (2b65f0a)
remains the route for larger payloads.
2026-06-12 06:29:12 -04:00
Eremey Valetov
217bf9e53f test_blockstore: portable temp paths and recursive cleanup
Same defect class as test_ingest (ac01b32): hardcoded /tmp and a
shell rm -rf gave the test nothing real to do on the Windows runner.
Temp store now lands in %TEMP% and cleanup uses a portable rmtree
(dirent on POSIX, _findfirst on MSVC) over the store's two-level
layout.
2026-06-12 06:29:12 -04:00
Eremey Valetov
ac01b32273 test_ingest: portable temp paths for Windows CI
The test hardcoded /tmp, which does not exist on the Windows runner.
With NDEBUG compiling the asserts out, the NULL stream from the failed
fopen reached fclose() and tripped the UCRT invalid-parameter fail-fast
(0xc0000409). Temp files now go to %TEMP% on Windows; rm -rf and unlink
are replaced with ISO C remove(); file-handle acquisition failures now
exit loudly instead of relying on assert.
2026-06-11 17:01:29 -04:00
20 changed files with 830 additions and 140 deletions

View File

@@ -34,3 +34,75 @@ jobs:
run: .\build\cli\Release\uc2.exe -h
- name: Test
run: ctest --test-dir build --output-on-failure -C Release
libarchive:
runs-on: ubuntu-latest
name: libarchive plugin
env:
LIBARCHIVE_VERSION: 3.7.7
LIBARCHIVE_SHA256: 4cc540a3e9a1eebdefa1045d2e4184831100667e6d7d5b315bb1cbc951f8ddff
steps:
- uses: actions/checkout@v4
- name: Fetch libarchive source
run: |
curl -fsSLO "https://github.com/libarchive/libarchive/releases/download/v${LIBARCHIVE_VERSION}/libarchive-${LIBARCHIVE_VERSION}.tar.gz"
echo "${LIBARCHIVE_SHA256} libarchive-${LIBARCHIVE_VERSION}.tar.gz" | sha256sum -c -
tar xzf "libarchive-${LIBARCHIVE_VERSION}.tar.gz"
- name: Build libarchive static (dependency-free)
run: |
cmake -S "libarchive-${LIBARCHIVE_VERSION}" -B larch-build \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
-DBUILD_SHARED_LIBS=OFF -DENABLE_TEST=OFF -DENABLE_TAR=OFF \
-DENABLE_CPIO=OFF -DENABLE_CAT=OFF -DENABLE_UNZIP=OFF \
-DENABLE_WERROR=OFF -DENABLE_ZLIB=OFF -DENABLE_BZip2=OFF \
-DENABLE_LZMA=OFF -DENABLE_ZSTD=OFF -DENABLE_LZ4=OFF \
-DENABLE_LIBXML2=OFF -DENABLE_EXPAT=OFF -DENABLE_OPENSSL=OFF \
-DENABLE_LIBB2=OFF -DENABLE_ICONV=OFF -DENABLE_ACL=OFF \
-DENABLE_XATTR=OFF -DENABLE_CNG=OFF -DENABLE_MBEDTLS=OFF \
-DENABLE_NETTLE=OFF -DENABLE_PCREPOSIX=OFF -DENABLE_PCRE2POSIX=OFF
cmake --build larch-build --target archive_static -j
- name: Configure UC2 with libarchive plugin
run: |
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DUC2_BUILD_LIBARCHIVE_PLUGIN=ON \
-DLIBARCHIVE_SOURCE_DIR="$PWD/libarchive-${LIBARCHIVE_VERSION}" \
-DLIBARCHIVE_LIBRARY="$PWD/larch-build/libarchive/libarchive.a"
- name: Build
run: cmake --build build -j
- name: Round-trip test
run: ctest --test-dir build --output-on-failure -R libarchive_roundtrip
djgpp:
runs-on: ubuntu-latest
name: DOS (DJGPP)
env:
DJGPP_URL: https://github.com/andrewwutw/build-djgpp/releases/download/v3.4/djgpp-linux64-gcc1220.tar.bz2
DJGPP_SHA256: 8464f17017d6ab1b2bb2df4ed82357b5bf692e6e2b7fee37e315638f3d505f00
# Keep host include dirs out of the cross-compiler's search path in
# every step (the toolchain file also forces -nostdinc, but a stray
# CPATH on the runner would otherwise leak glibc headers).
CPATH: ''
CPLUS_INCLUDE_PATH: ''
steps:
- uses: actions/checkout@v4
- name: Install DJGPP cross-toolchain
run: |
# The prebuilt DJGPP binutils (ar, ld) are linked against the
# flex runtime; install it so they load on a clean runner.
sudo apt-get update
sudo apt-get install -y libfl2
curl -fsSL -o djgpp.tar.bz2 "$DJGPP_URL"
echo "${DJGPP_SHA256} djgpp.tar.bz2" | sha256sum -c -
sudo tar xjf djgpp.tar.bz2 -C /opt # -> /opt/djgpp
- name: Configure (DJGPP toolchain)
run: |
cmake -B build-dos \
-DCMAKE_TOOLCHAIN_FILE=cmake/djgpp.cmake \
-DDJGPP_ROOT=/opt/djgpp -DCMAKE_BUILD_TYPE=Release
- name: Build
run: cmake --build build-dos -j
- name: Verify DOS executable
run: |
file build-dos/cli/uc2.exe
file build-dos/cli/uc2.exe | grep -q "DJGPP go32 DOS extender" \
|| { echo "uc2.exe is not a DJGPP DOS executable"; exit 1; }

View File

@@ -105,7 +105,7 @@ No mainstream archiver offers post-quantum encryption.
## Phase 6: DOS / FreeDOS / Retro-Computing
- [x] DJGPP cross-compilation toolchain: `cmake/djgpp-toolchain.cmake`
- [x] DJGPP cross-compilation toolchain: `cmake/djgpp.cmake`
builds `uc2.exe` against the prebuilt DJGPP gcc 7.2 / 12.2 from
`andrewwutw/build-djgpp`. Output is a 32-bit DPMI DOS executable
(MZ + COFF + go32 stub). See `cmake/README-djgpp.md` for the
@@ -294,3 +294,15 @@ Bobrowski already shipped prototypes; update for UC2 v3.
Found debugging extraction on sdf.org (NetBSD 10) but reproducible
everywhere. New regression test: cli_bigfile. Follow-up filed:
bf73896 (ftell offsets >4GB truncate silently; P2).
- 2026-06-13: DOS build now has CI coverage (DJGPP v3.4 toolchain,
sha-pinned; builds uc2.exe via cmake/djgpp.cmake; git-bug 9379647).
Consolidated the two DJGPP toolchain files onto djgpp.cmake and
removed the redundant djgpp-toolchain.cmake.
- 2026-06-13: Damaged-archive decode hardening (git-bug f049d6d):
decompress_block match-length overflow guard (runtime check
replacing an NDEBUG assert), decompress_cdir end-bounding, and a
CLI handle/FILE leak fix on the cdir-error path. A prefix-sweep
fuzzer drove the fixes; a residual rare cdir-parser OOB it surfaces
is tracked for a systematic hardening + fuzzing pass (git-bug
69e8e52).

View File

@@ -106,6 +106,18 @@ static void uc2_say(FILE *f, const char *fmt, ...)
va_end(ap);
}
/* Archive positions are 32-bit in the UC2 container; fail loudly
rather than wrap when an archive would cross 4 GiB. */
static unsigned tell32(FILE *f)
{
long pos = ftell(f);
if (pos < 0)
err(EXIT_FAILURE, "ftell");
if ((unsigned long)pos > 0xFFFFFFFFul)
errx(EXIT_FAILURE, "archive exceeds the 4 GiB UC2 format limit");
return (unsigned)pos;
}
static int my_read(void *ctx, unsigned pos, void *ptr, unsigned len)
{
if (fseek(ctx, pos, SEEK_SET) < 0)
@@ -471,6 +483,19 @@ static bool extract_cb(struct node *ne, void *ctx, enum cause cause)
switch (cause) {
case VisitFile:
case EnterDir:;
/* Each UC2 entry name is a single path component (the directory
tree is rebuilt from dirid parents). A name that is empty,
".", "..", or contains a path separator is malformed or a
path-traversal attempt -- refuse to extract it rather than
write outside the destination. */
if (l == 0
|| (l == 1 && e->name[0] == '.')
|| (l == 2 && e->name[0] == '.' && e->name[1] == '.')
|| memchr(e->name, '/', l)
|| memchr(e->name, '\\', l))
errx(EXIT_FAILURE, "unsafe archive entry name: %.*s",
(int)l, e->name);
char *p = path->ptr + l;
if (p + 1 >= endof(path->buffer))
errx(EXIT_FAILURE, "Path too long");
@@ -493,7 +518,10 @@ static bool extract_cb(struct node *ne, void *ctx, enum cause cause)
int ret = uc2_extract(path->uc2, &e->xi, e->size, write_file, f);
if (ret < 0)
uc2err(path->uc2, ret, "%s", e->name);
fclose(f);
/* Report a write error (e.g. a full disk) surfaced at close
rather than silently truncating the extracted file. */
if (fclose(f) != 0 && ret >= 0)
err(EXIT_FAILURE, "%s", path->buffer);
if (!opt.no_file_meta)
set_attrs(path->buffer, ne);
break;
@@ -1289,7 +1317,7 @@ static int create_archive(int nargs, char **args)
/* Write master blocks (compressed with SuperMaster) */
for (int i = 0; i < nmasters; i++) {
masters[i].offset = (unsigned)ftell(out);
masters[i].offset = tell32(out);
struct mem_reader mr = {.data = masters[i].data, .pos = 0, .len = masters[i].size};
unsigned csize = 0;
unsigned short csum = 0;
@@ -1306,7 +1334,7 @@ static int create_archive(int nargs, char **args)
/* Phase 2: Compress each file */
for (int i = 0; i < nfiles; i++) {
recs[i].offset = (unsigned)ftell(out);
recs[i].offset = tell32(out);
FILE *inf = fopen(recs[i].path, "rb");
if (!inf)
@@ -1435,7 +1463,7 @@ static int create_archive(int nargs, char **args)
unsigned cdir_size = (unsigned)(p - raw_cdir);
unsigned short cdir_csum = fletcher_csum(raw_cdir, cdir_size);
unsigned cdir_offset = (unsigned)ftell(out);
unsigned cdir_offset = tell32(out);
unsigned char crec[10];
memset(crec, 0, 10);
fwrite(crec, 1, 10, out);
@@ -1449,7 +1477,7 @@ static int create_archive(int nargs, char **args)
if (ret < 0)
errx(EXIT_FAILURE, "cdir compression error %d", ret);
unsigned total = (unsigned)ftell(out);
unsigned total = tell32(out);
fseek(out, cdir_offset, SEEK_SET);
w32(crec + 0, 0); /* csize=0 matches original UC2 Pro */
@@ -1912,6 +1940,8 @@ usage:
if (ret == UC2_End)
break;
uc2err(uc2, ret, 0);
uc2_close(uc2);
fclose(f);
return EXIT_FAILURE;
}
@@ -1922,6 +1952,9 @@ usage:
ret = uc2_get_tag(uc2, &ne->entry, &tag, &data, &size);
if (ret < 0) {
uc2err(uc2, ret, 0);
free(ne);
uc2_close(uc2);
fclose(f);
return EXIT_FAILURE;
}
}
@@ -1957,8 +1990,11 @@ usage:
uc2_say(stderr, "Testing archive integrity...\n");
visit_selected(&root, pipe_cb, uc2);
if (opt.test) {
if (verify_trailer_if_present(opt.archive))
if (verify_trailer_if_present(opt.archive)) {
uc2_close(uc2);
fclose(f);
return EXIT_FAILURE;
}
uc2_say(stderr, "Everything went OK\n");
}
} else if (!opt.list) {
@@ -1982,5 +2018,6 @@ usage:
if (!opt.list && !opt.test && !opt.pipe)
uc2_say(stderr, "Decompression complete\n");
uc2_close(uc2);
fclose(f);
return EXIT_SUCCESS;
}

View File

@@ -31,7 +31,7 @@ the bundled `cwsdpmi.exe` extender (or any DPMI host).
```sh
unset CPATH CPLUS_INCLUDE_PATH
cmake -B build-djgpp \
-DCMAKE_TOOLCHAIN_FILE=cmake/djgpp-toolchain.cmake \
-DCMAKE_TOOLCHAIN_FILE=cmake/djgpp.cmake \
-DDJGPP_ROOT=/opt/djgpp
cmake --build build-djgpp
```
@@ -42,7 +42,9 @@ plus `cwsdpmi.exe` (shipped with DJGPP at
## Status
- Compiles clean against DJGPP gcc 7.2.0 and 12.2.0.
- Compiles clean against the DJGPP gcc 12.2.0 toolchain (the
`cmake/djgpp.cmake` include paths are pinned to that version; the
CI job and the andrewwutw v3.4 release both use 12.2.0).
- Library (`libuc2.a`) builds without changes.
- CLI uses the DOS compat layer in `cli/src/compat/compat_dos.c` for
the BSD `err.h` and POSIX `fnmatch` shims.
@@ -69,9 +71,9 @@ not installed.
## Notes
- The toolchain file forces `CMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY`
because the compiler check would otherwise try to execute a DOS .exe
on the host kernel and fail.
- The toolchain sets `CMAKE_SYSTEM_NAME Generic` and `-nostdinc` with
explicit DJGPP include paths, so the compiler check links a test
binary (rather than running one) and host headers never leak in.
- DJGPP's `unistd.h` provides POSIX-shaped APIs; most of the existing
source compiles unchanged. The library has no DOS-specific code
paths.

View File

@@ -1,55 +0,0 @@
# CMake toolchain file for DJGPP cross-compile (DOS / FreeDOS).
#
# Usage:
# cmake -B build-djgpp -DCMAKE_TOOLCHAIN_FILE=cmake/djgpp-toolchain.cmake
# cmake --build build-djgpp
#
# Requires the DJGPP cross-toolchain on PATH or at DJGPP_ROOT. The standard
# layout from andrewwutw/build-djgpp and the djfdyuruiry/djgpp docker image
# is /usr/local/bin/djgpp/. Override with -DDJGPP_ROOT=<path> if installed
# elsewhere.
set(CMAKE_SYSTEM_NAME Generic) # bare DJGPP DOS, no OS abstractions
set(CMAKE_SYSTEM_PROCESSOR i386)
# Project source uses `if(DJGPP)` to gate the DOS compat layer (cli/src/
# compat/compat_dos.c, sys-include/dos shim). Set the variable up front
# so those guards activate.
set(DJGPP TRUE)
# Locate the toolchain prefix.
if(NOT DEFINED DJGPP_ROOT)
if(EXISTS /usr/local/bin/djgpp)
set(DJGPP_ROOT /usr/local/bin/djgpp)
elseif(EXISTS /opt/djgpp)
set(DJGPP_ROOT /opt/djgpp)
endif()
endif()
if(DEFINED DJGPP_ROOT AND EXISTS ${DJGPP_ROOT})
set(_DJGPP_BIN ${DJGPP_ROOT}/bin)
else()
set(_DJGPP_BIN "")
endif()
set(CMAKE_C_COMPILER ${_DJGPP_BIN}/i586-pc-msdosdjgpp-gcc)
set(CMAKE_CXX_COMPILER ${_DJGPP_BIN}/i586-pc-msdosdjgpp-g++)
set(CMAKE_AR ${_DJGPP_BIN}/i586-pc-msdosdjgpp-ar CACHE FILEPATH "")
set(CMAKE_RANLIB ${_DJGPP_BIN}/i586-pc-msdosdjgpp-ranlib CACHE FILEPATH "")
set(CMAKE_STRIP ${_DJGPP_BIN}/i586-pc-msdosdjgpp-strip CACHE FILEPATH "")
if(DEFINED DJGPP_ROOT)
set(CMAKE_FIND_ROOT_PATH ${DJGPP_ROOT}/i586-pc-msdosdjgpp)
endif()
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
# DJGPP can produce static binaries; tests run inside DOSBox-X.
set(CMAKE_EXE_LINKER_FLAGS_INIT "")
# CMake's compiler check tries to build a test binary. DJGPP-produced
# .exe binaries are valid COFF executables that the host kernel will
# refuse to run, so use STATIC_LIBRARY mode.
set(CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY)

View File

@@ -2,12 +2,16 @@
/* libarchive read handler for UC2 v3 archives.
*
* Status: milestones 1-3.
* Status: milestones 1-6.
* M1 -- bid() with UC2 magic check.
* M2 -- read_header iterates uc2_read_cdir, maps each cdir entry to
* libarchive's archive_entry shape (name, size, mode, mtime).
* M3 -- read_data uses uc2_extract to decompress an entry, buffers
* the result, then yields it via libarchive's pull-style API.
* M4 -- master blocks resolve inside libuc2 during uc2_extract.
* M5 -- multi-file archives with full directory paths composed from
* the cdir's directory ids (parent-before-child not assumed).
* M6 -- tagged entries (Win95 long names) resolved via uc2_get_tag.
*
* Strategy: on the first read_header call we slurp the entire archive
* into memory through __archive_read_ahead, then drive libuc2 against
@@ -51,6 +55,7 @@ struct uc2_la_state {
/* Cached cdir entries. uc2_read_cdir is single-pass; we capture
* everything on the first read_header call. */
struct uc2_entry *entries;
char **paths; /* composed full path per entry */
int n_entries;
int n_capacity;
int next_entry;
@@ -111,16 +116,21 @@ static int
extract_write(void *ctx, const void *p, unsigned len)
{
struct extract_buf *eb = (struct extract_buf *)ctx;
if (eb->len + len > eb->cap) {
size_t ncap = eb->cap ? eb->cap * 2 : 4096;
while (ncap < eb->len + len) ncap *= 2;
if (eb->len + (size_t)len < eb->len) { eb->err = 1; return -1; } /* wrap */
size_t need = eb->len + (size_t)len;
if (need > eb->cap) {
size_t ncap = eb->cap ? eb->cap : 4096;
while (ncap < need) {
if (ncap > ((size_t)-1) / 2) { ncap = need; break; }
ncap *= 2;
}
uint8_t *np = realloc(eb->data, ncap);
if (!np) { eb->err = 1; return -1; }
eb->data = np;
eb->cap = ncap;
}
memcpy(eb->data + eb->len, p, len);
eb->len += len;
eb->len = need;
return (int)len;
}
@@ -306,6 +316,79 @@ collect_entries(struct archive_read *a, struct uc2_la_state *st)
return (ARCHIVE_OK);
}
/* Append the full path of directory `id` (with a trailing slash) to
* buf. Returns the new offset, or -1 on overflow. UC2 directory ids
* are archive-global; root is 0. The depth cap breaks cycles in
* damaged directories. */
static int
build_dir_path(struct uc2_la_state *st, unsigned id,
char *buf, size_t cap, int depth)
{
int i;
if (id == 0)
return (0);
if (depth > 64)
return (-1); /* cyclic or pathologically deep: corrupt cdir */
for (i = 0; i < st->n_entries; i++) {
struct uc2_entry *d = &st->entries[i];
if (d->is_dir && d->id == id) {
int off = build_dir_path(st, d->dirid, buf, cap,
depth + 1);
int n;
if (off < 0)
return (-1);
n = snprintf(buf + off, cap - off, "%s/", d->name);
if (n < 0 || (size_t)n >= cap - off)
return (-1);
return (off + n);
}
}
return (0); /* unknown parent: fall back to root */
}
/* Compose a full path for every entry: parent directories joined with
* '/', directories themselves carrying a trailing slash. */
static int
compose_paths(struct archive_read *a, struct uc2_la_state *st)
{
int i;
st->paths = (char **)calloc((size_t)st->n_entries,
sizeof *st->paths);
if (st->paths == NULL && st->n_entries > 0) {
archive_set_error(&a->archive, ENOMEM,
"UC2: out of memory composing paths");
return (ARCHIVE_FATAL);
}
for (i = 0; i < st->n_entries; i++) {
struct uc2_entry *e = &st->entries[i];
char buf[2048];
int off = build_dir_path(st, e->dirid, buf, sizeof buf, 0);
int n;
if (off < 0) {
archive_set_error(&a->archive, EINVAL,
"UC2: directory path too long");
return (ARCHIVE_FATAL);
}
n = snprintf(buf + off, sizeof buf - off, "%s%s",
e->name, e->is_dir ? "/" : "");
if (n < 0 || (size_t)n >= sizeof buf - off) {
archive_set_error(&a->archive, EINVAL,
"UC2: entry path too long");
return (ARCHIVE_FATAL);
}
st->paths[i] = strdup(buf);
if (st->paths[i] == NULL) {
archive_set_error(&a->archive, ENOMEM,
"UC2: out of memory composing paths");
return (ARCHIVE_FATAL);
}
}
return (ARCHIVE_OK);
}
static int
uc2_la_read_header(struct archive_read *a, struct archive_entry *entry)
{
@@ -321,6 +404,9 @@ uc2_la_read_header(struct archive_read *a, struct archive_entry *entry)
r = collect_entries(a, st);
if (r != ARCHIVE_OK) return r;
r = compose_paths(a, st);
if (r != ARCHIVE_OK) return r;
}
if (st->next_entry >= st->n_entries)
@@ -332,7 +418,7 @@ uc2_la_read_header(struct archive_read *a, struct archive_entry *entry)
st->entry_len = 0;
st->entry_yielded = 0;
archive_entry_set_pathname(entry, e->name);
archive_entry_set_pathname(entry, st->paths[st->next_entry - 1]);
archive_entry_set_size(entry, (la_int64_t)e->size);
archive_entry_set_mtime(entry, dos_to_unix_time(e->dos_time), 0);
@@ -409,6 +495,12 @@ uc2_la_cleanup(struct archive_read *a)
return (ARCHIVE_OK);
if (st->handle)
uc2_close(st->handle);
if (st->paths) {
int i;
for (i = 0; i < st->n_entries; i++)
free(st->paths[i]);
free(st->paths);
}
free(st->data);
free(st->entries);
free(st->entry_data);

View File

@@ -41,6 +41,43 @@ Cross-compile from a Linux host using the DJGPP toolchain:
This produces a DOS executable suitable for DOSBox or real hardware.
libarchive Read Plugin
----------------------
The optional libarchive read handler (``contrib/libarchive/``) lets any
libarchive consumer — ``bsdtar``, file managers, language bindings —
list and extract ``.uc2`` archives. It uses libarchive's internal
read-format API, so it builds against a libarchive **source tree**
rather than an installed ``-devel`` package.
Unpack a libarchive release and build a static library (a
dependency-free configuration is enough for the plugin and its test):
.. code-block:: sh
curl -LO https://github.com/libarchive/libarchive/releases/download/v3.7.7/libarchive-3.7.7.tar.gz
tar xzf libarchive-3.7.7.tar.gz
cmake -S libarchive-3.7.7 -B larch-build -DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF -DENABLE_TEST=OFF
cmake --build larch-build --target archive_static
Then configure UC2 with the plugin enabled, pointing at the source tree
and the static library:
.. code-block:: sh
cmake -B build -DCMAKE_BUILD_TYPE=Release \
-DUC2_BUILD_LIBARCHIVE_PLUGIN=ON \
-DLIBARCHIVE_SOURCE_DIR=$PWD/libarchive-3.7.7 \
-DLIBARCHIVE_LIBRARY=$PWD/larch-build/libarchive/libarchive.a
cmake --build build
This builds ``libuc2_libarchive.a`` and the ``libarchive_roundtrip``
test, which creates archives at multiple compression levels and reads
them back through libarchive's public API, verifying every byte. The
plugin handles multi-file archives with directory paths, master-block
deduplication, and Win95 long names.
Build Options
-------------
@@ -54,6 +91,9 @@ Build Options
* - ``UC2_BUILD_TESTS``
- ``ON``
- Build test programs
* - ``UC2_BUILD_LIBARCHIVE_PLUGIN``
- ``OFF``
- Build the libarchive read handler (needs ``LIBARCHIVE_SOURCE_DIR``)
* - ``CMAKE_BUILD_TYPE``
- (none)
- ``Release``, ``Debug``, ``RelWithDebInfo``

View File

@@ -178,7 +178,9 @@ struct range {
u8 *ptr, *end;
};
static unsigned range_len(struct range *r) {return (unsigned)(r->end - r->ptr);}
/* Defensive: a never-set or stale end (end < ptr) must report an empty
range so range_get() refuses rather than handing out wild pointers. */
static unsigned range_len(struct range *r) {return r->end > r->ptr ? (unsigned)(r->end - r->ptr) : 0;}
struct uc2_context {
char *message;
@@ -659,6 +661,30 @@ static int use_master(struct uc2_context *uc2, u8 buffer[65535], u32 id)
static int cdir_damaged(struct uc2_context *uc2);
/* Writer for the central-directory decode that also enforces a
compression-ratio ceiling. A tiny crafted cdir stream can expand via
long matches into tens of megabytes (a decompression bomb), turning a
few-hundred-byte archive into a multi-second decode. Abort once the
output far outgrows the compressed bytes consumed. */
struct cdir_writer {
struct range out;
struct archive_ctx *src; /* reader context, for bytes consumed */
unsigned base; /* src->offset at decode start */
unsigned long produced;
};
static int cdir_write(void *context, const void *ptr, unsigned size)
{
struct cdir_writer *w = context;
w->produced += size;
unsigned consumed = w->src->offset - w->base;
/* Real cdir metadata compresses well under ~20:1; 64:1 with a
64 KiB floor leaves ample headroom while stopping bombs. */
if (w->produced > 65536 + 64ul * consumed)
return UC2_Damaged;
return buf_write(&w->out, ptr, size);
}
static int decompress_cdir(struct uc2_context *uc2, u32 offset, u16 csum)
{
assert(!uc2->cdir_buf);
@@ -686,15 +712,20 @@ static int decompress_cdir(struct uc2_context *uc2, u32 offset, u16 csum)
struct archive_ctx ar = {.offset = offset, .uc2 = uc2};
struct reader rd = {.read = archive_read, .context = &ar};
struct range wrctx = {.ptr = uc2->cdir_buf, .end = uc2->cdir_buf + size};
struct writer wr = {.write = buf_write, .context = &wrctx};
struct cdir_writer wctx = {
.out = {uc2->cdir_buf, uc2->cdir_buf + size},
.src = &ar, .base = offset
};
struct writer wr = {.write = cdir_write, .context = &wctx};
u16 cs;
ret = decompressor(uc2, get16(c.method), &rd, &wr, NoMaster, 100000000, &cs);
if (ret < 0)
return ret;
goto fail;
if (cs != csum)
return cdir_damaged(uc2);
if (cs != csum) {
ret = cdir_damaged(uc2);
goto fail;
}
if ((unsigned)ret <= size)
break;
@@ -704,8 +735,20 @@ static int decompress_cdir(struct uc2_context *uc2, u32 offset, u16 csum)
uc2->cdir_buf = u_free(uc2, uc2->cdir_buf);
}
uc2->cdir_range.end = uc2->cdir_buf + size;
/* Bound the walk to the bytes actually decompressed, not the
allocation. A damaged cdir that passes the 16-bit checksum by
chance would otherwise be parsed into uninitialised heap between
the real end and the buffer end. */
uc2->cdir_range.end = uc2->cdir_buf + (unsigned)ret;
return 0;
/* On error, free cdir_buf and leave it NULL so the invariant
"cdir_buf != NULL iff cdir_range is fully valid" holds; otherwise
a later uc2_read_cdir / uc2_finish_cdir would walk a range whose
end was never set, handing out wild pointers. */
fail:
uc2->cdir_buf = u_free(uc2, uc2->cdir_buf);
return ret;
}
static int start_read(struct uc2_context *uc2);
@@ -947,7 +990,10 @@ static int cdir_damaged(struct uc2_context *uc2)
struct delta {
u8 size;
u8 index;
u8 val[8];
/* size is the delta stride; decompressor() accepts methods up to 49,
giving strides up to 10, so val[] must cover that (was [8], which
both read out of bounds and mis-decoded strides 9-10). */
u8 val[16];
};
static void delta_init(struct delta *db, u8 type)
@@ -1076,6 +1122,10 @@ static int decompressor_rans(struct uc2_context *uc2, unsigned master_id,
if (ret < 0) { u_free(uc2, buf); return ret; }
u16 tail = (u16)ret;
u16 wpos = tail; /* window position of the next unwritten output byte */
/* Bytes written into the 64KB window so far (master fill + output),
saturated at the window size. A match distance must not exceed it,
else (u16)(tail - dist) would reference unwritten window bytes. */
unsigned produced = (unsigned)ret;
struct csum cs;
csum_init(&cs);
unsigned remaining = limit;
@@ -1117,6 +1167,7 @@ static int decompressor_rans(struct uc2_context *uc2, unsigned master_id,
if (remaining) {
buf[tail++] = (u8)sym;
remaining--;
if (produced < 65536) produced++;
if ((u16)(tail - wpos) >= 0x8000) {
ret = rans_flush(wr, &cs, buf, &wpos, tail);
if (ret < 0) { bi.err = ret; break; }
@@ -1142,9 +1193,11 @@ static int decompressor_rans(struct uc2_context *uc2, unsigned master_id,
(ls == 26) ? 667+(bits_get(&bi,11) & 0x7ff) :
2715+(bits_get(&bi,15) & 0x7fff);
if (bi.err) break;
if (dist > produced) { bi.err = UC2_Damaged; break; }
for (unsigned j = 0; j < length && remaining > 0; j++) {
buf[tail] = buf[(u16)(tail - dist)];
tail++; remaining--;
if (produced < 65536) produced++;
if ((u16)(tail - wpos) >= 0x8000) {
ret = rans_flush(wr, &cs, buf, &wpos, tail);
if (ret < 0) { bi.err = ret; break; }
@@ -1177,6 +1230,7 @@ static int decompressor_rans(struct uc2_context *uc2, unsigned master_id,
struct cbuffer {
u16 head, tail;
unsigned limit;
unsigned produced; /* bytes written to the window (master + output), <= 0x10000 */
struct csum csum;
u8 data[0x10000];
};
@@ -1348,6 +1402,8 @@ static int ht_dec(u8 lengths[NumSymbols], struct dcinfo *dc, struct bits *bi, u3
if (c < 0)
return c;
int n = c + MinRepeat - 1;
if (n > (int)(syme - symp))
return UC2_Damaged; /* malformed tree overruns stream[] */
for (; n > 0; n--)
*symp++ = val;
} else {
@@ -1444,6 +1500,7 @@ static int decompressor_ultra(struct uc2_context *uc2, unsigned master, unsigned
goto ret;
ultra->cb.limit = limit;
ultra->cb.head = ultra->cb.tail = ret;
ultra->cb.produced = ret;
csum_init(&ultra->cb.csum);
u8 *dbuf = 0;
@@ -1539,9 +1596,10 @@ static int decompress_block(struct ultra *ultra)
int c = huff(ultra->bd_table, &ultra->bi);
if (c < 0)
return c;
if (!(c & 1<<16))
if (!(c & 1<<16)) {
ultra->cb.data[ultra->cb.tail++] = (u8)c;
else {
if (ultra->cb.produced < 65536) ultra->cb.produced++;
} else {
unsigned dist = c & 0xffff;
c = c >> 20 & 0xf;
if (c)
@@ -1558,10 +1616,24 @@ static int decompress_block(struct ultra *ultra)
c = c >> 20 & 0xf;
if (c)
len += bits_get(&ultra->bi, c);
assert(cbuf_space(&ultra->cb) >= len);
/* On valid data the loop guard below keeps len within the
window (<= 35482 <= cbuf_space at block entry). A
corrupt or truncated stream can underflow len (a short
bits_get returns negative); the original assert caught
that only in debug builds, so NDEBUG would let the copy
overrun cb.data. Bail cleanly instead -- the checksum
path then reports the damage. */
if (len > cbuf_space(&ultra->cb))
return UC2_Damaged;
/* dist must reference already-written history; a too-large
dist (or a negative bits_get above wrapping it huge) would
read unwritten/uninitialised window bytes into the output. */
if (dist == 0 || dist > ultra->cb.produced)
return UC2_Damaged;
do {
ultra->cb.data[ultra->cb.tail] = ultra->cb.data[(u16)(ultra->cb.tail - dist)];
ultra->cb.tail++;
if (ultra->cb.produced < 65536) ultra->cb.produced++;
} while (--len);
}

View File

@@ -64,7 +64,7 @@ int uc2_blockstore_ingest(struct uc2_blockstore *bs,
uint32_t off = tree->chunks[i].offset;
uint32_t clen = tree->chunks[i].length;
if (off + clen > len) continue;
if (off > len || clen > len - off) continue; /* overflow-safe */
if (uc2_blockstore_has(bs, h)) {
bs->saved_bytes += clen;

View File

@@ -48,6 +48,7 @@ int uc2_dict_verify(const struct uc2_dict *dict)
size_t uc2_dict_serialize(const struct uc2_dict *dict, uint8_t **out)
{
if (dict->size > (1u << 30)) { *out = NULL; return 0; } /* sane cap; no wrap */
size_t total = HDR_SIZE + dict->size;
uint8_t *buf = malloc(total);
if (!buf) { *out = NULL; return 0; }

View File

@@ -193,6 +193,16 @@ int uc2_ingest_write(const char *archive_path,
/* Reserve manifest entry table; we'll backfill offsets after
* appending the chunk pool. */
long manifest_off = ftell(f);
if (manifest_off < 0) {
fclose(f);
uc2_merkle_free(&tree);
return -1;
}
if (tree.nchunks < 0 || tree.nchunks > (1 << 24)) {
fclose(f);
uc2_merkle_free(&tree);
return -1;
}
size_t manifest_size = (size_t)tree.nchunks * ENTRY_SIZE_V2;
if (tree.nchunks > 0) {
uint8_t *zero = calloc(manifest_size, 1);
@@ -361,6 +371,11 @@ static int restore_v2(FILE *f, uint32_t nchunks, FILE *out)
if (nchunks == 0)
return 0;
/* nchunks comes from the (untrusted) archive header; cap it so the
manifest size cannot wrap (notably on 32-bit) and to bound memory.
16M chunks exceeds any archive within the 4 GiB container limit. */
if (nchunks > (1u << 24))
return -1;
uint8_t *manifest = malloc((size_t)nchunks * ENTRY_SIZE_V2);
if (!manifest) return -1;
if (fread(manifest, 1, (size_t)nchunks * ENTRY_SIZE_V2, f)

View File

@@ -46,9 +46,13 @@ void uc2_merkle_build(struct uc2_merkle *tree,
if (clen == 0) break;
if (tree->nchunks >= tree->capacity) {
tree->capacity = tree->capacity ? tree->capacity * 2 : 16;
tree->chunks = realloc(tree->chunks,
(size_t)tree->capacity * sizeof *tree->chunks);
int ncap = tree->capacity ? tree->capacity * 2 : 16;
struct uc2_chunk *nc = realloc(tree->chunks,
(size_t)ncap * sizeof *tree->chunks);
if (!nc)
break; /* out of memory: keep chunks gathered so far */
tree->chunks = nc;
tree->capacity = ncap;
}
struct uc2_chunk *c = &tree->chunks[tree->nchunks++];
c->hash = uc2_hash64(data + off, clen);

View File

@@ -98,6 +98,12 @@ int uc2_bwt_revert(const uint8_t *data, size_t len,
{
if (len == 0) { *out = NULL; return 0; }
/* primary_index indexes data[]/T[]; reject an out-of-range value
(it can come from an untrusted stream). Also guard the T[]
allocation multiply against wrap on 32-bit. */
if (primary_index >= len || len > ((size_t)-1) / sizeof(uint32_t))
return -1;
uint8_t *result = malloc(len);
uint32_t *T = malloc(len * sizeof(uint32_t));
if (!result || !T) { free(result); free(T); return -1; }

View File

@@ -155,6 +155,26 @@ if(Python3_Interpreter_FOUND)
)
endif()
# libarchive plugin round-trip. Needs -DUC2_BUILD_LIBARCHIVE_PLUGIN=ON,
# -DLIBARCHIVE_SOURCE_DIR=<source tree>, and -DLIBARCHIVE_LIBRARY=<built
# libarchive.a> (a deps-disabled static build is enough; see docs).
if(TARGET uc2_libarchive AND DEFINED LIBARCHIVE_LIBRARY
AND DEFINED LIBARCHIVE_SOURCE_DIR)
add_executable(test_libarchive_uc2 src/test_libarchive_uc2.c)
target_include_directories(test_libarchive_uc2 PRIVATE
"${LIBARCHIVE_SOURCE_DIR}/libarchive")
target_link_libraries(test_libarchive_uc2 PRIVATE
uc2_libarchive "${LIBARCHIVE_LIBRARY}" uc2)
target_compile_features(test_libarchive_uc2 PRIVATE c_std_99)
add_test(NAME libarchive_roundtrip
COMMAND ${CMAKE_COMMAND}
-DUC2_CLI=$<TARGET_FILE:uc2-cli>
-DLA_TEST=$<TARGET_FILE:test_libarchive_uc2>
-DTEST_DIR=${CMAKE_CURRENT_BINARY_DIR}/libarchive_test
-P ${CMAKE_CURRENT_SOURCE_DIR}/test_cli_libarchive.cmake
)
endif()
# Cross-tool round-trip: UC2 v3 <-> original uc2pro.exe via DOSBox-X
add_test(NAME roundtrip_dosbox
COMMAND bash ${CMAKE_CURRENT_SOURCE_DIR}/scripts/roundtrip_dosbox.sh

42
tests/fuzz/README.md Normal file
View File

@@ -0,0 +1,42 @@
# Fuzzing the UC2 reader
`fuzz_extract.c` is a libFuzzer harness that drives the full read path
(`uc2_open` -> `uc2_read_cdir` -> `uc2_finish_cdir` -> `uc2_extract`)
over arbitrary bytes with an in-memory reader and a discard writer. It
targets the code that parses **untrusted** `.uc2` archives.
It is intentionally **not** part of the CMake build or CI: libFuzzer
needs a Clang toolchain, and a fuzz run is open-ended rather than
pass/fail. Build and run it by hand.
## Build
Compile the harness together with the library sources and the embedded
super-master, against a configured build tree (for `uc2_version.h` and
`super_data.S`):
```sh
cmake -B build-asan -DCMAKE_BUILD_TYPE=Debug # any tree works; provides the generated files
clang -fsanitize=fuzzer,address -O1 -g \
-Ilib/include -Ilib/src -Ibuild-asan/lib \
tests/fuzz/fuzz_extract.c $(ls lib/src/*.c) build-asan/lib/super_data.S \
-lm -o fuzz_extract
```
## Run
```sh
mkdir -p corpus && cp tests/archives/*.uc2 corpus/
./fuzz_extract -max_len=65536 -timeout=25 corpus/
```
ASan flags any out-of-bounds access; libFuzzer writes a `crash-*` (or
`timeout-*`) artifact for each finding. Re-run a single artifact with
`./fuzz_extract <artifact>`.
## Status
Memory-safety: clean over sustained runs after the 2026-06-13 cdir
hardening (git-bug 69e8e52). A residual slow-input (decompression-bomb)
timeout is tracked separately; it is a bounded-CPU issue, not a
memory-safety one.

78
tests/fuzz/fuzz_extract.c Normal file
View File

@@ -0,0 +1,78 @@
/* libFuzzer harness for the UC2 read path.
*
* Feeds the fuzzer-provided bytes as a .uc2 archive through the full
* open -> read_cdir -> finish_cdir -> extract flow with an in-memory
* reader and a discard writer. The decoder must never read or write
* out of bounds on any input.
*
* Build (clang):
* clang -fsanitize=fuzzer,address -O1 -g -Ilib/include -Ilib/src \
* -I<builddir>/lib tests/fuzz/fuzz_extract.c lib/src/*.c \
* <builddir>/lib/super_data.S -o fuzz_extract
* Run: ./fuzz_extract -max_len=65536 corpus/
*/
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <uc2/libuc2.h>
struct mem { const uint8_t *data; unsigned avail; };
static int mem_read(void *ctx, unsigned pos, void *buf, unsigned len)
{
struct mem *m = ctx;
if (pos >= m->avail)
return 0;
unsigned n = m->avail - pos;
if (n > len)
n = len;
memcpy(buf, m->data + pos, n);
return (int)n;
}
static void *mem_alloc(void *ctx, unsigned size) { (void)ctx; return malloc(size); }
static void mem_free(void *ctx, void *ptr) { (void)ctx; free(ptr); }
static int discard(void *ctx, const void *p, unsigned len)
{ (void)ctx; (void)p; (void)len; return 0; }
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
{
if (size > (1u << 20)) /* bound work; the format is small anyway */
return 0;
struct uc2_io io = { .read = mem_read, .alloc = mem_alloc, .free = mem_free };
struct mem m = { .data = data, .avail = (unsigned)size };
uc2_handle h = uc2_open(&io, &m);
if (!h)
return 0;
struct uc2_entry entries[64];
int n = 0;
for (int guard = 0; guard < 100000; guard++) {
struct uc2_entry e;
int ret = uc2_read_cdir(h, &e);
if (ret == UC2_End || ret < 0)
break;
while (ret == UC2_TaggedEntry) {
char *tag; void *d; unsigned sz;
ret = uc2_get_tag(h, &e, &tag, &d, &sz);
if (ret < 0)
break;
}
if (ret < 0)
break;
if (!e.is_dir && n < (int)(sizeof entries / sizeof *entries))
entries[n++] = e;
}
char label[12];
uc2_finish_cdir(h, label);
for (int i = 0; i < n; i++)
uc2_extract(h, &entries[i].xi, entries[i].size, discard, 0);
uc2_close(h);
return 0;
}

View File

@@ -4,11 +4,16 @@
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <sys/stat.h>
#ifdef _MSC_VER
#include <process.h>
#include <io.h>
#include <direct.h>
#define getpid _getpid
#define rmdir _rmdir
#else
#include <unistd.h>
#include <dirent.h>
#endif
#include <uc2/uc2_blockstore.h>
#include <uc2/uc2_merkle.h>
@@ -18,6 +23,18 @@ static int tests_run = 0, tests_passed = 0;
static char store_path[256];
/* Temp-file base: %TEMP% on Windows, /tmp elsewhere. */
static const char *tmpdir(void)
{
#ifdef _WIN32
const char *t = getenv("TEMP");
if (!t) t = getenv("TMP");
return t ? t : ".";
#else
return "/tmp";
#endif
}
static void fill_random(uint8_t *buf, size_t len, uint32_t seed)
{
for (size_t i = 0; i < len; i++) {
@@ -26,12 +43,46 @@ static void fill_random(uint8_t *buf, size_t len, uint32_t seed)
}
}
/* Recursive rm -rf (simple, for test cleanup) */
static void rmrf(const char *path)
/* Portable recursive removal for the store's two-level layout. */
static void rmtree(const char *path)
{
char cmd[512];
snprintf(cmd, sizeof cmd, "rm -rf '%s'", path);
system(cmd);
#ifdef _MSC_VER
char pattern[512];
struct _finddata_t fd;
snprintf(pattern, sizeof pattern, "%s/*", path);
intptr_t h = _findfirst(pattern, &fd);
if (h != -1) {
do {
if (strcmp(fd.name, ".") == 0 || strcmp(fd.name, "..") == 0)
continue;
char sub[512];
snprintf(sub, sizeof sub, "%s/%s", path, fd.name);
if (fd.attrib & _A_SUBDIR)
rmtree(sub);
else
remove(sub);
} while (_findnext(h, &fd) == 0);
_findclose(h);
}
#else
DIR *d = opendir(path);
if (d) {
struct dirent *e;
while ((e = readdir(d))) {
if (strcmp(e->d_name, ".") == 0 || strcmp(e->d_name, "..") == 0)
continue;
char sub[512];
snprintf(sub, sizeof sub, "%s/%s", path, e->d_name);
struct stat st;
if (stat(sub, &st) == 0 && S_ISDIR(st.st_mode))
rmtree(sub);
else
remove(sub);
}
closedir(d);
}
#endif
rmdir(path);
}
static void test_open_close(void)
@@ -180,24 +231,24 @@ static void test_has(void)
int main(void)
{
snprintf(store_path, sizeof store_path, "/tmp/uc2_blockstore_test_%d",
(int)getpid());
snprintf(store_path, sizeof store_path, "%s/uc2_blockstore_test_%d",
tmpdir(), (int)getpid());
printf("Block store tests:\n");
rmrf(store_path); /* clean start */
rmtree(store_path); /* clean start */
TEST(test_open_close);
rmrf(store_path);
rmtree(store_path);
TEST(test_ingest_single);
rmrf(store_path);
rmtree(store_path);
TEST(test_dedup_identical);
rmrf(store_path);
rmtree(store_path);
TEST(test_read_back);
rmrf(store_path);
rmtree(store_path);
TEST(test_cross_archive_dedup);
rmrf(store_path);
rmtree(store_path);
TEST(test_has);
rmrf(store_path);
rmtree(store_path);
printf("%d/%d tests passed\n", tests_passed, tests_run);
return tests_passed == tests_run ? 0 : 1;

View File

@@ -18,11 +18,41 @@ static int tests_run = 0, tests_passed = 0;
static char tmp_archive[256];
static void rmrf(const char *path)
/* Temp-file base: %TEMP% on Windows, /tmp elsewhere. */
static const char *tmpdir(void)
{
char cmd[768];
snprintf(cmd, sizeof cmd, "rm -rf '%s' '%s.blocks'", path, path);
system(cmd);
#ifdef _WIN32
const char *t = getenv("TEMP");
if (!t) t = getenv("TMP");
return t ? t : ".";
#else
return "/tmp";
#endif
}
/* Remove the archive and its derived files. v2 archives never create
* the .blocks sidecar, so plain remove() covers everything. */
static void cleanup(const char *path)
{
char buf[320];
remove(path);
snprintf(buf, sizeof buf, "%s.out", path);
remove(buf);
snprintf(buf, sizeof buf, "%s.blocks", path);
remove(buf);
}
/* fopen that fails the test loudly: assert() is compiled out in
* Release builds, and continuing with a NULL stream trips the MSVC
* CRT invalid-parameter fail-fast instead of a test failure. */
static FILE *xfopen(const char *path, const char *mode)
{
FILE *f = fopen(path, mode);
if (!f) {
fprintf(stderr, "FATAL: cannot open %s (mode %s)\n", path, mode);
exit(1);
}
return f;
}
static void fill_random(uint8_t *buf, size_t len, uint32_t seed)
@@ -37,7 +67,10 @@ static void fill_random(uint8_t *buf, size_t len, uint32_t seed)
static uint8_t *slurp(const char *path, size_t *out_len)
{
FILE *f = fopen(path, "rb");
if (!f) return NULL;
if (!f) {
fprintf(stderr, "FATAL: cannot slurp %s\n", path);
exit(1);
}
fseek(f, 0, SEEK_END);
long n = ftell(f);
fseek(f, 0, SEEK_SET);
@@ -50,7 +83,7 @@ static uint8_t *slurp(const char *path, size_t *out_len)
static void test_roundtrip_small(void)
{
rmrf(tmp_archive);
cleanup(tmp_archive);
const char *msg = "hello world";
struct uc2_ingest_stats st;
int rc = uc2_ingest_write(tmp_archive,
@@ -64,8 +97,7 @@ static void test_roundtrip_small(void)
char restored[320];
snprintf(restored, sizeof restored, "%s.out", tmp_archive);
FILE *out = fopen(restored, "wb");
assert(out);
FILE *out = xfopen(restored, "wb");
rc = uc2_ingest_restore(tmp_archive, out);
fclose(out);
assert(rc == 0);
@@ -75,13 +107,13 @@ static void test_roundtrip_small(void)
assert(got_len == strlen(msg));
assert(memcmp(got, msg, got_len) == 0);
free(got);
unlink(restored);
rmrf(tmp_archive);
remove(restored);
cleanup(tmp_archive);
}
static void test_roundtrip_multichunk(void)
{
rmrf(tmp_archive);
cleanup(tmp_archive);
const size_t N = 200000;
uint8_t *data = malloc(N);
fill_random(data, N, 0x12345678);
@@ -95,8 +127,7 @@ static void test_roundtrip_multichunk(void)
char restored[320];
snprintf(restored, sizeof restored, "%s.out", tmp_archive);
FILE *out = fopen(restored, "wb");
assert(out);
FILE *out = xfopen(restored, "wb");
rc = uc2_ingest_restore(tmp_archive, out);
fclose(out);
assert(rc == 0);
@@ -108,13 +139,13 @@ static void test_roundtrip_multichunk(void)
free(got);
free(data);
unlink(restored);
rmrf(tmp_archive);
remove(restored);
cleanup(tmp_archive);
}
static void test_intra_call_dedup(void)
{
rmrf(tmp_archive);
cleanup(tmp_archive);
/* Concatenate the same random buffer twice -- CDC produces the
* same chunk hashes for both halves, so half the chunks should
* dedup within a single ingest call. */
@@ -137,8 +168,7 @@ static void test_intra_call_dedup(void)
* structurally transparent. */
char restored[320];
snprintf(restored, sizeof restored, "%s.out", tmp_archive);
FILE *out = fopen(restored, "wb");
assert(out);
FILE *out = xfopen(restored, "wb");
rc = uc2_ingest_restore(tmp_archive, out);
fclose(out);
assert(rc == 0);
@@ -150,13 +180,13 @@ static void test_intra_call_dedup(void)
free(got);
free(data);
unlink(restored);
rmrf(tmp_archive);
remove(restored);
cleanup(tmp_archive);
}
static void test_v2_self_contained(void)
{
rmrf(tmp_archive);
cleanup(tmp_archive);
/* A v2 archive must restore correctly even if the legacy sidecar
* blockstore directory is absent. The chunk pool lives inside
* the archive file itself. */
@@ -179,8 +209,7 @@ static void test_v2_self_contained(void)
char restored[320];
snprintf(restored, sizeof restored, "%s.out", tmp_archive);
FILE *out = fopen(restored, "wb");
assert(out);
FILE *out = xfopen(restored, "wb");
rc = uc2_ingest_restore(tmp_archive, out);
fclose(out);
assert(rc == 0);
@@ -192,13 +221,13 @@ static void test_v2_self_contained(void)
free(got);
free(data);
unlink(restored);
rmrf(tmp_archive);
remove(restored);
cleanup(tmp_archive);
}
static void test_empty_stream(void)
{
rmrf(tmp_archive);
cleanup(tmp_archive);
struct uc2_ingest_stats st;
int rc = uc2_ingest_write(tmp_archive, NULL, 0, 0, &st);
assert(rc == 0);
@@ -208,8 +237,7 @@ static void test_empty_stream(void)
char restored[320];
snprintf(restored, sizeof restored, "%s.out", tmp_archive);
FILE *out = fopen(restored, "wb");
assert(out);
FILE *out = xfopen(restored, "wb");
rc = uc2_ingest_restore(tmp_archive, out);
fclose(out);
assert(rc == 0);
@@ -218,35 +246,35 @@ static void test_empty_stream(void)
uint8_t *got = slurp(restored, &got_len);
assert(got_len == 0);
free(got);
unlink(restored);
rmrf(tmp_archive);
remove(restored);
cleanup(tmp_archive);
}
static void test_bad_magic_rejected(void)
{
rmrf(tmp_archive);
FILE *f = fopen(tmp_archive, "wb");
assert(f);
cleanup(tmp_archive);
FILE *f = xfopen(tmp_archive, "wb");
const char garbage[16] = "not-a-uc2-ingest";
fwrite(garbage, 1, sizeof garbage, f);
fclose(f);
FILE *out = fopen("/dev/null", "wb");
#ifdef _MSC_VER
if (!out) out = fopen("NUL", "wb");
#endif
assert(out);
if (!out) {
fprintf(stderr, "FATAL: no null device\n");
exit(1);
}
int rc = uc2_ingest_restore(tmp_archive, out);
fclose(out);
assert(rc != 0);
(void)rc;
rmrf(tmp_archive);
cleanup(tmp_archive);
}
int main(void)
{
snprintf(tmp_archive, sizeof tmp_archive,
"/tmp/uc2_ingest_test_%d.uc2", (int)getpid());
"%s/uc2_ingest_test_%d.uc2", tmpdir(), (int)getpid());
printf("Running uc2_ingest tests...\n");
TEST(test_roundtrip_small);

View File

@@ -0,0 +1,134 @@
/* Round-trip verification of the libarchive UC2 read plugin.
*
* Usage: test_libarchive_uc2 <archive.uc2> <originals-dir>
*
* Opens the archive through libarchive's public API with the UC2
* format registered, walks every entry, extracts the data, and
* compares it byte-for-byte against <originals-dir>/<entry-name>.
* Exit 0 only if every file entry matches.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <archive.h>
#include <archive_entry.h>
extern int archive_read_support_format_uc2(struct archive *);
static unsigned char *slurp(const char *path, size_t *out_len)
{
FILE *f = fopen(path, "rb");
if (!f) {
fprintf(stderr, "FAIL: cannot open original %s\n", path);
exit(1);
}
fseek(f, 0, SEEK_END);
long n = ftell(f);
fseek(f, 0, SEEK_SET);
if (n < 0) {
fprintf(stderr, "FAIL: ftell %s\n", path);
exit(1);
}
unsigned char *buf = malloc(n > 0 ? (size_t)n : 1);
if (!buf) {
fprintf(stderr, "FAIL: malloc\n");
exit(1);
}
*out_len = fread(buf, 1, (size_t)n, f);
fclose(f);
return buf;
}
int main(int argc, char **argv)
{
if (argc != 3) {
fprintf(stderr, "usage: %s <archive.uc2> <originals-dir>\n",
argv[0]);
return 2;
}
struct archive *a = archive_read_new();
if (!a) return 2;
if (archive_read_support_format_uc2(a) != ARCHIVE_OK) {
fprintf(stderr, "FAIL: cannot register UC2 format: %s\n",
archive_error_string(a));
return 1;
}
if (archive_read_open_filename(a, argv[1], 65536) != ARCHIVE_OK) {
fprintf(stderr, "FAIL: open %s: %s\n", argv[1],
archive_error_string(a));
return 1;
}
int nfiles = 0, ndirs = 0, bad = 0;
struct archive_entry *e;
int r;
while ((r = archive_read_next_header(a, &e)) == ARCHIVE_OK) {
const char *name = archive_entry_pathname(e);
if (archive_entry_filetype(e) == AE_IFDIR) {
ndirs++;
continue;
}
la_int64_t want = archive_entry_size(e);
size_t cap = want > 0 ? (size_t)want : 1;
unsigned char *got = malloc(cap);
if (!got) {
fprintf(stderr, "FAIL: malloc\n");
return 1;
}
size_t got_len = 0;
for (;;) {
la_ssize_t n = archive_read_data(a, got + got_len,
cap - got_len);
if (n < 0) {
fprintf(stderr, "FAIL: read_data %s: %s\n",
name, archive_error_string(a));
return 1;
}
if (n == 0)
break;
got_len += (size_t)n;
if (got_len == cap)
break;
}
if ((la_int64_t)got_len != want) {
fprintf(stderr, "BAD: %s: size %zu, header said %lld\n",
name, got_len, (long long)want);
bad++;
free(got);
nfiles++;
continue;
}
char opath[4096];
snprintf(opath, sizeof opath, "%s/%s", argv[2], name);
size_t ref_len;
unsigned char *ref = slurp(opath, &ref_len);
if (ref_len != got_len || memcmp(ref, got, got_len) != 0) {
fprintf(stderr, "BAD: %s: content mismatch "
"(%zu vs %zu bytes)\n", name, got_len, ref_len);
bad++;
}
free(ref);
free(got);
nfiles++;
}
if (r != ARCHIVE_EOF) {
fprintf(stderr, "FAIL: next_header: %s\n",
archive_error_string(a));
return 1;
}
archive_read_free(a);
printf("libarchive round-trip: %d files (%d dirs), %d bad\n",
nfiles, ndirs, bad);
if (nfiles == 0) {
fprintf(stderr, "FAIL: no file entries found\n");
return 1;
}
return bad ? 1 : 0;
}

View File

@@ -0,0 +1,39 @@
# Round-trip test for the libarchive UC2 read plugin: the uc2 CLI
# creates archives (Huffman and rANS), then test_libarchive_uc2 reads
# them back through libarchive's public API and verifies every byte.
file(REMOVE_RECURSE "${TEST_DIR}")
file(MAKE_DIRECTORY "${TEST_DIR}/input/subdir")
file(WRITE "${TEST_DIR}/input/hello.txt" "Hello from libarchive!\n")
string(REPEAT "The quick brown fox jumps over the lazy dog.\n" 200 REPEATED)
file(WRITE "${TEST_DIR}/input/repeated.txt" "${REPEATED}")
string(RANDOM LENGTH 8192 RANDOM_SEED 99 BLOB)
file(WRITE "${TEST_DIR}/input/blob.dat" "${BLOB}")
file(WRITE "${TEST_DIR}/input/subdir/nested_long_file_name.txt"
"nested content with a long name\n")
file(WRITE "${TEST_DIR}/input/empty.dat" "")
foreach(LEVEL 4 6)
set(ARCHIVE "${TEST_DIR}/la${LEVEL}.uc2")
execute_process(
COMMAND "${UC2_CLI}" -q -w -L ${LEVEL} "${ARCHIVE}"
hello.txt repeated.txt blob.dat empty.dat subdir
WORKING_DIRECTORY "${TEST_DIR}/input"
RESULT_VARIABLE RC
)
if(NOT RC EQUAL 0)
message(FATAL_ERROR "uc2 -w -L ${LEVEL} failed: ${RC}")
endif()
execute_process(
COMMAND "${LA_TEST}" "${ARCHIVE}" "${TEST_DIR}/input"
RESULT_VARIABLE RC
OUTPUT_VARIABLE OUT
ERROR_VARIABLE OUT
)
message(STATUS "L${LEVEL}: ${OUT}")
if(NOT RC EQUAL 0)
message(FATAL_ERROR "libarchive round-trip failed at -L ${LEVEL}")
endif()
endforeach()