Commit Graph

10 Commits

Author SHA1 Message Date
Ludovic Courtès
5a7cb59648
deduplication: Detect holes and create sparse files.
This reduces disk usage of sparse files that are substituted such as
Guile object files (ELF files).  As of Guile 3.0.9, .go files are sparse
due to ELF sections being aligned on 64 KiB boundaries.

This reduces disk usage reported by “du -sh” by 9% for the ‘guix’
package, by 23% for ‘guile’, and by 35% for ‘guile-git’.

* guix/store/deduplication.scm (hole-size, find-holes): New procedures.
(tee)[seekable?]: New variable.
[read!]: Add case when SEEKABLE? is true.
* tests/store-deduplication.scm (cartesian-product): New procedure.
("copy-file/deduplicate, sparse files (holes: ~a/~a/~a)"): New test set.

Change-Id: Iad2ab7830dcb1220e2026f4a127a6c718afa8964
2024-05-25 16:44:42 +02:00
Ludovic Courtès
b129026e2e
deduplicate: Use 'sendfile' for small file copies.
* guix/store/deduplication.scm (dump-file/deduplicate): Use 'sendfile'
instead of 'dump-port'.
* tests/store-deduplication.scm ("copy-file/deduplicate, below %deduplication-minimum-size"):
New test.
2022-12-10 10:56:48 +01:00
Ludovic Courtès
472a0e82a5
daemon: Do not deduplicate files smaller than 8 KiB.
Files smaller than 8 KiB typically represent ~70% of the entries in
/gnu/store/.links but only contribute to ~4% of the space savings
afforded by deduplication.

Not considering these files for deduplication speeds up file insertion
in the store and, more importantly, leaves 'removeUnusedLinks' with
fewer entries to traverse, thereby speeding it up proportionally.

Partly fixes <https://issues.guix.gnu.org/24937>.

* config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK
definition.
* guix/store/deduplication.scm (%deduplication-minimum-size): New
variable.
(deduplicate)[loop]: Do not recurse when FILE's size is below
%DEDUPLICATION-MINIMUM-SIZE.
(dump-port): New procedure.
(dump-file/deduplicate)[hash]: Turn into...
[dump-and-compute-hash]: ... this thunk.
Call 'deduplicate' only when SIZE is greater than
%DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'.
* nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where
st.st_size < deduplicationMinSize.
* nix/libstore/local-store.hh (deduplicationMinSize): New declaration.
* nix/libstore/optimise-store.cc (deduplicationMinSize): New variable.
(LocalStore::optimisePath_): Return when PATH is a symlink or smaller
than 'deduplicationMinSize'.
* tests/derivations.scm ("identical files are deduplicated"): Produce
files bigger than %DEDUPLICATION-MINIMUM-SIZE.
* tests/nar.scm ("restore-file-set with directories (signed, valid)"):
Likewise.
* tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"):
New test.
("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than
%DEDUPLICATION-MINIMUM-SIZE.
* tests/store.scm ("substitute, deduplication"): Likewise.
2021-11-16 14:34:28 +01:00
Ludovic Courtès
7530e491b5
deduplicate: Create the '.links' directory lazily.
This avoids repeated (mkdir-p "/gnu/store/.links") calls when
deduplicating lots of files.

* guix/store/deduplication.scm (deduplicate): Remove initial call to
'mkdir-p'.  Add ENOENT case in 'link' exception handler.  Reindent.
* tests/store-deduplication.scm ("deduplicate, ENOSPC"): Check
for (<= links 4) to account for the initial 'link' call.
2020-12-15 17:32:12 +01:00
Ludovic Courtès
6a060ff27f
store-copy: 'populate-store' can optionally deduplicate files.
Until now deduplication was performed as an additional pass after
copying files, which involve re-traversing all the files that had just
been copied.

* guix/store/deduplication.scm (copy-file/deduplicate): New procedure.
* tests/store-deduplication.scm ("copy-file/deduplicate"): New test.
* guix/build/store-copy.scm (populate-store): Add #:deduplicate?
parameter and honor it.
* tests/gexp.scm ("gexp->derivation, store copy"): Pass #:deduplicate? #f
to 'populate-store'.
* gnu/build/image.scm (initialize-root-partition): Pass #:deduplicate?
to 'populate-store'.  Pass #:deduplicate? #f to 'register-closure'.
* gnu/build/vm.scm (root-partition-initializer): Likewise.
* gnu/build/install.scm (populate-single-profile-directory): Pass
 #:deduplicate? #f to 'populate-store'.
* gnu/build/linux-initrd.scm (build-initrd): Likewise.
* guix/scripts/pack.scm (self-contained-tarball)[import-module?]: New
procedure.
[build]: Pass it as an argument to 'source-module-closure'.
* guix/scripts/pack.scm (squashfs-image)[build]: Wrap in
'with-extensions'.
* gnu/system/linux-initrd.scm (expression->initrd)[import-module?]: New
procedure.
[builder]: Pass it to 'source-module-closure'.
* gnu/system/install.scm (cow-store-service-type)[import-module?]: New
procedure.  Pass it to 'source-module-closure'.
2020-12-15 17:32:10 +01:00
Ludovic Courtès
adb158b739
deduplication: Gracefully handle ENOSPC raised by 'link' calls.
Reported by Andreas Enge <andreas@enge.fr>
in <https://bugs.gnu.org/33676>.

* guix/store/deduplication.scm (replace-with-link): Catch ENOSPC around
'get-temp-link'.  Do nothing when 'get-temp-link' throws ENOSPC.  Move
code to restore PARENT's permissions outside of 'catch'.
* tests/store-deduplication.scm ("deduplicate, ENOSPC"): New test.
2018-12-14 12:07:24 +01:00
Ludovic Courtès
ca71942445
Switch to Guile-Gcrypt.
This removes (guix hash) and (guix pk-crypto), which now live as part of
Guile-Gcrypt (version 0.1.0.)

* guix/gcrypt.scm, guix/hash.scm, guix/pk-crypto.scm,
tests/hash.scm, tests/pk-crypto.scm: Remove.
* configure.ac: Test for Guile-Gcrypt.  Remove LIBGCRYPT and
LIBGCRYPT_LIBDIR assignments.
* m4/guix.m4 (GUIX_ASSERT_LIBGCRYPT_USABLE): Remove.
* README: Add Guile-Gcrypt to the dependencies; move libgcrypt as
"required unless --disable-daemon".
* doc/guix.texi (Requirements): Likewise.
* gnu/packages/bash.scm, guix/derivations.scm, guix/docker.scm,
guix/git.scm, guix/http-client.scm, guix/import/cpan.scm,
guix/import/cran.scm, guix/import/crate.scm, guix/import/elpa.scm,
guix/import/gnu.scm, guix/import/hackage.scm,
guix/import/texlive.scm, guix/import/utils.scm, guix/nar.scm,
guix/pki.scm, guix/scripts/archive.scm,
guix/scripts/authenticate.scm, guix/scripts/download.scm,
guix/scripts/hash.scm, guix/scripts/pack.scm,
guix/scripts/publish.scm, guix/scripts/refresh.scm,
guix/scripts/substitute.scm, guix/store.scm,
guix/store/deduplication.scm, guix/tests.scm, tests/base32.scm,
tests/builders.scm, tests/challenge.scm, tests/cpan.scm,
tests/crate.scm, tests/derivations.scm, tests/gem.scm,
tests/nar.scm, tests/opam.scm, tests/pki.scm,
tests/publish.scm, tests/pypi.scm, tests/store-deduplication.scm,
tests/store.scm, tests/substitute.scm: Adjust imports.
* gnu/system/vm.scm: Likewise.
(guile-sqlite3&co): Rename to...
(gcrypt-sqlite3&co): ... this.  Add GUILE-GCRYPT.
(expression->derivation-in-linux-vm)[config]: Remove.
(iso9660-image)[config]: Remove.
(qemu-image)[config]: Remove.
(system-docker-image)[config]: Remove.
* guix/scripts/pack.scm: Adjust imports.
(guile-sqlite3&co): Rename to...
(gcrypt-sqlite3&co): ... this.  Add GUILE-GCRYPT.
(self-contained-tarball)[build]: Call 'make-config.scm' without
 #:libgcrypt argument.
(squashfs-image)[libgcrypt]: Remove.
[build]: Call 'make-config.scm' without #:libgcrypt.
(docker-image)[config, json]: Remove.
[build]: Add GUILE-GCRYPT to the extensions  Remove (guix config) from
the imported modules.
* guix/self.scm (specification->package): Remove "libgcrypt", add
"guile-gcrypt".
(compiled-guix): Remove #:libgcrypt.
[guile-gcrypt]: New variable.
[dependencies]: Add it.
[*core-modules*]: Remove #:libgcrypt from 'make-config.scm' call.
Add #:extensions.
[*config*]: Remove #:libgcrypt from 'make-config.scm' call.
(%dependency-variables): Remove %libgcrypt.
(make-config.scm): Remove #:libgcrypt.
* build-aux/build-self.scm (guile-gcrypt): New variable.
(make-config.scm): Remove #:libgcrypt.
(build-program)[fake-gcrypt-hash]: New variable.
Add (gcrypt hash) to the imported modules.  Adjust load path
assignments.
* gnu/packages/package-management.scm (guix)[propagated-inputs]: Add
GUILE-GCRYPT.
[arguments]: In 'wrap-program' phase, add GUILE-GCRYPT to the search
path.
2018-09-04 17:25:11 +02:00
Ludovic Courtès
3dbf331942
deduplication: Place link files under /gnu/store/.links.
Previously they'd always be placed next to TO-REPLACE, which would lead
to EPERM in some cases.

* guix/store/deduplication.scm (replace-with-link): Add #:swap-directory
parameter and honor it.  Add call to 'make-file-writable'.  Catch
'system-error' around 'rename-file'.
(deduplicate): Pass #:swap-directory and remove uses of
'false-if-system-error'.
* tests/store-deduplication.scm ("deduplicate"): Add 'chmod' call.
2018-07-03 00:39:11 +02:00
Ludovic Courtès
0d0438ed8c
deduplicate: Fix a couple of thinkos.
* guix/store/deduplication.scm (get-temp-link): Turn 'args' in the 'catch'
handler into a rest argument.
(deduplicate): Use 'lstat' instead of 'file-is-directory?' to properly
handle symlinks.  When iterating over the result of 'scandir', exclude
the ".links" sub-directory.
* tests/store-deduplication.scm ("deduplicate"): Create sub-directories
and call 'deduplicate' directly on STORE.
2018-06-14 11:16:59 +02:00
Caleb Ristvedt
bf5bf5778c
Add (guix store deduplication).
* guix/store/database.scm (register-path): Add #:deduplicate? and call
'deduplicate' when it's true.
(counting-wrapper-port, nar-sha256): Move to...
* guix/store/deduplication.scm: ... here.  New file.
* tests/store-deduplication.scm: New file.
* Makefile.am (STORE_MODULES): Add deduplication.scm.
(SCM_TESTS) [HAVE_GUILE_SQLITE3]: Add store-deduplication.scm.

Co-authored-by: Ludovic Courtès <ludo@gnu.org>
2018-06-01 15:35:54 +02:00