Files smaller than 8 KiB typically represent ~70% of the entries in
/gnu/store/.links but only contribute to ~4% of the space savings
afforded by deduplication.
Not considering these files for deduplication speeds up file insertion
in the store and, more importantly, leaves 'removeUnusedLinks' with
fewer entries to traverse, thereby speeding it up proportionally.
Partly fixes <https://issues.guix.gnu.org/24937>.
* config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK
definition.
* guix/store/deduplication.scm (%deduplication-minimum-size): New
variable.
(deduplicate)[loop]: Do not recurse when FILE's size is below
%DEDUPLICATION-MINIMUM-SIZE.
(dump-port): New procedure.
(dump-file/deduplicate)[hash]: Turn into...
[dump-and-compute-hash]: ... this thunk.
Call 'deduplicate' only when SIZE is greater than
%DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'.
* nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where
st.st_size < deduplicationMinSize.
* nix/libstore/local-store.hh (deduplicationMinSize): New declaration.
* nix/libstore/optimise-store.cc (deduplicationMinSize): New variable.
(LocalStore::optimisePath_): Return when PATH is a symlink or smaller
than 'deduplicationMinSize'.
* tests/derivations.scm ("identical files are deduplicated"): Produce
files bigger than %DEDUPLICATION-MINIMUM-SIZE.
* tests/nar.scm ("restore-file-set with directories (signed, valid)"):
Likewise.
* tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"):
New test.
("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than
%DEDUPLICATION-MINIMUM-SIZE.
* tests/store.scm ("substitute, deduplication"): Likewise.
* guix/store/database.scm (assert-integer): New procedure.
(update-or-insert): Use it to validate NAR-SIZE and TIME.
* tests/store-database.scm ("sqlite-register with incorrect size"): New
test.
This avoids repeated (mkdir-p "/gnu/store/.links") calls when
deduplicating lots of files.
* guix/store/deduplication.scm (deduplicate): Remove initial call to
'mkdir-p'. Add ENOENT case in 'link' exception handler. Reindent.
* tests/store-deduplication.scm ("deduplicate, ENOSPC"): Check
for (<= links 4) to account for the initial 'link' call.
* guix/store/database.scm (timestamp): New procedure.
(sqlite-register): Use it as the default for #:time.
(register-items): Likewise for #:registeration-time.
Fixes <https://bugs.gnu.org/44760>.
Previously, the 'register-path' call would re-traverse ITEM to compute
its nar hash, even though that hash is already known in the initial
store. This patch also avoids repeated opening/closing of the
database.
* guix/store/database.scm (call-with-database): Export.
* guix/scripts/system.scm (copy-item): Add 'db' parameter. Call
'sqlite-register' instead of 'register-path'.
(copy-closure): Remove redundant call to 'references*'. Call
'call-with-database' and pass the database to 'copy-item'.
It is now up to the caller to deduplicate store contents.
* guix/store/database.scm (register-items): Remove #:deduplicate?
parameter and call to 'deduplicate'.
(register-path): Call 'deduplicate' when #:deduplicate? is true.
* gnu/build/image.scm (register-closure): Adjust call accordingly.
* gnu/build/vm.scm (register-closure): Likewise.
* guix/nar.scm (finalize-store-file): Likewise.
* guix/scripts/pack.scm (store-database): Likewise.
Until now deduplication was performed as an additional pass after
copying files, which involve re-traversing all the files that had just
been copied.
* guix/store/deduplication.scm (copy-file/deduplicate): New procedure.
* tests/store-deduplication.scm ("copy-file/deduplicate"): New test.
* guix/build/store-copy.scm (populate-store): Add #:deduplicate?
parameter and honor it.
* tests/gexp.scm ("gexp->derivation, store copy"): Pass #:deduplicate? #f
to 'populate-store'.
* gnu/build/image.scm (initialize-root-partition): Pass #:deduplicate?
to 'populate-store'. Pass #:deduplicate? #f to 'register-closure'.
* gnu/build/vm.scm (root-partition-initializer): Likewise.
* gnu/build/install.scm (populate-single-profile-directory): Pass
#:deduplicate? #f to 'populate-store'.
* gnu/build/linux-initrd.scm (build-initrd): Likewise.
* guix/scripts/pack.scm (self-contained-tarball)[import-module?]: New
procedure.
[build]: Pass it as an argument to 'source-module-closure'.
* guix/scripts/pack.scm (squashfs-image)[build]: Wrap in
'with-extensions'.
* gnu/system/linux-initrd.scm (expression->initrd)[import-module?]: New
procedure.
[builder]: Pass it to 'source-module-closure'.
* gnu/system/install.scm (cow-store-service-type)[import-module?]: New
procedure. Pass it to 'source-module-closure'.
The assumption now is that the caller took care of resetting timestamps
and permissions.
* guix/store/database.scm (register-items): Remove #:reset-timestamps?
parameter and the call to 'reset-timestamps'.
(register-path): Adjust accordingly and add call to 'reset-timestamps'.
* gnu/build/image.scm (register-closure): Remove #:reset-timestamps?
parameter to 'register-items'.
* gnu/build/vm.scm (register-closure): Likewise.
* guix/nar.scm (finalize-store-file): Adjust accordingly.
* guix/scripts/pack.scm (store-database)[build]: Likewise.
This avoids having to traverse and re-read the files that we have just
restored, thereby reducing I/O.
* guix/serialization.scm (dump-file): New procedure.
(restore-file): Add #:dump-file parameter and honor it.
* guix/store/deduplication.scm (tee, dump-file/deduplicate): New
procedures.
* guix/nar.scm (restore-one-item): Pass #:dump-file to 'restore-file'.
(finalize-store-file): Pass #:deduplicate? #f to 'register-items'.
* tests/nar.scm <top level>: Call 'setenv' to set "NIX_STORE".
It was made transactional in a4678c6ba1, with
the reasoning to prevent broken intermediate states from being visible. I
think this means something like an entry being in ValidPaths, but the Refs not
being inserted.
Using a transaction for this makes sense, but I think using one single
transaction for the whole register-items call is unnecessary to avoid broken
states from being visible, and could block other writes to the store database
while register-items is running. Because the deduplication and resetting
timestamps happens within the transaction as well, even though these things
don't involve the database, writes to the database will still be blocked while
this is happening.
To reduce the potential for register-items to block other writers to the
database for extended periods, this commit moves the transaction to just wrap
the call to sqlite-register. This is the one place where writes occur, so that
should prevent the broken intermediate states issue above. The one difference
this will make is some of the registered items will be visible to other
connections while others may be still being added. I think this is OK, as it's
equivalent to just registering different items.
* guix/store/database.scm (register-items): Reduce transaction scope.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
It's necessary that store items be locked and protected from garbage
collection while they are being registered. This documents that.
* guix/store/database.scm (register-path, register-items): document GC
protection and locking requirements.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
This causes with-writable-file to take into consideration the actual store
being used, as passed to 'deduplicate', rather than
whatever (%store-directory) may return.
* guix/store/deduplication.scm (replace-with-link): new keyword argument
'store'. Pass to with-writable-file.
(with-writable-file, call-with-writable-file): new store argument.
(deduplicate): pass store to replace-with-link.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
scandir* uses readdir, which means that the file type property can be 'unknown
if the underlying file-system does not support d_type. Make sure to fallback
to lstat in that case.
Fixes: https://issues.guix.gnu.org/issue/42579.
* guix/store/deduplication.scm (deduplicate): Handle the case where properties
is 'unknown because the underlying file-system does not support d_type.
This fixes <https://bugs.gnu.org/42151>.
* gnu/system/images/hurd.scm (hurd-initialize-root-partition): Use #:wal-mode #f
in call to ...
* gnu/build/image.scm (initialize-root-partition): ... this, add #:wal-mode?
parameter, pass it to ...
(register-closure): ... this, add #:wal-mode? parameter, pass it to ...
* guix/store/database.scm (with-database): ... this, add #:wal-mode?
parameter, pass it to ...
(call-with-database): ... this, add #:wal-mode? parameter; when
set to #f, do not set journal_model=WAL.
Suggested by Caleb Ristvedt <caleb.ristvedt@cune.org>.
* guix/store/deduplication.scm (call-with-writable-file): Call THUNK
directly when FILE is (%store-directory).
Suggested by Caleb Ristvedt <caleb.ristvedt@cune.org>.
* guix/store/deduplication.scm (call-with-writable-file): New procedure.
(with-writable-file): New macro.
(replace-with-link): Use it.
Until now, we'd call (nar-sha256 file) unconditionally. Thus, if FILE
was a directory, we would traverse it for no reason, and then call
'deduplicate' on FILE, which would again traverse it.
This change also removes redundant (mkdir-p store) calls from the loop,
and avoids 'lstat' calls by using 'scandir*'.
* guix/store/deduplication.scm (deduplicate): Add named loop. Move
'mkdir-p' outside the loop. Use 'scandir*' instead of 'scandir'. Do
not call 'nar-sha256' when FILE has type 'directory.
Previously call-with-transaction would both retry when SQLITE_BUSY errors were
thrown and do what its name suggested (start and rollback/commit a
transaction). This changes it to do only what its name implies, which
simplifies its implementation. Retrying is provided by the new
call-with-SQLITE_BUSY-retrying procedure.
* guix/store/database.scm (call-with-transaction): no longer restarts, new
#:restartable? argument controls whether "begin" or "begin immediate" is
used.
(call-with-SQLITE_BUSY-retrying, call-with-retrying-transaction,
call-with-retrying-savepoint): new procedures.
(register-items): use call-with-retrying-transaction to preserve old
behavior.
* .dir-locals.el (call-with-retrying-transaction,
call-with-retrying-savepoint): add indentation information.
update-or-insert can break if an insert occurs between when it decides whether
to update or insert and when it actually performs that operation. Putting the
check and the update/insert operation in the same transaction ensures that the
update/insert will only succeed if no other write has occurred in the middle.
* guix/store/database.scm (call-with-savepoint): new procedure.
(update-or-insert): use call-with-savepoint to ensure the read and the
insert/update occur within the same transaction.
Most of our queries would fail to finalize their statements properly if sqlite
returned an error during their execution. This resolves that, and also makes
them somewhat more concise as a side-effect.
This also makes some small changes to improve certain queries where behavior
was strange or overly verbose.
* guix/store/database.scm (call-with-statement): new procedure.
(with-statement): new macro.
(last-insert-row-id, path-id, update-or-insert, add-references): rewrite to
use with-statement.
(update-or-insert): factor last-insert-row-id out of the end of both
branches.
(add-references): remove pointless last-insert-row-id call.
* .dir-locals.el (with-statement): add indenting information.
guile-sqlite3 provides statement caching, making it unnecessary for sqlite to
keep re-preparing statements that are frequently used. Unfortunately it
doesn't quite emulate the semantics of sqlite_finalize properly, because it
doesn't cause a commit if the statement being finalized is the last "active"
statement (see https://notabug.org/guile-sqlite3/guile-sqlite3/issues/12). We
work around this by wrapping sqlite-finalize with our own version that ensures
sqlite-reset is called, which does The Right Thing™.
* guix/store/database.scm (sqlite-finalize): new procedure that shadows the
sqlite-finalize from (sqlite3).
Fixes <https://bugs.gnu.org/39725>.
* guix/store/deduplication.scm (deduplicate): Use
'bytevector->nix-base32-string' instead of 'bytevector->base16-string'.
* guix/store/database.scm (SQLITE_BUSY, register-output-sql): New variables.
(add-references): Don't try finalizing after each use, only after all the
uses (otherwise a finalized statement would be used if #:cache? was #f).
(call-with-transaction): New procedure.
(register-items): Use call-with-transaction to prevent broken intermediate
states from being visible.
* .dir-locals.el (call-with-transaction): indent it.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
This should avoid "database is locked" errors when there's a lot of
concurrency, for instance when offloading simultaneously a lot of
builds.
* guix/store/database.scm (call-with-database): Add two 'sqlite-exec'
calls to set 'journal_mode' and 'busy_timeout'.
Reported by Andreas Enge <andreas@enge.fr>
in <https://bugs.gnu.org/33676>.
* guix/store/deduplication.scm (replace-with-link): Catch ENOSPC around
'get-temp-link'. Do nothing when 'get-temp-link' throws ENOSPC. Move
code to restore PARENT's permissions outside of 'catch'.
* tests/store-deduplication.scm ("deduplicate, ENOSPC"): New test.
Fixes <https://bugs.gnu.org/33361>.
* guix/store/deduplication.scm (replace-with-link): Call 'set-file-time'
and 'chmod' after 'rename-file'.
* tests/nar.scm ("restore-file-set with directories (signed, valid)"):
New test.
* guix/store/database.scm (%default-database-file): New variable.
(path-id): Export.
* guix/nar.scm (finalize-store-file): Use 'with-database' instead of
'with-store', and use 'path-id' instead of 'valid-path?'.
Fixes <https://bugs.gnu.org/32600>.
Reported by Leo Famulari.
* guix/store/database.scm (register-items): Check whether TO-REGISTER is
in DB by calling 'path-id', and skip the reset-timestamps,
registration, and deduplication phases when it is.
* guix/store/database.scm (register-items): Add #:log-port. Use
'progress-reporter/bar' to show a progress report.
(register-path): Pass #:log-port to 'register-items'.
Previously, store items registered in the database by this code (for
instance, store items retrieved by 'guix offload' and passed to
'restore-file-set') would have an mtime of 0 instead of 1.
This would cause problems for things like .go files: Guile would
consider them to be older than the corresponding .scm file, and
consequently it would ignore them and possibly use another (incorrect)
.go file.
Reported by Ricardo Wurmus.
* guix/store/database.scm (reset-timestamps): Pass 1, not 0, to
'utime'.
* tests/store-database.scm ("register-path"): Check the mtime of FILE
and REF.
Fixes <https://bugs.gnu.org/32161>.
Reported by Ricardo Wurmus <rekado@elephly.net>.
This mostly reverts 83099892e0.
* guix/store/deduplication.scm (counting-wrapper-port): New procedure.
(nar-sha256): Use it.