This reduces disk usage of sparse files that are substituted such as
Guile object files (ELF files). As of Guile 3.0.9, .go files are sparse
due to ELF sections being aligned on 64 KiB boundaries.
This reduces disk usage reported by “du -sh” by 9% for the ‘guix’
package, by 23% for ‘guile’, and by 35% for ‘guile-git’.
* guix/store/deduplication.scm (hole-size, find-holes): New procedures.
(tee)[seekable?]: New variable.
[read!]: Add case when SEEKABLE? is true.
* tests/store-deduplication.scm (cartesian-product): New procedure.
("copy-file/deduplicate, sparse files (holes: ~a/~a/~a)"): New test set.
Change-Id: Iad2ab7830dcb1220e2026f4a127a6c718afa8964
* guix/store/database.scm (register-valid-path): Replace "sqlite-register"
with "register-valid-path" as argument to `assert-integer'.
Change-Id: Id93687e90d0a806d715006ca0b2498a1d10cfba6
Signed-off-by: Christopher Baines <mail@cbaines.net>
These names should be more descriptive.
* guix/store/database.scm (path-id): Rename to select-valid-path-id.
(sqlite-register): Rename to register-valid-path.
(register-items): Update accordingly.
Change-Id: I6d4a14d4cde9d71ab34d6ffdbfbfde51b2c0e1db
The update-or-insert procedure name was unhelpfully generic, and these changes
should improve the code readability.
* guix/store/database.scm (update-or-insert): Remove procedure and inline
functionality in to sqlite-register.
Change-Id: Ifab0cdb7972d095460cc1f79b8b2f0e9b958059c
Especially since we're asking for these to be cached.
Management of prepared statements isn't trivial, since you don't want to keep
them forever as this can lead to poor query performance, but I don't think
that finalizing them immediately is the right solution.
Change-Id: I61706b4d09d771835bb8f074b8f6a6ee871f5e2d
* guix/store/database.scm (sqlite-step-and-reset): New procedure.
(last-insert-row, path-id, update-or-insert, add-references): Don't finalize
prepared statements.
Change-Id: I2a2c6deb43935d67df9e43000a5105343d72b3e6
This makes the code easier to read, as you don't have to keep jumping between
the two places.
* guix/store/database.scm (path-id-sql, update-sql, insert-sql,
add-reference-sql): Remove variables.
(path-id, update-or-insert, add-references): Include SQL.
Change-Id: I53b4ab973be8d0cd10a0f35ba25972f1c9680353
I think using dynamic-wind to finalize all statements is the wrong
approach. Firstly it would be good to allow reseting statements rather than
finalizing them. Then for the problem of handling errors, the approach I've
settled on in the build coordinator is to close the database connection, since
that'll trigger guile-sqlite3 to finalize all the cached statements.
This reverts commit 5d6e225528.
* .dir-locals.el (scheme-mode): Remove with-statement.
* guix/store/database.scm (call-with-statement): Remove procedure.
(with-statement): Remove syntax rule.
(call-with-transaction, last-insert-row-id, path-id, update-or-insert,
add-references): Don't use with-statement.
Change-Id: I2fd976b3f12ec8105cc56350933a953cf53647e8
While care does need to be taken with making updates or inserts to the
ValidPaths table, I think that trying to ensure this within update-or-insert
is the wrong approach. Instead, when working with the store database, only one
connection should be used to make changes to the database and those changes
should happen in transactions that ideally begin immediately.
This reverts commit 37545de4a3.
* .dir-locals.el (scheme-mode): Remove entries for call-with-savepoint and
call-with-retrying-savepoint.
* guix/store/database.scm (call-with-savepoint, call-with-retrying-savepoint):
Remove procedures.
(update-or-insert): Remove use of call-with-savepoint.
Change-Id: I2f986e8623d8235a90c40d5f219c1292c1ab157b
This was obtained by setting up this environment:
guix shell -D guix --with-input=guile@3.0.9=guile-next \
--with-commit=guile-next=e2ed33ef0445c867fe56c247054aa67e834861f2
-- make -j5
then adding 'unused-module' to (@@ (guix build compiler) %warnings),
building, and checking all the "unused module" warnings and removing
those that were definitely unused.
* guix/store/deduplication.scm (dump-file/deduplicate): Use 'sendfile'
instead of 'dump-port'.
* tests/store-deduplication.scm ("copy-file/deduplicate, below %deduplication-minimum-size"):
New test.
Files smaller than 8 KiB typically represent ~70% of the entries in
/gnu/store/.links but only contribute to ~4% of the space savings
afforded by deduplication.
Not considering these files for deduplication speeds up file insertion
in the store and, more importantly, leaves 'removeUnusedLinks' with
fewer entries to traverse, thereby speeding it up proportionally.
Partly fixes <https://issues.guix.gnu.org/24937>.
* config-daemon.ac: Remove symlink hard link check and CAN_LINK_SYMLINK
definition.
* guix/store/deduplication.scm (%deduplication-minimum-size): New
variable.
(deduplicate)[loop]: Do not recurse when FILE's size is below
%DEDUPLICATION-MINIMUM-SIZE.
(dump-port): New procedure.
(dump-file/deduplicate)[hash]: Turn into...
[dump-and-compute-hash]: ... this thunk.
Call 'deduplicate' only when SIZE is greater than
%DEDUPLICATION-MINIMUM-SIZE; otherwise call 'dump-port'.
* nix/libstore/gc.cc (LocalStore::removeUnusedLinks): Drop files where
st.st_size < deduplicationMinSize.
* nix/libstore/local-store.hh (deduplicationMinSize): New declaration.
* nix/libstore/optimise-store.cc (deduplicationMinSize): New variable.
(LocalStore::optimisePath_): Return when PATH is a symlink or smaller
than 'deduplicationMinSize'.
* tests/derivations.scm ("identical files are deduplicated"): Produce
files bigger than %DEDUPLICATION-MINIMUM-SIZE.
* tests/nar.scm ("restore-file-set with directories (signed, valid)"):
Likewise.
* tests/store-deduplication.scm ("deduplicate, below %deduplication-minimum-size"):
New test.
("deduplicate", "deduplicate, ENOSPC"): Produce files bigger than
%DEDUPLICATION-MINIMUM-SIZE.
* tests/store.scm ("substitute, deduplication"): Likewise.
* guix/store/database.scm (assert-integer): New procedure.
(update-or-insert): Use it to validate NAR-SIZE and TIME.
* tests/store-database.scm ("sqlite-register with incorrect size"): New
test.
This avoids repeated (mkdir-p "/gnu/store/.links") calls when
deduplicating lots of files.
* guix/store/deduplication.scm (deduplicate): Remove initial call to
'mkdir-p'. Add ENOENT case in 'link' exception handler. Reindent.
* tests/store-deduplication.scm ("deduplicate, ENOSPC"): Check
for (<= links 4) to account for the initial 'link' call.
* guix/store/database.scm (timestamp): New procedure.
(sqlite-register): Use it as the default for #:time.
(register-items): Likewise for #:registeration-time.
Fixes <https://bugs.gnu.org/44760>.
Previously, the 'register-path' call would re-traverse ITEM to compute
its nar hash, even though that hash is already known in the initial
store. This patch also avoids repeated opening/closing of the
database.
* guix/store/database.scm (call-with-database): Export.
* guix/scripts/system.scm (copy-item): Add 'db' parameter. Call
'sqlite-register' instead of 'register-path'.
(copy-closure): Remove redundant call to 'references*'. Call
'call-with-database' and pass the database to 'copy-item'.
It is now up to the caller to deduplicate store contents.
* guix/store/database.scm (register-items): Remove #:deduplicate?
parameter and call to 'deduplicate'.
(register-path): Call 'deduplicate' when #:deduplicate? is true.
* gnu/build/image.scm (register-closure): Adjust call accordingly.
* gnu/build/vm.scm (register-closure): Likewise.
* guix/nar.scm (finalize-store-file): Likewise.
* guix/scripts/pack.scm (store-database): Likewise.
Until now deduplication was performed as an additional pass after
copying files, which involve re-traversing all the files that had just
been copied.
* guix/store/deduplication.scm (copy-file/deduplicate): New procedure.
* tests/store-deduplication.scm ("copy-file/deduplicate"): New test.
* guix/build/store-copy.scm (populate-store): Add #:deduplicate?
parameter and honor it.
* tests/gexp.scm ("gexp->derivation, store copy"): Pass #:deduplicate? #f
to 'populate-store'.
* gnu/build/image.scm (initialize-root-partition): Pass #:deduplicate?
to 'populate-store'. Pass #:deduplicate? #f to 'register-closure'.
* gnu/build/vm.scm (root-partition-initializer): Likewise.
* gnu/build/install.scm (populate-single-profile-directory): Pass
#:deduplicate? #f to 'populate-store'.
* gnu/build/linux-initrd.scm (build-initrd): Likewise.
* guix/scripts/pack.scm (self-contained-tarball)[import-module?]: New
procedure.
[build]: Pass it as an argument to 'source-module-closure'.
* guix/scripts/pack.scm (squashfs-image)[build]: Wrap in
'with-extensions'.
* gnu/system/linux-initrd.scm (expression->initrd)[import-module?]: New
procedure.
[builder]: Pass it to 'source-module-closure'.
* gnu/system/install.scm (cow-store-service-type)[import-module?]: New
procedure. Pass it to 'source-module-closure'.
The assumption now is that the caller took care of resetting timestamps
and permissions.
* guix/store/database.scm (register-items): Remove #:reset-timestamps?
parameter and the call to 'reset-timestamps'.
(register-path): Adjust accordingly and add call to 'reset-timestamps'.
* gnu/build/image.scm (register-closure): Remove #:reset-timestamps?
parameter to 'register-items'.
* gnu/build/vm.scm (register-closure): Likewise.
* guix/nar.scm (finalize-store-file): Adjust accordingly.
* guix/scripts/pack.scm (store-database)[build]: Likewise.
This avoids having to traverse and re-read the files that we have just
restored, thereby reducing I/O.
* guix/serialization.scm (dump-file): New procedure.
(restore-file): Add #:dump-file parameter and honor it.
* guix/store/deduplication.scm (tee, dump-file/deduplicate): New
procedures.
* guix/nar.scm (restore-one-item): Pass #:dump-file to 'restore-file'.
(finalize-store-file): Pass #:deduplicate? #f to 'register-items'.
* tests/nar.scm <top level>: Call 'setenv' to set "NIX_STORE".
It was made transactional in a4678c6ba1, with
the reasoning to prevent broken intermediate states from being visible. I
think this means something like an entry being in ValidPaths, but the Refs not
being inserted.
Using a transaction for this makes sense, but I think using one single
transaction for the whole register-items call is unnecessary to avoid broken
states from being visible, and could block other writes to the store database
while register-items is running. Because the deduplication and resetting
timestamps happens within the transaction as well, even though these things
don't involve the database, writes to the database will still be blocked while
this is happening.
To reduce the potential for register-items to block other writers to the
database for extended periods, this commit moves the transaction to just wrap
the call to sqlite-register. This is the one place where writes occur, so that
should prevent the broken intermediate states issue above. The one difference
this will make is some of the registered items will be visible to other
connections while others may be still being added. I think this is OK, as it's
equivalent to just registering different items.
* guix/store/database.scm (register-items): Reduce transaction scope.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
It's necessary that store items be locked and protected from garbage
collection while they are being registered. This documents that.
* guix/store/database.scm (register-path, register-items): document GC
protection and locking requirements.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
This causes with-writable-file to take into consideration the actual store
being used, as passed to 'deduplicate', rather than
whatever (%store-directory) may return.
* guix/store/deduplication.scm (replace-with-link): new keyword argument
'store'. Pass to with-writable-file.
(with-writable-file, call-with-writable-file): new store argument.
(deduplicate): pass store to replace-with-link.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
scandir* uses readdir, which means that the file type property can be 'unknown
if the underlying file-system does not support d_type. Make sure to fallback
to lstat in that case.
Fixes: https://issues.guix.gnu.org/issue/42579.
* guix/store/deduplication.scm (deduplicate): Handle the case where properties
is 'unknown because the underlying file-system does not support d_type.
This fixes <https://bugs.gnu.org/42151>.
* gnu/system/images/hurd.scm (hurd-initialize-root-partition): Use #:wal-mode #f
in call to ...
* gnu/build/image.scm (initialize-root-partition): ... this, add #:wal-mode?
parameter, pass it to ...
(register-closure): ... this, add #:wal-mode? parameter, pass it to ...
* guix/store/database.scm (with-database): ... this, add #:wal-mode?
parameter, pass it to ...
(call-with-database): ... this, add #:wal-mode? parameter; when
set to #f, do not set journal_model=WAL.
Suggested by Caleb Ristvedt <caleb.ristvedt@cune.org>.
* guix/store/deduplication.scm (call-with-writable-file): Call THUNK
directly when FILE is (%store-directory).
Suggested by Caleb Ristvedt <caleb.ristvedt@cune.org>.
* guix/store/deduplication.scm (call-with-writable-file): New procedure.
(with-writable-file): New macro.
(replace-with-link): Use it.
Until now, we'd call (nar-sha256 file) unconditionally. Thus, if FILE
was a directory, we would traverse it for no reason, and then call
'deduplicate' on FILE, which would again traverse it.
This change also removes redundant (mkdir-p store) calls from the loop,
and avoids 'lstat' calls by using 'scandir*'.
* guix/store/deduplication.scm (deduplicate): Add named loop. Move
'mkdir-p' outside the loop. Use 'scandir*' instead of 'scandir'. Do
not call 'nar-sha256' when FILE has type 'directory.
Previously call-with-transaction would both retry when SQLITE_BUSY errors were
thrown and do what its name suggested (start and rollback/commit a
transaction). This changes it to do only what its name implies, which
simplifies its implementation. Retrying is provided by the new
call-with-SQLITE_BUSY-retrying procedure.
* guix/store/database.scm (call-with-transaction): no longer restarts, new
#:restartable? argument controls whether "begin" or "begin immediate" is
used.
(call-with-SQLITE_BUSY-retrying, call-with-retrying-transaction,
call-with-retrying-savepoint): new procedures.
(register-items): use call-with-retrying-transaction to preserve old
behavior.
* .dir-locals.el (call-with-retrying-transaction,
call-with-retrying-savepoint): add indentation information.
update-or-insert can break if an insert occurs between when it decides whether
to update or insert and when it actually performs that operation. Putting the
check and the update/insert operation in the same transaction ensures that the
update/insert will only succeed if no other write has occurred in the middle.
* guix/store/database.scm (call-with-savepoint): new procedure.
(update-or-insert): use call-with-savepoint to ensure the read and the
insert/update occur within the same transaction.
Most of our queries would fail to finalize their statements properly if sqlite
returned an error during their execution. This resolves that, and also makes
them somewhat more concise as a side-effect.
This also makes some small changes to improve certain queries where behavior
was strange or overly verbose.
* guix/store/database.scm (call-with-statement): new procedure.
(with-statement): new macro.
(last-insert-row-id, path-id, update-or-insert, add-references): rewrite to
use with-statement.
(update-or-insert): factor last-insert-row-id out of the end of both
branches.
(add-references): remove pointless last-insert-row-id call.
* .dir-locals.el (with-statement): add indenting information.
guile-sqlite3 provides statement caching, making it unnecessary for sqlite to
keep re-preparing statements that are frequently used. Unfortunately it
doesn't quite emulate the semantics of sqlite_finalize properly, because it
doesn't cause a commit if the statement being finalized is the last "active"
statement (see https://notabug.org/guile-sqlite3/guile-sqlite3/issues/12). We
work around this by wrapping sqlite-finalize with our own version that ensures
sqlite-reset is called, which does The Right Thing™.
* guix/store/database.scm (sqlite-finalize): new procedure that shadows the
sqlite-finalize from (sqlite3).
Fixes <https://bugs.gnu.org/39725>.
* guix/store/deduplication.scm (deduplicate): Use
'bytevector->nix-base32-string' instead of 'bytevector->base16-string'.