When offloading to a single machine, the previous default value would
lead 'guix offload' to wait possibly for several minutes between
subsequent builds until normalized load would finally go below 0.6.
Increasing it mitigates that.
* guix/scripts/offload.scm (<build-machine>)[overload-threshold]: Bump
to 0.8.
* doc/guix.texi (Daemon Offload Setup): Likewise.
Fixes <https://issues.guix.gnu.org/59447>.
Reported by Mathieu Othacehe <othacehe@gnu.org>.
Previously, if a machine had a buggy 'guix repl', 'guix offload' would
crash with a backtrace instead of just ignoring the machine.
* guix/scripts/offload.scm (remote-inferior*): New procedure.
(check-machine-availability)[if-true]: New procedure.
Use 'remote-inferior*' and 'if-true'.
(check-machine-status): Use 'remote-inferior*'.
This halves the number of syscalls made by "guix offload" during startup
and delays loading of Guile-SSH until there are actually machines to
offload to.
* guix/scripts/offload.scm: Remove unused module imports. Autoload many
modules.
(check-ssh-zlib-support): New procedure.
(process-request): Call it when accepting.
(guix-offload): Remove 'zlib-support?' check, now moved to
'check-ssh-zlib-support'.
This significantly reduces the amount of work done by "guix offload"
when there's no machine to offload to.
* guix/scripts/offload.scm (process-request): Add call to
'read-derivation-from-file', moved from...
(guix-offload): ... here.
This simplifies setup of build machines: no need to install Guile in
addition to Guix, no need to set 'GUILE_LOAD_PATH' & co., leading to
fewer failure modes.
* guix/ssh.scm (remote-run): New procedure.
(remote-daemon-channel): Use it instead of 'open-remote-pipe*'.
(store-import-channel)[import]: Remove check for module availability.
Add call to 'primitive-exit'.
Use 'remote-run' instead of 'open-remote-pipe'.
(store-export-channel)[export]: Remove check for module availability.
Add calls to 'primitive-exit'.
Use 'remote-run' instead of 'open-remote-pipe'.
(handle-import/export-channel-error): Remove 'module-error' clause.
(report-module-error): Remove.
* guix/scripts/offload.scm (assert-node-has-guix): Replace call to
'report-module-error' by 'leave'.
* doc/guix.texi (Daemon Offload Setup): Remove mention of Guile.
* guix/scripts/offload.scm (open-ssh-session): Have 'max-silent-time'
default to #f rather than -1, which is not a valid timeout value.
Adjust body accordingly.
Fixes <https://issues.guix.gnu.org/43773>.
The computed normalized load was previously obtained by dividing the load
average as found in /proc/loadavg by the number of parallel builds defined for
a build machine.
This normalized load didn't allow to compare machines with different number of
cores, as the load average reported by /proc/loadavg can be as high as the
number of cores; thus comparing that value to a fixed threshold of 2.0 would
mean machines with multiple cores were more likely to be flagged as overloaded
compared to single core machines.
This can be fixed by normalizing using the available number of cores instead
of the number of parallel jobs.
* guix/scripts/offload.scm (<build-machine>)[overload-threshold]: New field.
(node-load): Modify to return a normalized load value between 0 and 1, taking
into account the number of cores available.
(normalized-load): Remove procedure.
(report-load): New procedure.
(choose-build-machine): Adjust to use the modified 'node-load' and the new
'report-load' and 'build-machine-overload-threshold' procedures.
(check-machine-status): Adjust.
* doc/guix.texi (Daemon Offload Setup): Document the offload scheduler and the
new 'overload-threshold' field.
Fixes <https://bugs.gnu.org/42740>.
* guix/scripts/copy.scm (send-to-remote-host): Keep the result of
'connect-to-remote-daemon' in scope, and explicitly close it after the
call to 'send-files'.
(retrieve-from-remote-host): Explicitly close REMOTE and disconnect
SESSION.
* guix/scripts/offload.scm (transfer-and-offload): Explicitly close
STORE and disconnect SESSION upon completion.
* guix/scripts/offload.scm (<build-machine>)[systems]: New field.
[system]: Accessor changed to %build-machine-system. Default to #f.
* guix/scripts/offload.scm (build-machine-system): Wrap %build-machine-system
with a deprecation warning.
(build-machine-systems): Access the new systems field or fallback to use
build-machine-system, for backward compatibility.
(machine-matches?): Adjust.
* tests/offload.scm: Add tests...
* Makefile.am (SCM_TESTS): ...and register them.
* doc/guix.texi (Daemon Offload Setup): Update doc.
* guix/scripts/offload.scm (host-key->type+key): Remove.
(open-ssh-session): Replace server authentication code with a call to
'authenticate-server*'.
* guix/ssh.scm (host-key->type+key, authenticate-server*): New
procedures.
This is a followup to bc69ea2d60.
* guix/scripts/build.scm (show-build-options-help): Rename
"--no-build-hook" to "--no-offload".
(%standard-build-options): Likewise, and warn when "--no-build-hook" is
passed.
* nix/nix-daemon/guix-daemon.cc (options): Add "--no-offload" and mark
"--no-build-hook" as hidden.
* guix/scripts/offload.scm: Adjust comment.
* doc/guix.texi (Invoking guix-daemon, Common Build Options): Replace
"--no-build-hook" with "--no-offload".
* etc/completion/fish/guix.fish, etc/completion/zsh/_guix: Adjust
accordingly.
This is useful when a single machine appears several time, with
different port numbers.
* guix/scripts/offload.scm (machine-slot-file): Add MACHINE's port to
the file name.
This extra level of locking turned out to be unnecessary.
* guix/scripts/offload.scm (with-machine-lock): Remove.
(machine-lock-file): Remove.
(acquire-build-slot): Remove surrounding 'with-machine-lock'.
This lock was unnecessary and it led to a contention when many 'guix
offload' processes are polling for available machines.
* guix/scripts/offload.scm (machine-choice-lock-file): Remove.
(choose-build-machine): Remove surrounding 'with-file-lock (machine-lock-file)'.
This is a followup to ed7b44370f71126087eb953f36aad8dc4c44109f;
following that commit, 'guix offload test' and 'guix offload status'
would abort with a backtrace instead of clearly diagnosing a missing
'guix' command on the build machine.
* guix/scripts/offload.scm (assert-node-has-guix): Call 'leave' when
NODE is not an inferior. Remove 'catch' blocks for 'node-repl-error'.
(check-machine-availability): Invoke 'assert-node-has-guix' first.
(check-machine-status): Print a warning when 'remote-inferior' returns #f.
Using inferiors and thus 'guix repl' simplifies setup on build
machines (no need to worry about GUILE_LOAD_PATH etc.)
Furthermore, the 'guix repl -t machine' protocol running in a remote
pipe addresses several issues with the current implementation of nodes
and RREPLs in Guile-SSH: fewer round trips, doesn't leave a 'guile
--listen' process behind it, stateless (since a new process is started
each time), more efficient (the SSH channel can be reused), more
reliable (no 'pgrep', 'pkill', and shellology; see
<https://github.com/artyom-poptsov/guile-ssh/issues/11> as an example.)
* guix/ssh.scm (inferior-remote-eval): New procedure.
(send-files): Use it instead of 'make-node' and 'node-eval'.
* guix/scripts/offload.scm (node-guile-version): New procedure.
(node-free-disk-space, transfer-and-offload, node-load)
(choose-build-machine, assert-node-has-guix): Use 'remote-inferior'
instead of 'make-node' and 'inferior-eval' instead of 'node-eval'.
(assert-node-can-import, assert-node-can-export): Likewise, and add
'session' parameter.
(check-machine-availability): Likewise, and add calls to
'close-inferior' and 'disconnect!'.
(check-machine-status): Likewise.
* doc/guix.texi (Daemon Offload Setup): Remove bit related to 'guile' in
$PATH and $GUILE_LOAD_PATH; mention 'guix' alone.
Fixes a regression introduced in
bbe66a530a whereby the actual
load (non-normalized) would be displayed.
* guix/scripts/offload.scm (check-machine-status): Add call to
'normalized-load'.
Previously, if a remote build would fail due to lack of disk space, this
would be considered a permanent failure and thus cached as a build
failure if the local daemon runs with '--cache-failures'.
* guix/scripts/offload.scm (transfer-and-offload): Upon
'nix-protocol-error?' call 'node-free-disk-space' and return 1 instead
of 100 if the result if lower than 10 MiB.
Fixes <https://bugs.gnu.org/33378>.
* guix/scripts/offload.scm (node-free-disk-space): New procedure.
(%minimum-disk-space): New variable.
(choose-build-machine): Call 'node-free-disk-space' and take it into
account in addition to LOAD.
(check-machine-status): Display the free disk space.
* guix/scripts/offload.scm (machine-load): Remove.
(node-load, normalized-load): New procedures.
(choose-build-machine): Call 'open-ssh-session' and 'make-node' from
here; pass the node to 'node-load'.
(check-machine-status): Use 'node-load' instead of 'machine-load'. Call
'disconnect!' on SESSION.
* guix/scripts/offload.scm (call-with-timeout): New procedure.
(with-timeout): New macro.
(process-request): Use it around 'transfer-and-offload' call.
Previously we were looking at the load of the past 5 minutes, which
means that, after a build, we could end up waiting for 5 minutes for
that metric to be low enough.
* guix/scripts/offload.scm (machine-load): Compute RAW based on ONE, not
FIVE.
This fixes a regression in 'retrieve-files*' introduced in
896fec476f, whereby (guix scripts offload)
would not read the initial sexp now sent by the remote host via
'store-export-channel'. This would effectively prevent file retrieval
entirely when offloading.
* guix/ssh.scm (retrieve-files*): New procedure, like former
'retrieve-files' but with an extra #:import parameter.
(retrieve-files): Rewrite in terms of 'retrieve-files*'.
(file-retrieval-port): Make private.
* guix/scripts/offload.scm (transfer-and-offload): Pass #:import to
'retrieve-files*'.
(retrieve-files*): Remove.
* guix/scripts/offload.scm (check-machine-status): New procedure.
(guix-offload): Call it when the argument is "status".
* doc/guix.texi (Daemon Offload Setup): Document it.
* guix/scripts/offload.scm (build-machines): Comment out
'(set! %fresh-auto-compile #t)' since with Guile 2.2.3 it could lead to
an actual rebuild of everything that gets loaded from there on. See
<https://bugs.gnu.org/29226>.
* guix/ui.scm (load*): Likewise.
Previously we would call 'machine-load' once per machine, which was very
costly when there were many machines. Now we arrange to call it only
once on average (when all the machines have the same 'speed' value).
* guix/scripts/offload.scm (random-seed, shuffle): New procedures.
(choose-build-machine)[machines+slots+loads]: Rename to...
[machines+slots]: ... this. Remove load from the tuples therein.
[undecorate]: Adjust accordingly.
[machine-less-loaded-or-faster?]: Remove.
[machine-faster?]: New procedure.
Sort MACHINES+SLOTS according to 'machine-faster?'. Call
'machine-load?' as the last thing.
The '%slots' list could grow indefinitely; in practice though,
guix-daemon is likely to restart 'guix offload' often enough.
* guix/scripts/offload.scm (%slots): Remove.
(choose-build-machine): Don't 'set!' %SLOTS. Return the acquired slot
as a second value.
(process-request): Adjust accordingly. Release the returned slot after
'transfer-and-offload'.
This fixes a memory leak that can be seen by running:
(map (lambda _ (machine-load m)) (iota 1000))
* guix/scripts/offload.scm (machine-load): Add call to 'disconnect!'.
This avoids the open/fstat/close syscalls upon a cache hit that we had
with the previous idiom:
(call-with-input-file file read-derivation)
where caching happened in 'read-derivation' itself.
* guix/derivations.scm (%read-derivation): Rename to...
(read-derivation): ... this.
(read-derivation-from-file): New procedure.
(derivation-prerequisites, substitution-oracle)
(derivation-prerequisites-to-build):
(derivation-path->output-path, derivation-path->output-paths):
(derivation-path->base16-hash, map-derivation): Use
'read-derivation-from-file' instead of (call-with-input-file …
read-derivation).
* guix/grafts.scm (item->deriver): Likewise.
* guix/scripts/build.scm (log-url, options->things-to-build): Likewise.
* guix/scripts/graph.scm (file->derivation): Remove.
(derivation-dependencies, %derivation-node-type): Use
'read-derivation-from-file' instead.
* guix/scripts/offload.scm (guix-offload): Likewise.
* guix/scripts/perform-download.scm (guix-perform-download): Likewise.
* guix/scripts/publish.scm (load-derivation): Remove.
(narinfo-string): Use 'read-derivation-from-file'.
This allows 'guix publish' threads as well as 'guix substitute' and
'guix offload' processes to be properly labeled in 'top', 'pstree', etc.
* guix/workers.scm (worker-thunk): Add #:thread-name parameter and honor it.
(make-pool): Likewise.
* guix/scripts/publish.scm (http-write): Add calls to 'set-thread-name'
in bodies of 'call-with-new-thread'.
(guix-publish): Call 'set-thread-name'. Pass #:thread-name to 'make-pool'.
* guix/scripts/offload.scm (guix-offload): Call 'set-thread-name'.
* guix/scripts/substitute.scm (guix-substitute): Likewise.
* guix/scripts/offload.scm (connect-to-remote-daemon)
(store-import-channel, store-export-channel, send-files)
(retrieve-files): Move to (guix ssh).
(nonce): Add optional 'name' parameter and use it.
(retrieve-files*): New procedure.
(transfer-and-offload): Use it instead of 'retrieve-files', and add
first parameter to 'send-files'.
(assert-node-can-import): Likewise.
(assert-node-can-export): Use 'retrieve-files' instead of
'store-export-channel'.
* guix/ssh.scm: New file.
* configure.ac: Use 'GUIX_CHECK_GUILE_SSH' and define 'HAVE_GUILE_SSH'
Automake conditional.
* Makefile.am (MODULES) [HAVE_GUILE_SSH]: Add guix/ssh.scm.
* guix/scripts/offload.scm (check-machine-availability): Add 'pred'
parameter and honor it.
(guix-offload): for the "test" sub-command, accept an extra 'regexp'
parameter. Pass a second argument to 'check-machine-availability'.