9ffd353171
explain why locks prevent building two pkgpaths.
420 lines
14 KiB
Groff
420 lines
14 KiB
Groff
.\" $OpenBSD: dpb.1,v 1.7 2010/10/27 17:53:24 espie Exp $
|
|
.\"
|
|
.\" Copyright (c) 2010 Marc Espie <espie@openbsd.org>
|
|
.\"
|
|
.\" Permission to use, copy, modify, and distribute this software for any
|
|
.\" purpose with or without fee is hereby granted, provided that the above
|
|
.\" copyright notice and this permission notice appear in all copies.
|
|
.\"
|
|
.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
|
|
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
|
|
.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
|
|
.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
|
|
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
|
|
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
|
|
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
|
.\"
|
|
.Dd $Mdocdate: October 27 2010 $
|
|
.Dt DPB 1
|
|
.Os
|
|
.Sh NAME
|
|
.Nm dpb
|
|
.Nd distributed ports builder
|
|
.Sh SYNOPSIS
|
|
.Nm dpb
|
|
.Bk -words
|
|
.Op Fl aceqrsuUx
|
|
.Op Fl A Ar arch
|
|
.Op Fl b Ar logfile
|
|
.Op Fl h Ar hosts
|
|
.Op Fl j Ar n
|
|
.Op Fl L Ar logdir
|
|
.Op Fl m Ar threshold
|
|
.Op Fl P Ar subdirlist
|
|
.Op Fl S Ar sizefile
|
|
.Op Fl t Ar ctimeout
|
|
.Op Fl T Ar dtimeout
|
|
.Op Ar pkgpath ...
|
|
.Ek
|
|
.Sh DESCRIPTION
|
|
.Nm
|
|
is used to build ports on a cluster of machines.
|
|
Its name is an acronym for
|
|
.Sq distributed ports builder .
|
|
.Nm
|
|
walks ports to figure out dependencies, and starts building ports
|
|
as soon as it can.
|
|
It can take
|
|
.Ar pkgpath ...
|
|
to build as parameters.
|
|
Options are as follows:
|
|
.Bl -tag -width pkgpathlonger
|
|
.It Fl A Ar arch
|
|
Build packages for given architecture, selecting relevant hosts from the
|
|
cluster.
|
|
By default, the current host's architecture will be used.
|
|
.It Fl a
|
|
Walk the whole tree and builds all packages (default if no pkgpath is given).
|
|
.It Fl b Ar logfile
|
|
Prime the heuristics module with a previous build log, so that packages that
|
|
take a long time to build will happen earlier.
|
|
.It Fl c
|
|
Clean port working directory and log before each build.
|
|
.It Fl e
|
|
The listing job is extra and won't be given back to the pool when it's
|
|
finished.
|
|
.It Fl h Ar hosts
|
|
File with hosts to use for building.
|
|
One host per line, plus properties, such as:
|
|
.Bd -literal
|
|
espie@aeryn jobs=4 arch=i386
|
|
.Ed
|
|
Properties are as follows:
|
|
.Bl -tag -width memory=150
|
|
.It arch=value
|
|
Architecture of the concerned host.
|
|
(there should be a startup task to check consistency, but
|
|
currently this has to be set manually on heterogeneous networks.)
|
|
.It jobs=n
|
|
Number of jobs to run on that host, defaults to hw.ncpu.
|
|
.It memory=thr
|
|
Builds everything below that wrkdir threshold in /tmp, assuming
|
|
it is a memory filesystem.
|
|
Avoid for now, as mfs has serious race conditions which yield
|
|
random errors under stress conditions such as bulk build.
|
|
.It sf=n
|
|
Speed factor.
|
|
An estimate of that machine's speed with that number of jobs
|
|
compared to other machines in the same network.
|
|
Works better with small values, in the range of 1..50.
|
|
The machine (or machines) with the highest speed factor will
|
|
get access to all jobs, whereas other machines will be clamped
|
|
to stuff which does not take too long.
|
|
Requires previous build information to be effective.
|
|
.It timeout=s
|
|
Defines a specific connection timeout for ssh to that host.
|
|
.El
|
|
The
|
|
.Ar hosts
|
|
file can also define a start-up script, as
|
|
.Bd -literal
|
|
STARTUP=path
|
|
.Ed
|
|
which will be run at start-up on each machine.
|
|
.It Fl j Ar n
|
|
Number of concurrent local jobs to run (defaults to hw.ncpu if no hosts file).
|
|
.It Fl L Ar logdir
|
|
Choose a log directory.
|
|
.Po
|
|
Defaults to
|
|
.Pa ${PORTSDIR}/logs/${ARCH}
|
|
.Pc .
|
|
.It Fl m Ar threshold
|
|
Build ports besides the memory threshold within
|
|
.Pa /tmp .
|
|
Avoid for now, as mfs has serious race conditions which yield
|
|
random errors under stress conditions such as bulk build.
|
|
.It Fl P Ar subdirlist
|
|
Read list of pkgpaths from file
|
|
.It Fl q
|
|
Don't quit while errors/locks are around.
|
|
.It Fl r
|
|
Random build order.
|
|
Disregard any kind of smart heuristics.
|
|
Useful to try to find missing build dependencies.
|
|
.It Fl s
|
|
Compute workdir sizes before cleaning up, and stash them in log file
|
|
.Pa ${LOGDIR}/size.log .
|
|
.It Fl S Ar sizefile
|
|
Read a size log file and use it for choosing to put WRKDIR in memory.
|
|
.It Fl t Ar ctimeout
|
|
Connection timeout for ssh.
|
|
Defaults to 60 seconds.
|
|
.It Fl T Ar dtimeout
|
|
Display timeout (in seconds) while waiting for jobs to finish, so that the
|
|
display is updated even if jobs didn't finish.
|
|
Defaults to 10 seconds.
|
|
.It Fl u
|
|
Update existing packages during dependency solving.
|
|
Can be used to run a bulk-build on a machine with installed packages,
|
|
but might break a bit, since some packages only build on a clean machine
|
|
right now.
|
|
.It Fl U
|
|
Insist on updating existing packages during dependency solving,
|
|
even if the new package apparently didn't change.
|
|
.It Fl x
|
|
No tty report, only report really important things, like hosts going down
|
|
and coming back up, build errors, or builds not progressing.
|
|
.El
|
|
.Pp
|
|
.Nm
|
|
figures out in which order to build things on the fly, and constantly
|
|
displays information relative to what's currently building.
|
|
There's a list currently running, one line per task, with the task name,
|
|
local pid, the build host name, and advancement based on the log file size.
|
|
This is followed by a three-line display:
|
|
.Bl -tag -width BB=
|
|
.It I=
|
|
number of built packages that can be installed.
|
|
.It B=
|
|
number of built packages, not yet known to be installable,
|
|
because of possibly run depends that still need to be built.
|
|
.It Q=
|
|
number of packages in the queue, e.g., stuff that can be built now, assuming
|
|
we have a free slot.
|
|
.It T=
|
|
number of packages to build, where dependencies are not yet resolved.
|
|
.It !=
|
|
number of ignored packages.
|
|
.It L=
|
|
list of packages that cannot currently be built because of locks.
|
|
.It E=
|
|
list of packages in error, that cannot currently be built.
|
|
.El
|
|
.Pp
|
|
Note that those numbers refer to pkgpaths known to
|
|
.Nm .
|
|
In general, those numbers will be slightly higher than the actual number
|
|
of packages being built, since several paths may lead to the same package.
|
|
.Pp
|
|
.Nm
|
|
uses some heuristics to try to maximise Q as soon as possible.
|
|
There's also a provision for a feedback-directed build, where timings from
|
|
a previous build can be used to try to build long-running jobs first.
|
|
.Sh LOCKS AND ERRORS
|
|
When building a package,
|
|
.Nm
|
|
produces a lockfile in the lock directory, whose name is deduced from
|
|
the basic pkgpath with slashes replaced by dots, and a possible second lock
|
|
with the fullpkgpath.
|
|
This lockfile is filled with such info as the build start time or the host.
|
|
.Pp
|
|
At the end of a succesful build, these lockfiles are removed.
|
|
The fullpkgpath lock will stay around in case of errors.
|
|
.Pp
|
|
In this case, it contains the status of the last task that was run
|
|
.Po
|
|
raw
|
|
value from
|
|
.Xr wait 2
|
|
.Pc ,
|
|
and the name of the next task in the build pipeline (with todo=<nothing>
|
|
in case of failure during clean-up).
|
|
Normal list of tasks is:
|
|
.Ar depends prepare fetch patch configure build fake package clean .
|
|
.Pp
|
|
At the end of each job,
|
|
.Nm
|
|
rechecks the lock directory for existing lockfiles.
|
|
If some locks have vanished,
|
|
it will put the corresponding paths back in the queue and attempt
|
|
another build.
|
|
.Pp
|
|
This eases manual repairs: if a package does not build, the user can look
|
|
at the log, go to the port directory, fix the problem, and then remove the lock.
|
|
.Nm
|
|
will pick up the ball and keep building without interruption.
|
|
.Pp
|
|
One can also run several
|
|
.Nm
|
|
in parallel.
|
|
This is not optimal, since each
|
|
.Nm
|
|
ignores the others, and only uses the lock info to avoid the other's
|
|
current work, but it can be handy: in an emergency, one can start a second
|
|
.Nm
|
|
to obtain a specific package right now, in parallel with the original
|
|
.Nm .
|
|
.Pp
|
|
Note that
|
|
.Nm
|
|
is very careful not to run two builds from the same pkgpath at the
|
|
same time, even on different machines:
|
|
in some cases, MULTI_PACKAGES and FLAVOR combinations may lead to the
|
|
same package being built simultaneously, and since the package repository
|
|
is shared, this can easily lead to trouble.
|
|
.Pp
|
|
.Sh SHUTTING DOWN GRACEFULLY
|
|
.Nm
|
|
periodically checks for a file named
|
|
.Pa stop
|
|
in its log directory
|
|
If this file exists, then it won't start new jobs, and shutdown when
|
|
the current jobs are finished.
|
|
.Sh FILES
|
|
Apart from producing packages,
|
|
.Nm
|
|
will create a number of log files under
|
|
.Pa ${PORTSDIR}/logs/{$ARCH} :
|
|
.Bl -tag -width engine.log
|
|
.It Pa build.log
|
|
Actual build log.
|
|
Each line summarizes build of a single pkgpath, as:
|
|
.Sq pkgpath host time logsize (detailed timing)[!]
|
|
where time is the actual build time in seconds, host is the machine name
|
|
where this occurred, logsize is the corresponding log file size,
|
|
and a ! is appended in case the build didn't succeed.
|
|
.Pp
|
|
The detailed timing info gives a run-down of the build, with clean, fetch,
|
|
prepare, patch (actually extract+patch), configure, build, fake, package, clean
|
|
detailed timing info.
|
|
Note that the actual build time starts at
|
|
.Sq extract
|
|
and finishes at
|
|
.Sq package .
|
|
.It Pa clean.log
|
|
Paths that do not clean correctly, and required sudo to clean the directory.
|
|
.It Pa size.log
|
|
Size of work directory at the end of each build
|
|
.It Pa engine.log
|
|
Build engine log.
|
|
Each line corresponds to a state change for a pkgpath and starts with the pid
|
|
of
|
|
.Nm ,
|
|
plus a timestamp of the log entry.
|
|
.Bl -tag -width BB:
|
|
.It ^
|
|
pkgpath temporarily put aside, because a job is running in the same directory.
|
|
.It B
|
|
pkgpath built.
|
|
.It I
|
|
pkgpath can be installed.
|
|
.It J
|
|
job to build pkgpath started.
|
|
Also records the host used for the build.
|
|
.It L
|
|
job did not start, existing lock detected.
|
|
.It N
|
|
job did not finish.
|
|
The host may have gone down.
|
|
.It P
|
|
built package is no longer required for anything.
|
|
.It Q
|
|
pkgpath queued as buildable whenever a slot is free.
|
|
.It T
|
|
pkgpath to build.
|
|
.It V
|
|
pkgpath put back in the buildable queue, after job that was running in
|
|
the same directory returned.
|
|
.El
|
|
.It Pa locks/
|
|
Directory where locks are created.
|
|
The slash in a pkgpath is replaced with a dot like so:
|
|
.Pa locks/devel.make
|
|
to flatten the structure.
|
|
.It Pa packages/pkgname.log
|
|
one file or symlink per pkgname.
|
|
.It Pa paths/some/path.log
|
|
one file or symlink per pkgpath.
|
|
.It Pa signature.log
|
|
Discrepancies between hosts that prevent them from starting up.
|
|
.It Pa stats.log
|
|
Simple log of the B=... line summaries.
|
|
Mostly useful for making plots and tweaking performance.
|
|
.It Pa vars.log
|
|
Logs the directories that were walked in the ports tree for dependency
|
|
information.
|
|
.El
|
|
.Sh BUGS AND LIMITATIONS
|
|
.Nm
|
|
performs best with lots of paths to build.
|
|
When just used to build a few ports, there's a high risk of starvation
|
|
as there are bottlenecks in parts of the tree.
|
|
.Pp
|
|
.Nm
|
|
considers all pkgpaths it explores as valid candidates for packages.
|
|
This is not the case for some pkgpath:patch depends.
|
|
It should not try to reach them.
|
|
.Pp
|
|
.Nm
|
|
does not properly distinguish between default flavors and empty flavors.
|
|
This leads to a few errors in some multi-packages that have pseudo-flavors
|
|
that prevent their build.
|
|
.Pp
|
|
.Nm
|
|
Hot fixes to a port that change the pkgname or other properties won't be
|
|
used by
|
|
.Nm
|
|
after removing the lock.
|
|
It should rescan the directory for new properties and will eventually.
|
|
.Pp
|
|
On heterogeneous networks, calibration of build info and choice of speed
|
|
factors is not perfect, and somewhat a dark art.
|
|
Using distinct speed factors on a build log that comes from a single
|
|
machine works fine, but using the build info coming from several machines
|
|
does not work all that well.
|
|
.Pp
|
|
.Nm
|
|
should check
|
|
.Pa /usr/include
|
|
and
|
|
.Pa /usr/X11R6/include
|
|
for consistency, but it doesn't.
|
|
.Pp
|
|
When an host fails consistency check, there is no way to re-add it after
|
|
fixing the problem.
|
|
You have to stop
|
|
.Nm ,
|
|
cleanup and restart.
|
|
.Pp
|
|
There's a bug in mfs that prevents it from proper use in bulk builds.
|
|
.Pp
|
|
The default limits in
|
|
.Pa login.conf
|
|
are too small for bulk builds on any kind of parallel machines.
|
|
Bump number of processes.
|
|
.Pp
|
|
Even though
|
|
.Nm
|
|
tries really hard to check heterogeneous networks for sanity (checking
|
|
shared libraries and .la files), it is still dependent on the user to
|
|
make sure all the hosts build ports the same way.
|
|
.Pp
|
|
Make sure your NFS setup is consistent (the ports dir itself should be
|
|
exported, including distfiles and packages repository, but the WRKOBJDIR
|
|
should not be in most cases). Pay particular attention to discrepancies
|
|
in
|
|
.Pa /etc/mk.conf .
|
|
.Pp
|
|
Also,
|
|
.Nm
|
|
connects to external hosts through
|
|
.Xr ssh 1 ,
|
|
relying on
|
|
.Xr ssh_config 5
|
|
for any special cases.
|
|
.Sh AUTHOR
|
|
Marc Espie
|
|
.Sh HISTORY
|
|
The original
|
|
.Nm dpb
|
|
command was written by Nikolay Sturm.
|
|
This version is a complete rewrite from scratch using all the stuff
|
|
we learnt over the years to make it better.
|
|
.Pp
|
|
There are still a number of changes to make, and some possible avenues
|
|
to explore.
|
|
.Pp
|
|
Better build feedback for next builds would be nice: we need a way to
|
|
calibrate build logs that contain info for several machines (so that we
|
|
can gauge whether a machine is fast or slow).
|
|
It might make sense to have some kind of machine affinity for big packages
|
|
in a cluster, so that we avoid reinstalling big things on each machine if
|
|
we can get away with installing stuff on a single machine.
|
|
We should probably keep the pkgnames around with the pkgpath in the build-log,
|
|
so that we give more credibility to build times that correspond to the
|
|
exact same pkgnames.
|
|
.Pp
|
|
We should integrate mirroring functionalities.
|
|
This mostly involves having
|
|
.Sq special
|
|
jobs with no cpu requirements that can run locally,
|
|
and to have a step prior to
|
|
.Sq tobuild ,
|
|
where fetch would occur.
|
|
The same logic that was used for pkgpaths should be used to handle distfiles,
|
|
and we should probably add some kind of lock based on the ftp site being
|
|
used to grab distfiles.
|
|
(This is low priority, as most build machines currently being used already
|
|
have the distfiles).
|