Commit Graph

556 Commits

Author SHA1 Message Date
Kris Kennaway
88e9a32308 * Python daemon run as root that proxies privileged build commands for
the ports-* users.  Currently it is not possible to delegate
  management of ZFS filesystems to non-root users, so root privilege
  is required to manipulate them.  We validate the command passed on
  a local domain socket and re-execute the build script with the requested
  parameters.
2008-07-26 15:24:13 +00:00
Kris Kennaway
d1aea0930d Script run from cron to regularly update the master ZFS copies of the
ports and source trees.  Since we have >=1 consumer of these trees
that run frequently but do not insist on up-to-the-second trees, it
makes sense to "pre-update" them regularly and then then re-use in all
of the consumers, instead of potentially doing several updates
simultaneously or on demand.  Consumers can clone the ZFS snapshot
into their local filesystem which takes a couple of seconds instead of
minutes or tens of minutes for the CVS update.

We update to a date stamp instead of "." because this avoids
ambiguity of commits that happen while the tree update is in progress
(unfortunately it's slower).
2008-07-26 15:16:16 +00:00
Kris Kennaway
9ed197c29c Script run from cron on the package clients to report metrics to ganglia.
Currently we collect:

* The current and maximum number of vnodes in use

* The number of packages built over the past hour
2008-07-26 15:09:00 +00:00
Kris Kennaway
4663e0b500 Simple script to expire ZFS snapshots older than a certain age 2008-07-26 15:06:41 +00:00
Kris Kennaway
17885ef52d Python script for backing up ZFS filesystems on pointyhat. For each
listed filesystem we take a new snapshot each time it is run and if
the last full backup was not too long ago, do a compressed incremental
backup from the previous backup.
2008-07-26 15:05:58 +00:00
Kris Kennaway
00cada47c5 * Add comment that this is unused 2008-07-26 15:01:50 +00:00
Kris Kennaway
b472fe55ac * Add comments 2008-07-26 15:01:30 +00:00
Kris Kennaway
e683ebb83c * Cleanup
* Catch up to build ID directory changes

* Support a meta-hostname of 'all' for setting up all clients at once.
  This is better than the old way of running one copy of the script
  for each client by hand, since it is easier and involves less
  duplicated work.

* We copy in the per-build ports, src, and bindist .tbz files and .md5
  checksums, as well as refreshing the build scripts and
  bindist-$(hostname).tar customization tarball.

* The -force switch forces copying of files and re-extraction of the
  tarballs on the client.  This is necessary in order to propagate
  local changes to the tarballs after the initial client setup
  (e.g. if you need to change a file in the ports tree, it must be
  recompressed, redistributed, and re-extracted on the client).

* The -queue switch will poll the client's job queue after completion
  of the setup.  This is racy and should only be used when the machine
  is not currently accepting jobs.

* For cleaning up a build the 'build cleanup' command should now be
  used instead.  It calls back into this command but also allows full
  clenaup of build-local files on the client.

TODO: "all" setups are hard on the server since they may spawn dozens
of rsyncs at once.  A better solution would be to have a worker pool
of setup tasks to limit the maximum load.
2008-07-26 15:00:37 +00:00
Kris Kennaway
89c8fd897f * Cleanup
* Catch up to build ID directory changes
* Make it easier to kill a build by not running dopackages in the background
  where it is detached from shell job control.  Now, sending a termination
  signal to this process (e.g. ^C) will also kill off the dopackages script
  and in turn the processes created by it.  Some background processes
  spawned by dopackages, pdispatch, etc, may still remain and need to be
  killed by hand.
2008-07-26 14:52:05 +00:00
Kris Kennaway
4a7f6d83cb * Cleanup
* Catch up to build ID directory changes

* Improve usage()

* Fix a variety of small bugs

* Remove support for -ftp builds: we have not supported direct
  uploading for many years due to the desire to manually inspect build
  output for quality

* All data associated to a build is now localized in its own directory
  named according to a build ID:
  /var/portbuild/${arch}/${branch}/builds/${buildid}, where ${buildid}
  is the creation time.  These are actually ZFS filesystems.

* Tasks such as cloning a new build, updating a ZFS snapshot, and
  cleaning up a build are exported to the "build" script, which can be
  used independently.

* Creating a new build is done by ZFS cloning and takes a couple of
  seconds since it is copy-on-write (i.e. no data needs to be copied).

* Ports and source trees are also cloned from pre-updated ZFS images
  (updated regularly from the "updatesnap" cron job).  In most cases
  we do not care if we are building a ports tree that is an hour or so
  old since it will become outdated almost immediately anyway, so no
  matter what we do there will be times when a port has been fixed by
  the time the build error is generated by a client.

* In case an up-to-the-second tree is desired, the -portscvs and
  -srccvs switches update the existing ports tree via CVS.

* -noports and -nosrc can be used to prevent any automatic changes to
   the ports tree.  This is useful for dealing with local
   modifications (e.g. for -exp builds), since the default when
   creating a new build is to replace the previous trees with fresh,
   pristine trees.  If you forget to use this then any local changes
   that are not also present in other trees will be lost.

* By default we keep two builds for each arch/branch pair.  These
  build IDs also may be referred to via "latest" and "previous"
  symlinks.  When creating a new build, the old "previous" build is
  destroyed by default, unless it was originally created using the
  -keep switch.  This prevents the build from being destroyed
  automatically.

* By default when a build finishes all of the clients are completely
  cleaned up (i.e. all build data such as ports trees, tarballs,
  client chroots, etc are deleted).  This is needed to save space on
  the clients.  If you expect to *immediately* perform further builds
  after this one completes, the -nocleanup switch prevents this step.
  Otherwise they will just be set up again if further builds are
  scheduled.

* Try to parallelize build pre-processing as much as possible, by
  running jobs in the background wherever possible.  In several places
  we operate on the same parts of the filesystem from multiple jobs,
  so we can make good use of caching to improve performance

* Clients no longer need to be set up explicitly at the start of the
  build, they will be set up on-demand when the first job is
  dispatched to them.  This allows fast clients or those that already
  have been set up to begin building ports as soon as possible, while
  slow clients are set up in the background.  It also improves
  robustness of client recovery, e.g. if the client was offline at the
  time of build startup but later brought back online.

* Optimize copying back in the previous set of restricted packages by
  hardlinking instead of copying.

TODO: The record of failed ports is arch/branch-global still.  This is
the only thing preventing us from running concurrent builds of the
same arch/branch (e.g. while one is stuck building openoffice, the
next build can start to keep the cluster busy).  The difficulty is
that one build from a later ports tree may signal that a build was
successful, then a phase 2 build from an earlier ports tree may
indicate that it was broken.  The solution is probably to migrate this
to a real database instead of a flat file, and query it for the set of
broken ports as of a certain ports tree date.
2008-07-26 14:49:26 +00:00
Kris Kennaway
efe865a26c * Catch up to build ID directory changes
* Clients no longer mount ports/src trees via NFS (even the FreeBSD.org
  local clients).  This was putting too much load on the server and
  slowing down builds.

* Instead ports and src .tbz files are pushed to the clients and
  unpacked.  MD5 checksums are used to verify correctness

* -force forces re-extraction of the tarballs even if they exist and
  appear to be checked out

* Also unpack the compressed bindist

TODO: When we are not using md or ZFS builds it would be even faster
to keep an unpacked copy of the bindist on the scratch filesystem and
hardlink the files into the target directory
2008-07-26 14:19:31 +00:00
Kris Kennaway
b9dde2b9f8 * Catch up to build ID directory changes
* Optimize by copying old packages using cpio -dumpl (i.e. create hardlink
  instead of copying the file).
2008-07-26 14:14:35 +00:00
Kris Kennaway
07e904cab8 * Catch up to build ID directory changes 2008-07-26 14:13:35 +00:00
Kris Kennaway
1d5ba88d7a * Cleanup
* Catch up to build ID directory changes
* Remove need for /etc/arch file
2008-07-26 14:12:53 +00:00
Kris Kennaway
9f29c725dd * Cleanup
* Catch up to build ID directory changes
* Improved support for ZFS
* Desupport X11BASE
2008-07-26 14:12:28 +00:00
Kris Kennaway
f8a634d336 * Cleanup
* Catch up to build ID directory changes
* Improved support for ZFS builds
* Improved robustness
* Report status verbosely to the caller; whether we succeeded in claiming
  a chroot, whether the caller needs to first set up the client, or
  whether a setup is in progress.
* If we discover that the client has not been set up either because it
  freshly booted and newfs'ed its filesystem, or because a particular
  build has not yet been encountered, atomically claim a cookie and
  report this to the caller to act on
2008-07-26 14:11:26 +00:00
Kris Kennaway
316ad2a0a7 * Cleanup
* Catch up to build ID directory changes
2008-07-26 14:07:49 +00:00
Kris Kennaway
1dc6876bab * Cleanup
* Catch up to build ID directory changes
* Add helper functions for resolving a build ID symlink and
  validating an arch/branch combination (centralize instead of doing it
  in many scripts)
2008-07-26 14:06:30 +00:00
Kris Kennaway
5acb87ae92 * Desupport alpha and ia64
* Catch up to build ID directory changes
2008-07-26 14:05:01 +00:00
Kris Kennaway
46356ad8f8 * Add comments 2008-07-26 14:04:23 +00:00
Kris Kennaway
0b457b9cf0 * Implement basename and dirname using shell builtins 2008-07-26 14:02:55 +00:00
Kris Kennaway
f204e78013 * Cleanup
* Catch up to build ID directory changes
* Record package build completion for reporting to ganglia
2008-07-26 14:02:38 +00:00
Kris Kennaway
90e209c3d9 * Cleanup
* Catch up to build ID directory changes
* Add support for ssh_cmd and scp_cmd to allow using HPN-SSH with the
  none cipher where possible (for performance)
* Lazy client setup; claim-chroot will report if the client needs to be
  set up with this buildid, and we initiate the setup and poll until
  it is complete.  This allows fast clients to begin building before
  slow ones have finished setting up.

TODO: a better solution would be to avoid trying to dispatch jobs onto
clients that are in the process of setting up, since they often have low
loads and are picked preferentially by the job scheduler.
2008-07-26 14:01:07 +00:00
Kris Kennaway
a52cf32275 * Cleanup
* Remove vestiges of archaic support for building bindists from FTP
  snapshots; we haven't used this for years and building a world is no
  longer a challenge
* Revert half-baked bindist generation number and make it per-buildid
  instead.  Compress and md5 it for distribution to the clients.

TODO: Merge with makeworld?
2008-07-26 13:54:59 +00:00
Kris Kennaway
46a114508f * Cleanup
* Catch up to build ID directory changes
* Optimize by using ECHO_MSG=true instead of /usr/bin/true
* Try harder to avoid pollution from local host
2008-07-26 13:52:32 +00:00
Kris Kennaway
18cafe9ff8 * Cleanup
* Catch up to build ID directory layout
2008-07-26 13:51:30 +00:00
Kris Kennaway
1ba1b7f79e * Cleanup
* Catch up to build ID directory changes
* Export the INDEX_PRISTINE and INDEX_QUIET variables (old bug)
* Desupport X11BASE
2008-07-26 13:50:15 +00:00
Kris Kennaway
4bcc698d1c * Cleanup
* Catch up to build ID directory changes
* Desupport 5.x
2008-07-26 13:47:45 +00:00
Kris Kennaway
e9fe4c9896 * Cleanup
* Catch up to build ID directory changes
* Optimize by using __MAKE_SHELL=/rescue/sh
2008-07-26 13:47:03 +00:00
Kris Kennaway
23fa193076 Rewrite in python and combine the functions of the former
checkmachines script.  Polls build machines for their status either
once-off or regularly as a daemon.  Optionally it will update the
queue entries but this remains subject to race conditions.

TODO: Integrate with queue manager and forward machine status changes
to it
2008-07-26 13:45:19 +00:00
Kris Kennaway
335c9a9ec3 More verbose status reporting using key=value format. We now also
report error status, architecture and OS version, and available build
environments, as well as load and number of running jobs
2008-07-26 13:42:14 +00:00
Mark Linimon
29b9f9c62d Reflect latest changes from production:
- no more 5-exp
 - add 8, 8-exp
 - fix two error-name hrefs
2008-07-02 08:44:20 +00:00
Kris Kennaway
a530ab018b This conversion script is no longer useful 2008-06-25 22:27:17 +00:00
Kris Kennaway
cabd6f0d4a Modernize this script a bit.
* Remove 5.x support
* Leave the archaic ftp snapshot support for now, it is not hurting anything
  but will not work
* Be more careful when removing files (use absolute paths)
* Switch to bindist/tmp for the tmp dir
* Fix the recording of the bindist.tar generation number
* Get rid of redundant or useless processing of the world image
2008-06-11 13:30:35 +00:00
Kris Kennaway
8a8d78247c * Distfile collection is now the default; replace -distfiles with -nodistfiles
* Record the CVS update stamp in some extra places and make sure to remove it
  if the build is started with -noportscvs (since this probably means the
  ports tree was updated by hand at some random time)
2008-06-11 13:28:30 +00:00
Kris Kennaway
fce1fcb22a Add some test -d's to avoid cd'ing into directories that do not exist 2008-06-11 13:25:49 +00:00
Kris Kennaway
fdbc5869f0 Major optimizations. Instead of copying the distfiles around, mv
them in batches according to their target directory.
2008-06-11 13:25:13 +00:00
Kris Kennaway
efaa197bfb Revive this script and make it useful. Transfer the distfiles using rsync
and make sure they have been post-processed first.
2008-06-11 13:24:17 +00:00
Kris Kennaway
271351e954 * Catch up to X11R6 removal
* Keep RESTRICTED distfiles in a separate DISTDIR so we can easily
  avoid accidentally publishing them to the FTP site (idea from des@)
2008-06-11 13:22:58 +00:00
Kris Kennaway
5953f77e24 Rewrite this to make it more efficient (fewer external command
invocations).  It also fixes some edge cases that were not handled in
the previous version.

TODO: Correctly report IPv6 sockets (already in use by the sparc64 build)
2008-06-02 19:46:03 +00:00
Mark Linimon
11ea3eeed1 Remove the force file, if it was used.
Forgotten by:	linimon
2008-05-29 14:14:54 +00:00
Mark Linimon
ed5147b294 Add a force function here, just like processlogs. 2008-05-29 02:34:41 +00:00
Mark Linimon
4cc67bd216 Add a note that processonelog and processlogs2 are finicky about the
header format of the log files.
2008-05-29 01:46:08 +00:00
Mark Linimon
f5c9292932 Fix these after the 1.31 update to buildscript.
Forgotten by:	pav
2008-05-29 01:45:39 +00:00
Kris Kennaway
9100b4ee0d Rewrite this in python instead of shell. Because we can read the
INDEX once and process internally instead of invoking many external
utilities, runtime is improved from ~20 minutes to <10 seconds.
2008-05-25 18:07:49 +00:00
Kris Kennaway
778518d745 NNew build scheduler written in python to replace the make+sh job
ordering, which had become too limited.

We now build packages ordered by those that are part of the longest
dependency chains first.  This has the effect of building the deepest
parts of the tree first and levelling out the tree height, hopefully
avoiding the situation we currently face where there appear
bottlenecks late in the build where the cluster becomes mostly idle
while waiting for a few long dependency chains to finish building
before the cluster can become fully loaded again.

The algorithm is that we sort the list of remaining packages according
to height (longest dependency chain), then add leaf packages from each
in order until we have filled a queue of length between 100 and 200,
to amortise the cost of this queue rebalancing while not losing the
height averaging property.  Jobs are dispatched from this queue into
worker threads as machine slots become available.

Unlike the make-based solution that required a fixed -j concurrency
value and could not respond to addition/removal of build resources, we
now can dynamically add new machines as they become available to the
queue.

The other advantage of using python is that we have more
customisability and visibility into the build status, e.g. we
periodically report the number of remaining packages, as well as the
list of deepest packages that we are working on.

TODO:

* Implement mtime checking for parent package staleness, so that
  parents are rebuilt if the dependencies are touched more recently.
  Currently packages will not be rebuild if they exist, whether or not
  they are "stale" wrt their dependencies.

* Offload the machine selection into an external queue manager.
  Currently the queue manager used here doesn't interoperate with the
  old one (getmachine/releasemachine) because it's not possible to use
  the lockf()-based mutual exclusion within a multithreaded client.
  Doing that will also allow for a more flexible job placement
  algorithm as well as finer queue customization.
2008-05-10 13:22:51 +00:00
Pav Lucistnik
989ac675fc Remove XFree86-4 from quickports 2008-04-11 11:34:30 +00:00
Pav Lucistnik
b88fea571b Parallelize to 4 concurrent jobs 2008-04-11 11:33:38 +00:00
Pav Lucistnik
0c02d135ff Include per-machine configuration and respect use_zfs flag 2008-04-11 11:32:29 +00:00
Pav Lucistnik
2545b7c79e Sync with pointyhat (reorg) 2008-04-11 11:31:33 +00:00