78 lines
4.0 KiB
Plaintext
78 lines
4.0 KiB
Plaintext
[12] WHEN ARE THE SYSTEM MAINTENANCE WINDOWS? WHY THE LOW UPTIME?
|
|
|
|
Typically the SDF Public Access UNIX System is available to its
|
|
members and, in some cases, the general public 24 hours a day,
|
|
7 days a week, 365 days a year, 10 years a decade, 25 years a
|
|
quarter century .. and so on.
|
|
|
|
That being said there are unforeseen issues that can cause the
|
|
system to become unavailable:
|
|
|
|
1. Hard Disk Crash - We have several spare drives, some of
|
|
them already plugged in and ready to be used. In the
|
|
best case scenario no maintenance window is required.
|
|
|
|
2. Fire - In the case of fire all SDF machines must be shut
|
|
down unless the fire is an isolated occurance.
|
|
|
|
3. Natural Disaster - In the Spring (Apr-May) we do get
|
|
affected by lighting strikes in our area due to heavy
|
|
thunderstorms. Best case scenario the UPS systems filter
|
|
the spikes and dips which allow SDF to run uninterrupted.
|
|
|
|
4. Software Bug - This due crop up from time to time and are
|
|
usually related to system updates. On SDF we typically
|
|
will let the public access machines lag behind NetBSD
|
|
development in order to test new releases in our lab before
|
|
subjecting the userbase to 'new bugs'.
|
|
|
|
5. Routine and Scheduled Maintenance - Please read below.
|
|
|
|
6. Hardware Component Failure - We have many spare machines,
|
|
some completely cabled up and ready to go at the flick of
|
|
a remote command. If an SDF client host becomes completely
|
|
unrecoverable, a spare can be put into operation within
|
|
minutes. Keep in mind that while all of your personal files
|
|
are hosted on the file server, the /tmp directory is exclusive
|
|
to each SDF client host.
|
|
|
|
ROUTINE AND SCHEDULED MAINTENANCE
|
|
|
|
There is a weekly maintenance window on Sunday mornings beginning at
|
|
02:00 AM until 03:00 AM. This windows is not always used and when it
|
|
is, it is used very briefly. 5 minutes prior to a shutdown or runlevel
|
|
transition all logged in members will be notified on their terminals.
|
|
If you see this message alerting you to system maintenance, you should
|
|
save all open files and prepare to logout.
|
|
|
|
Scheduled maintenance is always announced several days in advance on
|
|
the bboard in the <ANNOUNCE> board. If it that maintenance window
|
|
requires extended time (basically anything over 5 to 10 minutes) the
|
|
/etc/motd file (displayed at login) will note the details of the event.
|
|
|
|
Scheduled maintenance is really only used when hardware upgrades have
|
|
to take place. In most cases, software updates can occur while the
|
|
systems are up and available.
|
|
|
|
WHY THE LOW UPTIME?
|
|
|
|
Uptime is relative. What we're after is 'high availability'. This
|
|
means that our goal is to have the servers answering at least 99.9%
|
|
of the time. In the 20+ years of service SDF has been able to meet
|
|
this goal. The most uptime you'll see on any given server will be
|
|
about 3 to 4 weeks. After 3 weeks performing maintenance is necessary.
|
|
This helps with clearing buffers, caches and other inconsistencies
|
|
that can occur as the systems run from cold or warm boot. Rather
|
|
than waiting for the system to fail due to kernel panic or a hang,
|
|
a warm boot is performed, during the weekly maintenance window, which
|
|
takes roughly 5 minutes or less. Keep in mind, this doesn't occur
|
|
weekly but usually after 3 to 4 weeks of linear uptime.
|
|
|
|
Why is this necessary? (aka "My box runs for years under my desk").
|
|
We too have very low usage non-public NetBSD systems that run for years
|
|
without requiring a reboot. However, SDF is extremely high volume with
|
|
sophsiticated NFS, NIS and VNODE caching. While these do not cause
|
|
problems with light loads, with 40,000 active users they become an
|
|
issue. Again, our goal is high availability which doesn't necessarily
|
|
have to translate it long uptimes.
|