I just released a hugely improved version of nagiosr: for example it picks up eight alerts from yesterday that the old version missed! (I didn't write the orginal alert matching code--but I've almost entirely rewritten it now.)
ftp://noc.hep.wisc.edu/pub/src/nagiosr
The old Agenda System, based on CERN's "CDS" had a serious security hole, so Will's implemented CERN's new version (called "Indico".) The old DIS05 and PHENO06 content has been migrated to the new system, but the old accounts don't work (just re-register if need be.)
I'm rather certain we set a new record for egress traffic here at the University of Wisconsin. For a few minutes we peaked at 4.2 Gbps and we sustained 3.3 Gbps for a little over 30 minutes. The end-to-end (Fermi to UW HEP) applications (PHEDeX) data rate was around 2.7 Gbps (330 MBps.)
At approximately 0205 today, one of the UW-HEP name servers (at 128.104.28.118) died. This caused minor delays in resolving domain names until around 0710--when DNS service was restored. DHCP service is still down, but should be restored shortly.
Some recent discussion on the Nagios mailing list reminded me that I've announced my super nifty full-screen terminal interface for Nagios...
http://noc.hep.wisc.edu/cnagios.html
Cnagios and nagiosr make a darn nice replacement for the Nagios web GUI--in fact I almost never use the web gui anymore!
When real garlic gets old, it gets moldy and stinky. Fortunately our garlic (an AFS server for OSG file space) didn't get stinky, but it did get too slow for the job. So we've upgraded from a lowly 2 GHz Pentium4 system to a spiffy dual 3.0 GHz Xeon system with 4 GB of memory.
Here's some pretty graphs I made recently, I have no idea what they really represent, but I'm sure the under-lying data is completely bogus...
http://noc.hep.wisc.edu/nrg/tier2/ProdAgent-events.cgi
We recently inherited 27 dual 3.0 GHz Xeon compute servers (which were already housed in our machine room.) In order to increase our storage space for our CMS Tier-2 storage facility, we're going to install a bunch of disks in them.
Yesterday I asked for quotes for qty 55 750 GB Seagate 7200.10 disks, but then on vendor said they've seen a 80% failure rate with that drive. So now we're looking at buying 55 500 GB Seagate drives--the NL35.2 model--which has a better MTBF. Oh, well, only another 13.5 TB instead of 20.
If you know Nagios, then you probably know that that some of it's monitoring scripts (aka "plugins") suck. And that's why I wrote nifty scripts to monitor sendmail mail queue size over snmp today...
ftp://noc.hep.wisc.edu/pub/src/nagios/
Adaptec bought a company called DPT a number of years ago. I
have a long and good history of using the RAID controllers
from DPT. A few months ago, I was disappointed to find out
that Adaptec had stopped making DPT-based RAIDs. I emailed one
of the DPT guys at Adaptec and he sent my a very friendly reply
with the skinny. Adaptec no longer makes DPT-based RAID controllers,
but they have folded a lot of the DPT technology into their latest
("aacraid") controllers and software. So we're now standardizing
on the aacraid controllers--which I'm comfortable with because
not-so-little vendors like Dell and Sun integrate aacraid controllers
into their products. And thus I just finished writing a Nagios plugin
to monitor our new aacraid (2130/2230S) controllers...
ftp://noc.hep.wisc.edu/pub/src/nagios/