At approximately 0205 today, one of the UW-HEP name servers (at 128.104.28.118) died. This caused minor delays in resolving domain names until around 0710--when DNS service was restored. DHCP service is still down, but should be restored shortly.
There will be a couple of brief network interruptions at UW-HEP tomorrow, Thursday November 17th during the 4am through 5am CST time-frame. The interruptions are expected to last around 30 seconds, but may last five minutes or more. (We will be deploying a new core backbone switch in preparation for migration to the UW-HEP 21st Century Network.)
The UW-HEP mail server (hep.wisc.edu aka mail.hep.wis.cdu) has crashed a number of times lately. It's a two processor system and all indications are that the second CPU is flaky. I removed it earlier this morning and the system hasn't crashed yet.
All UW-HEP network and computing services will be down on Tuesday, September 13th from 6:00am through 7:00am CDT (because Chamberlin Hall will not have power.)
As I understand it, the new emergency generator installed during the renovation of our building had problems with it's fuel line--like the engineer who installed it expected the fuel oil to flow up hill!
There will one--or perhaps a few--brief UW-HEP network outages during the 9am through 10am time-frame (CDT) tomorrow, Thursday August 18th. The outage(s) is/are expected to last around 30 seconds, but may last five minutes or more.
During that time, the campus electrical folks will be testing emergency power and lighting.
All UW-HEP network and computing services will be down tomorrow (Wednesday August 17th) from 6:00am through 7:00am CDT because Chamberlin Hall will not have power.
The Campus Electrical Folks say...
We will be requiring a building outage to allow for testing of one of the two, 5kV feeder cables supplying the building. This cable, we feel, has caused the two recent unscheduled outages. The outage purpose is to disconnect this cable from the building switches to allow testing to be preformed on it (the cable can not be tested while connected to the switches)..
The outage should last approx 1 hour or less. We are scheduling it from 6:00 am to 7:00 am, this Wednesday, August 17, 2005.
This outage will also test the new settings which are being installed into the 480 volt building circuit breakers feeding panel H6DP on the 6th floor. This should resolve the problem of tripping out after a utility/building outage is restored.
We will follow up with the cable testing as mentioned above, however, this will not be affecting the building.
It appears that at around 11:35pm yesterday, Thursday Jul 28th, UW-HEP lost power for about 140 minutes. The machine room A/C is still without power.
A number of servers did not cleanly reboot after power was restored and that problem was magnified by the fact that our KVM (keyboard/video/mouse multiplexing) system has failed.
Most AFS file service was restored at around 1am. AFS file service for /afs/hep/atlas, /afs/hep/grid3 and /afs/hep/osg should be restored by around 9am.
Mail service (hep.wisc.edu) was restored at around 7:45am today.
Some service may be briefly off-line today while we fix our KVM system.
Tomorrow morning, Thursday June 30th, the UW-HEP email service (hep.wisc.edu) will unavailable starting at 2:00AM CDT. Service should be restored by 6:00AM. During that time, the server hardware and software will be upgraded.
The upgrade will include the installation of Sophos PureMessage--an email filtering system which automatically detects and quarantines viruses and spam messages. Another message with specific details about the UW-HEP PureMessage system will follow.
From around 10:30am until 11:10am today, Saturday June 18th, the UW-HEP email service was severely degraded because the server was flooded with email msgs from "CMSDOC Server" Service is still slightly degraded because the system is processing a backlog of msgs. I expect service to return to normal in about 30 minutes (around 12:15pm). UW-HEP DHCP service was down from around 2200 yesterday until 0800 today. The OS decided to suspend/sleep! After some research/googling I've come to think that Sun keyboard drivers can mysteriously, incorrectly and very infrequently think they receive keyboard "powerdown/suspend/powerup" key code! I've disabled the "buttons_n_dials-set" init script which should cause Solaris to disregard the suspend keycode.
DHCP Service Outage