By way of "start-up money" for some new Plasma Physics professors, we've added 24 more 2.4 GHz dual/dual Opteron compute nodes (g10n01-24.)
The UW-HEP worker nodes came back on-line at 1600 CST yesterday, Thursday February 16th.
The U of Wisconsin electricians believe a motor which drives one of two fans in the heat exchanger is causing fuses to blow, which in turn cause the A/C to fail. They took that motor (and it's fan) off-line yesterday and are testing it.
For now, the other fan is operating and adequately exchanging heat. (Our current temperature is 12 deg F and the NWS forecast low for tonight is 10 degrees below zero!)
We do not anticipate further outages--the repaired motor can be brought back on-line without shutting down worker nodes.
The air conditioning in the first UW-HEP machine room is once again not functioning properly. Accordingly, the worker nodes in that room have been shutdown.
I'll send another message when they are back in service. My apologies for any inconvenience this causes.
The UW-HEP worker nodes came on-line at around 9:30am today. The UW electrical shop replaced a blown fuse yesterday, so the first A/C unit is now operating. But the root cause of the failing fuses has not been fixed or even found. Hopefully the UW electrical shop will resolve the problem tomorrow. However, additional worker node outages are possible.
The air conditioning in the first UW-HEP machine room is once again not functioning properly. Accordingly, the worker nodes in that room have been shutdown.
The ETR is unknown. I'll send another message when they are back in service. Our apologies for any inconvenience this causes.
Most of the UW-HEP worker nodes are back on-line. We'll get the rest on-line ASAP.
The primary A/C unit had blown a fuse. It was replaced, and then the same fuse blew again. Apparently the folks who initially installed the unit did not tighten some terminals for the temperature probes that monitor the heat exchanger on the roof and the contacts has corroded to the point were they caused a short. The screws for the terminals have been tighend--in fact, I was told they turned almost two whole turns!
The air conditioning in the first UW-HEP machine room is not functioning properly: the temperature has risen to around 91 degrees.
Accordingly, I just shutdown all worker nodes in that room. This includes the g3nXX, g4nXX, g5nXX, g6nXX, g7nXX and g8nXX systems.
I will post another blog entry when they are back in service.
The UW-HEP Condor pool is now effectively defunct: all our worker nodes have been migrated to the GLOW Condor pool. So in total, we added another 248 CPUs to GLOW recently. Along with this change, we upgraded our Condor software to version 6.1.14.
UW-HEP just added 43 dual/dual (four CPU) 1.8 GHz Opteron systems to GLOW. They have host names in form "g9nXX.hep.wisc.edu". About 10 minutes or so after their inception, they were all 100% CPU busy--pounced on by Condor jobs from Comp Sci. That brings the total (right now) to exactly 1100 CPUs! Oh, and I almost completely forgot: they each have two 500 GB disk drives, so we also added 42 TB of storage space for dCache!
Something like 104 UW-HEP compute nodes where renumbered (i.e had their IP addresses changed) from 128.104.28.0/23 to 144.92.180.0/22 today. This was done in anticipation of dividing the two HEP IP subnets into two VLANs, and eventually into two entirely separate local area networks.