Categories

24 More Compute Nodes

Posted: 1:48pm Thursday May 25 2006

Categories: Compute Nodes, GLOW

By way of "start-up money" for some new Plasma Physics professors, we've added 24 more 2.4 GHz dual/dual Opteron compute nodes (g10n01-24.)


Worker Nodes Back Online

Posted: 3:28pm Thursday February 16 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The UW-HEP worker nodes came back on-line at 1600 CST yesterday, Thursday February 16th.

The U of Wisconsin electricians believe a motor which drives one of two fans in the heat exchanger is causing fuses to blow, which in turn cause the A/C to fail. They took that motor (and it's fan) off-line yesterday and are testing it.

For now, the other fan is operating and adequately exchanging heat. (Our current temperature is 12 deg F and the NWS forecast low for tonight is 10 degrees below zero!)

We do not anticipate further outages--the repaired motor can be brought back on-line without shutting down worker nodes.


A/C Still Not Functioning

Posted: 10:05pm Wednesday February 15 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The air conditioning in the first UW-HEP machine room is once again not functioning properly. Accordingly, the worker nodes in that room have been shutdown.

I'll send another message when they are back in service. My apologies for any inconvenience this causes.


Worker Nodes Back On-Line Again

Posted: 3:02pm Sunday February 05 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The UW-HEP worker nodes came on-line at around 9:30am today. The UW electrical shop replaced a blown fuse yesterday, so the first A/C unit is now operating. But the root cause of the failing fuses has not been fixed or even found. Hopefully the UW electrical shop will resolve the problem tomorrow. However, additional worker node outages are possible.


Another A/C Malfunction--Worker Nodes Offline Again

Posted: 1:58pm Saturday February 04 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The air conditioning in the first UW-HEP machine room is once again not functioning properly. Accordingly, the worker nodes in that room have been shutdown.

The ETR is unknown. I'll send another message when they are back in service. Our apologies for any inconvenience this causes.


Worker Nodes Back Online

Posted: 10:07am Thursday February 02 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

Most of the UW-HEP worker nodes are back on-line. We'll get the rest on-line ASAP.

The primary A/C unit had blown a fuse. It was replaced, and then the same fuse blew again. Apparently the folks who initially installed the unit did not tighten some terminals for the temperature probes that monitor the heat exchanger on the roof and the contacts has corroded to the point were they caused a short. The screws for the terminals have been tighend--in fact, I was told they turned almost two whole turns!


A/C Malfunction--Worker Node Outage

Posted: 2:47am Thursday February 02 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The air conditioning in the first UW-HEP machine room is not functioning properly: the temperature has risen to around 91 degrees.

Accordingly, I just shutdown all worker nodes in that room. This includes the g3nXX, g4nXX, g5nXX, g6nXX, g7nXX and g8nXX systems.

I will post another blog entry when they are back in service.


Yum Yum Yum

Posted: 1:08pm Wednesday February 01 2006

Categories: Compute Nodes, Desktops, Operating Systems, Servers, Software

The advent of Scientific Linux x86_64 worker nodes (with Opteron CPUs) here at UW-HEP caused the need for a fair number of i386 shared library RPMs. That was the straw that broken the camel's back with respect to package management, so I finally set up a number of yum repositories to automate the installation of RPMs. It was a dirty complex job, but, like a lot of systems admistration work, it'll pay off big time in the long run.


172 More CPUs

Posted: 4:49pm Thursday January 26 2006

Categories: CMS, Compute Nodes, GLOW, Storage

UW-HEP just added 43 dual/dual (four CPU) 1.8 GHz Opteron systems to GLOW. They have host names in form "g9nXX.hep.wisc.edu". About 10 minutes or so after their inception, they were all 100% CPU busy--pounced on by Condor jobs from Comp Sci. That brings the total (right now) to exactly 1100 CPUs! Oh, and I almost completely forgot: they each have two 500 GB disk drives, so we also added 42 TB of storage space for dCache!


Ninth Generation Compute Nodes Arrived

Posted: 1:06pm Wednesday January 11 2006

Categories: CMS, Compute Nodes, Storage

We received 46 "dual/dual" Opteron 1U pizzaboxes today. Each one has four 1.8 GHz Opteron CPUs, 4 GB of memory and an extra pair of 500 GB disks that'll be used for dCache. So all told, that's 184 CPUs and about 45 TB of storage that'll almost fit into one rack.



Search

Other Links