Categories

A/C "Repaired"?

Posted: 3:24pm Tuesday February 21 2006

Category: Machine Rooms

I'm kinda tired of blogging about this damn A/C, so this'll be short. The thermostats for the fans have been turned down, so they both operate when the glycol is 55 deg F or lower. The second fan was rotating in the wrong direction, so it's rotation was reversed. And the temperature in the room has dropped five degrees to 72. We are going to move a couple of racks of computers out of that room to decrease the heat load. More later, I'm sure.


A/C Saga Continues

Posted: 2:53pm Monday February 20 2006

Category: Machine Rooms

I wasn't around on Friday afternoon. But the electrical shop was and they said [they] "found wire shorted out to one of those fans... re-spliced the wire in the fan... so we should be good to go now... I'm glad we found a definite problem.."

Well, we seem to have lost some cooling capacity--we've gone from holding approx. 71 deg F to around 75 deg F--so I checked the heat exchanger on the roof. Sure enough, the second fan isn't blowing.

So I called the electrician and talked to him just now. He claims that the control electronics and set points are such that the second fan should/may not always run. I explained to him that it seems illogically that system would be designed to not exchange as much heat as it can when the thermostat is calling for cooling. We agreed that the unit should be wired so both fans come on. He'll be here tomorrow.


Worker Nodes Back Online

Posted: 3:28pm Thursday February 16 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The UW-HEP worker nodes came back on-line at 1600 CST yesterday, Thursday February 16th.

The U of Wisconsin electricians believe a motor which drives one of two fans in the heat exchanger is causing fuses to blow, which in turn cause the A/C to fail. They took that motor (and it's fan) off-line yesterday and are testing it.

For now, the other fan is operating and adequately exchanging heat. (Our current temperature is 12 deg F and the NWS forecast low for tonight is 10 degrees below zero!)

We do not anticipate further outages--the repaired motor can be brought back on-line without shutting down worker nodes.


A/C Still Not Functioning

Posted: 10:05pm Wednesday February 15 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The air conditioning in the first UW-HEP machine room is once again not functioning properly. Accordingly, the worker nodes in that room have been shutdown.

I'll send another message when they are back in service. My apologies for any inconvenience this causes.


Worker Nodes Back On-Line Again

Posted: 3:02pm Sunday February 05 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The UW-HEP worker nodes came on-line at around 9:30am today. The UW electrical shop replaced a blown fuse yesterday, so the first A/C unit is now operating. But the root cause of the failing fuses has not been fixed or even found. Hopefully the UW electrical shop will resolve the problem tomorrow. However, additional worker node outages are possible.


Another A/C Malfunction--Worker Nodes Offline Again

Posted: 1:58pm Saturday February 04 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The air conditioning in the first UW-HEP machine room is once again not functioning properly. Accordingly, the worker nodes in that room have been shutdown.

The ETR is unknown. I'll send another message when they are back in service. Our apologies for any inconvenience this causes.


Worker Nodes Back Online

Posted: 10:07am Thursday February 02 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

Most of the UW-HEP worker nodes are back on-line. We'll get the rest on-line ASAP.

The primary A/C unit had blown a fuse. It was replaced, and then the same fuse blew again. Apparently the folks who initially installed the unit did not tighten some terminals for the temperature probes that monitor the heat exchanger on the roof and the contacts has corroded to the point were they caused a short. The screws for the terminals have been tighend--in fact, I was told they turned almost two whole turns!


A/C Malfunction--Worker Node Outage

Posted: 2:47am Thursday February 02 2006

Categories: CMS, Compute Nodes, GLOW, Machine Rooms

The air conditioning in the first UW-HEP machine room is not functioning properly: the temperature has risen to around 91 degrees.

Accordingly, I just shutdown all worker nodes in that room. This includes the g3nXX, g4nXX, g5nXX, g6nXX, g7nXX and g8nXX systems.

I will post another blog entry when they are back in service.


Need... More... Bandwidth...

Posted: 11:10am Tuesday January 17 2006

Categories: CMS, Machine Rooms, Networks

After bringing up our new "GridNet" network, we found that the trunk interconnecting our two machine rooms was completely saturated--running at 985-995 Mbps. So today we replaced it with a 4 Gbps "etherchannel" trunk. As luck would have it, our CMS and OSG computing network demands dropped below 1 Gbps at about the same time as when we brought up the link, so it took a while before we were convinced that we have a full 4 Gbps of bandwidth.


The Grid Network Is Born

Posted: 5:32pm Thursday January 12 2006

Categories: CMS, Machine Rooms, Networks

Today was incredibly hectic! Not only did we bring up the 10 Gbps uplink to the Internet, but we also started using our new machine room ("mr2"), moved the CMS Tier2 Server Rack from the old machine room ("mr1") to mr2, reprogrammed three stacks of Cisco 3750 switches, brought up a gigabit interconnect from mr2 to mr1 for the GLOW racks, renumbered the CMS Tier2 Server Rack, and lastly, physically split our network in two. So we now have the "HEP Grid Network" with about 20 servers, 12 storage servers and 186 CPUs in 94 compute nodes, and the "HEP Staff Network" with, well, all the other servers and connections to offices.



Search

Other Links