Archive for machine room

HP c-Class c7000 Chassis & Onboard Administrator Notes

The Onboard Administrators (we got a pair for redundancy) each ship with a unique password. When you connect them, it appears the active OA resets the standby password to match the active. This was a bit confusing, as OA #2 came up active, and the passwords were not as expected; SSL certificates are created and reloaded in terms of “Active” & “Standby”, so I initially loaded new certs onto the wrong OAs.

ssh Implementation Flawed

The OAs support ssh access and ssh keys, but apparently only for the single Administrator account. This is documented incorrectly — the docs say the last word on the key line is the username the key is for, but actually they’re all linked to Administrator. HP Support doesn’t know much about it. It’s bad when security features don’t work as documented — in this case, it would be easy to follow instructions and upload a key for an unprivileged Operator or User account, unintentionally granting full Administrator access — we had this for a while, until I figured out what was really going on.

The web interface doesn’t allow copy & paste of keys — they must be downloaded by the OA from a web server. Afterwards, though, the public keys (which had to be accessible on through a web server, remember) are not visible to other authorized users of the OAs — only Administrator can see or modify keys. Feh.

Additionally, the web interface shows line breaks as ‘^’, so the keys look corrupt. Despite this they work, and display correctly in the command-line interface.

OA doesn’t automatically configure its accounts onto blade iLO. Instead, it creates an account for OA itself on each blade’s iLO. This is a bit odd, as it means authorized users cannot connect directly to iLO — instead they must connect through an OA, and have the OA login, before using iLO. We will presumably use the Compaq iLO configuration language to deploy our accounts to iLO, but this shouldn’t be necessary.

Good News

On the bright side, the chassis is easier to mount than our (smaller) IBM BladeCenter chassis; it’s also better labeled. The Onboard Administrator interface is better laid out, although it doesn’t work in Safari (seems fine in Firefox/Mac). The command line is a bit less bizarre than IBM’s.

HP makes it easy to dump the configuration to a text file, tweak it, and load it into another chassis, although we haven’t tested yet; they call this “Configuration Scripts”.

Comments

Wiring Art

The Pretties

Inspired by When data center cabling becomes art from Andrew T Laurence & Chuck Goolsbee’s pics of Digital Forest, I took some photos of Rockefeller’s new data center. We’ve been planning out various scenarios for 5 years at this point, but we finally moved most of our systems in this month. Note that the network guys (mostly Eric) took care to run cables connecting to ports on the left half of each device in from the left, and come in from the right for ports on the right. This makes more work for them in preparation, since one cannot simply plug a cable into a free port, but makes things look prettier, and also reduces cable snarling. 3 KVMs & baby + LCD

More Connectivity, Please

Since we first started discussing data center plans, I’ve been saying we need more connectivity. The new DC has 48 patches per 42U rack, and some of the new racks are indeed running out of ports before they run out of vertical space. In our racks 2U is used for patch panels and 2 cables control APC managed power strips, so we have 40U and 46 patch ports for servers. Our Linux servers have Ethernet, serial console, & KVM; Suns have Ethernet & console; Windows have Ethernet & KVM. In the worst case, 40 1U Linux servers need 120 connections, but we only have 46 available. If the rack is full of 2U Suns & Windows servers, we’re okay with 6 ‘extra’, available for dual-connected servers or whatever. As we get more dense, we begin to run out of ports. Cat6 flowing down

Blades

Blades are no better — their chassis tend to blow out the power budget because they’re even more dense than 1Us (although they do get more servers per rack), and with all the redundancy they still require a lot of cabling. For a reasonable IBM BladeCenter, we need 4 x 2 for GE switches (FC cables don’t go in these patch panels). Then 2 x 2 for (Ethernet & KVM) for management modules per chassis = 12 ports for 7U. For our new HP c7000 chassis with basic networking, we have 16 GE ports, 2 GE console ports, 2 OA Ethernet ports, and 2 2 OA serial ports (again, ignoring the fiber-optic GE ports): 22 ports in 10U. I’m sure somewhere HP has demo chassis, filled them with fully-connected GE switch modules: (9 x 8 + 4 = 74 patches) & (4 x 8 = 32 fiber-optic ports) = 106 cables total (not counting power connections — 6 in our case). In 10U — 1/4 of a rack — insane! c7000: 30 ports

Update 2008/2/5: Eric pointed out I was wrong about the ports — the Cisco switches have 8 uplink ports, 4 of which are either fiber-optic or copper (you can see they’re 17-20 in the photo); the other 4 copper ports seem intended for cross-linking to the other switch. So the max copper patch count remains, but the the fiber connections would be instead, rather than in addition, and we may fully connect our 2 switches with only 8 GE uplinks rather than 16 going out of the chassis.

Comments

As a system admin, excitement is generally bad: HVAC Oops!

machine room pictures

So today they cut the wires to our main machine room’s A/C. This occurred as part of the general campus work, which is why we were expecting to be out of our old machine room by now. Alas, the new machine room is not quite ready yet, so our primary systems were in a very warm room. It was a bit uncomfortable working there, although not too bad.

So around 3:30, my bosses (2) came over to ask me what could be shut down; in a perfect world, this would be just stringing a bunch of hostnames together, between “dsh -w” and “shutdown -h now” (for Linux) and “shutdown -y -g0 -i5” (for Solaris), from my desk. Instead I tromped over and started reading labels on servers (many of which were out of date — now updated!), and deciding what we could do without, calling users to ask them which machines could be turned off for a while. We had my boss, boss^2, and boss^3, as well as a bunch of the Plant Ops guys and their boss.

After I’d shut down a dozen or so, they told us the A/C might be back within 15 minutes (hooray!). The first repair didn’t hold (fuse immediately blew), but within 25 minutes we had (partial but insufficient) A/C, and I turned most things back on.

For a while we opened the door to the FDR drive, which cooled the room a bit. I got a few pictures of the drive and of blinkenlights.

Comments

Rockefeller Updates

I stopped posting about the Super-Tent, because not much has changed since we moved in. I did get a bigger desk when Mark left Rockefeller, which matters to me but not much to anyone else. I have continued to take pictures of Rockefeller as the various construction projects proceed, though.

Comments

RU Pictures, May 9th 2007

I took a bunch of pictures at RU today, including some of our DR site being expanded to become our primary machine room. Lots of AC & UPSes going in. I even got my father and Stu (Data Center Manager — he gets an office outside the Super-Tent!) in a couple.

Dad & Stu

Comments

Props to the Network Guys

We have a bunch of 48-port terminal servers (they’re Linux/ssh based, and quite good). Unfortunately, one of ours has a bad Ethernet port (intermittent connections — no good for lights out management!)

Today (Friday), I spent from 4:15 to 5:30 labelling 48 Cat5 cables, replacing the old terminal server (a tight fit!), reconnecting the cables, and testing. It increased my respect for our Network group, as they do this type of thing all the time (although usually with less ports), and scheduling network downtime is much tougher than scheduling console downtime. Lots more people notice. Fortunately, the terminal servers are for our group, and used almost entirely by 4 particular people, so notification and scheduling was easy.

Still, it wasn’t fun. At the end I had a label maker with dead batteries, a whole bunch of garbage from the labels, and grimy fingers, but we regained remote access for the weekend, which was my goal.

Next time I’ll ask a hardware guy to do the cable swapping!

Comments

Going to the beach

So Friday I spent several hours in a machine room, dealing with a mulish array. Having failed to anticipate how much time this would take, I also failed to bring a jacket. We spent most of the time waiting for various people and things.

Back in the day, I used to read Wired to see what new words and phrases they’d come up with to describe geek life.

My own contribution, inspired by the hours spent in our machine room, waiting for a callback/reboot/answer/whatever (both yesterday and on other days):

Going to the beach“: Moving from the “cold aisle” (in front of the equipment, where the keyboards and displays are, and where the air conditioning system dumps chilled air) to the “hot aisle” (where the hot air is exhausted from the backs of the servers, before being sucked back into the A/C system for another cycle).

Example: “I’m going to the beach for a few minutes, while we wait for a callback from someone with a clue why the tool failed.”

Comments