Extra Pepperoni

To content | To menu | To search

Wednesday, July 1 2009

Sun Grid Engine (SGE): Installation Saga

Sun is really ticking me off this week (and last). I am trying to find the least-time installation procedure for Sun Grid Engine (SGE), to test on an Amazon EC2 AMI (Amazon Machine Image). OpenSolaris + SGE AMIs are publicly available, but no Linux + SGE yet.

Just finding the files is amazingly complicated. http://gridengine.sunsource.net/ appears to be the old SGE site -- it doesn't offer 6.3 releases -- but it lacks a pointer to the new site. I thought 6.3 wasn't really available yet, until I remembered seeing a totally different download site, and found it again.

The new SGE site seems to be http://wikis.sun.com/display/GridEngine/Home, which doesn't link back to sunsource either. For extra confusion, http://wikis.sun.com/ hosts 6 different SGE wikis (4 English, 2 Japanese).

I found 4 ways to get SGE electronically (there are also CD media, but for clusters who cares?):

  • CVS source. The CVS tree includes instructions for building, but I found several inaccuracies and problems. I didn't get it built, so I don't know how serious the problems are. Compiling from source isn't really appropriate for AMIs anyway, so I stopped working through the issues.
  • Sun's Download Center offers only the latest release, as zipped tarballs and zipped RPMs.
    • This page should offer links to older releases.
    • It's ridiculous to zip RPMs -- they're compressed already, it seems downright stupid to zip tarballs, as they're explicitly compressed!
    • The 64-bit tarball isn't self-contained -- it also apparently requires the 32-bit tarball, but I had to Google to find this out. Again, I stopped pursuing this option before I got it to work.
  • http://gridengine.sunsource.net/downloads/latest.html (note 'latest') redirects to a download page for 6.2u2_1 -- not the latest.
pepper@teriyaki:~/Downloads$ unzip -Z sge62u3_linux24-i586_rpm.zip 
Archive:  sge62u3_linux24-i586_rpm.zip   28572971 bytes   2 files
-rw-r--r--  2.3 unx 24722508 tx defN 18-Jun-09 05:34 sge6_2u3/sun-sge-bin-linux24-i586-6.2-3.i386.rpm
-rw-r--r--  2.3 unx  4023935 tx defN 18-Jun-09 05:34 sge6_2u3/sun-sge-common-6.2-3.noarch.rpm
2 files, 28746443 bytes uncompressed, 28572553 bytes compressed:  0.6%
pepper@teriyaki:~/Downloads$ unzip -Z sge62u3_linux24-i586_targz.zip 
Archive:  sge62u3_linux24-i586_targz.zip   28699502 bytes   2 files
-rw-r--r--  2.3 unx 24825624 bx defN 18-Jun-09 05:34 sge6_2u3/sge-6_2u3-bin-linux24-i586.tar.gz
-rw-r--r--  2.3 unx  3981994 bx defN 18-Jun-09 05:34 sge6_2u3/sge-6_2u3-common.tar.gz
2 files, 28807618 bytes uncompressed, 28699112 bytes compressed:  0.4%

RPMs should install into the right places and be ready to go with chkconfig, but instead Sun decided to unpack them into /gridengine/sge, which doesn't even follow Sun's /opt convention. Worse, they do not install init scripts, or even provide init scripts suitable for symlinking. Instead the unpacked installer must be run to customize the init script templates. What were they (not?) thinking?!? The inst_sge installer doesn't actually copy any files -- you have to manually copy them to the right place, making the RPM even less useful (the workaround is probably to make /gridengine/sge a symlink to the desired location, assuming rpm will install under a symlink).

At this point, you might say "Wow, documentation is needed to explain this hideously complicated situation!" And you'd be right, but apparently Sun hasn't figured that out. When I went looking for an explanation of this convoluted state of affairs, the best I could find was http://wiki.gridengine.info/wiki/index.php/Main_Page#Is_Grid_Engine_commercial_or_open_source_software.3F, which hints that the split may be symptomatic of a deliberate commercial vs. open source split. If so, Sun's mishandling it amazingly -- these pages do not identify themselves as referring to the open source or commercial flavor, or even acknowledge the existence of an alternate product.

To make sorting this out just that little bit tougher, the binaries (both tarball & RPMs) completely lack documentation -- not even a URL for Sun's online docs. Adding insult to injury, the install docs explain how to unpack a tarball, but don't even acknowledge the existence of the RPMs. Apparently Sun decided that the RPM must be equivalent to the tarball -- it provides the files so you can run Sun's installer -- instead of being a proper RPM, which should fully install the software. This would be obnoxious and shortsighted if I hadn't already noticed that Fedora has an SGE RPM, and Scalable Systems produced an SGE RPM in 2002 -- including full integration with either plain RHEL or Rocks. Apparently Sun doesn't want something that works -- instead they prefer to force people to use their lame installer, which took over 1,500 lines for a basic install!

Wednesday, June 24 2009

DRAC Notes

We have several systems with DRAC (Dell Remote Administration Card) v5. It's inferior to HP's for many reasons, among them the simple fact that dell.com hits aren't available in Google and www.dell.com is horrible for finding useful information. www.hp.com, in contrast, is navigable and their forum answers are well indexed by Google (useful for general Linux info, not just HP-specific stuff).

For flavor, check out my recent DRAC rant, and my older DRAC rant.

More bad things about DRAC 5

  1. Finding things on www.dell.com is amazingly difficult (HP does better on all these sub-issues).
    1. Older versions show up alongside (above) newer versions.
    2. Documentation references are stale (referenced documents don't exist).
    3. Refining searches, and searching for only relevant hits don't work (both HP and IBM can show downloads relevant to a certain Linux distro/version on a particular hardware platform).
    4. Dell's site makes it very difficult to find technical information. For example, compare Dell's R900 specs page to Sun's X4540 specs page.
  2. Google doesn't return results from www.dell.com. This makes finding authoritative info on Dell products more difficult.
  3. Compatibility
    1. Remote KVM doesn't work on Mac (serves ActiveX; uses wrong keymapping -- workarounds include VNC & Parallels).
    2. Remote media doesn't work on Mac (HP's does).
    3. Remote KVM & media don't work on Linux with 64-bit Firefox. They haven't updated these plugins in years, and probably never will -- newer servers use DRAC 6 instead.
  4. Dell sells a lot of cluster systems, but doesn't support cluster toplogies (Dell Gold Server Support assumes a Windows administrative workstation on the cluster segment, and cannot cope with X11 tunneling to access DRAC on nodes).
  5. Dell only supports the operating system purchased from Dell. The word 'CentOS' is verboten, even though Dell's diagnostic tools are CentOS-based! I don't know how many people pay for dozens or hundreds of Red Hat licenses for cluster nodes, but I'm certain it's less than the number of people using free/free distros like CentOS, Rocks, Scientific Linux, or even Fedora. This is also a problem for dual-boot laptops...
  6. Dell's installer (dup) doesn't work. It generally tells me it's already running, even after rebooting. Booting from (virtual) CD for diagnostics wastes my time.
  7. Some of Dell's firmware updaters still use Windows boot floppies. I recently decided that I didn't really need to update that firmware, rather than dig up floppies and find a Windows workstation to make a boot floppy.
  8. Dell doesn't identify parts. Sun is very good about putting meaningful 7-digit numbers on everything. This is very useful. In contrast, I had to grovel through Wikipedia to find out that a Dell card I inherited was in fact a SAS card.
  9. Dell doesn't indicate MAC addresses anywhere. Sun puts them on the packing slip. We had to rack and boot up an R900 in a temporary location to get its MAC addresses -- are required before we can put it on the network, after we re-box and move it.
  10. DRAC's serial console only works on COM2/ttyS1 and at 57600bps

DRAC Trivia & Tips

  1. DRAC cards have what looks like a PCI edge connector, but it's not used in our 1950s or R900 -- instead they connect purely through a couple ribbon cables.
  2. Not all DRAC 5 cards are the same. Specifically, the R900 card has a larger ribbon connector, which goes with a longer cable. In contrast, the 1950 cable is not long enough to reach the R900 connector. I of course brought the wrong card to our machine room, but in that case the longer cable fit the 1950. When I got back, I discovered the other card wouldn't reach in the R900, so had to go back to swap cards. I never did find any identification of which DRAC was for the 1950 vs. the R900.
  3. Supposedly, Alt-F Alt-E should reset the syste BIOS, although it didn't work for me. DRAC has a 'reset to defaults' option, but not the main (Phoenix) BIOS.

Wednesday, May 27 2009

BBEdit Multihost Multifile Comparison

Update 2009/07/02: Although I wrote this script specifically for comparing files, I have found that it also saves a lot of typing for editing individual remote files, cutting 14 characters of syntactic sugar down to 5: bbs<Tab> and a Space.


I often need to reconcile (partially or completely) one or more files on two or more machines. I often to use BBEdit's excellent "Find Differences" to do this, but loading the files into BBEdit can be awkward. I still hope someday Bare Bones will add support for remote files to their bbdiff command, but I'm not holding my breath. Today I whipped up a shell script to make it more convenient:

#!/bin/sh
# bsftp: Edit multiple files from multiple hosts.
# Usage: bsftp "host1 host2 host3" "/etc/passwd /etc/shadow /etc/group"

for file in $2
 do
  for host in $1
   do
    bbedit sftp://root@$host/$file
   done
 done

Tuesday, May 26 2009

anaconda vs. Virtual Disk

Update: chip suggests dmraid -r -D in rescue mode. I haven't tried it -- wiping both disks with dd didn't help, but booting one disk, installing, and then adding the second disk gave me a usable install.


My latest Linux problem is odd, but at least somewhat interesting. I'm installing CentOS 5.3 onto an old PowerEdge 1950 with a couple 750gb SATA disks. My kickstart configuration failed, complaining that the system couldn't repartition sda. I tried hda, with no more luck. Confusingly, I do see hda when booting, but it looks like that's the CD-ROM drive, and anaconda skips it for partitioning.

anaconda: error -- cannot find physical disk

When I ran anaconda (the Red Hat / CentOS installer) manually, it showed me only a single device: /dev/mapper/ddf1_Virtual Disk 0. This is clearly some sort of logical device -- LVM, software RAID, or something like that. I booted into the Dell SAS BIOS and confirmed it wasn't presenting logical devices -- it's not a PERC controller, and doesn't appear to support hardware RAID at all.

anaconda: cannot manage LVM volume

I booted into linux rescue mode, and the system found and mounted /dev/mapper/ddf1_Virtual Disk 0p5 and various associated partitions, but I was able to destroy and recreate the partition tables with fdisk.

linux rescue mode: logical and physical disks

I filed a bug against CentOS (since I haven't tried to reproduce this in RHEL, although I'm pretty sure it's their issue).

Unfortunately, when I booted back into anaconda, the virtual disk remained the only available device. Someone on #centos suggested that this was an LVM volume, and the system was finding LVM superblocks scattered across the disk (so not dependent on the partition table). Next attempt is ongoing right now (writing 12,002,501,984,256 zeroes takes a while):

  • time dd if=/dev/zero of=/dev/sda bs=1G
  • time dd if=/dev/zero of=/dev/sdb bs=1G

Tuesday, May 19 2009

OMFG! Dell requires procmail!

I've been fighting with Dell DRAC for months at this point. Some of their updaters insist that a copy of the updater is already running (even across reboots), and can only be installed by booting from a diagnostic CD. I've seen this with multiple updaters using DUP (Dell Update Packages).

The DRAC vKVM control doesn't work in 64-bit Firefox, and DRAC serves the ActiveX control to Firefox/Mac. When I point out this is a stupid bug, and the non-ActiveX plugin which works on Firefox/Linux should actually work on Macs, I'm told Macs are unsupported and this won't be fixed.

But today takes the cake. I wanted to install the new DRAC firmware on a PowerEdge 1950, so I downloaded Dell's DRAC5 Update Package for Red Hat Linux, v1.45, A00. I run RAC_FRMW_LX_R209365.BIN (well, of course that's v1.45 -- isn't it obvious??), but it requires an X11 DISPLAY. Why? It's not a graphical application. Dunno, just do it.

Okay, I log out, log back in with X11 forwarding, and run it. Dozens of prelink errors (about the same files, repeating) scroll off the screen, and then the xterm closes before I see what it did. Run it again, and it's failing because Dell expects a lockfile binary in my PATH. I don't have one. What does Google think? Workaround: yum install procmail, install, then rpm -e procmail to clean up. What possible excuse could Dell have for requiring (a piece of) procmail on my system to upgrade DRAC?

Then it failed again, but at least that time it spat out a useful error message:

Error while loading shared libraries: libstdc++.so.5:
cannot open shared object file: No such file or directory.
You must install the Linux compatibility libraries.
To install the compatibility libraries, use the following command:
"rpm -ih compat-libstdc++-33-3.2.3-47.3.i386.rpm"

Fortunately I was saved from banging my head against a wall any more -- I found Dell's "Hard-Drive" updater. That too is a Windows executable, but unzip f_drac5v145_A00.exe strips out the useful bit: firmimg.d5. The upload failed from Safari on my Mac, but I got it to work in Firefox/Linux (next time I'll try Firefox/Mac, which should work).

Parting Peeves

Why does Update sit under the "Remote Access" subhead, rather than the "System" heading? It's firmware for the whole DRAC system, not just the Remote Access part. I shouldn't have to hunt for the upgrade page. Adding insult to injury, the (remote) Console & Media tabs are under System, not Remote Access. Does anybody think that makes sense?

DRAC 5 Update page

In a final bit of brilliance, as the updater reached 100%, I got a warning that my session was about to be closed. Next time, guys, make sure you don't break your own firmware updates by timing the user out in the middle, okay?

Thursday, May 14 2009

CrashPlan: Deep Magic

Update 2009/05/15: The second part is not fully automatic. Sam did point his CrashPlan.app at my archives, but I didn't have to do anything on the client side. Still freaky!


Today I swapped backup drives with Sam. I have a Time Capsule (which hasn't failed much recently, to my surprise) for 'local' backups, and CrashPlan is for offsite backup. Sam discovered that CrashPlan is able to back up to a host behind a 'real' firewall. I'm not surprised that CrashPlan can use NAT-PMP to accept inbound backups, but I am surprised it apparently works through non-personal firewalls without manual configuration.

We believe Code42 must be running a heavy-duty proxying service, where each CrashPlan client connects to their servers, and their servers accept all the peer-to-peer backup traffic and dispatch it back down that 'inbound' connection. This is part of what makes IM work, but the traffic volume per user for CrashPlan is much higher than for something like AIM or Back to My Mac (screen sharing or individual file transfers). I hope this doesn't mean the software breaks if their server farm shuts down or is retired, and that they don't decide the intermediary service is too expensive (neither Sam nor I is paying for the service -- just for the CrashPlan+ licenses). If they really are retransmitting all p2p backups, code42 is passing double the non-local traffic of all their p2p users put together. They'd have to accept each byte from the client, then send it to the server. Backups across the local network don't need this. Perhaps CrashPlan is smart enough to only do this proxying trick for backup 'servers' behind uncooperative firewalls...

But tonight we got an even bigger surprise. When Sam plugged my 1tb drive into his Mac, both my Linux server and MBP immediately started backing up to the drive through Sam's computer. Note that I had not configured my systems to use his Mac. Apparently when he plugged the drive in, his Mac automatically registered the CrashPlan-created volume 'serial number' with the CrashPlan servers, and they automatically connected my clients to that drive (or perhaps it's all done via client serial numbers rather than volume numbers -- I'm just speculating here). Suddenly I had a new friend, 'Sanford' show up in both my CrashPlan clients. Freaky-deaky!

Tuesday, May 12 2009

Bad Solaris 10 documentation: boot-adm recovery

Sun's How to Manually Update the Boot Archive on a RAID-1 (Mirror) Volume procedure says to find the root slices from the console messages -- which note md devices discovered during boot -- and embed these into /etc/vfstab to fix the boot environment and temporarily disable mirroring. Unfortunately, this guidance is incomplete and incorrect.

The first problem is that Sun's documentation instructs you to boot from 'the primary submirror.' But of course it might be corrupt (something scrambled the boot archive, after all). This week, one of our submirrors for / and both submirrors for /var showed errors under fsck -n. c4t0d0s0 (the default boot device) had problems which prevented the system making it all the way to normal multiuser state. c4t0d0s3 had moderate corruption, while c4t4d0s0 had minor corruption. Fixing /var/ was tricky, because Sun does not have any documentation which I could find on how to recover good data from one submirror onto one with bad data. They assume the only failure mode is a dead disk, and disk replacement is simple. The undocumented trick is that 'Submirror 0' is authoritative for resync operations.

root@jean:/# metastat d3|head -9
d3: Mirror
    Submirror 0: d23
      State: Okay         
    Submirror 1: d13
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8385930 blocks (4.0 GB)

Problem #2: fixing / is harder, because the procedure for booting from a different disk is totally obscure and non-standard. Basically, you must edit /boot/solaris/bootenv.rc, which overrides /boot/grub/menu.lst. I don't know why Sun apparently created a brand-new findroot command for grub, but doesn't actually run Solaris from the disk it specifies. At a guess, it stems from Sun's dissatisfaction with the way Linux & GRUB deal with the horrible multi-stage boot procedure required on x86 PCs. bootadm(1M) says it updates the 'boot archive', but not what a boot archive actuall is, and that it also updates the GRUB configuration, but our menu.lst hasn't actually been updated since I installed the system.

root@jean:/# grep bootpath /boot/solaris/bootenv.rc 
setprop bootpath /pci@1,0/pci1022,7458@4/pci11ab,11ab@1/disk@0,0:a
root@jean:/# grep -v \# /boot/grub/menu.lst 
default 0
timeout 10
splashimage /boot/grub/splash.xpm.gz
title Solaris 10 10/08 s10x_u6wos_07b X86
findroot (rootfs0,0,a)
kernel /platform/i86pc/multiboot
module /platform/i86pc/boot_archive
title Solaris failsafe
findroot (rootfs0,0,a)
kernel /boot/multiboot kernel/unix -s
module /boot/x86.miniroot-safe

If you find yourself in single-user mode with a root device like /pci@1,0/pci1022,7458@4/pci11ab,11ab@1/disk@0,0:a, rather something more normal like /dev/md/dsk/d0 or /dev/dsk/c4t0d0s0, it probably means Solaris is running from a device which it cannot correlate back to a valid boot device, although you can do this manually by examining the slice symlinks in /dev/dsk/:

root@jean:/# ls -l /dev/dsk/c0t4d0s0 
lrwxrwxrwx   1 root     root          62 Dec 30 14:13 /dev/dsk/c0t4d0s0 -> ../..
/devices/pci@0,0/pci1022,7458@1/pci11ab,11ab@1/disk@4,0:a

The final major problem is that disk device paths are not stable on the X4500. Sun's instructions are to find the disk path to the root submirror from console messages (in my case, they referred to /dev/dsk/c3t0d0s0 & /dev/dsk/c3t0d0s0) and use one of these in /etc/vfstab rather than the metadevice, but when I actually booted into Solaris, those slices didn't exist, because the bootable disks were at c4 rather than c3. I had to boot back into GRUB's Failsafe mode and correct the device for / in vfstab. Sun's documentation is fine for machines with consistent disk paths, but wrong for the X4500 (and presumably the X4540 as well).

Additionally, I worried that bootadm might read c4 from the vfstab file and write to c3 (a ZFS pool disk it must not modify!), or something similarly screwy, but this turned out to be a non-issue once I'd sorted out the rest.

Tip: bootadm apparently normally caches its changes and writes them to disk when rebooting; to force bootadm to write changes immediately, add the undocumented -f flag, e.g.: bootadm update-archive -fR /a

bootadm(1M), of course, doesn't provide any useful detail on what it does.

Solaris 10: SUNWlwact errors

Today I noticed a cascade of tictimed errors on the console of a Solaris 10/x86 server:

root@jean:/export/home/pepper# grep tictimed /var/adm/messages | grep May\ 12 | head
May 12 11:24:02 jean tictimed[1111]: [ID 921880 user.error] [tictimed]: XML file corruption detected!
May 12 11:29:10 jean tictimed[1111]: [ID 921880 user.error] [tictimed]: XML file corruption detected!
May 12 11:29:11 jean tictimed[1111]: [ID 423602 user.error] [tictimed]: stopping on SIGTERM or SIGPWR.
May 12 11:33:01 jean tictimed[1143]: [ID 921880 user.error] [tictimed]: XML file corruption detected!
May 12 11:36:38 jean tictimed[1143]: [ID 423602 user.error] [tictimed]: stopping on SIGTERM or SIGPWR.
May 12 11:40:23 jean tictimed[1143]: [ID 921880 user.error] [tictimed]: XML file corruption detected!
May 12 11:47:34 jean tictimed[1143]: [ID 921880 user.error] [tictimed]: XML file corruption detected!
May 12 11:53:44 jean tictimed[1143]: [ID 921880 user.error] [tictimed]: XML file corruption detected!
May 12 12:00:55 jean tictimed[1143]: [ID 921880 user.error] [tictimed]: XML file corruption detected!
May 12 12:07:05 jean tictimed[1143]: [ID 921880 user.error] [tictimed]: XML file corruption detected!

I eventually discovered the errors were caused by v3.2 of the Sun Services Tools Bundle, specifically Light Weight Availability Collection Tool v3.0, and removing SUNWlwact. I upgraded to STB v5.0, but the cascading XML errors returned, so Sun hasn't fixed the issue.

Tuesday, May 5 2009

Automatic Network Optimization with MarcoPolo

The Problem: Ethernet >>> AirPort

For whatever reason, my MacBook Pro doesn't get very good network performance over 802.11n AirPort. Since I routinely copy everything from half-hour MPEG videos (hundreds of megabytes) through full DVDs (several gigabytes) between it and my Linux server, I much prefer gigabit Ethernet.

It turns out Apple's AFP & SMB clients are smart enough to seamlessly migrate a network connection from one transport to another -- if you disconnect from Ethernet but AirPort remains up, Mac OS X will reconnect to the file server via AirPort. Contrawise, if Ethernet is connected when AirPort goes offline, the connection will switch back. Note that I haven't tested with Ethernet & AirPort in different subnets.

This means once I have reconnected the Ethernet cable, I have 2 ways to switch my SMB connection back: I can unmount all shares from that server and then remount, or I can bring down the AirPort connection and force the OS to migrate my connections over. I prefer not to bring down AirPort, because that breaks any other open AirPort connections, which might not reconnect (iChat reconnects; Safari downloads just fail), but I do this sometimes when a program is using the share and preventing the umount from succeeding.

For over a year, I've been using a shell alias to handle the disconnect/reconnect -- below is the final version. The grep was so I could confirm it worked -- if I see prowlere (Ethernet interface), I know I have a fast connection. If I see prowler instead, I still have a slow AirPort connection -- most likely because an open file prevented the umount.

alias remount='umount /Volumes/inspector; umount /Volumes/dvd; umount /Volumes/1tb; open smb://pepper@inspectore.rep.dom/inspector/; open /Volumes/inspector/home/pepper/tivo/; sleep 10; netstat -a | grep \.micro | grep ESTABLISHED'

The Improvement: Automation

Back in the Jaguar era, I had a script to run Plucker automatically. I had my laptop configured to run it both via cron and whenever I (dis)connected my Ethernet cable, but when the network trigger broke I just refined my crontab to compensate and forgot about it. In Leopard, launchd handles tasks like this, but it doesn't offer network triggers.

I wanted to recreate the automatic trigger on network reconnection, but wasn't sure how to do it. crankd could probably do the trick, but I don't know Python. Fortunately Jeremy Reichman pointed me to MarcoPolo, which fits the bill admirably.

I was initially confused by the fact that MarcoPolo automatically copied the system's Network 'Locations', which I don't want to change. Fortunately MarcoPolo is happy to work with its own 'Contexts', and leave the system Location untouched. I created a couple contexts, 'en0 online' and 'en0 offline', told MarcoPolo to use IP and NetworkLink as "Evidence Sources", and configured it to switch to 'en0 online' when the 'prowlere' IP comes online. On the other hand, "en0 (Ethernet) link inactive" switches to 'en0 offline'. This way if I connect to an outside network and get a different Ethernet IP, it won't try to connect to my home server. MarcoPolo can run shell scripts, so I converted my alias to a script, which runs on switching to 'en0 online':

pepper@prowler:~$ cat bin/remount 
#!/bin/sh
# remount: Reconnect to inspectore, hopefully via Ethernet rather than AirPort

umount /Volumes/inspector
umount /Volumes/dvd
umount /Volumes/1tb
open smb://pepper@inspectore.rep.dom/inspector/
sleep 10
netstat -a | grep micro | grep ESTABLISHED | awk '{print $4, $5}' | growlnotify -w

Future possibilities include launching iPhoto when I connect a camera memory card but not when I plug in the iPhone (I already had 37signals' workaround, but MarcoPolo would have been a simpler option), or switching iTunes to the upstairs speakers when I connect to the upstairs monitor (iTunes' output device is not currently scriptable, so I filed an RFE). Unfortunately, since these triggers and actions are all orthogonal to each other, I'd end up multiplying contexts to accommodate them, which is suboptimal.

David, thanks for MarcoPolo!

Sunday, April 12 2009

I love this irony: Sun ILOM is based on Linux

Sun and Microsoft are about the only 2 large companies based on the proposition that Linux isn't the best operating system. IBM supports several OSes, but they strongly support Linux for most applications. HP & Dell are happy to sell hardware to run anything a customer wants to pay for, although they are both Windows-biased (and Dell continues to have serious trouble with Linux).

That's why I was so surprised and amused to discover that ILOM, Sun's Integrated Lights-Out Management system which is used to manage Sun's current x86 servers, is running Linux. So Sun is using Linux to make Solaris systems more reliable. I found a reference to Linux underpinning ILOM a few weeks ago, and still chuckle every time I think of it. I had a better reference, but cannot find it today.

http://docs.sun.com/source/820-0048-17/sp.html

That said, this was probably the right choice. Nobody's going to build a tiny system management system around Solaris, and rebuilding one and coping with the inevitable bugs in such a constrained and important system would have been a huge waste.

Friday, March 20 2009

Cisco Port Security

I just spent a while learning (the hard way) about Cisco Port Security, so here's what I got out of it.

Cisco Port Security, when enabled, keeps a whitelist of allowed MAC addresses for each port; the list may hold 1 or more entries, and might be statically specified by an administrator or automatically 'learned' by the switch.

Intuitively, you might expect something called "port security" to prevent unapproved hosts from sniffing traffic, as this is the most serious risk. The reality, however, is that the switch has no way of knowing who's listening to a port, only of knowing who's transmitting individual packets, because each packet is 'signed' with the transmitter's MAC address. With PS enabled, the switch silently drops packets from unapproved MACs, preventing them from stealing bandwidth or actively attacking the network. These are useful, but data and credential sniffing are more serious risks, and are not addressed.

Complications

  1. Front-line support may have limited knowledge of security protocols and mechanisms, and is unlikely to have access to directly check the list of banned MACs/ports. Cisco provides a mechanism for alerting, but that does not mean notifications have been configured to reach all the concerned parties.
  2. Sniffing to determine the situation will show the host generating outbound traffic, but will not reflect that these packets are not being forwarded to any other hosts.
  3. Depending on situation, even after PS activates, there may be residual traffic for the blocked device. This can look like responses to current traffic, and mask the fact that the muzzled node is in fact mute.
  4. Using DHCP seems to avoid some of this. I found that using DHCP got an address and enabled full communications. I don't know why.

As Murphy would have it, I had just opened up the server chassis and enabled ipf before PS tripped, so I spent a lot of time looking for a (missing) misconfiguration.

See Also

Tuesday, March 17 2009

More Sun Grief

I'm reinstalling Solaris 10 on a Thumper which came (last year) with a 2-year-old version of Solaris 10. Wanting to protect myself, I removed HDD0 (primary boot), and replaced it with HDD47 (part of the ZFS pool, which I will have to recreate because ZFS does not allow removing or changing RAID levels in a pool).

Unfortunately, when I boot the system this way, it goes through the BIOS screens and just sits there -- apparently it cannot get past the lack of a boot block on HDD0. So much for redundancy or failover! Perhaps it would have worked if I'd left bay 0 empty, but that's an unlikely scenario.

I mounted the S10U6x86 ISO from a Windows VM under Parallels (sadly, Sun's Java Remote Console app only supports virtual media on Windows -- not Mac OS X, although the console control works fine on the Mac), and ran through the installer -- which takes about an hour, because it's so amazingly slow. I booted the machine 2 blocks away, walked back to my desk, and was in time to see it probing Ethernet devices...

Anyway, I accepted the default fdisk partitioning (everything in partition 1), and set up a Solaris VToC within partition 1 (as the Solaris installer defaults, and as our other Thumper is configured), but the installer failed with a confusing error:

ERROR: The '/' slice extends beyond HBA cylinder 1023
WARNING: Change the system's BIOS default boot device for hands-off rebooting.

Note that I copied the Solaris VToC from our other Thumper which is running this way right now. The fdisk configuration is simply the default. Thinking I'd done something wrong, I rebooted (there's really no alternative at that point) and reran the installer -- another hour gone. At the end of the whole process, I got the error again. Fortunately, I found a blog post that mentions that error. Apparently the Solaris installer cannot actually create the fdisk partition which it suggests!! I put the disks back as they were, ran the installer a third time, and Solaris is installing right now -- so it can install into that fdisk layout, but it cannot partition the disk that way. FAIL.


See also: * Sun * Solaris

Thursday, March 12 2009

iTunes 8.1 Is Smarter about Video Compatibility

Amy and I watch video on our Apple TV, and I watch on my iPhone. I rip to filenames ending in .AppleTV.mp4 (for us to watch together) and .iPhone.mp4.

Under iPhone: Videos: Sync movies, iTunes 8.1 now greys out incompatible videos. This is a significant improvement, and makes selecting videos for the iPhone easier & quicker.

Thanks, Apple.

Thursday, March 5 2009

Reading: An Embarassment of Options

As a rule, I read one book at a time. Or perhaps I should say this has been my rule, as it appears to have broken down completely.

For the past several years, I've carried a paperback book which fits in a pocket, and a PDA (Treo with Plucker, now iPhone) with news. I rotate between catching up on Twitter/NetNewsWire/Instapaper Pro/, watching movies and TV shows, and reading, along with the occasional game. My tendency has been to avoid the next book for a while, as I catch up with other things, then as I get near the end, I focus more on finishing the book.

Thanks to Goodreads, I realized that I am now reading 7 books.

  1. I'm reading the Mabinogion from Project Gutenberg on the iPhone.
  2. I just started Weber's By Schism Rent Asunder in the new Kindle for iPhone app.
  3. I'm halfway through reading Moore's Watchmen for about the 4th time. I read this at home, at night -- my compilation is big and I don't want to trash it.
  4. I'm reading DiMassa's The Complete Hothead Paisan at home (too big to carry)
  5. I have been reading Singh's Mac OS X Internals: A Systems Approach at home for over a year (much too big to carry).
  6. When I remember it, I read Gerrold's The Middle of Nowhere at lunch, when I remember to bring it.
  7. Card's Empire is in my jacket, unstarted.

I hope I can continue to keep them all straight!

As someone who doesn't read hardcovers due to cost and size, cannot generally fit trade paperbacks in my pockets, and struggles to find new books to read, Kindle ebooks on the iPhone are a big win. My main reason to carry a paperback is now for when the iPhone battery is dead.


Kindle.app Flaws

  • Paging is awkward. I have to swipe left to advance. This quickly becomes tiresome -- tapping on the top or bottom, or perhaps the right side, should advance. Each book requires hundreds or thousands of page turns, so they should be as quick and efficient as possible.
  • There's no way to get a page number. This makes references & citations problematic.
  • I cannot see a table of contents. This may be Tor's fault, as Kindle.app has a built-in ToC bookmark, but there's nothing on that page -- there's also no cover for either of the books I bought, which is frankly lame. When I tap the page, it shows me the "Location" I'm currently at. If I tap the bookmarks button and then the "Location..." button, I get a field to jump to an arbitrary location, and the app shows the final location number.
  • I cannot see what chapter I'm in. I don't know if the Kindle has provisions for this which Tor hasn't taken advantage of, or if the Kindle simply lacks support for chapters.
  • There's no landscape mode or zoom. By Schism Rent Asunder starts with a map and geography is important to the story. Unfortunately, it's far too small to read. Lame.

Update: May 20, 2009

I occasionally read print volumes, but I mostly read in Kindle.app.

To my delight, Amazon just released v1.1 of Kindle.app, fixing my main complaints:

  1. Tap to page
  2. Zoom
  3. Landscape mode is now available

Next I hope they will provide a progress meter, a la the gauge at the bottom of Stanza windows.

Monday, February 23 2009

LVM Setup (on SATABeast)

Update 2009/03/10: It looks like mke2fs is smart enough to automatically select the 4k blocksize, and largefiles4 is not necessary (which is good, as it was interfering with our backups).


We compared performance between 10-disk and 20-disk RAID6 sets on a SATABeast, and discovered the performance difference is not significant, so we chose the most efficient reasonable layout: 2 20-disk RAID6 sets, each containing a single volume the same size. These appear to the Linux host as a couple 16.37tibyte LUNs. We're using device mapper multipathing to provide fault tolerance across both FC paths (in Nexsan's recommended "All Paths All LUNs" mode, each LUN is available via both controllers). This is all handled (except the performance testing) via the SATABeast administration interfaces.

Within Linux, we create 2 LVM logical volumes of just under 8tibyte (the largest ext3 can handle), and a third with the leftover 384gibyte, from each LUN.

The SATABeast lets the host see and use the volumes while it's still generating parity on the underlying RAID arrays ("Online Creation"), but creating file systems is much slower during this process.

A fully configured SATABeast contains 42 1,000,137,687,040-byte ("1 terabyte") drives. They reserve 2 for spares, so we have 40 disks to work with. Nexsan suggests 4 10-disk RAID sets, but RAID 6 allocates 2 disks per RAID set to parity, so with 4 10-disk sets we would 'waste' 10 disks, and our usable space would be 4 volumes, each 8 usable disks = 8,001,101,496,320 bytes / 1024^4 * 4 RAID sets = 29tibyte usable. 29tibyte is a lot, but only 70% of the specified "42tbyte", so we'd really like to be more space efficient -- we have 2 hot spare and 4 p

I set up /etc/fstab earlier, using ext3 labels.


How to create an ~~8tibyte ext3 filesystem on a large multipath raw volume.

  1. pvcreate /dev/mapper/mpath3
  2. vgcreate satabeast1vg /dev/mapper/mpath3
  3. vgdisplay # Note number of usable extents (PE).
  4. lvcreate -n satabeast1lv --extents 2096128 satabeast1vg # Use number from above. # It least close to the largest valid ext3 volume size.
  5. mkfs.ext3 -L/satabeast1a -Tlargefile4 -b4096 /dev/satabeast1vg/satabeast1a # If the SATABeast is still calculating parity, this takes a while. Go get some food...
  6. vi /etc/fstab ## I use something like LABEL=/satabeast1 /satabeast1 ext3 defaults 0 0
  7. mount /dev/satabeast1vg/satabeast1lv /satabeast1
  8. df -h /satabeast1

Here's my transcript of creating an LVM volume with two not-quite-8tibyte and one 348gb filesystems on a 16.37tibyte LUN.

[root@norimaki device-mapper-multipath-0.4.7]# pvcreate /dev/mapper/mpath3 
  Physical volume "/dev/mapper/mpath3" successfully created
[root@norimaki device-mapper-multipath-0.4.7]# vgcreate noribeast0vg /dev/mapper/mpath3 
  Volume group "noribeast0vg" successfully created
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0lv --extents 2096128 noribeast0vg
  Logical volume "noribeast0lv" created
  Logical volume "noribeast0lv" successfully removed
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0a --extents 2096128 noribeast0vg
  Logical volume "noribeast0a" created
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0b --extents 2096128 noribeast0vg
  Logical volume "noribeast0b" created
[root@norimaki device-mapper-multipath-0.4.7]# vgdisplay
  --- Volume group ---
  VG Name               noribeast0vg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               16.37 TB
  PE Size               4.00 MB
  Total PE              4292125
  Alloc PE / Size       4192256 / 15.99 TB
  Free  PE / Size       99869 / 390.11 GB
  VG UUID               QLdntg-9ccY-0HYe-DnWI-Lxxu-Huzj-S3bExH

[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0c --extents 99869 noribeast0vg
  Logical volume "noribeast0c" created
[root@norimaki device-mapper-multipath-0.4.7]# mkfs.ext3 -L/noribeast0a -Tlargefile -b4096 /dev/noribeast0vg/noribeast0a
mke2fs 1.39 (29-May-2006)
Filesystem label=/noribeast0a
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
8384512 inodes, 2146435072 blocks
107321753 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
65504 block groups
32768 blocks per group, 32768 fragments per group
128 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000, 214990848, 512000000, 550731776, 644972544, 1934917632

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
[root@norimaki device-mapper-multipath-0.4.7]# time mkfs.ext3 -L/noribeast0b -Tlargefile -b4096 /dev/noribeast0vg/noribeast0b
mke2fs 1.39 (29-May-2006)
Filesystem label=/noribeast0b
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
8384512 inodes, 2146435072 blocks
107321753 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
65504 block groups
32768 blocks per group, 32768 fragments per group
128 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000, 214990848, 512000000, 550731776, 644972544, 1934917632

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 23 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

real    9m52.451s
user    0m0.657s
sys 0m3.660s
[root@norimaki device-mapper-multipath-0.4.7]# time mkfs.ext3 -L/noribeast0c /dev/noribeast0vg/noribeast0c 
mke2fs 1.39 (29-May-2006)
Filesystem label=/noribeast0c
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
51134464 inodes, 102265856 blocks
5113292 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
3121 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 27 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

real    1m45.683s
user    0m0.172s
sys 0m10.150s
[root@norimaki device-mapper-multipath-0.4.7]# df -h |grep nori
/dev/mapper/noribeast0vg-noribeast0a
                      8.0T  175M  7.6T   1% /noribeast0a
/dev/mapper/noribeast0vg-noribeast0b
                      8.0T  175M  7.6T   1% /noribeast0b
/dev/mapper/noribeast0vg-noribeast0c
                      384G  195M  365G   1% /noribeast0c

SATABeast, ext4, & CentOS

We are setting up a SATABeast with 42 1tb disks. With 2 hot spares, this is "40tb" of raw storage. With a couple 20-disk RAID6 sets, it's 18,002,478,366,720 bytes = 16,766gibytes. Problem: RHEL/CentOS 5.1 & 5.2's ext3 filesystem supports slightly less than 8tibytes. RHEL 5.3 includes ext4dev, meaning Red Hat still considers ext4 unstable. CentOS is working on 5.3, but doesn't seem close to release.

The SATABeast is very heavy (151 lbs with dual controllers and 42 drives), so naturally you put it at the bottom of the rack. But there are a couple problems:

  1. All the health & status indicators are on the bottom of the bezel, so you cannot see them without a mirror on the floor.
  2. The drives aren't hot swap. It's even worse than the X4500, with its dire warnings about leaving the cover off for over 60sec to replace a drive -- to get a drive out of the SATABeast, we have to unmount and shut it down, remove two screws and the front bezel, and then remove the failed drive -- they provide a special tool to lift the drive out!
  3. The SATABeast has a serial console (in our dual-controller unit, it's the lower serial port on Controller 0 -- the upper serial port on Controller 1 is inactive). Since we installed it last week, the serial port console has failed 3 times, and I have had to reboot to get it back. Yes, I upgraded the firmware.
  4. The firmware is not available for download -- you have to ask support to email it.
  5. Nexsan provides a Mac GUI tool for managing SATABeasts, but it only works via autodiscovery -- it cannot see our SATABeast (in another subnet), and there's no way to specify the IP. Lame.

On the other hand, the GUI is generally well designed (although a bit overly complicated -- there are 11 left-hand menu items, and buttons within the main UI tend to jump into the wrong top-level item, which makes operations such as changing LUN mappings harder than they should be). Also, it's pathetic that Nexsan charges $250 for an SSL certificate, and doesn't let you BYOC.

But it looks like we don't really need RHEL 5.3's ext4 support anyway:

NOTE: Although very large fileystems are on ext4's feature list, current e2fsprogs currently still limits the filesystem size to 2^32 blocks (16T for a 4k block filesystem). Filesystems larger than 16T is one of the very next high-priority features to complete for ext4.

Sunday, February 8 2009

NY Comic Con 2009

James and I toured Comic Con today. James enjoyed being there as a civilian, rather than a reporter/interviewer or a booth bimbo, and I kept noticing how much more interesting-looking the crowd was than a Macworld Expo, PC Expo, or Linux Expo crowd.

I bought Ultimate Spider-Man 1 (to read to Julia, although perhaps not yet) and The Graveyard Book, which I had to read after reading Gaiman's award post. I got both from the Comc Book Legal Defense Fund, where Sharon & James both helped out.

Additionally, I got some sketches for James, Sharon, Julia, & myself (Iron Man for me; Iron Man in color for Julia). I got an R/C Supreme Dalek for Amy & myself. Amy was pleasantly delighted (as was Julia).

For a bonus treat, I saw Kirsten Cappy, whom I hadn't seen since Wheaton. From her, I bought the first two Capt'n Eli books, which Julia should be ready for.

I even got a few pictures.

Now I'm off to read Ultimate Spider-Man 1 to see if Julia's ready for it. The work of a Dad is never done. ;)

Thursday, February 5 2009

iPhone Beta

Thanks to Peter N Lewis, I tried out a new iPhone game today. My first iPhone beta! It's fun, although somewhat minimalist. Looking forward to the release. And I found a few bugs, so it was successful testing.

http://www.stairways.com/iphone/aragom

iPhone Apps: 2009/01 Recommendations

I was listing out suggested iPhone/touch apps for a friend (several people have asked for recommendations), and decided to post my current list of suggestions. Note that the Lite apps are also available in Pro versions -- I purchased several after liking the Lite versions. Numbered items are roughly in order of importance. There are several apps I downloaded them because they are interesting, even if I haven't really used them yet. All except the last version are available in free versions, although I have paid to upgrade 3 of my 6 top items.

See also my 2008/08 list.

Highly Recommended

  1. NetNewsWire
  2. Twitteriffic (Lite)
  3. Instapaper (Lite)
  4. Remote
  5. Wikipanion (Lite)
  6. Shazam

Mildly Recommended

  1. 1Password (requires 1Password/Mac)
  2. Facebook
  3. Amazon Mobile
  4. Showtimes
  5. Lightsaber Unleashed
  6. Google Earth
  7. Yelp
  8. Stanza
  9. Darkslide
  10. iHandy Level Free
  11. Rain Stick
  12. Scribble
  13. Shakespeare
  14. White Noise

Games

  • Space DEADBEEF
  • Labyrinth Lite
  • Sol Free
  • Tap Tap Revenge
  • Crossword Light

Interesting

  • AroundMe
  • iWant
  • iTalk Recorder
  • Joost

Worth purchasing.

  • Classics
  • Toy Bot Diaries (1-3)

Monday, February 2 2009

iChat: AIM Multiple Login Conflict & Automated Login/Logout

iChat has a bug whereby it won't let me be logged into the same .Mac account on multiple Macs at one time. There's an option to allow this, but it doesn't work. Apple isn't fixing it, and I'm sick of the un-blockable AOL error chats that come up each time I switch without logging out on the other end first.

Additionally, it's confused at least one person to have me logged into iChat at work, even though I wasn't seeing chats after I left.

Fortunately in Leopard this is easy to fix. I tried in Tiger but never got it to work right -- probably because I don't really grok AppleScript. I have a script that logs me in, and one that logs me out. On my work Mac, I run them to log in shortly before I get to work, and out after I leave. At home, I log in awhile after the work system logged out, and log out before the work system logs in.

pepper@prowler:~$ crontab -l |grep ichat
0   8   *   *   1-5 /usr/bin/osascript ~/bin/ichat-logout.txt
30  18  *   *   1-5 /usr/bin/osascript ~/bin/ichat-login.txt
pepper@prowler:~$ cat bin/ichat-login.txt 
tell application "iChat"
    log in
end tell
pepper@prowler:~$ cat bin/ichat-logout.txt 
tell application "iChat"
    log out
end tell

- page 1 of 14