We have a couple X4500 "Thumper" servers here. An X4500 is basically a standard dual-Opteron chassis, with 48 SATA drive bays added on. This provides a raw capacity of 48tb with 1tb drives, or over 30tb usable in a realistic RAID environment. I love the informative labels on the outside (I haven't yet popped the cover to see the internals):

  1. If you open the cover, you must close it within 60 seconds to avoid frying components.
  2. If you lift the chassis, you will bend the sheet metal. I am told after removing all 48 drives and both power supplies, it's still difficult for one person to lift the empty chassis.

Solaris cannot yet boot from RAIDZ, so it's common to install the OS onto a single disk, mirror it with Sun Volume Manager (DiskSuite), and then use the 46 remaining disks for RAID and hot spares. RAIDZ works better with < 10 drives in a stripe, and can be used with single parity (RAIDZ1) or double (RAIDZ2).

A reasonable configuration would be 2 hot spares, 4 9-disk RAIDZ sets, and an 8-disk RAIDZ set -- all RAIDZ1 or RAIDZ2, because mixing different types of drives of protection levels in a single ZFS pool is not recommended. A higher performance configuration would use smaller RAIDZ sets, or even ZFS mirroring. 4 9-disk and 1 8-disk RAIDZ1 stripes provide 4 * 8 + 7 = 39 drives of capacity. With RAIDZ2 it's 4 * 7 + 6 = 34 drives usable. Of course, '1tb' drives don't provide a real 1tb usable, because drive manufacturers use base 10 and operating systems use base 2. The formatted capacity of our '1tb' drives is 931gb, so we would max out at 39 * 931gb = 36tb usable. Impressive, but not quite the 48tb advertised.

My current problem: Only 2 of the 48 disks are bootable, and their naming within Solaris is not consistent. In the list below, 46 disks report as ATA-HITACHI, but c4t0 & c4t4 report as DEFAULT. I'm pretty sure I should just mirror onto c4t4, but not enough to proceed with putting the server into production without verification. I've found docs across the Internet that refer to c5t0 & c5t4, as well as c6t0 & c6t4, so it's not as simple as it really should be.

"Which disks can I boot from?" is a fundamental question for a system that cannot boot from any attached disk, but when I called 800-USA-4-Sun yesterday, I explained my issue to a very nice gentleman who told me to pick the first two disks from format output. That's apparently wrong for an X4500, and after I explained the issue further he started finding and reading the same docs I had been. A particular favorite is "Upgrading to ILOM 2.0.2.5 Changes Controller IDs" in the Sun Fire X4500 Server Product Notes, which refers to a hd command not present on my fresh Solaris 10 10/08 installation or the pre-installed Solaris 10 11/06 that came on the other Thumper. I also found Important Solaris OS Installation and Bootable Hard Disk Drive Guidelines in the Product Notes, which says to use the first two disks returned by cfgadm -al -- c0t0 & c0t1 on this system. That's not right, although the instructions are for use within the Solaris installer.

Today I called back, and was eventually told there is only one person in who would have the answer to my question, but he's busy on another call!

This makes the "okay-to-remove" LEDs on the drives essential -- with a RAIDZ1 set, if one drive goes bad and I accidentally remove a good drive in the same set, at best the whole pool will go offline (and no data will be lost).

bash-3.00# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0