Sun's How to Manually Update the Boot Archive on a RAID-1 (Mirror) Volume procedure says to find the root slices from the console messages -- which note md devices discovered during boot -- and embed these into /etc/vfstab to fix the boot environment and temporarily disable mirroring. Unfortunately, this guidance is incomplete and incorrect.

The first problem is that Sun's documentation instructs you to boot from 'the primary submirror.' But of course it might be corrupt (something scrambled the boot archive, after all). This week, one of our submirrors for / and both submirrors for /var showed errors under fsck -n. c4t0d0s0 (the default boot device) had problems which prevented the system making it all the way to normal multiuser state. c4t0d0s3 had moderate corruption, while c4t4d0s0 had minor corruption. Fixing /var/ was tricky, because Sun does not have any documentation which I could find on how to recover good data from one submirror onto one with bad data. They assume the only failure mode is a dead disk, and disk replacement is simple. The undocumented trick is that 'Submirror 0' is authoritative for resync operations.

root@jean:/# metastat d3|head -9
d3: Mirror
    Submirror 0: d23
      State: Okay         
    Submirror 1: d13
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8385930 blocks (4.0 GB)

Problem #2: fixing / is harder, because the procedure for booting from a different disk is totally obscure and non-standard. Basically, you must edit /boot/solaris/bootenv.rc, which overrides /boot/grub/menu.lst. I don't know why Sun apparently created a brand-new findroot command for grub, but doesn't actually run Solaris from the disk it specifies. At a guess, it stems from Sun's dissatisfaction with the way Linux & GRUB deal with the horrible multi-stage boot procedure required on x86 PCs. bootadm(1M) says it updates the 'boot archive', but not what a boot archive actuall is, and that it also updates the GRUB configuration, but our menu.lst hasn't actually been updated since I installed the system.

root@jean:/# grep bootpath /boot/solaris/bootenv.rc 
setprop bootpath /pci@1,0/pci1022,7458@4/pci11ab,11ab@1/disk@0,0:a
root@jean:/# grep -v \# /boot/grub/menu.lst 
default 0
timeout 10
splashimage /boot/grub/splash.xpm.gz
title Solaris 10 10/08 s10x_u6wos_07b X86
findroot (rootfs0,0,a)
kernel /platform/i86pc/multiboot
module /platform/i86pc/boot_archive
title Solaris failsafe
findroot (rootfs0,0,a)
kernel /boot/multiboot kernel/unix -s
module /boot/x86.miniroot-safe

If you find yourself in single-user mode with a root device like /pci@1,0/pci1022,7458@4/pci11ab,11ab@1/disk@0,0:a, rather something more normal like /dev/md/dsk/d0 or /dev/dsk/c4t0d0s0, it probably means Solaris is running from a device which it cannot correlate back to a valid boot device, although you can do this manually by examining the slice symlinks in /dev/dsk/:

root@jean:/# ls -l /dev/dsk/c0t4d0s0 
lrwxrwxrwx   1 root     root          62 Dec 30 14:13 /dev/dsk/c0t4d0s0 -> ../..
/devices/pci@0,0/pci1022,7458@1/pci11ab,11ab@1/disk@4,0:a

The final major problem is that disk device paths are not stable on the X4500. Sun's instructions are to find the disk path to the root submirror from console messages (in my case, they referred to /dev/dsk/c3t0d0s0 & /dev/dsk/c3t0d0s0) and use one of these in /etc/vfstab rather than the metadevice, but when I actually booted into Solaris, those slices didn't exist, because the bootable disks were at c4 rather than c3. I had to boot back into GRUB's Failsafe mode and correct the device for / in vfstab. Sun's documentation is fine for machines with consistent disk paths, but wrong for the X4500 (and presumably the X4540 as well).

Additionally, I worried that bootadm might read c4 from the vfstab file and write to c3 (a ZFS pool disk it must not modify!), or something similarly screwy, but this turned out to be a non-issue once I'd sorted out the rest.

Tip: bootadm apparently normally caches its changes and writes them to disk when rebooting; to force bootadm to write changes immediately, add the undocumented -f flag, e.g.: bootadm update-archive -fR /a

bootadm(1M), of course, doesn't provide any useful detail on what it does.