Extra Pepperoni

To content | To menu | To search



Entries feed - Comments feed

Tuesday, October 5 2010

Well, THAT was unpleasant -- reppep.com postmortem

Last week, reppep.com (Dell Inspiron 530 desktop, a couple years old and running CentOS 5) stopped responding to email requests. It serves a bunch of websites and a few email accounts, but the email service is much more important. The key disks were a pair of 750gb disks mirrored with mdadm; I also have 3 1tb disks for data.

I discovered that logging in locally restored responsiveness, at least for a while. Unfortunately there was nothing I could do to bring it back from work. I was in the middle of a cluster build at work and busy with some projects at home, so I left it for a few days. I noticed some panics on the console and messages about resynching /dev/md3 (swap) and /dev/md6 (/home). Those should clear themselves, but I always wonder: with a discrepancy among 2 mirrored disks, how do you decide which to trust? If one disk completely fails it's clear, but in this case despite a heat warning, smartmontools stubbornly claimed neither disk had serious problems. I kept it staggering along for a few days, until one day after a particularly long bout of responsiveness and a complaint from Amy, I gave up on waiting it out or finding a solid indication of what was wrong.

I tried pulling one of the 750gb disks, hoping it would run off the good(?) submirror, but [warning: details get fuzzy at this point] it kept complaining about /dev/md3 sync not completing (with an implication it was just stuck waiting for /dev/md6 to sync, but perhaps the system just wasn't staying up long enough to resync the 634gb of /home), and additionally I got out-of-memory crashes. I had bought the system with 1gb RAM and configured 4gb of swap. After starting the mail system, Apache httpd, openfire Jabber server, CrashPlan backup service, etc., the system exceeded 1gb, and with swap offline it was killing processes and crashing. I bought and installed a couple 1gb DIMMs (it's convenient to have Staples a couple blocks away!). I saw USB / IRQ errors, which suggested irqpoll (which can apparently slow the system down, but was worth trying), so I added it to the kernel arguments, but still no stability. I tried running off the other 750gb submirror instead, but that didn't help.

I bought a new 1tb disk, figuring I'd use it to replace the 750gb disk with the heat warning. But the system kept crashing, the same way. I tried pairing the 1tb with the other 750gb, and got the same crashes. To avoid the crashing sync process, I used The --zero-superblock argument to mdadm (syntax is a bit tricky) to remove the RAID metadata, and changed the partition types from RAID to regular Linux filesystems. Finally I installed CentOS 5.5 afresh on the 1tb disk and disconnected the rest of the disks and all USB except the keyboard and mouse: more panics, including the IRQ errors. At this point, it was apparent that my 2-year-old Dell was curdled.

ns2.reppep.com is a Compaq Evo 510 SFF (EOL 8 years ago). It's perfectly adequate as a BIND slave server, but I've been planning to replace it with a plug computer or netbook for a while, to stop wasting power.

The Evo has a single PATA drive bay, but I have USB cases. Unfortunately, as I began to configure it I noticed it only has 256mb of RAM! That's fine for BIND, but not my email system. I could spend $100 on RAM for this ancient computer, but that seemed silly. Instead I bought an HP Pavilion P6610F, which so far seems fine. It has a quad-core Athlon, which may be irrelevant because its main purpose is to serve web & email up a 1.5mbps uplink (6mbps down), or might be handy for HandBrake or other stuff. It came with 4gb RAM, so the $100 I spent on the Dell was wasted. That's one of the more purely irritating aspects of this whole misadventure.

Installing CentOS on the HP was easy. With RPMforge, installing the mail system was straightforward (much easier than building amavisd-new, clamav, and all their dependencies manually, as I did a couple years ago for the Dell). Unfortunately, users did not see old mail until I realized that I was using the wrong reconstruct syntax for cyrus-imapd (Cyrus can use . or / as a path delimiter, and although chk_cyrus uses . as a delimiter on my system, reconstruct requires /, and doesn't complain when provided the wrong syntax -- I kept running reconstruct and wondering why it didn't recover mail! Thanks to the helpful info-cyrus@ list members!

openfire was trivial -- I just installed the RPM and copied /opt/openfire to the new disk. Apache was quick & easy too -- putting my configuration back was simple, then I had to install mod_ssl and a few PHP modules for Dotclear. MySQL was easy -- I just put the files back, and didn't have to test my automysqlbackup dumps.

Unfortunately, the HP only has a single 10/100 Ethernet port (and WiFi, but who cares on a Linux server?). The Dell was PCI based, so I ordered a new PCIe GE card for the HP; fortunately GE cards are cheap, so the only aggravation is waiting for it. Ironically/sadly, Staples (who sold me the HP) only has 2 GE cards in the store -- both the PCI GA311 I already have -- meaning they don't have any GE options for the HP they sold me.

My other irritation is that this Dell died -- so badly -- after 2 years. Obviously the Evo is much more robust, as have been most of my computers.

This whole unpleasant experience reminded me (painfully) that grub is very poor at dealing with mirrored boot disks. It tends to try booting the wrong disk, in various iterations. The grub command always assumes there is a single boot disk, and simply doesn't support redundancy well. With real hardware mirroring this would all be out of grub's control or visibility, but that's rare on desktops (most 'RAID' support on desktops is just fakeraid. Fortunately the HP's BIOS lets me choose which SATA disk to boot from, and that becomes /dev/sda, so I was able to get grub working (with a few false starts).

Now mail is back with all mail recovered, all websites are online, I have Jabber back, and things seem copacetic. As soon as I get the PCIe GE card and rid of the flood abatement hardware, I can restore the high-speed connection to our home LAN and reconnect my data drives...

Wednesday, August 18 2010

SystemImager & SALI

We use SystemImager to maintain (rebuild) our small HPC clusters. Conceptually it's very simple:

  1. Build a node (the 'golden client') just the way you want it.
  2. si_prepareclient: Run rsyncd on the node, accessible to the 'image server'.
  3. si_getimage on the image server copies the entire node into a directory, and analyzes it to produce a script that will recreate the image (with exclusions for files which should differ between nodes).
  4. si_updateclient on a target node fetches the script from the image server; the script configures the target (disk partitioning, etc.) and fetches the image contents, making the target match the golden client.
  5. If the node is dead or brand-new, there's a DHCP/PXE/TFTP process for bootstrapping far enough to run the script and then match the golden client.

Once the SI system is all set up, it's quick & easy to rebuild nodes. Unfortunately there are several complications:

  • The DHCP & TFTP dependencies are somewhat complicated, so bringing up SI without breaking anything is tricky. TFTP & pxelinux are not terribly well documented.
  • The "Latest Stable Release" is SystemImager 4.0.2 from December 2007. One of the key components of SystemImager is a generic kernel & Linux initrd (initial RAMdisk) which include a default set of drivers. But the release is so old that it cannot handle current hardware. There are several newer development versions but they're not fully baked and choosing between them is confusing.
  • SI doesn't yet support grub2 or ext4, which are required for large disks (GPT partition tables).

The workaround I got from the very helpful folks on sisuite-users@ was to use SALI, a modern kernel/initrd pair for SystemImager. Unfortunately SALI's a bit different -- in the process of adding grub2 support, they broke compatibility with the scripts that SI generates. Here's a quick recap of the steps I used (mostly from sisuite-users@) to use SALI:

  • Drop the 2 SALI files into the TFTP directory (normally /var/lib/tftpboot/ or /tftpboot/).
  • Specify the SALI files in /var/lib/tftpboot/pxelinux.cfg/default or equivalent.
  • Add a couple lines to /etc/dhcpd.conf.
  • Set SCRIPTNAME= in pxelinux.cfg/default.
  • In the script created by SI:
    • Change DISK_SIZE entries to "DISK_SIZE=$(get_disksize $DISK0)".
    • Remove -v1 from mkswap arguments.
    • Add -I 128 to mke2fs for the /boot FS.
    • Remove "-o defaults" from mount commands.
    • SystemImager's final line in the script is "shutdown -r now", which fails on SALI. Use reboot until SALI 1.3, which should support shutdown.
  • On our newer cluster, SALI does bizarre things with console redirection. I had to type into the (virtual VGA) console, while output appeared on the serial console. The serial console recognized and echoed my input, but did not execute it.
  • (Not SALI related): Make sure the scripts (normally in /var/lib/systemimager/scripts) are executable -- SI left mine non-executable for some reason.

Friday, July 30 2010

amavisd-new hates yum -- solution: RPMForge

Today I patched www.reppep.com, and it broke email once again. As on several previous occasions, perl modules were broken, amavisd-new was throwing misleading errors on startup, and I had to reinstall Scalar-List-Utils to get rid of complaints about Compress::Zlib.

This time, however, I decided to upgrade amavisd-new in hopes the new version would be smarter about the (bogus) perl module complaints at startup. I also tried using yum to install some of the perl module dependencies, which entailed reinstalling spamassassin. Alas, amavisd-new-2.6.4 is no smarter, but either amavisd-new or spamassassin introduced a new dependency on Mail::DKIM, which requires the Crypt-OpenSSL-Random perl module. I tried getting them through cpan, but it kept choking -- apparently Crypt-OpenSSL-Random requires the openssl-devel RPM on CentOS, but isn't smart enough to throw a clear error demanding it.

I never did figure out where Mail::DKIM was enabled, or how to disable it, but I seem to have found a much better solution.

amavisd-new is not in the base RHEL (or CentOS) repositories, so the CentOS wiki recommends getting it from RPMForge. This turned out to be pleasantly simple, and should prevent yum from breaking it in the future. Here's hoping, anyway!

Sunday, March 21 2010

Ubuntu: Java in Firefox

Julia's netbook runs Ubuntu Netbook Remix 9.04 (UNR's 9.10 installation process has changed, and doesn't work as well). Her teacher recently recommended http://arcytech.org/java/, which offers a bunch of educational games. Unfortunately, UNR doesn't include Sun Java, and getting it running was non-trivial.

The short version: I needed to enable the multiverse & universe repositories in synaptic, and then "apt-get install sun-java6-plugin". I initially installed just the JRE, but Firefox needs the plugin too. I also put the JVM's path in /etc/jvm, but I'm not sure if that mattered.

Interestingly, I had a similar problem at work last week -- CentOS 5 systems naturally don't come with Sun Java, but installing the JVM was easy. For both CentOS & Ubuntu, most of the documentation on installing Java (including Sun's) stops at getting the JVM installed, and neglects the Firefox plugin. On CentOS, I just dropped a symlink to /usr/java/default/... into the right directory under /usr/lib/mozilla.

Be sure you install for the correct version of firefox (some of our systems have bits of several different versions); if not sure, link the plugin into ~/.mozilla.

Thursday, December 31 2009

Parental Controls: Ubuntu Netbook Remix vs. Mac OS X

Julia had my old PowerBook for a while. She liked that it was (externally) the same as my MacBook Pro, and it was fine for the Flash edutainment sites she uses, such as http://www.starfall.com/, http://www.cyberkids.com/, http://pbskids.org/, & http://www.poissonrouge.com/.

Using Mac OS X's Parental Controls, I limited her to a half-hour per day (although most days she doesn't use the computer), and I set her up with the Simple Finder. It didn't work perfectly -- Parental Controls prevented us parents (administrators) from configuring/fixing certain things, and the timer granularity wasn't really sufficient (she got a 15-minute warning in a half-hour session, which was just distracting, and we couldn't set anything between 30 and 60 minutes), but it worked pretty well. Unfortunately, the PowerBook finally died -- it couldn't retain access to the AirPort network, kept losing the clock (which broke Parental Controls), and eventually stopped booting entirely.

Amy and I agreed that buying a fixed desktop computer didn't make sense, but a MacBook with AppleCare would cost over $1,000. Fortunately, I found a (purple) Eee PC netbook for $229 at Amazon, with 512mb RAM, 4gb flash, and a 9" 1024*600 LCD. It came with Xandros Linux (Windows was a non-starter). I upgraded to 1gb, still well under $300. I wiped Xandros in favor of Ubuntu Netbook Remix 9.04, which is quite nice. It's basically a smaller Ubuntu distribution with a launcher instead of a static background. The keyboard is quite awkward for an adult to use, but it works fine with a spare USB keyboard & mouse. My attempt to upgrade to 9.10 failed -- Ubuntu uses a new image format, and doesn't have Mac instructions; I tried on a Windows VM but it didn't work, and hasn't been a priority. The built-in upgrade option doesn't work because there isn't enough free space on the 4gb flash drive. 9.04 works fine, though, so I'm not fussed about figuring out a workaround for the upgrade -- I'm sure they'll get working Mac instructions eventually.

Julia's UNR desktop

After installing openssh-server and setting up passwords, it's easy to manage from Terminal and X11.app on my Mac.

But the tough question was how to recreate the parental controls. On Linux, it seems fairly straightforward to run a network proxy to filter out 'bad stuff', but as far as I can tell, there is no such thing as a good site blacklist. Since Julia isn't yet 7, I think for now we can just explain that we'll keep an eye on her computer usage (browser history), and keep an eye on her when she's using the computer (it's staying out of her bedroom, for instance).

The other tough part is the time limit. Fortunately, the included Keyboard Preferences has a section called "Typing Break", which I have set to lock the screen after 30 minutes, and unlock 840 minutes later. That should provide a reasonable control, although I have already thought of at least 3 different ways around it. When I kill the program to release the lock via ssh (so she can finish what she's doing), it doesn't come back next time, and I haven't investigated how to restart it yet...

As a backup, I have configured the computer to send me email every 10 minutes when it's on, which should provide reasonable cross-check:

pepper@julia:~$ crontab -l
# m h  dom mon dow   command
*/10    *   *   *   *   (uptime ; last | head -3) | mail -s "julia netbook is on" pepper

Tuesday, May 26 2009

anaconda vs. Virtual Disk

Update: chip suggests dmraid -r -D in rescue mode. I haven't tried it -- wiping both disks with dd didn't help, but booting one disk, installing, and then adding the second disk gave me a usable install.

My latest Linux problem is odd, but at least somewhat interesting. I'm installing CentOS 5.3 onto an old PowerEdge 1950 with a couple 750gb SATA disks. My kickstart configuration failed, complaining that the system couldn't repartition sda. I tried hda, with no more luck. Confusingly, I do see hda when booting, but it looks like that's the CD-ROM drive, and anaconda skips it for partitioning.

anaconda: error -- cannot find physical disk

When I ran anaconda (the Red Hat / CentOS installer) manually, it showed me only a single device: /dev/mapper/ddf1_Virtual Disk 0. This is clearly some sort of logical device -- LVM, software RAID, or something like that. I booted into the Dell SAS BIOS and confirmed it wasn't presenting logical devices -- it's not a PERC controller, and doesn't appear to support hardware RAID at all.

anaconda: cannot manage LVM volume

I booted into linux rescue mode, and the system found and mounted /dev/mapper/ddf1_Virtual Disk 0p5 and various associated partitions, but I was able to destroy and recreate the partition tables with fdisk.

linux rescue mode: logical and physical disks

I filed a bug against CentOS (since I haven't tried to reproduce this in RHEL, although I'm pretty sure it's their issue).

Unfortunately, when I booted back into anaconda, the virtual disk remained the only available device. Someone on #centos suggested that this was an LVM volume, and the system was finding LVM superblocks scattered across the disk (so not dependent on the partition table). Next attempt is ongoing right now (writing 12,002,501,984,256 zeroes takes a while):

  • time dd if=/dev/zero of=/dev/sda bs=1G
  • time dd if=/dev/zero of=/dev/sdb bs=1G

Monday, February 23 2009

LVM Setup (on SATABeast)

Update 2010/05/06: Apparently I was wrong. ext3 uses 32-bit block numbers from 0..4,294,967,295. With 4kbyte blocks (maximum on i386 & x86_64 systems) this gives a maximum ext3 filesystem of (2^32-1) * 4096 = 17,592,186,040,320 bytes. Using LVM with 4096kbyte physical extents, this means ext3 filesystems must be under 4,194,304 PEs. So use lvcreate --extents 4194303. 4,194,303 4096kbyte physical extents = 4,294,966,272 4kbyte blocks = 17,179,865,088 bytes in the resulting filesystem.

Update 2009/03/10: It looks like mke2fs is smart enough to automatically select the 4k blocksize, and largefiles4 is not necessary (which is good, as it was interfering with our backups).

We compared performance between 10-disk and 20-disk RAID6 sets on a SATABeast, and discovered the performance difference is not significant, so we chose the most efficient reasonable layout: 2 20-disk RAID6 sets, each containing a single volume the same size. These appear to the Linux host as a couple 16.37tibyte LUNs. We're using device mapper multipathing to provide fault tolerance across both FC paths (in Nexsan's recommended "All Paths All LUNs" mode, each LUN is available via both controllers). This is all handled (except the performance testing) via the SATABeast administration interfaces.

Within Linux, we create 2 LVM logical volumes of just under 8tibyte (the largest ext3 can handle), and a third with the leftover 384gibyte, from each LUN.

The SATABeast lets the host see and use the volumes while it's still generating parity on the underlying RAID arrays ("Online Creation"), but creating file systems is much slower during this process.

A fully configured SATABeast contains 42 1,000,137,687,040-byte ("1 terabyte") drives. They reserve 2 for spares, so we have 40 disks to work with. Nexsan suggests 4 10-disk RAID sets, but RAID 6 allocates 2 disks per RAID set to parity, so with 4 10-disk sets we would 'waste' 10 disks, and our usable space would be 4 volumes, each 8 usable disks = 8,001,101,496,320 bytes / 1024^4 * 4 RAID sets = 29tibyte usable. 29tibyte is a lot, but only 70% of the specified "42tbyte", so we'd really like to be more space efficient -- we have 2 hot spare and 4 p

I set up /etc/fstab earlier, using ext3 labels.

How to create an ~~8tibyte ext3 filesystem on a large multipath raw volume.

  1. pvcreate /dev/mapper/mpath3
  2. vgcreate satabeast1vg /dev/mapper/mpath3
  3. vgdisplay # Note number of usable extents (PE).
  4. lvcreate -n satabeast1lv --extents 2096128 satabeast1vg # Use number from above. # It least close to the largest valid ext3 volume size.
  5. mkfs.ext3 -L/satabeast1a -Tlargefile4 -b4096 /dev/satabeast1vg/satabeast1a # If the SATABeast is still calculating parity, this takes a while. Go get some food...
  6. vi /etc/fstab ## I use something like LABEL=/satabeast1 /satabeast1 ext3 defaults 0 0
  7. mount /dev/satabeast1vg/satabeast1lv /satabeast1
  8. df -h /satabeast1

Here's my transcript of creating an LVM volume with two not-quite-8tibyte and one 348gb filesystems on a 16.37tibyte LUN.

[root@norimaki device-mapper-multipath-0.4.7]# pvcreate /dev/mapper/mpath3 
  Physical volume "/dev/mapper/mpath3" successfully created
[root@norimaki device-mapper-multipath-0.4.7]# vgcreate noribeast0vg /dev/mapper/mpath3 
  Volume group "noribeast0vg" successfully created
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0lv --extents 2096128 noribeast0vg
  Logical volume "noribeast0lv" created
  Logical volume "noribeast0lv" successfully removed
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0a --extents 2096128 noribeast0vg
  Logical volume "noribeast0a" created
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0b --extents 2096128 noribeast0vg
  Logical volume "noribeast0b" created
[root@norimaki device-mapper-multipath-0.4.7]# vgdisplay
  --- Volume group ---
  VG Name               noribeast0vg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               16.37 TB
  PE Size               4.00 MB
  Total PE              4292125
  Alloc PE / Size       4192256 / 15.99 TB
  Free  PE / Size       99869 / 390.11 GB
  VG UUID               QLdntg-9ccY-0HYe-DnWI-Lxxu-Huzj-S3bExH

[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0c --extents 99869 noribeast0vg
  Logical volume "noribeast0c" created
[root@norimaki device-mapper-multipath-0.4.7]# mkfs.ext3 -L/noribeast0a -Tlargefile -b4096 /dev/noribeast0vg/noribeast0a
mke2fs 1.39 (29-May-2006)
Filesystem label=/noribeast0a
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
8384512 inodes, 2146435072 blocks
107321753 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
65504 block groups
32768 blocks per group, 32768 fragments per group
128 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000, 214990848, 512000000, 550731776, 644972544, 1934917632

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
[root@norimaki device-mapper-multipath-0.4.7]# time mkfs.ext3 -L/noribeast0b -Tlargefile -b4096 /dev/noribeast0vg/noribeast0b
mke2fs 1.39 (29-May-2006)
Filesystem label=/noribeast0b
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
8384512 inodes, 2146435072 blocks
107321753 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
65504 block groups
32768 blocks per group, 32768 fragments per group
128 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000, 214990848, 512000000, 550731776, 644972544, 1934917632

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 23 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

real    9m52.451s
user    0m0.657s
sys 0m3.660s
[root@norimaki device-mapper-multipath-0.4.7]# time mkfs.ext3 -L/noribeast0c /dev/noribeast0vg/noribeast0c 
mke2fs 1.39 (29-May-2006)
Filesystem label=/noribeast0c
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
51134464 inodes, 102265856 blocks
5113292 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
3121 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 27 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

real    1m45.683s
user    0m0.172s
sys 0m10.150s
[root@norimaki device-mapper-multipath-0.4.7]# df -h |grep nori
                      8.0T  175M  7.6T   1% /noribeast0a
                      8.0T  175M  7.6T   1% /noribeast0b
                      384G  195M  365G   1% /noribeast0c

SATABeast, ext4, & CentOS

We are setting up a SATABeast with 42 1tb disks. With 2 hot spares, this is "40tb" of raw storage. With a couple 20-disk RAID6 sets, it's 18,002,478,366,720 bytes = 16,766gibytes. Problem: RHEL/CentOS 5.1 & 5.2's ext3 filesystem supports slightly less than 8tibytes. RHEL 5.3 includes ext4dev, meaning Red Hat still considers ext4 unstable. CentOS is working on 5.3, but doesn't seem close to release.

The SATABeast is very heavy (151 lbs with dual controllers and 42 drives), so naturally you put it at the bottom of the rack. But there are a couple problems:

  1. All the health & status indicators are on the bottom of the bezel, so you cannot see them without a mirror on the floor.
  2. The drives aren't hot swap. It's even worse than the X4500, with its dire warnings about leaving the cover off for over 60sec to replace a drive -- to get a drive out of the SATABeast, we have to unmount and shut it down, remove two screws and the front bezel, and then remove the failed drive -- they provide a special tool to lift the drive out!
  3. The SATABeast has a serial console (in our dual-controller unit, it's the lower serial port on Controller 0 -- the upper serial port on Controller 1 is inactive the upper controller is #0, and apparently controller #0's GE port #0 is the only one with the administrative web server the administrative GUI is available on GE port #0, but not port #1, of each controller, with preassigned static IPs ( & -- GE ports #1 are only for iSCSI, which we don't use; both serial ports are usable). Since we installed it last week, the serial port console has failed 3 times, and I have had to reboot to get it back. Yes, I upgraded the firmware.
  4. The firmware is not available for download -- you have to ask support to email it.
  5. Nexsan provides a Mac GUI tool for managing SATABeasts, but it only works via autodiscovery -- it cannot see our SATABeast (in another subnet), and there's no way to specify the IP. Lame.

On the other hand, the GUI is generally well designed (although a bit overly complicated -- there are 11 left-hand menu items, and buttons within the main UI tend to jump into the wrong top-level item, which makes operations such as changing LUN mappings harder than they should be). Also, it's pathetic that Nexsan charges $250 for an SSL certificate, and doesn't let you BYOC.

But it looks like we don't really need RHEL 5.3's ext4 support anyway:

NOTE: Although very large fileystems are on ext4's feature list, current e2fsprogs currently still limits the filesystem size to 2^32 blocks (16T for a 4k block filesystem). Filesystems larger than 16T is one of the very next high-priority features to complete for ext4.

Saturday, January 31 2009

Linux Firewalls: Novell SuSE FAIL

We need to do some slightly exotic firewalling on a SUSE 10SP2 host, so I tweaked the firewall ruleset I've been using for years and started looking for the best way to apply it on a SuSE 10SP2 system. What I found is not pretty.

I found 4 SuSE scripts to manage the firewall, in addition to the basic iptables commands which are part of the netfilter (iptables) package used across Linux distributions:

  • /sbin/SuSEfirewall2: The script that actually manipulates iptables.
  • /etc/init.d/SuSEfirewall2_init: init script to start the firewall, presumably.
  • /etc/init.d/SuSEfirewall2_setup: init script to configure the firewall, apparently.
  • /etc/sysconfig/SuSEfirewall2: The config file (actually a script).

The init scripts have to run in the right order, and they call /sbin/SuSEfirewall2 to do the actual work. The init scripts offer a bunch of subcommands, but some of the listed subcommands are unimplemented -- presumably SuSE has a spec that says these must be provided, but the programmer didn't believe that means they had to do anything.

nori:~ # service SuSEfirewall2_setup
Usage: /etc/init.d/SuSEfirewall2_setup {start|stop|status|restart|reload|force-reload}
nori:~ # service SuSEfirewall2_setup status
Checking the status of SuSEfirewall2                                 unused

My script is in the simplest iptables format -- a bunch of lines like -A INPUT -j ACCEPT -p tcp --dport 80, with a header and COMMIT footer to make it a valid ruleset -- on Red Hat this goes in /etc/sysconfig/iptables, and the system loads it fine.

But this is not suitable, because SUSE expects a script. So I commented out SuSE's command to load the firewall script, and replaced it with iptables-restore, which is present (but unused) on SuSE because it's a part of netfilter. I have to do some more testing, but it looks like this way SuSE will start netfilter and load my rules, without me having to figure out what they were thinking.

nori:~ # diff -u /etc/init.d/SuSEfirewall2_setup.orig /etc/init.d/SuSEfirewall2_setup
--- /etc/init.d/SuSEfirewall2_setup.orig    2009-01-30 10:17:49.000000000 -0500
+++ /etc/init.d/SuSEfirewall2_setup 2009-01-30 10:19:40.000000000 -0500
@@ -2,6 +2,9 @@
 # Copyright (c) 2000-2002 SuSE GmbH Nuernberg, Germany.
 # Author: Marc Heuse <marc@suse.de>
+# Hacked by Pepper, 2009
 # /etc/init.d/SuSEfirewall2_setup
@@ -39,7 +42,10 @@
    echo -n "Starting Firewall Initialization "
    echo -n '(phase 2 of 2) '
    rm -f "$BOOTLOCKFILE"
-   $SUSEFWALL -q start
+   #$SUSEFWALL -q start
+   iptables-restore < /etc/sysconfig/iptables.conf
    rc_status -v

I find the rules Red Hat's lokkit generates inexplicable and painful to parse, but at least simple rules written by hand are valid, and the system happily uses them. Neither lokkit nor yast (SuSE's configuration utility) is flexible enough for our requirements, but that's fine. It's just forcing users to deal with such a needlessly complicated system that I resent.

Sunday, July 13 2008

Today's Linux tip: "yum localinstall"

I needed to install the Citrix ICA client on CentOS 5.2 (RHEL 5.2), but it has very strange dependencies -- it complains about a version of libXaw which is present, demands an older version of libXm, and requires manual installation of openmotif 2.2.

The trick (thanks, FriedChips!) was yum --nogpgcheck localinstall ICAClient-10.6-1.i386.rpm, rather than rpm -Uvh yum ICAClient-10.6-1.i386.rpm. This way yum chased the dependencies for me, and didn't refuse to install the unsigned Citrix package.

Next I associated launch.jsp with /usr/lib/ICAClient/wfica.sh -- Citrix should have used .ica instead, because .jsp is used for other things. IIRC, EMC NetWorker used .jsp to launch their graphical console.

Unfortunately the ICA client insists on being wider than the physical display, but I can work around that. I wonder if it's because I simultaneously connected to the same XP system via RDP from both Linux and a Mac with different resolutions.

Update: Citrix is fixed on the size of my MBP's 1440*900 main display, which means it doesn't fit properly on the MBP's external 1280*1024 (or landscape 1024*1280) or my Linux box's 1280*1024.

Annoyingly, Citrix assigns the Mac's Command key to Alt on the Windows host. This doesn't work well, because although they avoid most Command key combinations in the ICA Client, Command-Tab switches Mac apps rather than Windows windows. Guys, just use the Option key! It even says alt on it, and nobody needs that key for Mac specific functions! Today's happy discovery: Command-Option-Tab switches Windows apps.

Next I have to figure out how to de-assign Alt-Tab from switching virtual workspaces in KDE. Copy & Paste don't work consistently when connected from KDE either, presumably because some events are being interecpted locally and others are being passed through. I won't need to use KDE as a Citrix terminal for much longer, though.

Crud. After all that, the Citrix ICA client doesn't display most text, making it useless. I can get some things to display by selecting them, but many things (including dialog boxes) are un-selectable. Junk!

Wednesday, July 9 2008

reppep service interruption

Ouch! At 10:31pm last night, I started patching both Linux servers running reppep and associated domains, prompted by Rich's BIND alert. At 12:33am, www.reppep.com finished installing approximately 255 CentOS patches (including BIND), and I rebooted. Everything looked fine, and I went to bed. This morning, I thought it a bit odd that I didn't have any new email, but not that unusual.

Melissa left me a message that mail wasn't flowing, but I couldn't fix it at work. Tonight I discovered that amavisd-new, which handles filtering for reppep email, was unable to start. Strangely, it was complaining about the Compress::Zlib perl module, which was actually installed (version 2.008, via the perl-Compress-Zlib-1.42-1.fc6 RPM). Some more digging indicated Scalar-List-Utils-1.19 needed to be reinstalled, which enabled amavisd-new to start (it checks for Compress::Zlib and refuses to start if it finds something wrong, which was apparently triggered by the Scalar-List-Utils issue).

mailq showed me postfix was now getting errors from amavisd-new about MIME::Parser and File::Temp. CPAN reinstalled MIME::Parser and said File::Temp was already current.

I bounced amavisd-new again, and tried postfix flush. Over the past 15 minutes, postfix has delivered the ~~650 outstanding messages, and all seems well.

Separately, Alex noticed our blogs were inaccessible, but bouncing BIND tonight cleared that -- odd, as I checked http://www.bertpepper.com/ and got valid DNS resolution from both nameservers immediately after patching, but obviously something I didn't notice was still scrambled.

Anyway, at 8:45pm, all seems present and correct.

Sorry for the disruption!

Tuesday, February 19 2008

reppep.com Migrated

On Feb 19, 2008, I shut down the old reppep.com server, which ran Mac OS X 10.4 "Tiger" Server, and replaced it with a new (cheaper and faster) PC running Linux. Unfortunately, the password formats are incompatible, so I apologize to app reppep users for the disruption.

Please call me if you have an account on reppep.com and haven't received your password already, or find anything not working right.

I switched from Apple's jabberd to Openfire, which doesn't use the UNIX system accounts, so let me know if you want a chat account (compatible with iChat & GTalk).

[Done] I forgot SquirrelMail address books -- should be able to bring those over too.

  • Firewall problem fixed. SMTP MX issue fixed.
  • Virus filtering problem fixed.
  • Webmail certificate fixed.
  • Quota problem fixed.
  • Virtual domains for email fixed.

As of 5pm, I don't know anything that doesn't work (aside from SquirrelMail address books) [fixed Thursday].

Thanks for your patience!

As of 10:30 on the 20th, things seem to be working. Something's screwy with amavisd-new's quarantine, but mail is going through. I reinstalled Openfire, and chat seems okay under the correct hostname/certificate name now (will try signing it as ca.reppep.com later).

Good timing -- the optical drive on the old server died tonight.

I have distributed all the new temporary passwords, so any users having trouble logging in should let me know.

Markdown.cgi is still broken, but I'm the only person who uses it here, so I'll get to it.

On Thursday the 21st, I found a problem with amavisd-new -- it had quarantined 32,000 messages in a single directory, and was stuck (apparently ext3 doesn't support more than 32,000 files in a directory). I cleared it out and finally managed to disable quarantine, which wasn't as easy as it should have been, and the backlog of messages have been delivered as of 9:15pm.

At 11pm, I fixed an issue preventing SMTP AUTH from working properly, which was interfering with sending email to non-reppep addresses.

Thursday, December 20 2007

Installing Linux: NFS vs. HTTP

I'm digging deeper into CentOS (basically a free version of Red Hat Enterprise Linux) v5.1, and for me that entails dozens of runs through the installer, testing out kickstart configuration variations.

This led me to wonder if it is faster to install via NFS or HTTP. I couldn't find a useful answer online, so I ran a couple simple tests. My client is a 2.4GHz Dell PowerEdge 600SC using SATA disks on a Promise TX4. My server is a dual 1.25GHz Power Mac G4, running Mac OS X Server 10.4.11. They're connected via private network, using a NetGear gigabit Ethernet switch.

For NFS installations, anaconda takes a directory containing a DVD ISO (or set of CD ISOs), and automatically loopback mounts them as part of the installation process. This is very handy with the CD ISOs, as it doesn't require much configuration on the server -- just an NFS export.

In contrast, HTTP installation doesn't work against ISOs -- the web server must serve up the individual files, whether from a loopback mount on the server, or a directory where the files have been extracted. With the 6 CD ISO files, this is quite a nuisance; with the DVD ISO, it's not so bad.

My fairly complete kickstart configuration installs 2,103mb of packages. Installation times were quite similar, but a bit faster for HTTP, at 18:28 for package installation and 24:37 total. Via NFS packages took 19:57; total was 27:38.

For reference, I used the following partitioning configuration, which factors into total installation time:

part /boot --onpart=sda1 --fstype=ext2
part /     --onpart=sda2 --fstype=ext3
part swap  --onpart=sda3 --size=2048
part /var5  --onpart=sda5 --fstype=ext3
part /home6 --onpart=sda6 --noformat
part /sdb1  --onpart=sdb1 --noformat

The partitions already existed:

[root@pe ~]# df -hl|grep -v tmp
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             7.7G  2.8G  4.5G  39% /
/dev/sda6             664G  198M  629G   1% /home6
/dev/sda5             3.9G   73M  3.7G   2% /var5
/dev/sdb1             688G  198M  653G   1% /sdb1
/dev/sda1             251M  7.4M  231M   4% /boot

I used the following set of packages for testing:


Saturday, December 8 2007

Upgrading from Tiger Server to Linux

For over a year now, I've been following the development of Mac OS X Server 10.5 Leopard and testing betas, and anticipating upgrading reppep.com from Tiger Server on a dual 1.25GHz Power Mac G4 to Leopard Server on a dual 2GHz Power Mac G5. Over the weekend I had a change of plans, though.

Although I support Mac OS X Server at Rockefeller, I don't recommend it for most requirements, as Linux compares favorably for transparency (some of the MOSXS internals are unique and poorly documented), server software compatibility (although Macs are quite good here too), and price/features at the low end. A Core Duo Mac mini has plenty of juice to saturate our 768kbps/3mbps DSL circuit, but adding a couple drives more than doubles its price, and Apple's software RAID is quite broken; Linux software RAID is apparently quite good; I might eventually switch to hardware RAID. An Xserve is a great piece of hardware, but it's a bit exotic and I can get a fast generic PC cheaper; I don't want all the high-end features for a box that sits in our apartment.

Additionally, I've read perhaps 600 pages of docs on Leopard Server, and had at another 400-1500 yet to go. This is an investment I was finding hard to justify. The migration process is quite complicated, and Apple doesn't support migrating accounts from a Tiger system to a Leopard system -- I don't want to do an upgrade. I could clone the G4 to the G5 and upgrade it there, but I prefer to handle upgrades as scratch installations with manual migration of applications, so I know exactly what's been done. A lot of this is masked by upgrade procedures.

As part of this, I've decided to invest a bit more time in learning RHEL5 -- we have a couple systems at Rockefeller, but not much in production yet, and now seems like a good time to dig in some more.

Fortunately, all the services I've been using on reppep.com are available on Linux (and FreeBSD), so aside from another incredibly inconvenient password change cycle (for which it is arguably time anyway), the switch should be largely transparent to reppep.com users, although I still have plenty of research to do.

A brief timeline of reppep.com

  1. 1999: I left the National Audubon Society, and bought the Power Mac 7300 with accelerator card I'd been using there. I set it up with LinuxPPC and Apache, and started offering free web hosting to friends & family. LinuxPPC was eventually discontinued.
  2. I upgraded from LinuxPPC to Yellow Dog Linux, which was better than LinuxPPC, but had serious flaws.
  3. 2001: I was working on a couple remote FreeBSD machines (as admin of the Info-Mac server, and a user on the Apache Software Foundation userhost), and decided to learn more; I bought a cheap Celeron PC and installed FreeBSD 4.3 (IIRC); I upgraded through about v5.1 and a Pentium 4 (giving the Celeron box to the Info-Mac Archive, where it became the Info-Mac server for a while). I learned a lot about FreeBSD and UNIX in general, but eventually realized I was investing more time learning FreeBSD than I could justify. The best thing about FreeBSD is not a technical feature, but rather that the user community is so rich with knowledge. Reading the FreeBSD-STABLE list was amazing, as there was so much depth, freely shared with the community. While running on FreeBSD, I added mail services to the web services I had been offering. Note: Disruptions to personal email service are much worse than problems with personal web service.
  4. 2005: It became clear that I needed anti-spam, so I began researching SpamAssassin. While I was figuring out how to build the SMTP sandwich, with a public untrusted Postfix listener on port 25 & 587, and a filter, and then a listener on a high port like 10025 to accept and deliver mail to actual users, I installed a beta of Mac OS X Server 10.4 "Tiger", which had the whole thing implemented, plus ClamAV as a bonus. I started testing heavily before the release, and switched to MOSXS 10.4 shortly after it was finalized. It's been very good, but as time has passed, I've had more and more problems. In particular, Apple chose to use Cyrus as an IMAP/POP server, and Cyrus is complicated, but Apple ignores the complexity; this can make troubleshooting impossible. The SpamAssassin installation is slightly broken; it's a bit too old to offer the newer SpamAssassin self-upgrade mechanism. Server Admin is great, but has a bunch of bugs around SSL certificates, some of which destroy the certificates. Blojsom was nice, but Apple's installation was very unstable; I eventually moved my blog to WordPress hosted externally.
  5. 2008: I intend to switch to CentOS 5.1, which is basically a (legal) no-charge clone of Red Hat Enterprise Linux 5.1. This should make future upgrades a bit more straightforward, as I won't have to deal with Apple's Open Directory (OpenLDAP); it will also give me a bit more experience with RHEL5, which is a better investment for my time than Leopard Server.

Monday, November 12 2007

Oracle VM: Funniest thing I read all day

Update: It's sillier and sadder than I thought. See below.

Today (2007/11/12), Oracle announced Oracle VM, their free competitor to VMware and (Citrix) Xen. A few months ago, Oracle announced "Unbreakable Linux", which is their re-branding of Red Hat Enterprise Linux. There are already many free Red Hat flavors, including CentOS, but not too many companies have built business models on attempts to take Red Hat support business away from Red Hat.

Oracle has. They made many loud claims of being cheaper and better than RHEL, while claiming this wasn't an attack on Red Hat. Red Hat was pretty quiet about Oracle Linux, but did point out that Oracle's claims to be actively fixing bugs in RHEL (supposedly faster than Red Hat does) without forking RHEL were impossible -- as soon as there's a fix which isn't available from Red Hat, that's a fork.

There's been a lot of ill feeling both ways over this, but of course neither company is willing to publicly and unambiguously badmouth the other.

Today we see another step in Oracle's (Linux) plan: Oracle VM is free, but Oracle offers paid support. The best part is this, though:

What is the difference between Oracle VM and the virtualization that comes bundled with Oracle Enterprise Linux?

As part of the Unbreakable Linux Support program, Oracle supports virtualization that is included with Oracle Enterprise Linux 5. Please note that Oracle products are not supported to run in that environment. Any customer who wants to deploy Oracle products in a virtual environment should use Oracle VM, and subscribe to Oracle VM support. Oracle customers should refer to MetaLink note 466538.1

Translation: We sell RHEL5 (which includes Xen as part of the base price) but we don't like it, because we want you to pay more for Oracle VM instead. We cannot realistically either break or drop support for Xen, even though we'd really like to, but we do get to chose what "platforms" we support Oracle on, so we'll support Xen, and Oracle on Linux, but not Oracle on Xen. Please don't think too hard about that one. It makes our heads hurt!

Update 2007/11/13: I missed the fact that Oracle VM is based on Xen. This means Oracle wants to sell you "Unbreakable Linux", but wants to charge an extra $500 to virtualize its own software on "Oracle's" Linux platform. I thought they were claiming Oracle VM was better than RHEL's VM, but that can't stand even cursory scrutiny, given that they're basically the same code. Additionally, their

• Three times greater efficiency than current x86 based server virtualization products;

has to be in relation to VMware which is not paravirtualized, but there is no way Oracle's brand-new Xen build is significantly faster than Red Hat's Xen kernel, running on Red Hat's Linux distribution.

Given that Oracle now recommends RHEL + Xen (from Oracle) as a platform for running Oracle Database & Applications products, Oracle's lack of support for running on RHEL + Xen (when purchased from Red Hat) looks -- I was going to say even more absurd, but this can't be an oversight, so it's just transparent corporate greed.

Saturday, September 8 2007

Red Hat 401: Deployment & Systems Management

I just finished RH401: "Red Hat Enterprise Deployment, Virtualization, and Systems Management". It's a 4-day course, given Tuesday-Friday of this week. The course is normally Monday-Thursday, with an assessment exam (EX401) on Friday. Had I known this, I probably would have taken the course with the exam -- I'd like to have that certificate. There are 5 tests (including EX401) to earn the exalted title of "RHCA", Red Hat Certified Architect.

The course covered several major areas:

  • Net booting (PXE, DHCP, & TFTP)
  • Kickstart (automated installation of RHEL)
  • Red Hat Network (rhn.redhat.com, a service hosted by Red Hat), Satellite Server (a local version of the service, which includes and installs net boot services), and Proxy server (a customized caching webserver which saves bandwidth and download time -- a subset of the full Satellite)
  • Building RPMs
  • Xen virtualization

Xen is very cool -- it's perhaps halfway between VMware and Solaris zones (containers), so more efficient than VMware but less than zones. Xen offers live migration between servers and supports RHEL 4.5 as a guest OS. With appropriate hardware (preferably recent Intel or AMD CPUs with hypervisor instructions), Xen can also virtualize Windows and earlier versions of RHEL. VMware is much more mature, but very expensive (easily more than the hardware it runs on for standard 2-socket systems), so this was a useful preview, even if we don't expect to use Xen much during the next year -- perhaps for Rockefeller's multi-user webserver, where we would like more isolation between users.

I was really there, however, to find out how to build custom RPMs for Rockefeller, manage them with custom RHN channels, and kickstart from a net boot server to streamline and automate installations.

Unfortunately this turns out to be surprisingly expensive, compared to what we pay to run RHEL. We normally pay $50/host/year for RHEL Academic Server, which is basically the Update & Management entitlements. This enables us to download patches from rhn.redhat.com (Update), and do a little bit more advanced stuff such as group systems in the RHN website (Management).

To use all the custom channels and kickstarting discussed in the class, we need a Red Hat Satellite Server (which costs about as much as all our RHEL Academic seats combined), and a $96 RHN Provisioning add-on Entitlement for each server. Combined, these would quadruple the amount we pay Red Hat annually for our servers, and I'm not at all convinced it would be a worthwhile investment.

We may instead get a Red Hat Proxy Server, which provides custom channels and costs much less than the full Satellite, and build our own kickstart server, forgoing all the Satellite features. This would be a shame, but might turn out to be the best compromise.

Another problem is that the RHN/Satellite back-end is RHEL4AS only -- it doesn't run on RHEL5, and it doesn't coexist well with any other services. This is a larger Red Hat problem, rather than specific to the class, but it meant the class was a mixture of RHEL4 and RHEL5, and made things more complicated.

It's enough to make one seriously consider CentOS, is a rebranded free version of RHEL. We don't want to do that, though.

Paul, our instructor, was full of excellent tips on better ways to work with RHEL. Unfortunately, I avoid many of these (decidedly useful) techniques, since they only work on Linux (or only RHEL), and I generally stick to things common to Linux, Solaris, and Mac OS X. The neat stuff Red Hat has added recently, which he was excited about, would make my RHEL work more efficient at the expense of having to keep track of the RHEL way and the non-RHEL way. Those commonalities are essential for me.

Still, I learned a lot of useful stuff about RHEL, and now just need a chunk of time to set up a kickstart server and decide how to do DHCP -- our DHCP scopes are managed by the Network Group, and we need a way to set up and manipulate kickstarting without asking them to make multiple DHCP & VLAN changes. I have some ideas for how to automate and customize the kickstart process, which I'd really like to test and implement.

Saturday, August 11 2007

Chris Pepper, RHCE

I passed the RHCE exam, hooray!

This was much easier than becoming Dr. Pepper, and much easier & safer than becoming Sgt. Pepper.

Friday, August 10 2007

Took the RHCE Exam Today

I spent Monday to Friday this week in RH300, the Red Hat Certified Engineer Rapid Track Course; today (Friday) was the exam. I believe I passed -- they should email my final results by Wednesday. In reality, I took the test as much for the RHEL5 update as for the certification.

I was concerned about problem solving with no Internet access, no access to another system (in real life we almost always have another live system to check things out, as opposed to troubleshooting grub on an unbootable system without working man, which was a problem in class), and no ability to discuss with co-workers, but it's not an exam about how well you can find answers on The Google, so this was the only realistic way to do it.

Wednesday, May 16 2007

rpm -e --allmatches

I was deleting some unneeded Samba RPMs today, since they're vulnerable to a security bug, and hit a snag on a 64-bit machine, where rpm was too stupid to handle the presence of both 32-bit and 64-bit RPMs. The error was 'error: "samba-common" specifies multiple packages'. The solution is simple but obscure: Add "--allmatches", as in "rpm -e --allmatches".

As it turns out, I can't really remove samba-common anyway, because it's required for kdebase, which is required for half the RPMs on the system, but now at least I know the trick for next time. RPM really needs to deal with multi-flavored packages better.