Extra Pepperoni

To content | To menu | To search

Friday, April 27 2012

Isilon Notes, 2012 Edition

General

  • Isilon provides templates for Nagios, which you should use. Unfortunately Nagios cannot distinguish serious problems (failed disk) from trivia (quota violations & bogus warnings).

Hardware

  • Isilon's current units are either 2U (12-bay 200 series) or 4U (36-bay 400 series).
  • The new NL400-108 nodes are similar enough to the older 108NL nodes that they pool together. The 108NLs are dual-socket 16gb nodes based on the 72000x chassis, which is an upgrade from the 36000x chassis. This makes them much faster than the older single-core 36NLs & 72NLs.
  • As of OneFS v6.0(?), Isilon nodes no longer use the VGA keyboard & mouse console. Instead they use the serial port exclusively as console, although the VGA port does display some booting messages. In 2011, a USB connection to a KVM made a node reboot until we disconnected USB.
  • Every node is assigned a device ID when it is joined to the cluster. All alerts are tagged with the device ID of the node reporting the event. Device IDs are never reused, so if a chassis fails and is swapped out, the replacement will get a new device ID, but the old node's hostname. If this happens to you, you may want to use isi config (with advice from Isilon Support) to change the hostname to match the device ID. With a large or dynamic cluster it might just be better to ignore device IDs and let the node names run in a contiguous sequence.

Jobs

  • Isilon's job engine is problematic. Only one job runs at a time, and jobs are not efficiently parallelized.
  • MultiScan combines Collect and AutoBalance jobs.
  • During the Mark phase of Collect (or MultiScan), with snapshots enabled, delete is slow and can cause NFS timeouts.
  • It is fine for non-disruptive jobs to run in the background for long periods, and it is understandable for high-priority jobs to briefly impact the cluster, but there are too many jobs (SmartPools, AutoBalance, Collect, MultiScan) which have a substantial impact on performance for long periods.
  • There are enough long-running jobs that it's easy to get into a cycle where as soon as one finishes another resumes, meaning a job is always running and the cluster never actually catches up. It took months for us to get this all sorted out so the jobs run safely in the background and don't interfere badly.
  • When a drive does not respond quickly, Isilon logs a 'stall' in /var/log/messages. Stalls trigger "group changes", which can trigger jobs. Group changes also disrupt jobs including MultiScan, AutoBalance, & MediaScan from completing. The workaround is to tune /etc/mcp/override/sysctl.conf per Isilon Support.
  • The default job priorities were dysfunctional for us. We had to alter priorites for AutoBalance, SnapshotDelete, SmartPools, and QuotaScan, and frequency for at least SmartPools. This improved somewhat in v6.5.
  • To tweak job priority, do not redefine an existing priority. This caused problems as the change cascaded to other jobs. Define a new priority instead.

Batch Jobs

  • /etc/mcp/templates/crontab is a cluster-wide crontab; field #6 is username.

Support & Diagnostics

  • By default, Isilon's main diagnostic command, isi_gather_info, builds a tarball of configuration and logs and uploads it to EMC. This took over 15 minutes on our clusters. To make this quicker, change "Gather mode" to Incremental under Help:Diagnostics:Settings.
  • Isilon does not actually maintain an HTTP upload server, so uncheck HTTP upload to avoid a wasted timeout.
  • When a node crashes it logs a core in /var/crash, which can fill up. Upload the log with 'isi_gather_info -s "isi_hw_status -i" -f /var/crash' on the affected node before deleting it.

Network & DNS

  • Isilon is "not compatible" with firewalls, so client firewalls must be configured to allow all TCP & UDP ports from Isilon nodes & pools back to NFS clients (and currently SNMP consoles).
  • Specifically, there is a bug where SNMP responses come from the node's primary IP. iptables on our Nagios console dropped responses which came from a different IP than Nagios queried.
  • To use SmartConnect you must delegate the Isilon domain names to the SmartConnect resolver on the cluster. We were unable to use DNS forwarding in BIND with this delegation active.

NFS

  • By default Isilon exports a shared large /ifs filesystem from all nodes. They suggest mounting with /etc/fstab options rw,nfsvers=3,rsize=131072,wsize=524288.

CIFS

  • Migrating an IP to another node disconnects CIFS clients of that IP.
  • CIFS clients should use their own static SmartConnect pools rather than connecting to dynamic SmartConnect pools (for NFS clients).

Load Balancing

  • Rather than real-time load balancing, Isilon handles load-balancing through its built-in DNS server (SmartConnect: Basic or Advanced). Because this happens at connection time, the cluster cannot manage load between clients which are already connected, except via "isi networks --sc-rebalance-all", which shuffles server-side IPs in to even out load. Unfortunately OneFS (as of v6.5) does not track utilization statistics for network connections, so it cannot intelligently determine how much traffic each IP represents. This means only Round Robin and Connection Count are suitable for "IP failover policy" (rebalancing) -- "Network Throughput" & "CPU Usage" don't work.
  • High availability is handled by reassigning IPs to different nodes in case of failure. For NFS this is seamless, but for CIFS this causes client disconnection. As a result CIFS clients must connect to static pools, and "isi networks --sc-rebalance-all" should never be run on clusters with CIFS clients (there is apparently a corresponding command to rebalance a single pool, suitable for manual use on each dynamic pool).

Quotas

  • Some of the advantage of the single filesystem is lost because it is impossible to move files from one quota under another. This forces us to copy (rsync) and then delete as if each quota were its own mount point.
  • For user quota reporting, each user should have an account (perhaps via LDAP or AD) on the cluster.
  • For user quota notifications, each user must have an email mapping (we created aliases to route machine account quota notifications to the right users).

Bugs

  • The user Enable checkbox disables all login access (but preserves UID mappings for quota reports). Unchecking it blocks both ssh and CIFS/SMB access and clears the user password.
  • You cannot create a user with a home directory that exists (even with --force). Workaround: move the directory aside before creating the user, or create with a bogus homedirectory (which can only be used once) and use "isi auth local user modify" to fix after creation.
  • Don't use more than 8 SyncIQ policies (I don't know if this bug has been fixed).
  • Gateways and priorities are not clear, but if there are 2 gateways with the same priority the cluster can get confused and misbehave. The primary gateway should have the lowest priority number (1).
  • We heard one report that advisory quotas on a SyncIQ target cluster caused SyncIQ errors.
  • If you configure two gateways with the same priority, the cluster can get confused and misbehave.
  • In at least one case, advisory quotas on a SyncIQ target disrupted SyncIQ.
  • The Virtual Hot Spare feature appears to reserve twice as many drives as are specified in the UI, and they do not work as described.

Support

  • Support is very slow. SLAs apparently only apply to parts delivery -- our 4-hour service does not prevent Isilon from saying they will answer questions in a few days.
  • Support is constantly backlogged. Callback times are rarely made and cases are often not followed up unless we call in to prod Support.
  • My process for opening a case looks like this:
    1. Run uname -a; isi_hw_status -i; isi_gather_info.
    2. Paste output from first 2 commands and gather filename into email message.
    3. Describe problem and send email to support@.
    4. A while later we get a confirmation email with a case number.
    5. A day or two later I get tired of waiting and phone Isilon support.
    6. I punch in my case number from the acknowledgement.
    7. I get a phone rep and repeat the case number.
    8. The phone rep transfers me to a level 1 support rep, who as a rule cannot answer my question.
    9. The L1 rep tries to reach an L2 rep to address my question. They are often unable to reach anyone(!!!), and promise a callback as soon as they find an L2 rep.
    10. As a rule, I do not receive a callback.
    11. Eventually I give up on waiting and call in again.
    12. I describe my problem a third time.
    13. The L1 tech goes off to find an answer.
    14. I may have to call back in and prod L1 multiple times (there is no way for me to reach L2 directly).
    15. Eventually I get an answer. This process often takes over a week.
  • Support provides misinformation too often. Most often this is simple ignorance or confusion, but it appears to be EMC policy to deny that any problem affects multiple sites.

Commands

For manual pages, use an underscore (e.g., man isi_statistics). The command line is much more complete than the web interface but not completely documented. Isilon uses zsh with customized tab completion. When opening a new case include output from "uname -a" & "isi_hw_status -i", and run isi_gather_info.

  • isi_for_array -s: Execute a command on all nodes in in order.
  • isi_hw_status -i: Node model & serial number -- include this with every new case.
  • isi status: Node & job status. -n# for particular node, -q to skip job status, -d for SmartPool utilization; we use isi status -qd more often.
  • isi statistics pstat --top & isi statistics protocol --protocol=nfs --nodes=all --top --long --orderby=Ops
  • isi networks
  • isi alerts list -A -w: Review all alerts.
  • isi alerts cancel all: Clear existing alerts, including the throttled critical errors message. Better than the '''Quiet''' command, which can suppress future errors as well.
  • isi networks --sc-rebalance-all: Redistribute SmartConnect IPs to rebalance load. Not suitable for clusters with CIFS shares.
  • du -A: Size, excluding protection overhead, from an Isilon node.
  • du --apparent-size: Size, excluding protection overhead, from a Linux client.
  • isi devices: List disks with serial numbers.
  • isi snapshot list --schedule
  • isi snapshot usage | grep -v '0.0'
  • isi quota list --show-with-no-overhead | isi quota list --show-with-overhead | isi quota list --recurse-path=/ifs/nl --directory
  • isi quota modify --directory --path=/ifs/nl --reset-notify-state
  • isi job pause MultiScan / isi job resume MultiScan
  • isi job config --path jobs.types.filescan.enabled=False: Disable MultiScan.
  • isi_change_list (unsupported): List changes between snapshots.
  • sysctl -n hw.physmem: Check RAM.
  • isi device -a smartfail -d 1:bay6 / isi devices -a stopfail -d 1:bay6 (stopfail is not normally appropriate)
  • isi devices -a add -d 12:10: Use new disk in node 12, bay 10.
  • date; i=0; while [ $i -lt 36 ]; do isi statistics query --nodes=1-4 --stats=node.disk.xfers.rate.$i; i=$[$i+1]; done # Report disk IOPS(?) for all disks in nodes 1-4 -- 85-120 is apparently normal for SATA drives.
  • isi networks modify pool --name *$NETWORK*:*$POOL* --sc-suspend-node *$NODE*: Prevent $POOL from offering $NODE for new connections, without interfering with active connections. --sc-resume-node to undo.
  • isi_lcd_d restart: Reset LEDs.
  • isi smb config global modify --access-based-share-enum=true: Restrict SMB shares to authorized users (global version); isi smb config global list | grep access-based: verify (KB #2837)
  • ifa isi devices | grep -v HEALTHY: Find problem drives.
  • isi quota create --path=$PATH --directory --snaps=yes --include-overhead --accounting
  • cd /ifs; touch LINTEST; isi get -DD LINTEST | grep LIN; rm LINTEST: Find the current maximum LIN.

Thursday, April 26 2012

Skyrim Tips

  • Use the Wait button to detect nearby enemies -- if you can wait, the area is clear.
  • Do not improve unimportant skills. Enemy toughness is based on your overall level. So, for instance, if you raise your Alchemy from 0 to 100, your overall level might go up and all enemies might as well. In terms of combat, it's good to have the lowest overall level but the strongest combat skills you're actively using, along with whatever auxiliary skills you prefer (such as smithing & enchanting for your gear). On the other hand, loot is also leveled...
  • Many companions (those who start with bows) won't use superior bows in combat, although they will use hand-to-hand weapons & armor you provide. They will also use better arrows; give your companion one of your best arrow -- they never use it up, and you can police them off dead enemies. Companions often tend to choose the wrong weapon or armor -- you might need to take away one piece to make them reconsider.
  • Weapons matter much more for companions than armor because they generally cannot be killed.
  • Don't give your companion a staff if you have a horse (or dog?). They're sloppy and liable to start a fight by accidentally attacking your pet.
  • Most dungeons loop back to end by the entrance. Find chests (or other containers), periodically dump all the stuff you don't need soon -- I normally do this before going through a portal to another section -- and sweep back through after you have cleared the whole dungeon to get your loot.
  • To level Smithing, create iron daggers. To level Enchanting, enchant them with Banish (this is how I use up all my Petty Soul Gems). Then sell them for all the money you'll ever need.
  • Pick a type of weapon and a type of armor and specialize. I picked Archery and Heavy Armor, although if I had known that Light Armor can (eventually) provide the maximum Armor Rating I might have picked that instead.
  • Find a chest you must steal from that's easy to get to. You can stash stolen goods in it and have your companion steal them to launder the items, removing the Stolen flag.
  • If you get the wrong soul in a gem, drop it on the ground to empty it.
  • The Unofficial Elder Scrolls Wiki seems to be the best reference.
  • On Xbox 360 scrolling gets faster briefly if you use both the left thumbstick and the D-pad to move up or down.
  • Don't bother exploring and clearing dungeons unless you're on a mission -- if you do, you will probably have to run through it again as part of a quest later.

Problems

  • Some quests are broken. I cannot complete the Companions questline because I need to kill someone who I already killed; I cannot start the Bards questline because I already spoke to the head of the College and now he won't say his line.
  • The inventory system is broken. Normally when you remove something the next item down pops up under the cursor, but sometimes the next item above is selected instead. This is carried over from Fallout.
  • The Stolen system is broken. It seems like items of the same type (and graphic) are supposed to stack, with Stolen items (marked "Stolen" in your inventory, but colored red instead in containers) on top. So you should always be able to grab the stolen items and leave a stack of un-hot items behind. But the ordering doesn't work right. It would be much better if Stolen and non-Stolen items didn't stack together.
  • The 'Cleared' flag on a dungeon means it was cleared -- it does not mean it's still clear.

Thursday, August 18 2011

Cluster job distribution & general Isilon status

Users of our Isilon clusters need basic status information, so every 10 minutes our clusters run status.sh per /etc/mcp/templates/crontab. This provides a variety of useful information to users with access to the Isilon shared filesystem, and no need to provide shell access to the cluster nodes or remember the command syntax.

We now need to run some large/slow jobs, so I wanted a list of nodes in least-busy order. Obviously Isilon tracks this so SmartConnect can send connections to the least loaded node when using the "CPU Usage" connection policy, but it's not available to user scripts. The pipeline to provide a list of nodes sorted by lowest utilization to highest is applicable to all clusters, though -- just swap in the appropriate local cluster-wide execution command for isi_for_array.

status.sh

#!/bin/sh
# Record basic cluster health information

PREFIX=/ifs/x/common/cluster/status

isi status                   > $PREFIX/status.log
isi status -q -d             > $PREFIX/pool.log
isi job status -v            > $PREFIX/job.log
isi quota list               > $PREFIX/quota.log
isi quota list|grep -v :|grep -v default- > $PREFIX/quota-short.log
isi snapshot list -l         > $PREFIX/snapshot.log
isi snapshot usage | tail -1 > $PREFIX/snapshot-total.log
isi sync policy report | tail> $PREFIX/synciq.log
isi_for_array -s uptime      > $PREFIX/uptime.log
isi_for_array uptime | tr -d :, | awk '{print $12, $1}' | sort -n | awk '{print $2}' > $PREFIX/ordered-nodes.txt

Monday, January 31 2011

Isilon Cluster

Our old bulk storage is Apple Xserve RAIDs. They are discontinued and service contracts are expiring, so we have been evaluating small-to-medium storage options for some time. Our more modern stuff is a mix of Solaris 10 (ZFS) on Sun X4500/X4540 chassis (48 * 1tb SATA; discontinued), and Nexsan SATABeasts (42 SATA drives, either 1tb or 2tb) attached to Linux hosts, with ext3 filesystems. We are not buying any more Sun hardware or switching to FreeBSD for ZFS, and ext4 does not yet support filesystems over 16tb. Breaking up a nice large array into a bunch of 16tb filesystems is annoying, but moving (large) directories between filesystems is really irritating.

We eventually decided on a 4-node cluster of Isilon IQ 32000X-SSD nodes. Each ISI36 chassis is a 4U (7" tall) server with 24 3.5" drive bays on the front and 12 on the back. In our 32000X-SSD models, bays #1-4 are filled with SSDs (apparently 100gb each, currently usable only for metadata) and the other 32 bays hold 1tb SATA drives, thus the name. Each of our nodes has 2 GE ports on the motherboard and a dual-port 10GE card.

Isilon's OneFS operating system is based on FreeBSD, with their proprietary filesystem and extra bits added. Their OneFS cluster file system is cache coherent: inter-node lookups are handled over an InfiniBand (DDR?) backend, so any node can serve any request; most RAM on the nodes is used as cache. Rather than traditional RAID 5 or 6, the Isilon cluster stripes data 'vertically' across nodes, so it can continue to operate despite loss of an entire node. This means an Isilon cluster must consist of at least 3 matching nodes, just like a RAID5 must consist of at least 3 disks. Unfortunately, this increases the initial purchase cost considerably, but cost per terabyte decreases as node count grows, and the incremental system administration burden per node is much better than linear.

Routine administration is managed through the web interface, although esoteric options require the command line. Isilon put real work into the Tab completion dictionaries. This is quite helpful when exploring the command line interface, but the (zsh based) completions are not complete -- neither are the --help messages nor the manual pages, unfortunately.

There are many good things about Isilon.

Pros

  • Single filesystem & namespace. This sounds minor but is essential for coping with large data sets. Folders can be arbitrarily large and all capacity is available to all users/shares, subject to quotas.
  • Cost per terabyte decreases with node count, as parity data becomes a smaller proportion of total disk capacity.
  • Aggregate performance increases with node count -- total cache increases, and number of clients per server is reduced.
  • Administration burden is fairly flat with cluster growth.
  • The FlexProtect system (based on classic RAID striping-with-parity and mirroring, but between nodes rather than within nodes/shelves) is flexible and protects against whole-node failure.
  • NFS and CIFS servers are included in the base price.
  • Isilon's web UI is reasonably simple, but exposes significant power.
  • The command line environment is quite capable, and Tab completion improves discoverability.
  • Quotas are well designed, and flexible enough to use without too much handholding for exceptions.
  • Snapshots are straightforward and very useful. They are comparable to ZFS snapshots -- much better than Linux LVM snapshots (ext3 does not support snapshots directly).
  • The nodes include NVRAM and battery backup for safe high-speed writes.
  • Nodes are robust under load. Performance degrades predictably as load climbs, and we don't have to worry about pushing so hard the cluster falls over.
  • Isilon generally handles multiple network segments with aplomb.
  • The storage nodes provide complete services -- they do not require Linux servers to front-end services, or additional high availability support.
  • The disks are hot swap, and an entire chassis can be removed for service without disrupting cluster services.
  • Because the front end is gigabit Ethernet (or 10GE), an Isilon storage cluster can serve an arbitrarily large number of clients without expensive fibre channel HBAs and switches.

And, of course, some things are less good.

Cons

  • Initial/minimum investment is high: 3 matching nodes, 2 InfiniBand switches, and licenses.
  • Several additional licenses are required for full functionality.
  • Isilon is not perfectionistic about the documentation -- in fact, the docs are incomplete.
  • Isilon is not as invested in the supporting command-line environment as I had hoped.
  • The round-robin load balancing works by delegating a subdomain to the Isilon cluster. Organizationally, this might be complicated.
  • CIFS integration requires AD access for accounts. This might also be logistically difficult.
  • Usable capacity is unpredictable and varies based on data composition.
  • There are always two different disk utilization numbers: actual data size, and including protection. This is confusing compared to classic RAID, where users only see unique data size.
  • There is no good way for users to identify which node they're connected to. This is possible but awkward for administrators to determine, but it is generally not worth going beyond the basic web charts.
  • Support can be frustrating.
    • We often get responses from many people on the same case, and rehashing the background repeatedly wastes time.
    • Some reps are very good; but some are poor, with wrong answers, pointless instructions, and a disappointing lack of knowledge about the technology and products.
    • We are frequently asked for system name & serial number, and asked to upload a status report with isi_gather_info -- even when this is all already on file.
    • Minor events trigger email asking if we need help, even when we're in the middle of scheduled testing.
  • The cluster is built of off-the-shelf parts, and the integration is not always complete. For instance, we are not alerted of problems with an InfiniBand switch, because things like a faulted PSU are not visible to the nodes and not logged.
  • Many commands truncate output to 80 columns -- even when the terminal is wider. To see full output add -w.
  • When the system is fully up, the VGA console does not show a prompt. This makes it harder to determine whether a node has booted successfully.
  • There is only one bit of administrative access control: when users log in, they either have access to the full web interface and command-line tools, or they don't. There is no read-only or 'operator' mode.
  • Running out of space (or even low on space) is apparently dangerous.
  • One suggestion was to reserve one node's worth of disks as free space, so the whole cluster can run with a dead node. In a 4-node configuration, reserving 25% of raw space for robustness (in addition to 25% for parity) would mean 50% utilization at best, which is generally not feasible. In fairness, it is rare for a storage array to even attempt to work around a whole shelf failure, but most (non-Isilon) storage shelves are simple enclosures with fewer and simpler failure modes...
  • SmartConnect is implemented as a DNS server, but it's incomplete -- it only responds to A record requests, which causes errors when programs like host attempt other queries.
  • The front panels are finicky. The controls are counterintuitive, the LED system is prone to bizarre (software) failure modes, and removing the front panel to access the disks raises an obscure but scary alert.

Notes

  • On Isilon nodes, use du -Sl to get size without protection overhead. On Linux clients, use du --apparent-size.
  • Client load balancing is normally managed via DNS round robin, with the round robin addresses automatically redistributed in case of a node failure. This is less granular and balanced than you'd get from a full load balancer, but much simpler.
  • EMC has bought Isilon. I'm not sure what the impact will be, but I am not confident this will be a good thing over the long term.
  • In BIND (named), subdomain delegation is incompatible with forwarding. Workaround: Add forwarders {}; to zone containing Isilon NS record.

Future

  • All that said, we are getting more Isilon storage -- it seems like the best fit for our requirements.

Wednesday, October 20 2010

2 Months of Tech FAIL

Several months ago, my MacBook Pro stopped joining the work WiFi network. This had worked for a while, but then stopped. I don't bring it in much, so it took me a while to realize it was a consistent problem. After a bunch of poking and prodding, we realized it only affected my account. So I used Ethernet at work, rather than recreate my home directory from scratch. MOSX FAIL.

Then it refused to boot entirely. I sent it in to Apple, who informed me I had one of a bad batch of video cards, which they replaced. ATI FAIL.

When I got it back the case was a bit bent and the optical drive didn't work. So I sent it back and they bent the case mostly back into shape and replaced the SuperDrive. I still cannot burn DVD-DL media, but I have decided this is a more general problem (which affects Mac Pros in the office too). AppleCare & SuperDrive FAIL.

Then my 24" LCD started flickering when connected to the MBP. This was annoying enough to interfere with getting work done, but hard to reproduce (schlepping a 15" MBP to the Apple Store is a nuisance, but bringing a 24" LCD monitor and waiting for an intermittent problem to reappear was non-viable). While trying to figure this out, I noticed that it also wasn't auto-detecting connection/disconnection of the LCD monitor, and this was easy to replicate booted from a fresh 10.6.4 install. I brought it into Apple, and it didn't reproduce with their monitor. Frustrating! So I reinstalled and spent a week manually copying over the few bits I really needed from a backup of my old home directory. I thought my problems were due to the replacement video card, but this was apparently double OS FAIL.

A few weeks later, www.reppep.com/mail.reppep.com started hanging. Eventually I realized the 2-year-old Inspiron was dead and bought a new server entirely, which is running now (although it occasionally has periods of unexplained high load). Linux mdadm didn't work at all, although I suspect this was due to underlying hardware problems. I'll need to switch back to a mirrored configuration later... Dell & mdadm FAIL.

Then we had a couple floods. After over a century of working basically well, major sewer FAIL.

The new reppep.com is PCIe based, so it needed a new GE NIC, which keeps inexplicably losing its connection to the network. The new card and GE switch arrived today, but were stolen from the lobby of our building: NIC(?) & lock FAIL.

To make things more 'interesting', our Speakeasy DSL dropped a few times. I called Speakeasy, who told me the circuit was up and fine -- clearly not true -- and that my problems were due to the Linux iptables firewall (laughable, but with no Internet I found myself unable to laugh). At the same time an AirPort Extreme failed and refused to reset. I eventually got it to reset and reconfigured, and then our Time Capsule (which is relatively new, having been replaced when it died on schedule of bad caps) died. AirPort + Time Capsule + DSL FAIL.

Today at work my ~~18-month-old Mac Pro died -- apparently the power supply just stopped supplying power. Hopefully it will be up soon, once I get a replacement. PS FAIL.

Despite all this I should acknowledge that the Compaq Evo 510 SFF I bought several years ago ran fine until I retired it last month, and Amy's 2gb/2GHz MacBook has been fine (modulo some calendar problems, which were pure software). Our iPad and my iPhone 4 have also been fine. And our many hard drives have been okay. So not everything is failing -- it just feels like it.

Here's hoping the story is over, rather than still evolving.

Tuesday, February 23 2010

Saga: Dell Memory Diagnostics

We have a couple Dell R900s: 4 sockets, 24 Xeon cores, & 128gb RAM. One of them started reporting RAM & processor errors in December, so I called Dell. The rep explained it might be spurious, due to a BIOS bug. Not that there was any known issue, but Dell naturally hoped I could fix the problem with a software upgrade, so they wouldn't need to replace any hardware. I upgraded BIOS, and it shut up for a couple months.

Last week the front panel went amber again, and the System Event Log started recording RAM errors in one memory board (the system has 4 boards, each with 8 DIMM slots: a total of 32 4gb DIMMs).

Non-critical    02/17/2010 14:58:11 Mem CRC Err: Memory sensor, transition to non-critical from OK ( Memory Board D ) was asserted
Unknown 02/17/2010 11:47:03 I/O Fatal Err: Unknown sensor, unknown event
OK  02/17/2010 11:47:03 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:03 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:03 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:03 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:03 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:02 System Software event: OEM Diagnostic data event was asserted
Non-Recoverable 02/17/2010 11:47:02 CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted
Non-Recoverable 02/17/2010 11:47:02 CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted
Unknown 02/17/2010 11:47:02 I/O Fatal Err: Unknown sensor, unknown event
OK  02/17/2010 11:47:02 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:02 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:02 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:02 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:02 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:01 System Software event: OEM Diagnostic data event was asserted
Non-Recoverable 02/17/2010 11:47:01 CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted
Non-Recoverable 02/17/2010 11:47:01 CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted
Unknown 02/17/2010 11:47:01 I/O Fatal Err: Unknown sensor, unknown event
OK  02/17/2010 11:47:01 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:01 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:01 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:01 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:01 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:01 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:01 System Software event: OEM Diagnostic data event was asserted
Non-Recoverable 02/17/2010 11:47:00 CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted
Unknown 02/17/2010 11:47:00 I/O Fatal Err: Unknown sensor, unknown event
Unknown 02/17/2010 11:47:00 I/O Fatal Err: Unknown sensor, unknown event
OK  02/17/2010 11:47:00 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:00 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:00 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:00 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:00 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:47:00 System Software event: OEM Diagnostic data event was asserted
Non-Recoverable 02/17/2010 11:47:00 CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted
Unknown 02/17/2010 11:47:00 I/O Fatal Err: Unknown sensor, unknown event
Unknown 02/17/2010 11:46:59 I/O Fatal Err: Unknown sensor, unknown event
OK  02/17/2010 11:46:59 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:59 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:59 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:59 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:59 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:59 System Software event: OEM Diagnostic data event was asserted
Non-Recoverable 02/17/2010 11:46:59 CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted
Unknown 02/17/2010 11:46:59 I/O Fatal Err: Unknown sensor, unknown event
Unknown 02/17/2010 11:46:59 I/O Fatal Err: Unknown sensor, unknown event
OK  02/17/2010 11:46:58 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:58 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:58 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:58 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:58 System Software event: OEM Diagnostic data event was asserted
OK  02/17/2010 11:46:58 System Software event: OEM Diagnostic data event was asserted
Non-Recoverable 02/17/2010 11:46:58 CPU Machine Chk: Processor sensor, transition to non-recoverable was asserted
OK  02/17/2010 11:50:02 CPU3 Status: Processor sensor for CPU3, IERR was deasserted
OK  02/17/2010 11:50:02 CPU2 Status: Processor sensor for CPU2, IERR was deasserted
Critical    02/17/2010 11:49:47 CPU3 Status: Processor sensor for CPU3, IERR was asserted
Critical    02/17/2010 11:49:47 CPU2 Status: Processor sensor for CPU2, IERR was asserted

I called Dell, and was told I'd need to run a "Dell 32 Bit Diags" to isolate the bad component. Unfortunately it's only available as a Windows self-extracting executable, which can generate a floppy .img file or a CD-ROM .iso file; Dell's tool can also copy the diagnostics to a flash drive. I hate that Dell both assumes that everybody runs Windows, and helps ensure that by requiring Windows to manage Dell machines. Fortunately I have an XP VM.

So I swapped the suspect memory board from slot D into slot C and ran the diagnostics. I was told to erase the SEL and run the included mpmemory.exe. It was supposed to take half an hour, but actually took about 2 1/2 hours for each run. Additionally, the diagnostics showed an unclear warning that the (DOS-based) diagnostics are not compatible with console redirection (presumably because these hosts have serial consoles configured). Fortunately we bought DRAC, for this machine, and that seems to work fine.

To boot into the diagnostics, I checked the "Boot Order" section of the R900 BIOS. Surprisingly, although it does show VIRTUAL FLASH, I was unable to find a USB FLASH entry. For some reason Dell configures USB flash as a virtual hard drive, so I had to change the "Hard Disk Boot Order" to prefer flash to the RAID controller -- this got me a a DOS-based menu and let me run mpmemory.exe.

Disturbingly, Dell's memory diagnostic triggered but was not able to detect the memory error. mpmemory returned a clean bill of health, but the SEL recorded errors on memory board C (the suspect card in a different slot, so the motherboard itself is fine).

Non-critical    02/23/2010 20:56:56 Mem CRC Err: Memory sensor, transition to non-critical from OK ( Memory Board C ) was asserted
Non-critical    02/23/2010 15:27:27 Mem CRC Err: Memory sensor, transition to non-critical from OK ( Memory Board C ) was asserted
Non-critical    02/23/2010 15:27:27 Mem CRC Err: Memory sensor, transition to non-critical from OK ( Memory Board C ) was asserted

Unfortunately the diagnostics failed to isolate an individual DIMM, and I don't have the time to start keep reconfiguring the RAM (across all 4 memory cards, which apparently need to match each other) to do a binary search, running 150 minutes per test, to isolate the faulty DIMM or slot -- worse, I'd have to visit the server and reconfigure it after each round. Fortunately, Dell acknowledged the absurdity of running 5+ hours of tests (it could easily have taken over 20 hours to find the right DIMM). They sent a new card with 8 DIMMs (2 types, at least one refurbished). I swapped the replacement parts in and reran the test, which failed. Apparently nobody at Dell had ever seen this particular error (generated by a Dell proprietary diagnostic) -- not comforting. I ran it again and got a complete lockup -- this time apparently a common occurrence on Dell multi-processor systems. It turned out I had been given an old version of the diagnostics.

I got the new version, ran it twice, and saw no further errors in the SEL. Hopefully I won't have to think about that R900 for a while, but diagnosing it is so awkward -- it looks like the 256gb max configuration would take 5 hours for each pass!

Friday, February 19 2010

Twitter Post Types

Several people have asked me about types of tweets recently, so here's the rundown:

  1. If the first letter of a tweet is d or D, it's a 'direct message' -- follow with space and name, like "d mscrochety Hey, babe!". This only goes to the recipient, and nobody else can see it. Screwing up DMs, and making unintentionally public comments, has a storied history already. If you try to DM someone who doesn't follow you, you'll get an error.

  2. '@ mentions': Any time a username appears in a (public) tweet after an @, that user sees the message, even if they don't normally follow the poster.

  3. '@ messages' or '@ replies': Anything with first character @: These are public (visible on your page & in searches), but not sent to your followers unless they also follow the recipient specified after the initial @. So if I tweet "@mscrochety Hi there", my followers who don't know @mscrochety won't see it, but people who follow both of us will. If you are addressing a person but still want others to see your tweet, put an extra character such as a space before the @ to convert it to a 'mention' seen by all your followers.

  4. Public tweets: If the first character is anything but @ or d, anybody who follows the poster sees them. They also show up on the poster's feed page (e.g., http://twitter.com/reppep) and in searches.

  5. Hashtags: To make it easier for people to find your posts, you can use tags which start with the # character to identify topics. These are popular for people in a venue to collect their tweets, or for a larger meta-conversation. Hashtags should not contain spaces or punctuation, and are normally lower-case.

When you want to echo what someone else said, in current clients (including the twitter.com web interface), you can use the retweet button. This shares the original tweet with your followers, even if they don't follow the original poster. There's an older convention of starting with 'RT '.

For Twitter applications, including photo and video integration, see twittereye.

Friday, January 22 2010

Pen v keyboard v Newton v Graffiti v Treo v iPhone

A nice comparison of data entry performance. Typing on the iPhone is okay, but I much prefer a normal USB keyboard. I never did much entry using Rosetta or Grafitti. The Treo keyboard was decent, but not great. The BlackBerry disadvantage: noisy during meetings!

That said, my worst data entry method is definitely paper -- serious readability concerns! ;)

http://hardware.slashdot.org/story/10/01/22/0812222/Pen-vs-Keyboard-vs-Touch-vs-Everything-Else

Tuesday, January 19 2010

Flash Memory Performance

Update 2010/01/21: Thanks to @ceolaf for pointing out that Amazon has updated their page. They accepted my correction: "Fast read speeds of up to 15MB/second; write speed lower".

I have been using a Transcend 16gb Class 6 SDHC card Transcend 16gb Class 6 SDHC card in my Canon T1i, but needed another card. I found a good price on the somewhat confusingly labeled Sandisk Ultra 16gb SDHC card Sandisk Ultra 16gb SDHC card. As you can see, it prominently specifies "15MB/s*" on the label, and on Amazon's page. Additionally, it shows a smaller "C4" (Class 4) logo.

SDHC Class ratings are minimum write speeds in mbytes/sec, so the Transcend is guaranteed to write files at 6mbyte/sec or better, which is Canon's recommended minimum for shooting video or rapid stills on the T1i. I was a bit confused to see the C4 logo on the card, but I believed Amazon's "High speed card featuring fast 15MB/sec Read/Write speeds", which would be effectively Class 15. I couldn't find any explanation for that asterisk, but the price was decent, so I ordered a card.

I just got the card and did some simple tests -- copying a 1.1gb file to and from the card through my USB SDHC reader -- and discovered a few interesting things. The SanDisk wrote at 10,600,679 bits/sec and read at 21,621,758. The Transcend wrote at 9,399,092 bits/sec and read at 18,514,338 bits/sec. This means my SanDisk Ultra card qualifies as Class 10, and my Transcend is almost as fast (but there is no Class 7-9, so Class 6 is correct). The SanDisk is much faster than its marked Class 4, but nowhere near the 15mbyte/sec write speed Amazon promised.

Tonight I did some more poking, and found http://www.sandisk.com/products/imaging/sandisk-ultra-sdhc?tab=features, which explains that asterisk:

Fast read speeds of up to 15MB/second*; write speed lower

So Amazon's page is wrong, and I sent them a correction. I'm keeping the Ultra card because it's fast enough for the camera and probably whatever else I'll use it for, but disappointed it doesn't deliver on Amazon's specs.

Friday, December 18 2009

HP Converged Infrastructure Roadshow

HP is touring a 2-day show on convergence. I attended the second day, mostly to get an update on ProLiant blades. I was impatient with all the HP boosterism, but perhaps that material was appreciated by the HP employees and vendors present. There were a lot of dense slides on blades & VMware, but unfortunately they aren't yet cleared for publication so we didn't get copies. Hopefully the presentations will be available on the roadshow site next month.

The most interesting parts for me were an update on HP BladeSystem, and a competitive comparison (mostly trash-talking) about HP BladeSystem C-Class vs. Dell, Cisco, & IBM blades -- particularly as VMware hosts. There was really no mention of Itanium, except as a bullet point: HP offers Itanium blades (no mention of Cell).

DIMMs

Intel Nehalem (the Xeon 5500 series) still only supports 2-socket configurations ("2P"), so HP continues to recommend AMD Istanbul (6 cores per socket) for 4P and larger systems. Interestingly, there was only a single mention of boxes larger than 4P. IBM presentations I have attended, by contrast, tended to focus on their Hurricane & X3 chipsets, which are only available in 4P and larger systems. I wonder how much of this is because IBM is proud of Hurricane, and how much to HP's focus on blades (which don't make much sense beyond 4P).

Each Nehalem CPU has its own 3-channel memory controller, and each channel supports up to 3 DIMMs. In a 2-way box, this maxes out at 2 (CPUs) * 3 (channels/CPU) * 3 (slots/channel) = 18 DIMM slots. Unfortunately, Nehalem cannot use all DIMM slots at full speed. The highest speed is 1333MHz, which requires a single 1333MHz DIMM per channel (DPC) and a 95W CPU. Utilizing the second DIMM slot reduces memory access speed to 1066MHz (although HP apparently has a trick to retain the 1333MHz speed at 2DPC); a 3rd DIMM reduces speed to 800MHz. DIMM mismatches within or across channels can also reduce speed.

4gb DIMMs are common, but 8gb is uncommon and still often prohibitively expensive. This means normally the fastest possible configuration is 2 95W 5500s with 6 4gb 1333MHz DIMMs: 24gb (or 6gb or 12gb with smaller DIMMs). HP's 2DPC trick offers the same speed up to 48gb with 12 DIMMs. Maximum Nehalem RAM capacity is 18 8gb DIMMs: 144gb at 800MHz.

Blades

HP's bread & butter blade is the BL460c (they claim it's the most popular server in the world, eclipsing the 2U DL380). The BL460c offers 2 (dual or quad core) Xeon 5500s, 12 DIMM slots, 2 hot-swap 2.5" SAS/SATA drives (with embedded RAID controller), and 2 Flex-10 ports.

The BL490c only accepts quad-core Nehalems (no dual-core) and gets 6 more DIMM slots, suiting it better to large VM loads. But it also gives up the BL460c's pair of hot-swap RAID SAS/SATA bays for a couple non-hot-swap non-RAID SATA SSD bays. Presumably The 490 doesn't have enough cooling for spinning disks, and they expect you to put it on some kind of SAN anyway.

Flex-10 sounds very slick. Physically they're 10GE ports, but when uplinked to an HP Flex-10 Virtual Connect switch module, each Flex-10 connection appears to the host as 4 independent 10GE interfaces. The administrator can carve up the 10gbps of real bandwidth between the virtual interfaces -- something like NIC trunking/bonding/teaming and OS-based virtual interfaces, but implemented below the OS level. This should be extremely useful for clustering and VMware hosts, where the vendor requires (Microsoft) or runs faster with (VMware) more network devices, or as a simple way of implementing QoS. It's now easy to get a 2P 12-core system with up to 192gb of RAM, which can host a lot of VMs. In an iSCSI or NAS environment, the BL490c may not even need mezzanine cards to handle their IO. The HP Virtual Connect Flex-10 Ethernet Module is really a 24-port 10GE switch with 16 internal ports (for blades) and 8 external ports (for uplink & inter-chassis crosslink). This means maximum uplink bandwidth is 80gbps/switch, while total internal bandwidth to blades is 160bps/switch. If you have several blades which sustain >10gbps bandwidth, they'll need to be scattered across chassis to avoid competing for uplink bandwidth -- or perhaps migrated to standalone DL rack servers instead.

They also talked about the BL2x220c high-density blades: 2 2P motherboards in one half-height blade module. Each independent motherboard has 2 (dual or quad core) 5500s, 6 DIMM slots (48gb max), 2 1GE interfaces, and a single non-hot-swap 2.5" SATA drive. To use all the interfaces you need 4 switch modules -- each blade has 4 GE interfaces total, so 2 use the mezzanine connectors. Since you have to remove the whole unit to service either side (including disks), you need to figure out how to handle failures. They look good for HPC clusters, where the job scheduler can work around missing nodes.

Apparently iSCSI boot and acceleration (CPU offload to NICs) are expected in the G7 Flex-10 NICs, which should be very useful for HPC clusters.

Since this was HP boosterism, there was plenty of poking fun at IBM's less-well-endowed motherboards, with less or asymmetrical DIMM slots. And no mention of when 6-core Nehalems will be available in HP blades.

Matrix & Insight Dynamics

HP sells bundles under the BladeSystem Matrix name. This is good, as HP quoting is painfully arcane. Building a specification is a painful process of tracking down many different subcomponents, some of which have unintelligible names, and getting pricing from a rep. I have quoted IBM BladeCenter and HP BladeSystem gear, and IBM was complicated, but I eventually built up a spreadsheet and could calculate complete configurations with part numbers for entire or multiple chassis, fiddling with processor speeds or RAM configuration and getting real pricing for review by a reseller.

With HP, I have to give something much vaguer to a rep, who adds lots of unintelligible line items (without which the things won't work), and sends back pricing. When I complained to an HP rep years ago about how complicated the process was, he explained that HP had once lost a bid (to IBM?) by 25c, and decided to unbundle everything they possibly could so base price would be as low as possible. Once you get the low bid from HP, you get to add all the things (like access to the KVM features of the included ILO hardware), and get a higher real price. But the unbundling means only professionals can quote medium complex HP systems, so hopefully Matrix will help. I don't yet know if purchasing Matrix bundles would require us to purchase management software we don't want and won't use.

They also talked quite a bit about Insight Dynamics, a management system for BladeCenter. ID apparently makes it easily to download a template for a medium-complicated constellation of systems (like a multi-server Exchange installation) and have ID come up with where to deploy the components (to a combination of physical blades and VMs). I believe someone claimed ID can migrate physical machines to VMs (P2V) and vice-versa (V2P). This competes directly agains VMware VirtualCenter.

The idea is that HP Blades and specifically ID get you partway to cloud computing. Amazon & Google do basically effortless provisioning, so HP needed to improve the process of setting up new blades & VMs. Insight Dynamics can provision blades & VMs, although I'm not sure how many people trust it to yet...

Miscellany

They also talked about FCoE (Fibre Channel over Ethernet, which Cisco promotes). Fibre Channel protocols are intended to run over lossless SANs, while one of the main purposes of TCP is to compensate for the lossy nature of Ethernet. Apparently CEE (Converged Enhanced Ethernet) provide lossless layers which enable FCoE to work over longer ranges and more hops. It sounds like CEE will be available in 2010/2011. In the meantime, iSCSI looks interesting, especially if you can provide QoS controls (Flex-10?) to keep it from swamping everything else.

They also talked about LeftHand Networks, which HP bought last year. The LeftHand Virtual SAN Appliance is a VMware image, which presents any locally accessible storage as iSCSI devices.

Google Chrome UI Priorities

Gruber's note on the Chromium Bug Report: Close Tab Button on the Wrong Side feels misguided to me. I suspect he's missing the point. To Google, Chrome is the important piece. The Mac hardware and Mac OS X running the browser are just support infrastructure.

When you run a browser on a Dell, you don't expect it to have a Dell-compatible UI (I always hated those customized IE flavors with ISP logos & advertising). The fact that it's Dell hardware isn't important at the browser level. Google is trying to get us all to a point where Mac OS X / Windows / Linux are similarly irrelevant -- just APIs they compile to, while everything important happens inside Chrome.

The Chrome OS demo makes this very clear -- they are throwing out as much of the classic OS as they can, so we can live entirely inside the Chrome browser. But that means Google has to replace some of those capabilities, because a browser itself isn't enough to boot a computer (even a netbook). This might also help explain why the OS and browser are both named Chrome ("Chrome OS" isn't much of a name). Now it's confusing, because one is a netbook-optimized operating system while the other is a browser, but it seems clear that Google's objective is that we should decide to run Chrome, and that should be enough. This way Chrome gives us access to Google and the Internet -- who cares about how it works? I turn on my computer, and I'm online. That's a logical vision for Google, casual users, netbook users, people who have grown up using Google, etc.

From that perspective, it's much more important that Chrome's tabs always be consistent with Chrome, whether you happen to be using a Mac Pro or a Dell Mini 10. It doesn't much matter whether Chrome looks the same as other (Mac) apps.

Friday, March 20 2009

Cisco Port Security

I just spent a while learning (the hard way) about Cisco Port Security, so here's what I got out of it.

Cisco Port Security, when enabled, keeps a whitelist of allowed MAC addresses for each port; the list may hold 1 or more entries, and might be statically specified by an administrator or automatically 'learned' by the switch.

Intuitively, you might expect something called "port security" to prevent unapproved hosts from sniffing traffic, as this is the most serious risk. The reality, however, is that the switch has no way of knowing who's listening to a port, only of knowing who's transmitting individual packets, because each packet is 'signed' with the transmitter's MAC address. With PS enabled, the switch silently drops packets from unapproved MACs, preventing them from stealing bandwidth or actively attacking the network. These are useful, but data and credential sniffing are more serious risks, and are not addressed.

Complications

  1. Front-line support may have limited knowledge of security protocols and mechanisms, and is unlikely to have access to directly check the list of banned MACs/ports. Cisco provides a mechanism for alerting, but that does not mean notifications have been configured to reach all the concerned parties.
  2. Sniffing to determine the situation will show the host generating outbound traffic, but will not reflect that these packets are not being forwarded to any other hosts.
  3. Depending on situation, even after PS activates, there may be residual traffic for the blocked device. This can look like responses to current traffic, and mask the fact that the muzzled node is in fact mute.
  4. Using DHCP seems to avoid some of this. I found that using DHCP got an address and enabled full communications. I don't know why.

As Murphy would have it, I had just opened up the server chassis and enabled ipf before PS tripped, so I spent a lot of time looking for a (missing) misconfiguration.

See Also

Saturday, December 27 2008

DNS Glue Records

I had an obscure DNS problem, and EasyDNS pointed me in the right direction.

When looking up ns3.reppep.com, I was getting 66.92.104.200, which is wrong (that address is actually ns1.reppep.com) from some DNS servers, but not others. I confirmed that the record was correct in the reppep.com zone (hosted by EasyDNS), and forced a serial number update, but some servers kept returning the wrong address.

It turns out that years ago, when VeriSlime managed the DNS registration for reppep.com, they picked up the wrong address (it might have been correct when they picked up the IP, but hasn't been for years). When I transferred the domain registration (which is separate from the DNS service -- basically authoritative nameservers) to Dreamhost, the stale glue data remained. The solution is to have Dreamhost update the glue record for ns3.reppep.com.

You can see the glue with the whois command, e.g., "whois ns3.reppep.com. It should agree with "host ns3.reppep.com", but doesn't yet (still waiting on Dreamhost to get it together).

Wednesday, December 3 2008

New 24" Monitor

Inspired by Black Friday, I bought a Samsung 2433BW 24" LCD monitor. It's 1920*1200 -- basically a 1080p panel, but with DVI & VGA instead of HDMI & speakers. The thing is huge -- 75% more pixels than my old 1280*1024 pivoting Samsung (I should have bought 1600*1200 instead of a 912T). This seems to be as high a resolution as I can get, without going up to the much more expensive Dual-Link DVI 30" LCDs.

A few notes:

  • It's warm! I can feel the heat radiating in the corner of the loft. I noticed this with a 1600*1200 at Rockefeller a year ago. It surprises me that LCD panels radiate that much heat.
  • It's huge! It makes the 15" LCD on my MacBook Pro look sad (it's 77% larger).
  • The stand stinks. It was almost impossible to assemble without plastic lubricant, and Samsung phone support acknowledged the problem but had no useful advice. The stand is also low.
  • No dead pixels! I tested with ScreenQuery, a discontinued app for showing test patterns; I tried a few others, but they didn't work as well as the free ScreenQuery.

Tuesday, August 26 2008

Time for More RAM

pepper@prowler:~$ top -l1|head -7
Processes:  105 total, 3 running, 4 stuck, 98 sleeping... 439 threads   20:08:26

Load Avg:  0.68,  1.05,  1.10    CPU usage: 22.86% user, 42.86% sys, 34.29% idle
SharedLibs: num =    4, resident =   41M code, 3032K data, 3172K linkedit.
MemRegions: num = 39625, resident =  824M +   20M private,  207M shared.
PhysMem:  269M wired, 1159M active,  554M inactive, 1990M used,   58M free.
VM: 16G + 374M   5256473(0) pageins, 1406422(0) pageouts

A pair of 2gb DIMMs are en route from NewEgg, for $75.

Monday, August 4 2008

Indirection in Configuration Management

"Give me a place to stand and a lever long enough and I will move the world."

I was grumbling under my breath at a configuration management system today, and reminded of this wonderful statement by Archimedes.

Configuration management is the discipline of building systems which manage other systems -- cfengine is a well-known open source example. I needed to reboot a few hosts on a regular schedule -- easily handled in 5 minutes with "vi /etc/crontab" on each, or an ssh loop to append to the crontab on each affected system. I was struck by how many levels of indirection I needed to traverse to get this done with configuration management. This in turn prompted some thought about why jumping through the various hoops was worthwhile.

There are many excellent reasons to use configuration management:

  • Time savings -- over repeating the same actions over and over; this increases with the number of hosts involved.
  • Consistency -- configuration management ensures that (portions of) systems which should be identical really are.
  • Reproducibility -- because CMS is naturally tied into version control, it is easy to either examine or recreate the state of affairs at an arbitrary time in the past.
  • Modeling -- a CMS ends encompasses a representation of all the systems it manages. This efficient representation of those systems is quite useful for examining and comparing them. It's especially useful with a large or dynamic population of administrators, as it provides a single place to learn about the whole constellation of systems, and enforces some consistency among the various ways admins can manage systems.

In the simplest case, to make a machine reboot once, I could pull the plug and put it back (assuming I was near, or could get to, the machine). In a non-CMS scenario, I would do it with ssh and the shutdown -r. In this case, it was considerably more involved:

  • Launch PuTTY.
  • Log into a system with a checkout of the CMS configuration files.
  • Find the appropriate file (non-trivial if the managed constellation is complicated).
  • Fetch the latest version of the file (with multiple users, it's unlikely my checkout is current).
  • Edit the file corresponding to /etc/crontab or /var/spool/cron/root (I used kate, as I don't enjoy either vi or emacs, and BBEdit wasn't available); kate popped back an X11 session tunneled through ssh.
  • Create a pair of local machine sets in the file (cfengine calls these 'aliases'), each including half the covered systems (the systems reboot at staggered times, so they're not all down at once).
  • Create the pair of crontab lines, one for each machine set, embedding the pair of different reboot times and the shutdown -r command.
  • Check the modified crontab file back into the version control system; enter a message for the change log.
  • In a distributed CMS, staging hosts pick up the changes from version control, either on a schedule or when manually kicked for emergency/rush changes.
  • The affected hosts pick up the change from the CMS, and implement the specified change.

The reason Archimedes' quote is apropos is that configuration management provides excellent leverage -- I can edit one file in one place, and easily affect several systems (potentially hundreds or thousands). Each hoop I have to jump through provides an additional fulcrum. I can sit at my desk and use PuTTY to log into dozens of systems, across the world -- without even knowing where they are. Each change I make to the version control system is automatically picked up by every host participating in the system, and available to every admin with a checkout. I don't have to log into 8 machines (even uninteractively) to make them reboot -- I can orchestrate it all from my local workstation.

Unfortunately, mistakes are leveraged too; there is often no good way to test changes to production systems during business hours. If the changes are restricted to non-production hours, when the admin might not be around to monitor them (and shouldn't have to -- it's an automated system, after all!), the window could be closed by the time the admin sees whether the change was successful. Missing a change window can easily defer a change 24 hours.

Wednesday, July 2 2008

Halo 2 & 3 Done

Thanks Lyman!

Now I'm letting myself be be sucked into trying GTA (IV).

Monday, June 16 2008

Gaming Defensively

I have a compulsive personality, I like finishing things, and I enjoy computer games. I have developed a simple set of rules to avoid blowing all my time gaming.

  1. I test games written by friends. I only have 3 friends who have written games. Howard wrote Spectre and Gridz, but was distracted running an ISP for a while. Andrew wrote Battle-Girl but we're no longer in touch. Peter wrote Greebles, but doesn't currently appear to have any plans for further game development. So I haven't spent much time on friends' games in several years, although beta testing gets pretty involved. I also played several Delta Tao games, and might even try Return to Dark Castle.
  2. I play coin-op games. The sad thing here is that for years as a kid, I wished I could afford to spend a lot of time playing arcade games, but didn't have lots of money to blow on it. Now that blowing a roll of quarters isn't a big deal, I don't have much interest or time, and don't live near any arcades.
  3. I play Marathon; I played through Marathon 1-3, and Halo 1 (really Marathon 4). Last week I bought an Xbox 360 (my first gaming console ever) to play Halo 2 & 3 -- I'm waiting for Lyman's extra 20gb Xbox hard drive and VGA cable so I can get started with Halo 2. Unfortunately I'm not good at FPS games, so it will probably take me a long time to work my way through Halo 2 and then Halo 3. My intention is to sell the Xbox after Halo 3, assuming it has any resale value at that point.
  4. I don't sweat the small stuff. When Luxor came out, I spent a few hours playing the demo, then deleted it. On the Xbox, I'm playing demos and freebies (which tend to come with 1-3 sample levels -- pretty anemic) while I wait for the hard drive so I can play Halo 2. Halo 2 was only released for Xbox, not 360, so it needs to download patches from Xbox Live and store them on a hard drive; the 256mb flash card that came with mine is inadequate.

These rules keep me from sinking my life into video games. Also general lack of time, especially as a parent.

Thursday, June 5 2008

Childhood dreams fulfilled

Being the compulsive sort, it bugged me whenever I missed an episode of a TV show I watched (I used to watch a lot of TV; now not much). Similarly, it bothered me that I didn't have complete sets of the comics I read -- they were both hard to find and expensive, especially since I almost never started at the beginning.

Inspired by Ernie Cline, I've recently been watching Airwolf. It hasn't aged well, and was never great storytelling, but it's still enjoyable. And it's nice to see as a coherent whole over weeks, rather than scattered across years with commercial interruptions. I'm in the middle of season 2, and will skip season 4 (I don't think I ever saw it, fortunately); don't know about season 3. Perhaps I'll watch The Fall Guy next!

Nowadays, with the Internet, back issues of comic books are pretty easy to find. I've completed a few series that were missing issues, such as Badger crossovers, Dynamo Joe, and Tailgunner Jo. I'd love to collect various other series, but a full run of X-Men would be prohibitive -- both in terms of money and time to read them all!

I was pleased to discover Marvel made several of their more popular titles available to GIT, who released them on DVD. Unfortunately, the license was terminated in favor of Marvel's online service, but some DVDs are still available. James gave me Ghost Rider for my birthday, and despite some aggravations (they photographed the open comic books, so there's dead space around the corners, and didn't bother to split left & right pages, so it's too awkward to read in single-page portrait mode) which make the comic harder to read than it should be, I'm enjoying the old Ghost Rider issues. It's amazing what a loser Johnny Blaze originally was -- he's an idiot (sloppy writing), a coward, a regretful devil dealer, and not really faster or more skillful than gang members. As time has gone on, and Marvel has super-sized its characters, Ghost Rider and his cycle have gotten faster, stronger, less human, and ironically much more innocent.

Monday, February 18 2008

System Admin Interview Questions

I was quite impressed by Joel's description of the hiring process, and we've been doing a lot of interviewing for System Admins lately. I put together a list of standard questions to ask during interviews, which has been quite helpful in judging a) how much technical knowledge people have, and b) (just as important) how good a match they are for the skills void we were trying to fill at the time. Here they are, for the next person who needs to perform a similar exercise.

  1. How many systems does your team manage (Linux, Solaris, Windows, etc.)?
  2. How large is your team?
  3. Which OS are you most comfortable/familiar with?
  4. Which Linux flavors are you most comfortable/familiar with?
  5. Which Red Hat versions are you familiar with?
  6. Are you familiar with kernel programming or configuration?
  7. Have you done any custom packaging or kickstarting?
  8. Have you used or managed Sun JumpStart?
  9. How much experience do you have with Sendmail?
  10. ... NetWorker? Version? Managing backups, or just configuring clients?
  11. ... LDAP? Brand & version? LDIF or just querying?
  12. ... firewalls (iptables, ipf, etc.)?
  13. ... network administration (Cisco, sniffing, etc.)?
  14. ... Apache httpd?
  15. ... Tomcat & Java?
  16. ... EMC (Clariion, PowerPath)?
  17. ... shell scripting, and with which shells?
  18. ... perl scripting?
  19. ... Veritas VM/FS? Versions?
  20. ... Veritas Cluster, or other HA? Versions?
  21. ... snapshots? In which products?
  22. ... load balancing
  23. ... Oracle (as SA, not DBA)?
  24. ... HPC?
  25. Please briefly explain the difference between RAID 1 and 5. What are layered RAID levels, and when are they appropriate?
  26. What sizable projects have you done recently?
  27. Why are you leaving your current employer / did you leave your last employer?
  28. Please give specific examples of some routine tasks you've performed recently.
  29. Have you done systems specification and design (servers, multi-server configurations)?
  30. Have you worked with customers directly, or primarily with/for other IT personnel?

It didn't make sense to publish a list of questions when I was involved in the interviewing process, but now that I'm leaving Rockefeller and no longer interviewing UNIX Admins for them, I can post my sample questions.

- page 1 of 2