Sun is really ticking me off this week (and last). I am trying to find the least-time installation procedure for Sun Grid Engine (SGE), to test on an Amazon EC2 AMI (Amazon Machine Image). OpenSolaris + SGE AMIs are publicly available, but no Linux + SGE yet.

Just finding the files is amazingly complicated. http://gridengine.sunsource.net/ appears to be the old SGE site -- it doesn't offer 6.3 releases -- but it lacks a pointer to the new site. I thought 6.3 wasn't really available yet, until I remembered seeing a totally different download site, and found it again.

The new SGE site seems to be http://wikis.sun.com/display/GridEngine/Home, which doesn't link back to sunsource either. For extra confusion, http://wikis.sun.com/ hosts 6 different SGE wikis (4 English, 2 Japanese).

I found 4 ways to get SGE electronically (there are also CD media, but for clusters who cares?):

  • CVS source. The CVS tree includes instructions for building, but I found several inaccuracies and problems. I didn't get it built, so I don't know how serious the problems are. Compiling from source isn't really appropriate for AMIs anyway, so I stopped working through the issues.
  • Sun's Download Center offers only the latest release, as zipped tarballs and zipped RPMs.
    • This page should offer links to older releases.
    • It's ridiculous to zip RPMs -- they're compressed already, it seems downright stupid to zip tarballs, as they're explicitly compressed!
    • The 64-bit tarball isn't self-contained -- it also apparently requires the 32-bit tarball, but I had to Google to find this out. Again, I stopped pursuing this option before I got it to work.
  • http://gridengine.sunsource.net/downloads/latest.html (note 'latest') redirects to a download page for 6.2u2_1 -- not the latest.
pepper@teriyaki:~/Downloads$ unzip -Z sge62u3_linux24-i586_rpm.zip 
Archive:  sge62u3_linux24-i586_rpm.zip   28572971 bytes   2 files
-rw-r--r--  2.3 unx 24722508 tx defN 18-Jun-09 05:34 sge6_2u3/sun-sge-bin-linux24-i586-6.2-3.i386.rpm
-rw-r--r--  2.3 unx  4023935 tx defN 18-Jun-09 05:34 sge6_2u3/sun-sge-common-6.2-3.noarch.rpm
2 files, 28746443 bytes uncompressed, 28572553 bytes compressed:  0.6%
pepper@teriyaki:~/Downloads$ unzip -Z sge62u3_linux24-i586_targz.zip 
Archive:  sge62u3_linux24-i586_targz.zip   28699502 bytes   2 files
-rw-r--r--  2.3 unx 24825624 bx defN 18-Jun-09 05:34 sge6_2u3/sge-6_2u3-bin-linux24-i586.tar.gz
-rw-r--r--  2.3 unx  3981994 bx defN 18-Jun-09 05:34 sge6_2u3/sge-6_2u3-common.tar.gz
2 files, 28807618 bytes uncompressed, 28699112 bytes compressed:  0.4%

RPMs should install into the right places and be ready to go with chkconfig, but instead Sun decided to unpack them into /gridengine/sge, which doesn't even follow Sun's /opt convention. Worse, they do not install init scripts, or even provide init scripts suitable for symlinking. Instead the unpacked installer must be run to customize the init script templates. What were they (not?) thinking?!? The inst_sge installer doesn't actually copy any files -- you have to manually copy them to the right place, making the RPM even less useful (the workaround is probably to make /gridengine/sge a symlink to the desired location, assuming rpm will install under a symlink).

At this point, you might say "Wow, documentation is needed to explain this hideously complicated situation!" And you'd be right, but apparently Sun hasn't figured that out. When I went looking for an explanation of this convoluted state of affairs, the best I could find was http://wiki.gridengine.info/wiki/index.php/Main_Page#Is_Grid_Engine_commercial_or_open_source_software.3F, which hints that the split may be symptomatic of a deliberate commercial vs. open source split. If so, Sun's mishandling it amazingly -- these pages do not identify themselves as referring to the open source or commercial flavor, or even acknowledge the existence of an alternate product.

To make sorting this out just that little bit tougher, the binaries (both tarball & RPMs) completely lack documentation -- not even a URL for Sun's online docs. Adding insult to injury, the install docs explain how to unpack a tarball, but don't even acknowledge the existence of the RPMs. Apparently Sun decided that the RPM must be equivalent to the tarball -- it provides the files so you can run Sun's installer -- instead of being a proper RPM, which should fully install the software. This would be obnoxious and shortsighted if I hadn't already noticed that Fedora has an SGE RPM, and Scalable Systems produced an SGE RPM in 2002 -- including full integration with either plain RHEL or Rocks. Apparently Sun doesn't want something that works -- instead they prefer to force people to use their lame installer, which took over 1,500 lines for a basic install!