Dell Linux Engineers work over 5000 bugs with Red Hat

A post today by Dell’s Linux Engineering team announcing support for RHEL 5.8 on PowerEdge 12G servers made me stop and think.  In the post, they included a link to a list of fixes and enhancements worked in preparing RHEL 5.8 for our new servers.  The list was pretty short. But that list doesn’t tell the whole story.

A quick search in Bugzilla for issues which Dell has been involved in since 1999 yields 5420 bugs, 4959 of which are CLOSED, and only 380 of which are still in NEW or ASSIGNED state, many of which look like they’re pretty close to being closed as well.  This is a testament to the hard work Dell puts into ensuring Linux “Just Works” on our servers, straight out of the box, with few to no extra driver disks or post-install updates needed to make your server fully functional.  You want a working new 12G server?  Simply grab the latest RHEL or SLES DVD image and go.  Want a different flavor of Linux?  Just be sure you’re running a recent upstream kernel – we push updates and fixes there regularly too.

Sure, we could make it harder for you, but why?

Congratulations to the Linux Engineering team for launching 12G PowerEdge with full support baked into Linux!  Keep up the good work!

s3cmd sync enhancements and call for help

Coming soon, Fedora and EPEL users with virtual machines in Amazon (US East for starters) will have super-fast updates.  I’ve been hacking away in Fedora Infrastructure and the Fedora Cloud SIG to place a mirror in Amazon S3.  A little more testing, and I’ll flip the switch in MirrorManager, and all Amazon EC2 US East users will be automatically directed to the S3 mirror first.  Yea!  Once that looks good, if there’s enough demand, we can put mirrors in other regions too.

I hadn’t done a lot of uploading into S3 before.  It seems the common tool people use is s3cmd.  I like to think of ‘s3cmd sync’ as a replacement for rsync.  It’s not – but with a few patches, and your help, I think it can be made more usable.  So far I’ve patched in –exclude-from so that it doesn’t walk the entire local file system only to later prune and exclude files – a speedup of over 20x in the Fedora case.  I added a –delete-after option, because there’s no reason to delete files early in the case of S3 – you’ve got virtually unlimited storage.  And I added a –delay-updates option, to minimize the amount of time the S3 mirror yum repositories are in an inconsistent state (now down to a few seconds, and could be even better).  I’m waiting on upstream to accept/reject/modify my patches, but Fedora Infrastructure is using my enhancements in the meantime.

One feature I’d really like to see added is to honor hardlinks.  Fedora extensively uses hardlinks to cut down on the number of files, amount of storage, and time needed to upload content.  Some files in the Fedora tree have 6 hardlinks, and over three quarters of the files have at least one hardlink sibling.  Unfortunately, S3 doesn’t natively understand anything about hardlinks.  Lacking that support, I expect that S3 COPY commands would be the best way to go about duplicating the effect of hardlinks (reduced file upload time), even if we don’t get all the benefits.  However, I don’t have a lot more time available in the next few weeks to create such a patch myself – hence my lazyweb plea for help.  If this sounds like something you’d like to take on, please do!

IPv6 on Dell Cloud and Rackspace Cloud Servers

IPv6 is coming – albeit slowly.  While the core Internet is IPv6-capable, getting that plumbed all the way through to your system, be it at home, in your company’s data center, or in a cloud offering, is still elusive.  When waiting isn’t an option, tunneling IPv6 over IPv4 has proven viable, at least for light uses.

Since 2006, I’ve been using the tunnel service provided by SixXS to have IPv6 at home.  Now that I’ve been making more use of cloud servers, first with Dell Cloud with VMware vCloud Datacenter Service, and now adding Rackspace Cloud Servers, I’ve wanted IPv6 connectivity to those servers too.  While both clouds have roadmap plans to add native IPv6 connectivity, I’m a little impatient, and can afford to make the conversion once each is ready with native service.  So, I’ve expanded by my use of SixXS into each of those clouds as well.

As it happens, both Dell Cloud and Rackspace Cloud Servers are network-located in the Dallas, TX area, where SixXS also has a PoP.  That means in both cases there’s only about a 2ms round trip time between my cloud servers and the PoP, which is an acceptable overhead.  In configuring my cloud servers, I have requested a tunnel from SixXS, installed the aiccu program from the Linux distro repositories, and configured the /etc/aiccu.conf file with my credentials and tunnel ID.  Voila – IPv6 connectivity!  A quick update to /etc/sysconfig/ip6tables, and now my services are reachable through both IPv4 and IPv6.  As each tunnel also comes with a whole routed /48 subnet as well, as I stand up more cloud servers in each location, I can route this subnet so I don’t have to configure separate tunnels for each server.

Free IPv6 connectivity for my cloud servers, without waiting for native connectivity.  That’s cool!

Dell 12G PowerEdge – IPMI interrupt and the death of kipmi0

A seemingly minor feature was added to our 12G PowerEdge servers announced this week – IPMI interrupt handling.  This is the culmination of work I started back in 2005 when we discovered that many actions utilizing IPMI, such as polling all the sensors for status during system startup, and performing firmware updates to the IPMI controller itself, took a very very long time.  System startup could be delayed by minutes while OMSA polled the sensors, and firmware updates could take 15 minutes or more.

At the time, hardware rarely had an interrupt line hooked up to the Baseboard Management Controller, which meant we had to rely on polling the IPMI status register for changes.  The polling interval, by default, was the 100Hz kernel timer, meaning we could transfer no more than 100 characters of information per second – reading a single sensor could take several seconds.  To speed up the process, I introduced the “kipmi0″ kernel thread, which could poll much more quickly, but which PowerEdge users noted consumed far more CPU cycles than they would have liked.

Over time the Dell engineering team has made several enhancements to the IPMI driver to try to reduce the impact of the kipmi0 polling thread, but it could never be quite eliminated – until now.

With the launch of the 12G PowerEdge servers, we have a hardware interrupt line from the BMC hooked up and plumbed through the device driver.  This eliminates the need for the polling thread completely, and provides the best IPMI command performance while not needlessly consuming CPU cycles polling.

Congratulations to the Dell PowerEdge and Linux Engineering teams for finishing this effort!

New Dell Product Group GPG signing key

Back in 2001, I created the first GPG signing key for Dell, which the Linux Engineering team used to sign various packages and releases over time.  I’ve long since handed day-to-day use of that key over to the Product Group Release Engineering team.  They have issued a new stronger key which they will be using to sign future packages.  I have signed this new key, and it has been signed by the original 2001 key as well, to provide continuity in the web of trust.  The new key is on the usual keyservers, fingerprint:

pub 4096R/34D8786F 2012-03-02
 Key fingerprint = 4255 0ABD 1E80 D7C1 BC0B AD85 1285 4914 34D8 786F
uid Dell Inc., PGRE 2012 (PG Release Engineering Build Group 2012) <PG_Release_Engineering@Dell.com>
sub 4096R/79DF80D8 2012-03-02

SELinux on a Rackspace Cloud Server

After a long time hosting my personal web site at WestHost, I finally decided to move it to another cloud provider – a Rackspace Cloud Server.  This move gives me a chance to run Fedora 16, as I do at home everywhere, and which is more than capable of serving a few light traffic domains, personal mail and mailing lists, and email for our neighborhood youth basketball league.

One thing that surprised me though was that the default Fedora 16 image provided by Rackspace in their smallest configuration (256GMB RAM, 10GB storage) had SELinux disabled, and no selinux-policy package installed.  Being a big fan of Mark Cox’s work reporting on vulnerabilities in RHEL, and Josh Bressers work leading the Fedora Security Response Team, it just didn’t feel right running an internet-facing Fedora server without having SELinux enabled.

This was easily enough resolved by installing the selinux-policy-targeted package, editing /etc/grub.conf to remove selinux=0 from the kernel command line, enabling the configuration in /etc/selinux/config, and restarting the server.  After a few minutes of autorelabeling, all is well and good.

I’m sure SELinux can get in the way of some application deployments.  It’s easiest for Rackspace to keep it disabled, letting experienced folks like myself enable it if they want.  I would have preferred it to be enabled by default, as there’s always the option to disable it later or run in permissive mode.

Because I run a few mailing lists using mailman, across multiple domains, I of course wanted to run several separate instances of mailman, one for each domain.  Fedora has a SELinux-aware mailman package just a quick yum install away.  The problem is, the SELinux file context rules are written expecting only one instance of mailman per server.  That’s when I remembered a recent blog post by Dutch where he had patched the mailman spec and config files to build separate mailman-${sitename} RPMs, each with their own correct SELinux contexts.  Very cool, and exactly what I needed.  Well, almost – he did his work on EL6, I’m running Fedora 16, but close enough (see his blog comments for the few changes necessary on F16).  Thanks to Dutch, I’ve got a fully SELinux-secured web and mail server with separate mailman instances for each domain.

Next time you build a Rackspace Cloud Server running Fedora, take an extra couple minutes and enable SELinux.  The site you save may be your own!