IPv6 on Dell Cloud and Rackspace Cloud Servers

IPv6 is coming – albeit slowly.  While the core Internet is IPv6-capable, getting that plumbed all the way through to your system, be it at home, in your company’s data center, or in a cloud offering, is still elusive.  When waiting isn’t an option, tunneling IPv6 over IPv4 has proven viable, at least for light uses.

Since 2006, I’ve been using the tunnel service provided by SixXS to have IPv6 at home.  Now that I’ve been making more use of cloud servers, first with Dell Cloud with VMware vCloud Datacenter Service, and now adding Rackspace Cloud Servers, I’ve wanted IPv6 connectivity to those servers too.  While both clouds have roadmap plans to add native IPv6 connectivity, I’m a little impatient, and can afford to make the conversion once each is ready with native service.  So, I’ve expanded by my use of SixXS into each of those clouds as well.

As it happens, both Dell Cloud and Rackspace Cloud Servers are network-located in the Dallas, TX area, where SixXS also has a PoP.  That means in both cases there’s only about a 2ms round trip time between my cloud servers and the PoP, which is an acceptable overhead.  In configuring my cloud servers, I have requested a tunnel from SixXS, installed the aiccu program from the Linux distro repositories, and configured the /etc/aiccu.conf file with my credentials and tunnel ID.  Voila – IPv6 connectivity!  A quick update to /etc/sysconfig/ip6tables, and now my services are reachable through both IPv4 and IPv6.  As each tunnel also comes with a whole routed /48 subnet as well, as I stand up more cloud servers in each location, I can route this subnet so I don’t have to configure separate tunnels for each server.

Free IPv6 connectivity for my cloud servers, without waiting for native connectivity.  That’s cool!

Dell 12G PowerEdge – IPMI interrupt and the death of kipmi0

A seemingly minor feature was added to our 12G PowerEdge servers announced this week – IPMI interrupt handling.  This is the culmination of work I started back in 2005 when we discovered that many actions utilizing IPMI, such as polling all the sensors for status during system startup, and performing firmware updates to the IPMI controller itself, took a very very long time.  System startup could be delayed by minutes while OMSA polled the sensors, and firmware updates could take 15 minutes or more.

At the time, hardware rarely had an interrupt line hooked up to the Baseboard Management Controller, which meant we had to rely on polling the IPMI status register for changes.  The polling interval, by default, was the 100Hz kernel timer, meaning we could transfer no more than 100 characters of information per second – reading a single sensor could take several seconds.  To speed up the process, I introduced the “kipmi0″ kernel thread, which could poll much more quickly, but which PowerEdge users noted consumed far more CPU cycles than they would have liked.

Over time the Dell engineering team has made several enhancements to the IPMI driver to try to reduce the impact of the kipmi0 polling thread, but it could never be quite eliminated – until now.

With the launch of the 12G PowerEdge servers, we have a hardware interrupt line from the BMC hooked up and plumbed through the device driver.  This eliminates the need for the polling thread completely, and provides the best IPMI command performance while not needlessly consuming CPU cycles polling.

Congratulations to the Dell PowerEdge and Linux Engineering teams for finishing this effort!

New Dell Product Group GPG signing key

Back in 2001, I created the first GPG signing key for Dell, which the Linux Engineering team used to sign various packages and releases over time.  I’ve long since handed day-to-day use of that key over to the Product Group Release Engineering team.  They have issued a new stronger key which they will be using to sign future packages.  I have signed this new key, and it has been signed by the original 2001 key as well, to provide continuity in the web of trust.  The new key is on the usual keyservers, fingerprint:

pub 4096R/34D8786F 2012-03-02
 Key fingerprint = 4255 0ABD 1E80 D7C1 BC0B AD85 1285 4914 34D8 786F
uid Dell Inc., PGRE 2012 (PG Release Engineering Build Group 2012) <PG_Release_Engineering@Dell.com>
sub 4096R/79DF80D8 2012-03-02

SELinux on a Rackspace Cloud Server

After a long time hosting my personal web site at WestHost, I finally decided to move it to another cloud provider – a Rackspace Cloud Server.  This move gives me a chance to run Fedora 16, as I do at home everywhere, and which is more than capable of serving a few light traffic domains, personal mail and mailing lists, and email for our neighborhood youth basketball league.

One thing that surprised me though was that the default Fedora 16 image provided by Rackspace in their smallest configuration (256GMB RAM, 10GB storage) had SELinux disabled, and no selinux-policy package installed.  Being a big fan of Mark Cox’s work reporting on vulnerabilities in RHEL, and Josh Bressers work leading the Fedora Security Response Team, it just didn’t feel right running an internet-facing Fedora server without having SELinux enabled.

This was easily enough resolved by installing the selinux-policy-targeted package, editing /etc/grub.conf to remove selinux=0 from the kernel command line, enabling the configuration in /etc/selinux/config, and restarting the server.  After a few minutes of autorelabeling, all is well and good.

I’m sure SELinux can get in the way of some application deployments.  It’s easiest for Rackspace to keep it disabled, letting experienced folks like myself enable it if they want.  I would have preferred it to be enabled by default, as there’s always the option to disable it later or run in permissive mode.

Because I run a few mailing lists using mailman, across multiple domains, I of course wanted to run several separate instances of mailman, one for each domain.  Fedora has a SELinux-aware mailman package just a quick yum install away.  The problem is, the SELinux file context rules are written expecting only one instance of mailman per server.  That’s when I remembered a recent blog post by Dutch where he had patched the mailman spec and config files to build separate mailman-${sitename} RPMs, each with their own correct SELinux contexts.  Very cool, and exactly what I needed.  Well, almost – he did his work on EL6, I’m running Fedora 16, but close enough (see his blog comments for the few changes necessary on F16).  Thanks to Dutch, I’ve got a fully SELinux-secured web and mail server with separate mailman instances for each domain.

Next time you build a Rackspace Cloud Server running Fedora, take an extra couple minutes and enable SELinux.  The site you save may be your own!

FUDCon Blacksburg videos

I shot videos of several of the presentations at the Fedora User and Developer Conference yesterday.  For your viewing pleasure:

  • “State of Fedora” from the Fedora Project Leader, Jared Smith [ogg]
  • Mike McGrath, team lead for OpenShift, demoing OpenShift [ogg]
  • Jon Masters and Chris Tyler, on the ARM architecture in Fedora [ogg]. ARM is a secondary architecture today.  By Fedora 18, with your help, it needs to become a primary architecture.
  • David Nalley presented on CloudStack, which is aiming for Fedora 17 inclusion. [ogg]
  • Dan Prince and Russell Bryant giving an introduction to OpenStack [ogg]
  • Mo Morsi presenting the Aeolus cloud management project [ogg]

[Update 1/18/2012] I was able to upload all the videos to YouTube.  http://www.youtube.com/playlist?list=PL2BAA7FF83E6482C2
is a playlist with all 6.

OpenStack Conference Call for Speakers till Sept 6

OpenStack Community Manager Stephen Spector posted the OpenStack Conference Call for Speakers just a bit ago.  I’m pleased to be a part of the Program Committee for this conference, and encourage you to submit your presentation ideas.  There are two basic tracks, Business and Technical, and each session is planned to last only 30 minutes (so be concise!).

I look forward to meeting more members of the OpenStack community in Boston, October 5-7.  I love Boston in the Fall (or really anytime…).

Consistent Network Device Naming updates

Today I released biosdevname v0.3.7, after listening to feedback from all across the web, including NetworkWorld, LWN, and Slashdot.  No, I’m not killing the feature, as some might hope, but some changes are in order.

First, it’s amazing how many people hated the ‘#’ character in device names.  Yes, that was bound to cause some problems, but nothing that couldn’t be fixed given enough time.  But since it’s early in the game, changing that character from ‘#’ to ‘p’ accomplishes the same goal, with less chance of breakage, so that’s done.  pci<slot>p<port>_<vf> it is….

Second, the various virtual machine BIOSes each do something slightly different for the network devices they expose.  VMware exposes the first NIC (traditionally eth0) as in PCI slot 3.  KVM exposes the first NIC as in PCI slot 2, but has no information about the second NIC.  Xen doesn’t expose anything, so those all kept the ethX naming convention.

To address these discrepancies, and because there is no physical representation of a (virtual) NIC in a virtual machine, biosdevname no longer suggests a new name for NICs if running in a VM guest.  This means all VM guests keep ethX as their naming convention. Thanks to colleague Narendra K for this fix.

Third, for everyone who still thinks renaming devices is a really bad idea, you get an out.  A new kernel command line option, honored by udev, lets you disable biosdevname.  biosdevname=0 will prevent biosdevname from being invoked, effectively disabling this feature, leaving you with the ethX names.

All this, and the usual assorted bug fixes as biosdevname gets more widespread exposure and testing.

Love it?  Hate it?  Let me know.  You can find me (mdomsch) on IRC on FreeNode in #biosdevname, #udev, or #fedora-devel, as well as the usual mailing lists.

Fedora Test Day today – please join us

Today is the official  Fedora Test Day for Consistent Network Device Naming.  Given all the coverage this week on NetworkWorld and Slashdot, I would like to see widespread testing of this feature, to assuage the concerns and misconceptions raised there.  Testing is simple – download and boot the LiveISO, and report success or failure on the wiki page.  You can even try it out on a running Fedora 14 instance if you like.

The Dell engineers who have been working on this for years will be online in #fedora-test-day on FreeNode IRC today if you have any questions.  Please join us.  Thanks for your time and participation.

Consistent Network Device Naming coming to Fedora 15

One of my long-standing pet projects – Consistent Network Device Naming, is finally coming to Fedora (emphasizing the 2 of the Fedora F’s: Features and First), and thereafter, all Linux distributions.  What is this, you ask?

Systems running Linux have long had ethernet network devices named ethX.  Your desktop likely has one ethernet port, named eth0.  This works fine if you have only one network port, but what if, like on Dell PowerEdge servers, you have four ethernet ports?  They are named eth0, eth1, eth2, eth3, corresponding to the labels on the back of the chassis, 1, 2, 3, 4, respectively.  Sometimes.  Aside from the obvious confusion of names starting at 0 verses starting at 1, other race conditions can happen such that each port may not get the same name on every boot, and they may get named in an arbitrary order.  If you add in a network card to a PCI slot, it gets even worse, as the ports on the motherboard and the ports on the add-in card may have their names intermixed.

While several solutions have  been proposed over time (detailed at Linux Plumbers Conference last year), none were deemed acceptable, until now.

Enter biosdevname, the tool Dell has developed to bring sanity (and consistency!) to network device names.  Biosdevname is a udev helper, which renames network interfaces based on information presented by system BIOS.

The new naming convention is as follows:

  • em[1-N] for on-board (embedded) NICs (# matches chassis labels)
  • pci<slot>#<port> for cards in PCI slots, port 1..N
  • NPAR & SR-IOV devices add a suffix of _<vf>, from 0..N depending on the number of Partitions or Virtual Functions exposed on each port.
  • Other Linux conventions, such as .<vlan> and :<alias> suffixes remain unchanged and are still applicable.

This provides a sane mapping of Linux network interface name to externally visible network port (RJ-45 jack).

Where do we get this information?  The algorithm is fairly simple:

  • If system BIOS exposes the new PCI Firmware Specification 3.1 ACPI _DSM method, we get the interface label and index from ACPI, and use those.
  • Else if system BIOS exposes an index and label in SMBIOS 2.6 types 9 and 41, use the index value.
  • Else if system BIOS exposes index via the HP proprietary SMBIOS extension, use that.
  • Else fall back to using the legacy PCI IRQ Routing Table to figure out which slots devices are in, sort the PCI device list in breadth-first order, and assign index values.

How will this affect you?

If you have scripts that have hard-coded eth0 or have assumptions that ethX is a particular port, your scripts are already broken (you may just not know it yet).  Begin planning on using the new interface names going forward, adjusting your scripts as necessary.

Fedora 15 will be the first distribution to use biosdevname by default.  There will be a Test Day on Thursday, January 27.  I encourage you to download the Live image, boot it on your system, and verify that your network interfaces are now named according to the above convention, and that all works as expected.  You may also take the opportunity to review your custom scripts, looking for hard-coded ethX values, and prepare for the coming name change.

Once we get sufficient exposure and verification using Fedora, I expect to see this change roll into other Linux distributions, and other operating systems, over time.  Consider yourself warned.

Dell introduces RHEL Auto-Entitlement and 5-year subscriptions

Noted on the Dell blog, the auto-entitlement system we rolled out to the US and Europe a few years ago is finally available worldwide.  What is auto-entitlement, you ask?

If you’ve ever purchased a Red Hat Enterprise Linux subscription when purchasing a Dell PowerEdge server, shrink-wrapped alongside the CDs is a “registration card”, with a long string of numbers on it.  Upon unboxing your system, you had to a) not throw away that card; b) not lose that card; c) get that card to some responsible party at your organization; d) ensure that responsible party went to http://redhat.com/activate to activate the subscription, using the number on that card.  See how many steps that took?  Can you guess how many ways something could go wrong in the process?

With auto-entitlement, the system administrator is able to simply log their new system into Red Hat Network the first time they use it (as they would to get updates and to manage their system).  Red Hat Network is then smart enough to recognize that the system was purchased from Dell, knows the subscription type and duration, and Bob’s your Uncle.  No registration card to lose, no extra steps to take.  Oh, and if you manage to blow away the hard disk image and re-install RHEL before connecting to Red Hat Network for the first time – no worries – auto-entitlement will still work.

Oh, and while we’re at it, the new 5-year RHEL subscription matches the available 5-year ProSupport hardware service contract, so there’s never any mess with having out-of-sync support subscriptions.

Just two more ways Dell ensures Linux, in this case Red Hat Enterprise Linux, “Just Works”.