logrotate and bash

It took me a while (longer that I should admit) to figure out how to make daemon processes written in bash, work properly with logrotate so that the output from bash gets properly rotated, compressed, closed, and re-opened.

Say, you’re doing this in bash:

#!/bin/bash
logfile=somelog.txt
while :; do
     echo -n "Today's date is" >>  ${logfile}
     echo date >> ${logfile} 
     sleep 60
done

This will run forever, adding a line to the noted logrotate file every minute.  Easy enough, and if logrotate is asked to rotate the somelog.txt file, it will do so happily.

But what if bash has started a process that itself takes a long time to complete:

#!/bin/bash
logfile=somelog.txt
find / -type f -exec cat \{\} \; >>  ${logfile}

which, I think we’d agree, will take a long time.  During this time, it keeps the logfile open for writing.  If logrotate then fires to rotate it, we will lose all data written to the logfile after the rotate occurs.  The find continues to run, but the results are lost.  This isn’t really what we want.

The solution is to change how logs are written.  Instead of using the > ${logfile} syntax, we’re going to let bash itself do the writing.

#!/bin/bash
logfile=somefile.txt
exec 1>>${logfile} 2>&1
find / -type f -exec cat \{\} \;

Now, the output from the find command is written to its stdout, which winds up on bash’s stdout, which because of the exec command there, writes it to the logfile.  If logrotate fires here, we’ll still lose any data written after the rotate.  To solve this, we’d need to have bash close and re-open its logfile.

Logrotate can send a signal, say SIGHUP, to a process, when it rotates its logfile out from underneath it.  On receipt of that signal, the process should close its logfile and reopen it. Here’s how that looks in bash:

#!/bin/bash
logfile=somelog.txt
pidfile=pidfile.txt

function sighup_handler()
{
    exec 1>>${logfile} 2>&1
}
trap sighup_handler HUP
trap "rm -f ${pidfile}" QUIT EXIT INT TERM
echo "$$" > ${pidfile}
# fire the sighup handler to redirect stdout/stderr to logfile
sighup_handler
find / -type f -exec cat \{\} \;

and we add to our logrotate snippet:

somelog.txt {
 daily
 rotate 7
 missingok
 ifempty
 compress
 compresscmd /usr/bin/bzip2
 uncompresscmd /usr/bin/bunzip2
 compressext .bz2
 dateext
copytruncate
postrotate
    /bin/kill -HUP `cat pidfile.txt 2>/dev/null` 2>/dev/null || true
endscript
}

Now, when logrotate fires, it sends a SIGHUP signal to our long-running bash process.  Bash catches the SIGHUP, closes and re-opens its logfiles (via the exec command), and continues writing.  There is a brief window between when the logrotate fires, and when bash can re-open the logfile, where those messages may be lost, but that is often pretty minimal.

There you have it.  Effective log rotation of bash-generated log files.

(Update 7/5: missed the ‘copytruncate’ option in the logrotate config before, added it now.)


					

Dell Linux Engineers work over 5000 bugs with Red Hat

A post today by Dell’s Linux Engineering team announcing support for RHEL 5.8 on PowerEdge 12G servers made me stop and think.  In the post, they included a link to a list of fixes and enhancements worked in preparing RHEL 5.8 for our new servers.  The list was pretty short. But that list doesn’t tell the whole story.

A quick search in Bugzilla for issues which Dell has been involved in since 1999 yields 5420 bugs, 4959 of which are CLOSED, and only 380 of which are still in NEW or ASSIGNED state, many of which look like they’re pretty close to being closed as well.  This is a testament to the hard work Dell puts into ensuring Linux “Just Works” on our servers, straight out of the box, with few to no extra driver disks or post-install updates needed to make your server fully functional.  You want a working new 12G server?  Simply grab the latest RHEL or SLES DVD image and go.  Want a different flavor of Linux?  Just be sure you’re running a recent upstream kernel – we push updates and fixes there regularly too.

Sure, we could make it harder for you, but why?

Congratulations to the Linux Engineering team for launching 12G PowerEdge with full support baked into Linux!  Keep up the good work!

s3cmd sync enhancements and call for help

Coming soon, Fedora and EPEL users with virtual machines in Amazon (US East for starters) will have super-fast updates.  I’ve been hacking away in Fedora Infrastructure and the Fedora Cloud SIG to place a mirror in Amazon S3.  A little more testing, and I’ll flip the switch in MirrorManager, and all Amazon EC2 US East users will be automatically directed to the S3 mirror first.  Yea!  Once that looks good, if there’s enough demand, we can put mirrors in other regions too.

I hadn’t done a lot of uploading into S3 before.  It seems the common tool people use is s3cmd.  I like to think of ‘s3cmd sync’ as a replacement for rsync.  It’s not – but with a few patches, and your help, I think it can be made more usable.  So far I’ve patched in –exclude-from so that it doesn’t walk the entire local file system only to later prune and exclude files – a speedup of over 20x in the Fedora case.  I added a –delete-after option, because there’s no reason to delete files early in the case of S3 – you’ve got virtually unlimited storage.  And I added a –delay-updates option, to minimize the amount of time the S3 mirror yum repositories are in an inconsistent state (now down to a few seconds, and could be even better).  I’m waiting on upstream to accept/reject/modify my patches, but Fedora Infrastructure is using my enhancements in the meantime.

One feature I’d really like to see added is to honor hardlinks.  Fedora extensively uses hardlinks to cut down on the number of files, amount of storage, and time needed to upload content.  Some files in the Fedora tree have 6 hardlinks, and over three quarters of the files have at least one hardlink sibling.  Unfortunately, S3 doesn’t natively understand anything about hardlinks.  Lacking that support, I expect that S3 COPY commands would be the best way to go about duplicating the effect of hardlinks (reduced file upload time), even if we don’t get all the benefits.  However, I don’t have a lot more time available in the next few weeks to create such a patch myself – hence my lazyweb plea for help.  If this sounds like something you’d like to take on, please do!

IPv6 on Dell Cloud and Rackspace Cloud Servers

IPv6 is coming – albeit slowly.  While the core Internet is IPv6-capable, getting that plumbed all the way through to your system, be it at home, in your company’s data center, or in a cloud offering, is still elusive.  When waiting isn’t an option, tunneling IPv6 over IPv4 has proven viable, at least for light uses.

Since 2006, I’ve been using the tunnel service provided by SixXS to have IPv6 at home.  Now that I’ve been making more use of cloud servers, first with Dell Cloud with VMware vCloud Datacenter Service, and now adding Rackspace Cloud Servers, I’ve wanted IPv6 connectivity to those servers too.  While both clouds have roadmap plans to add native IPv6 connectivity, I’m a little impatient, and can afford to make the conversion once each is ready with native service.  So, I’ve expanded by my use of SixXS into each of those clouds as well.

As it happens, both Dell Cloud and Rackspace Cloud Servers are network-located in the Dallas, TX area, where SixXS also has a PoP.  That means in both cases there’s only about a 2ms round trip time between my cloud servers and the PoP, which is an acceptable overhead.  In configuring my cloud servers, I have requested a tunnel from SixXS, installed the aiccu program from the Linux distro repositories, and configured the /etc/aiccu.conf file with my credentials and tunnel ID.  Voila – IPv6 connectivity!  A quick update to /etc/sysconfig/ip6tables, and now my services are reachable through both IPv4 and IPv6.  As each tunnel also comes with a whole routed /48 subnet as well, as I stand up more cloud servers in each location, I can route this subnet so I don’t have to configure separate tunnels for each server.

Free IPv6 connectivity for my cloud servers, without waiting for native connectivity.  That’s cool!

Dell 12G PowerEdge – IPMI interrupt and the death of kipmi0

A seemingly minor feature was added to our 12G PowerEdge servers announced this week – IPMI interrupt handling.  This is the culmination of work I started back in 2005 when we discovered that many actions utilizing IPMI, such as polling all the sensors for status during system startup, and performing firmware updates to the IPMI controller itself, took a very very long time.  System startup could be delayed by minutes while OMSA polled the sensors, and firmware updates could take 15 minutes or more.

At the time, hardware rarely had an interrupt line hooked up to the Baseboard Management Controller, which meant we had to rely on polling the IPMI status register for changes.  The polling interval, by default, was the 100Hz kernel timer, meaning we could transfer no more than 100 characters of information per second – reading a single sensor could take several seconds.  To speed up the process, I introduced the “kipmi0″ kernel thread, which could poll much more quickly, but which PowerEdge users noted consumed far more CPU cycles than they would have liked.

Over time the Dell engineering team has made several enhancements to the IPMI driver to try to reduce the impact of the kipmi0 polling thread, but it could never be quite eliminated – until now.

With the launch of the 12G PowerEdge servers, we have a hardware interrupt line from the BMC hooked up and plumbed through the device driver.  This eliminates the need for the polling thread completely, and provides the best IPMI command performance while not needlessly consuming CPU cycles polling.

Congratulations to the Dell PowerEdge and Linux Engineering teams for finishing this effort!

New Dell Product Group GPG signing key

Back in 2001, I created the first GPG signing key for Dell, which the Linux Engineering team used to sign various packages and releases over time.  I’ve long since handed day-to-day use of that key over to the Product Group Release Engineering team.  They have issued a new stronger key which they will be using to sign future packages.  I have signed this new key, and it has been signed by the original 2001 key as well, to provide continuity in the web of trust.  The new key is on the usual keyservers, fingerprint:

pub 4096R/34D8786F 2012-03-02
 Key fingerprint = 4255 0ABD 1E80 D7C1 BC0B AD85 1285 4914 34D8 786F
uid Dell Inc., PGRE 2012 (PG Release Engineering Build Group 2012) <PG_Release_Engineering@Dell.com>
sub 4096R/79DF80D8 2012-03-02

SELinux on a Rackspace Cloud Server

After a long time hosting my personal web site at WestHost, I finally decided to move it to another cloud provider – a Rackspace Cloud Server.  This move gives me a chance to run Fedora 16, as I do at home everywhere, and which is more than capable of serving a few light traffic domains, personal mail and mailing lists, and email for our neighborhood youth basketball league.

One thing that surprised me though was that the default Fedora 16 image provided by Rackspace in their smallest configuration (256GMB RAM, 10GB storage) had SELinux disabled, and no selinux-policy package installed.  Being a big fan of Mark Cox’s work reporting on vulnerabilities in RHEL, and Josh Bressers work leading the Fedora Security Response Team, it just didn’t feel right running an internet-facing Fedora server without having SELinux enabled.

This was easily enough resolved by installing the selinux-policy-targeted package, editing /etc/grub.conf to remove selinux=0 from the kernel command line, enabling the configuration in /etc/selinux/config, and restarting the server.  After a few minutes of autorelabeling, all is well and good.

I’m sure SELinux can get in the way of some application deployments.  It’s easiest for Rackspace to keep it disabled, letting experienced folks like myself enable it if they want.  I would have preferred it to be enabled by default, as there’s always the option to disable it later or run in permissive mode.

Because I run a few mailing lists using mailman, across multiple domains, I of course wanted to run several separate instances of mailman, one for each domain.  Fedora has a SELinux-aware mailman package just a quick yum install away.  The problem is, the SELinux file context rules are written expecting only one instance of mailman per server.  That’s when I remembered a recent blog post by Dutch where he had patched the mailman spec and config files to build separate mailman-${sitename} RPMs, each with their own correct SELinux contexts.  Very cool, and exactly what I needed.  Well, almost – he did his work on EL6, I’m running Fedora 16, but close enough (see his blog comments for the few changes necessary on F16).  Thanks to Dutch, I’ve got a fully SELinux-secured web and mail server with separate mailman instances for each domain.

Next time you build a Rackspace Cloud Server running Fedora, take an extra couple minutes and enable SELinux.  The site you save may be your own!

FUDCon Blacksburg videos

I shot videos of several of the presentations at the Fedora User and Developer Conference yesterday.  For your viewing pleasure:

  • “State of Fedora” from the Fedora Project Leader, Jared Smith [ogg]
  • Mike McGrath, team lead for OpenShift, demoing OpenShift [ogg]
  • Jon Masters and Chris Tyler, on the ARM architecture in Fedora [ogg]. ARM is a secondary architecture today.  By Fedora 18, with your help, it needs to become a primary architecture.
  • David Nalley presented on CloudStack, which is aiming for Fedora 17 inclusion. [ogg]
  • Dan Prince and Russell Bryant giving an introduction to OpenStack [ogg]
  • Mo Morsi presenting the Aeolus cloud management project [ogg]

[Update 1/18/2012] I was able to upload all the videos to YouTube.  http://www.youtube.com/playlist?list=PL2BAA7FF83E6482C2
is a playlist with all 6.

Free Money

This post is aimed at my Dell colleagues in the US.

If you’re like me, you dread the weeks shortly after Back To School.  Sure, the kids are now settled into their daily routines, evening homework, Fall sports and Scouting, but with the start of the new school year comes the start of Fall Fundraising by each and every organization you’re fortunate enough to be a part of or even nearby.  Each organization worthy in its own right, and as active participants, you bet we’re going to donate.

But did you know you can double your money?  Yep!  For every dollar you donate to a familiar charity, Dell will match that donation dollar-for-dollar up to $10,000/year per employee.  This is an amazing benefit, which I had long put on my Tell Dell survey wishlist, and starting about 7 years ago (maybe more?) it became reality.

Now, there’s one little catch – you can’t simply hand over a check to your familiar charity and let them get it doubled.  You must send your donation through the Dell internal web site (internal home page, You and Dell, Employee Giving), pay via credit card or payroll deduction (or if you’re particularly generous, stock donation), and in a few weeks Dell sends a check for 2x your amount to the charity.  Relatively painless, and a fantastic benefit.  You can give to a bunch of charities, or a few; a little (minimum $25), or a lot (up to $10k matched) any time during the year.  The $10k match resets on January 1.

In addition, Dell wants to encourage employees to volunteer their time, as well as give their money, to charitable causes.  Are you a Scout leader?  A coach?  A board member?  Maybe you help out at the library or at church.  However you volunteer is up to you.  In recognition of your volunteer hours, Dell will give $150 each quarter (yep, that’s $600/year) to charities you designate (they don’t even have to be the same organizations you volunteer for if you want), as long as you log 10 or more hours of volunteer time in the quarter.  So go to the tool (inside home page, You and Dell, Make a Difference), set up your charities, and log your hours.  Then it’s free money for the charities you choose.

So, don’t let that Free Money pass you by.  You know the charities need it, and it’s a simple benefit on top of the activities you’re up to your neck in already.  Take a few minutes to double your contributions, and send that $600 to folks who really need it.

Northwest Austin Youth Basketball registration

Northwest Austin Youth Basketball Association (NWAYBA) registration deadline is only 3 weeks away.  Register your 1st grader through High School player, and join us on the courts this Fall.  Registration forms must be postmarked by October 16, but I’d appreciate it if you’d mail them sooner.  Somehow I got roped onto the NWAYBA Board, as the Registrar.  We’re expecting 400 players again this year. I’d prefer not to deal with 300 applications in the last week.