MirrorManager 1.4 now in production in Fedora Infrastructure

After nearly 3 years in on-again/off-again development, MirrorManager 1.4 is now live in the Fedora Infrastructure, happily serving mirrorlists to yum, and directing Fedora users to their favorite ISOs – just in time for the Fedora 19 freeze.

Kudos go out to Kevin Fenzi, Seth Vidal, Stephen Smoogen, Toshio Kuratomi, Pierre-Yves Chivon, Patrick Uiterwijk, Adrian Reber, and Johan Cwiklinski for their assistance in making this happen.  Special thanks to Seth for moving the mirrorlist-serving processes to their own servers where they can’t harm other FI applications, and to Smooge, Kevin and Patrick, who gave up a lot of their Father’s Day weekend (both days and nights) to help find and fix latent bugs uncovered in production.

What does this bring the average Fedora user?  Not a lot…  More stability – fewer failures with yum retrieving the mirror lists, not that there were many, but it was nonzero.  A list of public mirrors where the versions are sorted in numerical order.

What does this bring to a Fedora mirror administrator?  A few new tricks:

  • Mirror admins have been able to specify their own Autonomous System Number for several years.  Clients on the same AS get directed to that mirror.  MM 1.4 adds the ability for mirror admins to request additional “peer ASNs” – particularly helpful for mirrors located at a peering point (say, Hawaii), where listing lots of netblocks instead is unwieldy.  As this has the potential to be slightly dangerous (no, you can’t request ALL ASNs be sent your way), ask a Fedora sysadmin if you want to use this new feature – we can help you.
  • Multiple mirrors claiming the same netblock, or overlapping netblocks, were returned to clients in random order.  Now they will be returned in ascending netblock size order.  This lets an organization that has a private mirror, and their upstream ISP, both have a mirror, and most requests will be sent to the private mirror first, falling back to the ISP’s mirror.  This should save some bandwidth for the organization.
  • If you provide rsync URLs, You’ll see reduced load from the MM crawler as it will now use rsync to retrieve your content listing, rather than a ton of HTTP or FTP requests.

What does this bring Fedora Infrastructure (or anyone else running MirrorManager)?

  • reduced memory usage in the mirrorlist servers.  Especially with as bad as python is at memory management on x86_64 (e.g. reading in a 12MB pickle file blows out memory usage from 4MB to 120MB), this is critical.  This directly impacts the number of simultaneous users that can be served, the response latency, and the CPU overhead too – it’s a win-win-win-win.
  • An improved admin interface – getting rid of hand-coded pages that looked like they could have been served by BBS software on my Commodore 64 – for something modern, more usable, and less error prone.
  • Code specifically intended for use by Debian/Ubuntu and CentOS communities, should they decide to use MM in the future.
  • A new method to upgrade database schemas – saner than SQLObject’s method.  This should make me less scared to make schema changes in the future to support new features.  (yes, we’re still using SQLObject – if it’s not completely broken, don’t fix it…)
  • Map generation moved to a separate subpackage, to avoid the dependency on 165MB of  python-basemap and python-basemap-data packages on all servers.

MM 1.4 is a good step forward, and hopefully I’ve laid the groundwork to make it easier to improve in the future.  I’m excited that more of the Fedora Infrastructure team has learned (the hard way) the internals of MM, so I’ll have additional help going forward too.

s3cmd sync enhancements and call for help

Coming soon, Fedora and EPEL users with virtual machines in Amazon (US East for starters) will have super-fast updates.  I’ve been hacking away in Fedora Infrastructure and the Fedora Cloud SIG to place a mirror in Amazon S3.  A little more testing, and I’ll flip the switch in MirrorManager, and all Amazon EC2 US East users will be automatically directed to the S3 mirror first.  Yea!  Once that looks good, if there’s enough demand, we can put mirrors in other regions too.

I hadn’t done a lot of uploading into S3 before.  It seems the common tool people use is s3cmd.  I like to think of ‘s3cmd sync’ as a replacement for rsync.  It’s not – but with a few patches, and your help, I think it can be made more usable.  So far I’ve patched in –exclude-from so that it doesn’t walk the entire local file system only to later prune and exclude files – a speedup of over 20x in the Fedora case.  I added a –delete-after option, because there’s no reason to delete files early in the case of S3 – you’ve got virtually unlimited storage.  And I added a –delay-updates option, to minimize the amount of time the S3 mirror yum repositories are in an inconsistent state (now down to a few seconds, and could be even better).  I’m waiting on upstream to accept/reject/modify my patches, but Fedora Infrastructure is using my enhancements in the meantime.

One feature I’d really like to see added is to honor hardlinks.  Fedora extensively uses hardlinks to cut down on the number of files, amount of storage, and time needed to upload content.  Some files in the Fedora tree have 6 hardlinks, and over three quarters of the files have at least one hardlink sibling.  Unfortunately, S3 doesn’t natively understand anything about hardlinks.  Lacking that support, I expect that S3 COPY commands would be the best way to go about duplicating the effect of hardlinks (reduced file upload time), even if we don’t get all the benefits.  However, I don’t have a lot more time available in the next few weeks to create such a patch myself – hence my lazyweb plea for help.  If this sounds like something you’d like to take on, please do!

MirrorManager automatic local mirror selection

MirrorManager 1.3.2 (plus a hotfix) is now running on all Fedora Infrastructure application servers.  This brings one new interesting feature – automatic mirror detection.  How’s that you say?

As you know, Internet routing uses BGP (Border Gateway Protocol), and Autonomous System Numbers (ASNs) to exchange IP prefixes (aa.bb.cc.dd/nn) and routing tables.  By grabbing a copy of the global BGP table a few times a day, MM can know the ASN of an incoming client request, and Hosts in the MM database have grown two new fields: ASN and “ASN Clients?”.  MM then looks to see if there is a mirror with the same ASN as each client, and offers it up earlier in the list.

I’ve pre-populated the MM database, for public servers only, with ASNs, and set “ASN Clients?” = True, meaning such will offer to serve all clients on the same ASN.  If you have a private server and wish to do likewise (remember, this doesn’t work for home systems or those behind NATs), you can fill in those fields yourself.  The Fedora wiki page on mirroring gives an example on how to look up your ASN.  I recommend this for all schools, research organizations, companies, and ISPs.

The mirrorlist lookup code now goes in preferential order:

  • same netblock
  • same ASN
  • both on Internet2
  • same country
  • same continent
  • global

For ISPs and schools, this should mean that most of the possible Fedora traffic will stay within your network – no transit costs.  And as netblocks change, MM will keep up with them automatically.

To see this in action, try a query as such, and look for the ‘Using ASN ####’ in the result comment line.

$ wget -O – ‘http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-11&arch=i386′

# Using preferred netblock Using ASN XXXX country = US country = MX country = CA

I hope you enjoy this new feature.