MirrorManager’s primary aim is to make sure end users get directed to the “best” mirror for them. “Best” is defined in terms of network scopes, based on the concept that a mirror that is network-wise “close” to you is going to provide you a better download experience than a mirror that is “far” from you.
In a pure DNS-based round robin mirror system, you would expect all requests to be sent to a “global” mirror, with no preference for where you are on the network. In a country-based DNS round robin system, perhaps where the user has specified what country they are in, or perhaps it was automatically determined, you’d expect most hits in countries where you know you have mirrors.
MirrorManager’s scopes include clients and mirrors on the the same network blocks, Autonomous System Numbers, jointly on Internet2 or its related regional high speed research and education networks in your same country, then falling back to GeoIP to find mirrors in the same country, and same continent. In only the rarest of cases does the GeoIP lookup fail, we have no idea where you are, and you get sent to some random mirror somewhere.
But, how well does this work in practice? MM 1.4 added logging, so we can create statistics on how often we get a hit for each scope. Raw statistics:
|Global (any mirror)
In the case of MirrorManager, we take it three steps further than pure DNS round robin or GeoIP lookups. By using Internet2 routing tables, ASN routing tables, and letting mirror admins specify their Peer ASNs and their own netblocks, we are able to, in nearly 22% of all requests, keep the client traffic completely local to the organization or upstream ISP, and when adding in Internet2 lookups, a whopping 30% of client traffic never hits the commodity Internet at all. In 88% of all cases, you’re sent to a mirror within your own country – never having to deal with congested inter-country links.
After nearly 3 years in on-again/off-again development, MirrorManager 1.4 is now live in the Fedora Infrastructure, happily serving mirrorlists to yum, and directing Fedora users to their favorite ISOs – just in time for the Fedora 19 freeze.
Kudos go out to Kevin Fenzi, Seth Vidal, Stephen Smoogen, Toshio Kuratomi, Pierre-Yves Chivon, Patrick Uiterwijk, Adrian Reber, and Johan Cwiklinski for their assistance in making this happen. Special thanks to Seth for moving the mirrorlist-serving processes to their own servers where they can’t harm other FI applications, and to Smooge, Kevin and Patrick, who gave up a lot of their Father’s Day weekend (both days and nights) to help find and fix latent bugs uncovered in production.
What does this bring the average Fedora user? Not a lot… More stability – fewer failures with yum retrieving the mirror lists, not that there were many, but it was nonzero. A list of public mirrors where the versions are sorted in numerical order.
What does this bring to a Fedora mirror administrator? A few new tricks:
- Mirror admins have been able to specify their own Autonomous System Number for several years. Clients on the same AS get directed to that mirror. MM 1.4 adds the ability for mirror admins to request additional “peer ASNs” – particularly helpful for mirrors located at a peering point (say, Hawaii), where listing lots of netblocks instead is unwieldy. As this has the potential to be slightly dangerous (no, you can’t request ALL ASNs be sent your way), ask a Fedora sysadmin if you want to use this new feature – we can help you.
- Multiple mirrors claiming the same netblock, or overlapping netblocks, were returned to clients in random order. Now they will be returned in ascending netblock size order. This lets an organization that has a private mirror, and their upstream ISP, both have a mirror, and most requests will be sent to the private mirror first, falling back to the ISP’s mirror. This should save some bandwidth for the organization.
- If you provide rsync URLs, You’ll see reduced load from the MM crawler as it will now use rsync to retrieve your content listing, rather than a ton of HTTP or FTP requests.
What does this bring Fedora Infrastructure (or anyone else running MirrorManager)?
- reduced memory usage in the mirrorlist servers. Especially with as bad as python is at memory management on x86_64 (e.g. reading in a 12MB pickle file blows out memory usage from 4MB to 120MB), this is critical. This directly impacts the number of simultaneous users that can be served, the response latency, and the CPU overhead too – it’s a win-win-win-win.
- An improved admin interface – getting rid of hand-coded pages that looked like they could have been served by BBS software on my Commodore 64 – for something modern, more usable, and less error prone.
- Code specifically intended for use by Debian/Ubuntu and CentOS communities, should they decide to use MM in the future.
- A new method to upgrade database schemas – saner than SQLObject’s method. This should make me less scared to make schema changes in the future to support new features. (yes, we’re still using SQLObject – if it’s not completely broken, don’t fix it…)
- Map generation moved to a separate subpackage, to avoid the dependency on 165MB of python-basemap and python-basemap-data packages on all servers.
MM 1.4 is a good step forward, and hopefully I’ve laid the groundwork to make it easier to improve in the future. I’m excited that more of the Fedora Infrastructure team has learned (the hard way) the internals of MM, so I’ll have additional help going forward too.
I have the pleasure of moderating the Fedora Project Board Town Hall today, 1900 UTC, having served on the board for five years previously. Held on IRC, these Town Halls give project members a chance to ask questions directly of the five Board candidates, so that you can make a more informed decision when casting your vote. I hope you can join us.