s3cmd sync enhancements and call for help

Coming soon, Fedora and EPEL users with virtual machines in Amazon (US East for starters) will have super-fast updates.  I’ve been hacking away in Fedora Infrastructure and the Fedora Cloud SIG to place a mirror in Amazon S3.  A little more testing, and I’ll flip the switch in MirrorManager, and all Amazon EC2 US East users will be automatically directed to the S3 mirror first.  Yea!  Once that looks good, if there’s enough demand, we can put mirrors in other regions too.

I hadn’t done a lot of uploading into S3 before.  It seems the common tool people use is s3cmd.  I like to think of ‘s3cmd sync’ as a replacement for rsync.  It’s not – but with a few patches, and your help, I think it can be made more usable.  So far I’ve patched in –exclude-from so that it doesn’t walk the entire local file system only to later prune and exclude files – a speedup of over 20x in the Fedora case.  I added a –delete-after option, because there’s no reason to delete files early in the case of S3 – you’ve got virtually unlimited storage.  And I added a –delay-updates option, to minimize the amount of time the S3 mirror yum repositories are in an inconsistent state (now down to a few seconds, and could be even better).  I’m waiting on upstream to accept/reject/modify my patches, but Fedora Infrastructure is using my enhancements in the meantime.

One feature I’d really like to see added is to honor hardlinks.  Fedora extensively uses hardlinks to cut down on the number of files, amount of storage, and time needed to upload content.  Some files in the Fedora tree have 6 hardlinks, and over three quarters of the files have at least one hardlink sibling.  Unfortunately, S3 doesn’t natively understand anything about hardlinks.  Lacking that support, I expect that S3 COPY commands would be the best way to go about duplicating the effect of hardlinks (reduced file upload time), even if we don’t get all the benefits.  However, I don’t have a lot more time available in the next few weeks to create such a patch myself – hence my lazyweb plea for help.  If this sounds like something you’d like to take on, please do!