s3cmd 1.5.2 – major update coming to Fedora and EPEL

As new upstream maintainer for the popular s3cmd program, I have been collecting and making fixes all across the codebase for several months. In the last couple weeks it has finally gotten stable enough to warrant publishing a formal release. Aside from bugfixes, its primary enhancement is adding support for the AWS Signature v4 method, which is required to create S3 buckets in the eu-central-1 (Frankfurt) region, and is a more secure request-signing method usable in all AWS S3 regions. Since releasing s3cmd v1.5.0, python 2.7.9 (as exemplified in Arch Linux) added support for SSL certificate validation. Unfortunately, that validation broke for SSL wildcard certificates (e.g. *.s3.amazonaws.com). Ubuntu 14.04 has an intermediate flavor of this validation, which also broke s3cmd. A couple quick fixes later, and v1.5.2 is published now.

I’ve updated the packages in Fedora 20, 21, and rawhide. EPEL 6 and 7 epel-testing repositories has these as well. If you use s3cmd on RHEL/CentOS, please upgrade to the package in epel-testing and give it karma. Bug reports are welcome at https://github.com/s3tools/s3cmd.

Dasein Cloud at OSCON

While working on Dell’s acquisition of Enstratius, one of the highlights for me was the work George Reese and team have done on the open source (Apache license) cloud abstraction layer – Dasein Cloud.  I’m pleased Enstratius joined Dell, and that the work on building Dasein, and making Dasein available for other uses, has only accelerated.

Please see George’s blog post on his views of Dasein’s progress in just the last few months, and if you’re at OSCON, stop by the Dell booth or the Dasein session and talk to George.

The Open Source Soul of Dell Multi-Cloud Manager

s3cmd sync enhancements and call for help

Coming soon, Fedora and EPEL users with virtual machines in Amazon (US East for starters) will have super-fast updates.  I’ve been hacking away in Fedora Infrastructure and the Fedora Cloud SIG to place a mirror in Amazon S3.  A little more testing, and I’ll flip the switch in MirrorManager, and all Amazon EC2 US East users will be automatically directed to the S3 mirror first.  Yea!  Once that looks good, if there’s enough demand, we can put mirrors in other regions too.

I hadn’t done a lot of uploading into S3 before.  It seems the common tool people use is s3cmd.  I like to think of ‘s3cmd sync’ as a replacement for rsync.  It’s not – but with a few patches, and your help, I think it can be made more usable.  So far I’ve patched in –exclude-from so that it doesn’t walk the entire local file system only to later prune and exclude files – a speedup of over 20x in the Fedora case.  I added a –delete-after option, because there’s no reason to delete files early in the case of S3 – you’ve got virtually unlimited storage.  And I added a –delay-updates option, to minimize the amount of time the S3 mirror yum repositories are in an inconsistent state (now down to a few seconds, and could be even better).  I’m waiting on upstream to accept/reject/modify my patches, but Fedora Infrastructure is using my enhancements in the meantime.

One feature I’d really like to see added is to honor hardlinks.  Fedora extensively uses hardlinks to cut down on the number of files, amount of storage, and time needed to upload content.  Some files in the Fedora tree have 6 hardlinks, and over three quarters of the files have at least one hardlink sibling.  Unfortunately, S3 doesn’t natively understand anything about hardlinks.  Lacking that support, I expect that S3 COPY commands would be the best way to go about duplicating the effect of hardlinks (reduced file upload time), even if we don’t get all the benefits.  However, I don’t have a lot more time available in the next few weeks to create such a patch myself – hence my lazyweb plea for help.  If this sounds like something you’d like to take on, please do!