Saving Money with Amazon S3 and Bittorrent

I’m not the biggest fan of Amazon lately, but if you happen to be using S3 for hosting big downloads, or if you want to permanently publish a file using bittorrent without having to maintain your own seed for the rest of time, S3 has a little-used feature that could save you a lot of trouble — and potentially money.

Seeding Torrents from S3

It turns out that S3 will publish and seed a torrent for any publicly-available file stored in S3. This is pretty easy to set up:

  1. Upload a file to S3 and make it public
  2. Visit the file’s URL with ?torrent appended
  3. After a delay, you’ll get a .torrent for that file; save it to your computer
  4. Amazon will seed that torrent for as long as the file remains public

For example, if your uploaded file were available at http://bucketname.s3.amazonaws.com/my.mp3, the URL to get a .torrent for it would be http://bucketname.s3.amazonaws.com/my.mp3?torrent.

S3 doesn’t generate the torrent until the first time it’s requested, so you may have to wait a while for the .torrent to be generated if the original file is large.

In my experience, when demand outstrips supply, Amazon will actually temporarily spin up additional seeds in order to keep download speeds up (each individual Amazon seed seems to max out around 72kbps). In terms of billing, you’re charged (at the normal S3 rates) for all data downloaded via the Amazon seeds, but peer-to-peer transfers and downloads from other seeds would obviously be free for you.

Technical details of working with Bittorrent and the S3 REST API can be found in Amazon’s developer documentation.

Saving Money

There are basically two scenarios (that I can think of) in which seeding from S3 has the potential to save money:

  1. You’re serving popular downloads from S3 and start using Bittorrent (which reduces the amount of data served from S3 for a given number of downloads)
  2. You’re seeding torrents from an EC2 instance (or other hosted server), where bandwidth costs are typically higher than from S3

In the first case, potential savings are going to be largely proportional to how busy the torrent is. If you only ever have one person downloading at a time, costs will be pretty much the same as if people were downloading via HTTP directly.

In the second case, any difference is going to depend on the exact pricing structure you’re dealing with — for example, the first gigabyte downloaded from EC2 in a billing cycle is free, so if your EC2 seeds never serve significantly more than that, seeding from S3 is actually the more expensive option.

In both cases, savings aren’t guranteed; it’s important to keep an eye on costs and run the numbers. If you aren’t measuring, you’re losing.

To sum up:

Advantages

  1. Setting up torrents for files you already have in S3 is extremely simple
  2. You don’t have to maintain a seed or a tracker yourself
  3. Versus direct downloads from S3, you’re only billed for bytes downloaded from Amazon’s seeds

Limitations

  1. S3 won’t generate torrents for:
    • multiple files at once; multi-file torrents aren’t supported
    • files larger than 5GB
  2. You’re stuck using Amazon’s tracker if you want Amazon’s seeds to work for you. (On the other hand, it’s not that difficult to edit a .torrent to add extra trackers.)
  3. In some lower-usage situations it’s possible that — compared to a seed running on EC2 — S3 bandwidth costs would actually be more expensive.