Archive for the ‘Amazon S3’ Category

S3 in Business: 4 – My Tunesafe

17 August 2006

[Complete series]

Last time I described tunesafe.com, a beautiful, smooth, elegant service with just one drawback: if the company behind it went bankrupt or changed its strategy, your service would end. With backups, this matters: you don’t want your backup data’s survival to depend on someone else’s financial stability.

So let’s take a different approach. Open an Amazon S3 account of your own. Amazon make this easy. They let anyone do it. They don’t charge you for opening the account and, month by month, they charge your credit card with just the cost of what you have used: data storage, and upload and download.

There will still be a program called Tunesafe on your PC, but this time it will be rather cleverer. It will maintain a duplicate copy of your iTunes music library directly on Amazon’s S3 space without needing a tunesafe.com server to help it. It will communicate directly with S3 for all its data storage and retrieval. If our tunesafe.com business disappears tomorrow, your Tunesafe program will carry on working just the same. Your contract is with Amazon: as long as you pay your bills, Amazon S3 will store your data.

There is just one snag left. We have arranged for you to pay for your data storage yourself – that’s no problem – but who pays us for the Tunesafe program that you’re using, and how? If the thing doesn’t get paid for somehow, then it won’t get written and you won’t be able to use it.

Next time, I’ll go into the ways in which the writer of Tunesafe might be able to make money out of it. This will include a suggested modification to the current S3 service that could lead to an explosion of new businesses built on S3.

S3 in Business: 3 – tunesafe.com

16 August 2006

[Complete series]

Last time I introduced Tunesafe, a program that backs up your iTunes music library continuously, invisibly, automatically. That’s how the user sees it, but how do we actually provide the Tunesafe service?

First of all, let’s see how it would work if we did it all by setting up a server on the Internet at tunesafe.com.

Technical

On the Internet there is a site called tunesafe.com. On that site there is a space that is designated as yours and is accessible only to you. That space is intended to hold an exact image of your entire iTunes music library.

On your PC there is a program called Tunesafe. The Tunesafe program lurks quietly on your computer. Whenever there isn’t much going on, it goes into action automatically. You could configure it to work at night, or only in the day (if it’s a home computer and you’re at work), or whenever the computer has been idle for a while.

The Tunesafe program makes a picture of your iTunes world: what music tracks there are, what they contain, and what their names are. It asks tunesafe.com for a similar picture, compares the two, and if any changes are needed to bring tunesafe.com up to date, (such as uploading new music tracks), it makes them.

Uploads can take a long time (we’ll cover this in a later posting) so Tunesafe is happy to be interrupted whenever you need the computer for something else: the next time it wakes up, it doesn’t begin again from the beginning but continues from where it left off.

Once everything has been backed up, Tunesafe continues to wake up at intervals, to see if you’ve made any changes to your iTunes music library: if you have, it makes corresponding changes to your space on tunesafe.com, so that the backup is always up to date.

It looks to you as if everything is being handled by tunesafe.com, but behind the scenes, all the music is stored on Amazon S3, so the tunesafe.com server that we set up won’t need much in the way of data storage.

In the same way, although it looks to you as if your music is being uploaded to tunesafe.com, it doesn’t really pass through the tunesafe.com server at all but directly from your PC to the S3 servers. Thus tunesafe.com doesn’t need to have much data transmission capacity either.

Commercial

The tunesafe.com site has to pay for its own server and it also has to pay Amazon the fees for S3 data transmission and storage.

How does tunesafe.com get the money? S3 charges for both data transmission and data storage. Because of the way iPods are normally used, it may be possible to charge a flat subscription rate or a graduated one according to capacity (so much for a 20GB iPod, so much for a 40GB iPod). After the initial 20GB (or 40GB) of upload, our data transmission costs are likely to be practically zero.

Otherwise we could charge for usage. Since the server knows what has been transmitted for any given user, and what is being stored for that user, it could charge the user’s credit card each month (just as Amazon are charging us) or it could take a deposit, eat away at it, and when it was about to run out it could ask for an extra deposit.

Technical risks

With backup, it pays to think in the long term and it pays to be pessimistic.

Anyone who has been in computing for any length of time has come across bitrot – the decay of digital data.

Physical bitrot is when you want to retrieve important data from an 8″ floppy disk five years after you threw away your last 8″-disk PC (for a high-profile public example of this problem, see this article about the BBC Domesday Project). Amazon S3 eliminates physical bitrot because it eliminates our contact with physical media.

Logical bitrot is when the bits and bytes are readable but incomprehensible. NASA nearly lost the archives of their Viking project to this: as this article describes, they had CDROMs full of data from the Mars landing but no-one could remember what the data meant. We take great care that Cardbox should not suffer from this: we have always ensured continuity of data formats and can still retrieve users’ data from 8-bit Cardbox databases of the early 1980s. But we have been hit by logical bitrot ourselves. For almost a year we used a commercial backup program to create compressed backups on CDROMs, and then the manufacturer released a new version for Windows XP. The new version couldn’t read backups created in the old format. We abandoned that program but the year 2003 remains a thin one in our archives.

Protecting against logical bitrot is Tunesafe’s responsibility and we’ll take it seriously when we set up the system. The aim will be to allow retrieval of backup in some form even if Intel-based PCs become obsolete and the Tunesafe program dies with them.

Commercial risks

Bitrot isn’t the only risk. You are storing your valuable iTunes data in storage space that isn’t yours and that you don’t pay for directly.

  • What if the people behind tunesafe.com get bored and decide to do something else?
  • What if they go bust?

The trouble is, the commercial relationship is between tunesafe.com and Amazon: you, the user, are not involved. So if tunesafe.com stop paying Amazon, Amazon will quite reasonably erase tunesafe.com’s data – including your music.

We can try to build up your confidence in the stability and trustworthiness of tunesafe.com; but in the long term most businesses fail. It is 25 years since we launched the first version of Cardbox. None of the giants of the software world that were around then are still alive and independent today.

You can trust us” is a good message for any business to convey; but “you don’t even need to trust us” would be even better.

Next time, I’ll describe how Tunesafe can be made to work without a central server, so that if tunesafe.com disappeared tomorrow you could continue to make backups and retrieve your data from your backups for as long as you wanted.

S3 in Business: 2 – Tunesafe

15 August 2006

[Complete series]

Last time I promised you an outline of an Internet business that could be built on Amazon S3, so that we could see whether S3 was a solid enough foundation for a real business.

Rather than invent yet another Web 2.0 application, I’m going to concentrate on backup. We’ve had a little practice with this (the Cardbox Server already has automatic Amazon S3 backup built in) but the business I’ll be talking about will be much simpler than Cardbox and it’ll concentrate on nothing but backup.

The business need

I am impossibly virtuous when it comes to backup. Floppy disks (three sizes), tape (several sizes), CD, DVD, memory stick – I have used them all. Friends with data disasters have learnt not to cry on my shoulder, because they hate to see me trying not to look impossibly smug.

But I have my Achilles’ heel. It’s called an iPod.

My iTunes music library contains 23.4 gigabytes of music. That means six DVDs of slowly changing data, so obviously I don’t back it up daily. In fact I back up my iTunes once a month. Well, actually I backed it up once a month once, in February and I haven’t done a backup since. Here are my arguments:

  • The music exists in duplicate anyway, one copy on my iPod and one on my computer.
  • Most of my music comes from CDs, so I can always import it again if I have to.
  • Now that I’ve imported all my CDs, nothing much changes from one month to the next.
  • It hasn’t gone wrong yet, so it won’t go wrong in the future.

To take the arguments in turn:

  • Have you ever tried to copy music back from your iPod to your PC? Having tried, have you succeeded without making an unholy mess of the track titles? And anyway, are you 100% confident that iTunes, faced with an intact iPod and a wrecked PC (or vice versa) won’t replace the good data with the bad?
  • How many days and nights of soul-destroying labour will it take to re-import everything and correct the idiotic track names that the online databases assign to the music? And: when I said “most of my music comes from CDs”, that leaves a certain amount of stuff from other sources; especially the iTunes Music Store as well. Do I really want to buy every track all over again?
  • With classical music, the track data are a horrible mess. At least one night a week I do a little tidying up: for instance, I try to merge Jean Sibelius, Sibelius, “Sibelius, Jean” and “Sibelius, Jean (1865-1957)” so that they all look like the same person. Over the months, that adds up to a lot of work that can never really be duplicated. I wouldn’t have the time or the energy to sit down and do it all over again.

These are powerful arguments, but they’re still not enough to make me sit down regularly and create another six DVDs to add to all the others I’ve already got. So I continue to keep my fingers crossed…

The business

This is where our new S3-based Internet business comes in. It’s called Tunesafe.

  1. I sign up to Tunesafe. I download and install a small program from Tunesafe.
  2. Every night, invisible, automatically, the Tunesafe program uploads part of my iTunes music library to the Tunesafe service.
  3. Eventually all my library is there. From now on, the nightly backup only uploads any changes I have made recently, so that my Tunesafe backup continues to be an exact reflection of what I have in iTunes.

All in all, I have absolute data security without ever consciously having to back anything up at all.

The nice thing about constructing a business round backup is that it is a good, stable, long-term application. Web 2.0 businesses come and Web 2.0 businesses go, but backup goes on for ever. And people’s backup requirements can only increase with time.

Next time, I’ll describe how this will work if we adopt the straightforward approach of making the tunesafe.com server the central point of contact and hiding all the S3 functionality behind it.

S3 in Business: 1 – Introduction

14 August 2006

[Complete series]

Amazon’s S3 Simple Storage Service is a metered data storage service that works across the Internet.

Starting an Internet business

S3 means that you don’t have to spend time deciding how many terabytes of space you might need (wasting money if you buy too much and risking your existence if you buy too little). You don’t have to have high-security data centres with uninterruptible power supplies, with staff ready to rush in at the bleep of a pager if anything goes wrong. You don’t have to struggle with the dilemma of backup: back up too little and you risk massive data loss, back up too much and you risk one of your backups going missing, full of confidential information.

With S3, you just sign up and pay by the gigabyte. The more you use, the more you pay; the less you use, the less you pay. Amazon pay for a distributed network of data centres; Amazon pay for the staff; Amazon have software and procedures ensure that if one data centre is hit by a hurricane then another one will already have copies of all the data.

S3 seems the ideal choice for Internet startups that need data storage, and it’s already catalysing the invention of new Internet computing architectures. Whether companies will continue using it beyond the startup and adolescent phases remains to be seen, but even if S3 finds its place in the market as something you aim to grow out of eventually, it will have done a good job. It is far easier for a potential investor to assess an application with 10,000 users than it is to assess one that exists only on a whiteboard.

Individual computer users

S3 sounds perfect for backup and archiving. For every obsessive maker of daily backups, there are ten “wing and a prayer” people who know that their computer won’t go wrong today because it didn’t go wrong yesterday. And (as I fill yet another cupboard with my daily DVD backups, written and never used) I can see why they think like that.

With S3, a computer user need never make backups again. A simple invisible program (there are several on their way to the market) will work quietly in the background, taking anything that hasn’t been backed up recently and making copies of it on Amazon’s servers.

Too good to be true?

Altogether S3 looks like a Good Thing. It makes data storage really, really dull and boring. Duller than electricity, even: most electricity users have matches and a candle somewhere, just in case, but S3 doesn’t even need candles.

But is Amazon S3 boring enough to use? Can you build a business on it? If you do build a business on it, will anyone fund you or will their analysis throw up risks you never thought of?

These are the questions that this series of blog postings will explore.

These postings are intended to be non-technical and comprehensible to real human beings. If I say anything that’s unclear, please add a comment to the posting and I’ll do my best to correct it.

Next time, to make the whole discussion more concrete, I’ll describe a business that could be built on top of S3.

Cardbox/S3 photographic archive

13 July 2006

The Amazon S3 discussion forum has a thread on using S3 for photographic archiving. Cardbox and S3 make an ideal match for this.

You can have a Cardbox database on your computer that indexes all your photographs (on as many criteria as you like) and contains a low-resolution copy of each photograph (“low-resolution” means whatever you want it to mean – thumbnail, or 320×200 or 640×480 or 800×600) so that you know exactly what you’re talking about. Each record will also have a link to your original high-resolution picture stored on Amazon S3: the link can be of a kind that is inaccessible to anyone but you. Our previous post tells you more about using Cardbox in conjunction with S3.
Basically, to do this you’d need nothing more than Cardbox plus your own Amazon Web Services / S3 account.

A few bonuses:

  • If you want one of your customers to be able to download a photograph, you can give them a link to the high-res version: the link can expire automatically at a set time.
  • If it’s appropriate, you can even give your customers read-only access to one of the databases. This will let them view your low-resolution pictures and identify exactly the ones you want.

We’ve created a simple sample database that will show this in action. This page has a link that lets you download the database, and it has full instructions on how to get going with your own photogaphic archive.

Amazon S3 and Cardbox

11 July 2006

The project to add Amazon S3 facilities to Cardbox has been completed. (If you’re not familiar with Cardbox then you can read about it here and get a free 1-month trial licence here).

Amazon S3 is storage for the Internet. Secure, reliable and cheap, it can be used to store any amount of data, at any time, and retrieve it from anywhere on the web. There is no sign-up fee and you pay only for the storage you actually use.

In the new build of Cardbox that is released today, we have incorporated the following features to make use of S3:

The Cardbox Server (used in multi-user and networking systems) can be configured to back up your databases automatically, at specified intervals, to your S3 storage space. There is no interruption in service during a backup, because the Cardbox Server can back up a database even if it is in use. You can have a single backup copy of the latest version of each database, or you can have a separate backup made each time that the database has changed.

Cardbox itself has a new command that lets you view your S3 storage space and upload, download or delete files. Apart from managing the Cardbox Server’s backups, this also allows you to make backups of files on your own computer.

Cardbox’s VBScript macro system has been expanded to allow you to manage your S3 storage space programmatically. This offers interesting prospects for the future: for instance, a photographic database that stores and displays each photograph at a reasonable screen resolution while the original full-resolution images are held in an archive on S3.

Amazon S3 and Cardbox

13 June 2006

Amazon S3 is storage for the Internet. Secure, reliable and cheap, it can be used to store any amount of data, at any time, and retrieve it from anywhere on the web. There is no sign-up fee and you pay only for the storage you actually use.

This posting describes how we are enhancing Cardbox to take advantage of the facilities that Amazon S3 offers.

If you don’t already know Cardbox, it’s a fast, straightforward database designed for end users. Anything you can list, Cardbox can store. Its particular strengths are powerful text indexing and the ability to attach photographs and scanned documents to every record. Cardbox is used by everyone from individuals and small businesses to multinationals and United Nations agencies, and there are Cardbox users on every continent including Antarctica. The Cardbox web site tells you more.

(more…)

Denial-of-service attack: executive summary

12 June 2006

Here is a summary of the attack described in detail in our previous post.

(more…)

Denial-of-service attack on Amazon S3

11 June 2006

You can see a non-technical summary of this article here.

Amazon S3 is a new service which uses Amazon’s world-wide network of computers to provide fast, secure, and essentially infinite data storage on the Internet, metered and paid for according to usage. It is beautifully implemented and it is the kind of elegant technology that makes you want to need it.

We are working on a new Cardbox feature that will use Amazon S3 and we’ll announce it as soon as it’s ready for people to test. But in the course of development we’ve come across a vulnerability. An attack aimed at this vulnerability makes Amazon S3 (and any data stored on it) completely unusable by the victim. Note: the vulnerability is not inherent to Amazon S3 itself and the attack would work against any similar service.

Interestingly, the attack only works if you are using security software to protect yourself from computer worms and viruses: it is your computer’s own immune response that does the damage.

(more…)