Amazon S3 and Cardbox

Amazon S3 is storage for the Internet. Secure, reliable and cheap, it can be used to store any amount of data, at any time, and retrieve it from anywhere on the web. There is no sign-up fee and you pay only for the storage you actually use.

This posting describes how we are enhancing Cardbox to take advantage of the facilities that Amazon S3 offers.

If you don’t already know Cardbox, it’s a fast, straightforward database designed for end users. Anything you can list, Cardbox can store. Its particular strengths are powerful text indexing and the ability to attach photographs and scanned documents to every record. Cardbox is used by everyone from individuals and small businesses to multinationals and United Nations agencies, and there are Cardbox users on every continent including Antarctica. The Cardbox web site tells you more.

When Amazon S3 was released we knew immediately that this was something we wanted to use. The next question was: faced with unlimited, reliable data storage space, how could we extend Cardbox to take advantage of it? We had several ideas, but the one that promised the greatest benefits to the largest number of users was automated backup. (We will describe the other ideas in future posts).

The trouble with backup has traditionally been that it is something that someone has to do. Except in the most sophisticated organisations, someone ends up physically having to burn a CDROM or DVD, or even manipulate magnetic tape. Even when the backup has been done the problems are not over, because the backup has to be stored somewhere safe but accessible, organised so that you can find the right DVD again when you need it.

Automated backups

With Amazon S3, we have been able to add an automated backup facility to the Cardbox Server. In the Server’s configuration pages you tell it how often you want to back up, and the name of the location in S3 where you want the backups to go. And that is all. Cardbox does the rest.

The beautiful thing about letting the Cardbox Server do its own backups is that service doesn’t need to be interrupted while the backup is going on. Cardbox users can continue searching and browsing their data, and even making changes, during the backup. Cardbox knows enough about its own behaviour to be able to store the changes that users make while at the same time ensuring that the backup is a consistent picture of the entire database at one particular instant.

Retrieving from backup

The part of Cardbox that users actually see and interact with is the Cardbox Client – which we usually just call “Cardbox”, full stop, since as far as most people are concerned, the Cardbox Client is Cardbox.

We have extended the Cardbox Client in two ways. First of all, we have added commands to Cardbox’s VBScript macro system to allow objects to be listed, downloaded, and deleted (we have also included uploading, in case sophisticated macro writers need it). Secondly, we are implementing a window that shows a simple listing of all backed-up files and lets the user download or delete selected files. (Typically this feature would be under the control of a responsible administrator, since once an object is deleted from S3, it is deleted).

Summary

  • Automatic backup at regular intervals.
  • Secure and reliable storage on Amazon S3.
  • Simple viewing and management of all backed-up files.
  • Downloading on demand.

Things to be done before release

The features described here are in our internal development version of Cardbox. They will be incorporated in the production version (and formally announced to Cardbox users) soon. The principal work to be done is to add appropriate pages to the Tools > Options and Server Configuration windows, and to add a new command (Tools > Management > Amazon S3) to allow interactive management of backup files.

The future

The main future enhancement we are considering is incremental backup. If you have a database of 100,000 photographs or documents, the chances are that only a few will be added or deleted in one day, meaning that out of a 10GB database, only a few megabytes have changed. Sending only the changes to Amazon S3 will save time and reduce data transmission costs. This feature already exists in Cardbox (if you download a database to your PC from a server with Tools > Management > Download, only the changes get downloaded) but we are waiting to implement the equivalent in S3 until (1) Amazon add some necessary features to their service and (2) enough Cardbox users are using the service to make the development worthwhile.

Advertisements

%d bloggers like this: