Keeping databases consistent

We've been experimenting with using Amazon Web Services to extract book data from Amazon's database and import them into Cardbox. It all fits into a nice little macro and we'll publish the macro in the Knowledge Base in due course. You will be able to type in an ISBN and the macro will fill in all the book details for you. If you have a barcode scanner then you won't even need to type.
In the course of creating the macro we noticed that there was no way for us to get hold of the synopsis that appears on Amazon's book pages. Further investigation showed that the synopsis was available if we looked at amazon.com but not if we looked at amazon.co.uk. A thread on the relevant Amazon forum says, essentially, "this behavior is by design" (to use a favourite phrase of Microsoft's).

We thought we'd investigate further. Here are the details of a book published in the UK, as shown in amazon.co.uk and amazon.com.

amazon.co.uk amazon.com
ISBN 1852851880
Author Halik Kochanski
Title Sir Garnet Wolseley: Victorian Hero
Publisher Hambledon Continuum Hambledon & London
Date 1 July 1999 April 2001
Pages 351 204
Subjects 1833-1913, Biography, Biography & Autobiography, Biography/Autobiography, Biography: historical, General, Generals, Great Britain, Great Britain – History – 19th Century, Historical – General, History, History – General History, Ireland, Military, Military History (General), United Kingdom, Great Britain, Victoria, 1837-1901, Viscount,, Warfare & Defence, Wolseley, Garnet, Wolseley, Garnet Wolseley, 1833-1913, Biography, Biography & Autobiography, Biography/Autobiography, General, Generals, Great Britain, Great Britain – History – 19th Century, Historical – General, History, History – General History, Military, Military History (General), Victoria, 1837-1901, Viscount,, Wolseley, Garnet Wolseley,, Biography & Autobiography / Historical, Wolseley, Garnet Wolseley
Description
(amazon.com only)
"Before leaving England he placed his finger on a map of Egypt at the point now known to fame as Tel-El-Kebir, and said 'That is where I shall beat Arabi'".No Victorian was a greater hero for a longer period than Sir Garnet Wolseley (1833-1913). The leading British general of the second half of the nineteenth century, he personally took part in a significantly influenced every campaign between the Crimea and the Boer War. To Disraeli he was ‘Our Only General’, while to many soldiers and to the public at large he epitomised the virtues they most admired: exceptional personal bravery and an unshakeable belief in the virtues of the British Empire. The phrase ‘All Sir Garnet’ was a guarantee that everything was under control. Seen from another angle, Wolseley’s career reflects a number of weaknesses. To control a global empire Britain had a powerful navy but only a small army. Its ability to deploy a force of limited size throughout the world, almost always against untrained and underequipped native armies, gave the dangerous and ultimately disastrous illusion that Britain was as formidable by land as it was by sea.

Remember, this is the physically identical book. And yet there are significant differences between the entries. Even the number of pages is different!

This is a lesson for anyone planning a system round a database. Never imagine that you can keep the same data in two places. The data will diverge with time. Someone will change one copy and not the other; and then someone else will change the other copy too. After that, it is too late. No-one at Amazon is going to go through hundreds of thousands of inconsistent records, comparing them, checking with the publisher which one is correct, deciding which data to keep. It is simply too expensive. The cost of reuniting disparate data exceeds the cost of entering the data in the first place.

Sometimes organisations with several offices optimistically think that they can keep their Cardbox databases synchronized between sites simply by regularly emailing changes to each other. We know from experience that this always fails in the end, when someone forgets to apply an update, or someone edits something he's not meant to edit. Unless you one site is the "owner" of the data and no-one else is allowed to make any changes at all, the only safe solution is connecting to a Cardbox Server somewhere on the Internet. The Cardbox protocol is efficient, we tell them, and even a dial-up line will work…

Now we can show them that this is not just sales talk. No-one can accuse Amazon of being amateurs but, as this example shows, the hazards of split data afflict even the largest organisations and the damage can be effectively irreparable.

Advertisements

%d bloggers like this: