The Cardbox email archive

It’s done, and I’ve extracted 46,000 emails from our archives and imported them into a Cardbox database. It took 8 minutes in all, and every word in every email is indexed for fast retrieval. The Cardbox database is 231MB in size.

I won’t be posting a public spam database yet because of the trouble of going through it to make sure that nothing private has found its way in accidentally, but if anyone wants to see one then let me know: I’ve now set up a dummy email address specifically for spam and nothing else, and once the spams start coming in I’ll look at making a Cardbox database out of them.

Advertisements

%d bloggers like this: