Have you ever visited archive.org or used their Way Back machine? It’s a catalog of the Internet, and in my opinion one of the most ambitious projects undertaken. The sheer volumes of data astounds me. They don’t measure in Gigabytes, Terrabytes, or even Petabytes. They’re into the Exabytes, and pushing beyond. Cloud computing (and Jeff Bezos) don’t look quite so foolish now.
The site’s mission is to preserve the historical aspect of the net. Granted, some of my earliest ‘net memories aren’t quite the same without the VGA resolution monitor, Netscape, Windows 95, or modem chirps, but the pages are accurate.
The same issues surrounding public records (see the recent interview w/ Barbra Symonds) exist with the Net’s archival. Storing that much information at anyone’s fingertips can be dangerous, especially without any controls. I’m not a proponent of regulations; more so of education. So here goes:
If you’ve played around with Google Hacking or Search Engine Optimization, you probably know a page taken down remains in a search engine like Google’s cache indefinitely – more or less. If it’s instead updated, it’s reindexed and the cache changes.
That same page remains on the Way Back machine – not reindexed, just indefinitely. No updates, no cache changes, just another revision for another month/week/day. Elliott Spitzer’s call girls – indexed. Paris Hilton - logged. The Virginia Watchdog’s privacy work – stored.
Even if a judge orders a cease and desist in the latest scandal, and the site is taken down, most judges are not tech savvy enough to understand the ramifications of the web and the proliferation of digital data. The people who wanted the info already have the Virginia congressman’s social security info, or the former Florida Governor’s Social Security Number on a house purchase. The judge simply can’t erase every person’s hard drive, and nothing’s preventing any one of those individuals from reposting it.
The privacy implications are obvious; the web’s persistence is unyielding. The laws and regulations studied as a Certified Information Privacy Professional (CIPP) exist, but legislation lags the world of technology, most times significantly so.
Let’s face it. Your personal information won’t change any time soon. Your mother will still have the same maiden name, your Date Of Birth (except for women) will remain constant, and without serious appeals, your Social Security Number isn’t going anywhere. Once it’s out there, it’s out for good. And site’s like the Way Back machine will perpetuate any disclosures. Again, it’s not a good thing, or a bad thing, but an education lesson.
Those typos I made 7 years ago in a conference submission – even without the Net’s archive they’re still there, with a relatively high page score. Almost wish I’d spell checked one more time.