The fidelity and permanence of digital data has been on my mind. My drives are filling up—again. And alarmingly, some of the older drives are not mounting well, or not mounting at all.
I know of data heroics I can turn to and hopefully rescue the files—although the first step is to make sure they aren’t duplicated safe and sound elsewhere. I could spend a month looking, and may find them. But still it is distressing when the integrity of cherished images taken a dozen years ago turns out to be imperiled.
It’s odd, but my old pre-digital scrapbooks don’t seem to face this challenge. Nor do those of my parents, nor those of my grandparents, etc., including their metatada—the penciled notes on their backs. It’s true, they are not digitally searchable, but they do show up very reliably when I turn a scrapbook page.
I was thinking about the problem while on a trip to England in early November. I was at the Bentley Year in Infrastructure conference where the buzz was all about connecting data across software titles and systems for project execution, but I found myself thinking instead about connecting data across years.
Insuring data integrity and permanence is possible, I guess, with a sufficiently robust system of backups. And in the pre-cloud days we could do that with redundant servers and mirrored drives and the religious observance of routine. I am sure most of us have done that with complete success. Raise your hands.
With cloud data storage services, of course, the problem goes away, according to the service providers. You upload your files and the whole business of copying and backing up and insuring data integrity goes away… as long as you pay the monthly bill. I don’t know what happens if you fail to keep paying the bill, though.
That’s a sticking point. I don’t object to paying fairly for services I need, but I wonder that I should need such services at all.
Two events during my recent travels make me want to question the premise that data is fragile and should require significant effort to insure its integrity. The first was a little sketchbook one of my wife’s relative’s has in Southam, Warrickshire.
My wife and her "cousins" were talking about family history and our hostess produced a sketchbook of one of them so we could admire the clever phrases and quotations, and the handwriting and sketches of ordinary things by a common ancestor and his friends. It was lovely to hold the little green book and flip through its pages, until my eye fell on one sketch dated 1912.
This little package of data was communicating effortlessly to us across 100 years, with no translation errors, no file format failures, no loss of fidelity. It wasn’t even brown around the edges, It was perfectly legible. I tried to imagine someone in 100 years opening my files so easily. I can’t even open some of my own files created a decade or so ago.
The second wake-up came as I was reading The Selfish Gene, a book by Richard Dawkins, a seminal thinker about evolutionary biology, who wrote that landmark book published 40 years ago. Dawkins argues in that book that the common understanding of Charles Darwin’s of “survival of the fittest” philosophy about evolution is focused on a misunderstanding. It isn’t the fittest creature that ensures survival of its offspring by passing on successful genetic characteristics, it is the fittest genes, themselves, that engineer successful creatures, which reproduce successfully and ensure the conveyance of the genetic code across generations though autonomous, high-fidelity data reproduction.
It is not about creature survival; it’s about high-fidelity data replication. The genetic codes of successful creatures have replicated faithfully for hundreds of thousands of years, without engineered intervention—nor monthly bills.
That’s something to aspire to, and perhaps to model our next phase of software development on.