How To Keep Everything Forever
The Six Design Factors of Forever (Digital) Storage
When people say “forever” I’m not sure they actually mean it. But when it comes to digital information, especially things like irreplaceable media files, forever is serious business.
All of us developed a false sense of security when everything went digital. Since ones and zeros (the stuff of digital) can be preserved exactly, we thought we were out of the woods. There is a whole industry around preserving all things analog. There is film preservation, document preservation, artifact preservation. Physical things wear out, at some point. All of them do. But digital? We can keep creating as many copies as we want. What could be easier?
Well, if you have a small scale digital preservation job to do, you’re probably ok. You’ll make a few copies. Spin the content off to a giant thumb drive. Put a copy in the safe. You’re good to go. But when the volumes become high, say hundreds of terabytes (TBs) or petabytes (PBs), or the preservation becomes really high stakes, that’s when you get into trouble. Withmany petabytes of digital stuff (media, datafiles, scans, etc) that you need to keep things longer than a few years, you have already experienced the problem.
Who will recover your Dropbox files a century from now?
What technology do I use? How long will it last? Who is behind the technology? What if the company that makes it goes away?
You may have hundreds of thousands or even millions of digital objects. Are they properly catalogued? How would you remember where you put the footage you shot at your college graduation, or your kid’s fifth birthday, or your mother’s funeral?
The things to hold them—storage systems of various sizes—are ever-changing, and always wearing out. But “the cloud” you say? The cloud will save us all ! Maybe. Maybe not. Perpetuity can be pricey. Can you commit to monthly pricing for the next fifty years? What about the next fifty after that? Who’s gonna sign that contract? What if Google goes the way of DEC and Compaq and Atari (tech giants of yesteryear)? Who will recover your Dropbox files a century from now?
Digital gave us hope. Technology presented reality. Forever is a long time. So how do you even think about the problem?
For my day job, I’m a scientist who thinks about systems—how things fit together. And thinking about forever as a scientist, focusing on the problem of the forever digital archive, the following topics help to galvanize a design direction.
The Six Design Factors of Forever Storage
- Fundamental Storage Unit Independence: You are going to put your digital things on some digital storage device. This is a fundamental truth. Within any “system” you will have components that will wear out. A basic principle should be that the end point of your storage should be independent, upgradable, and easy to health-check. By the “end point,” I mean the thing that will actually hold the bits and bytes. If you use CD-ROMs for your home videos, or thumb drives, or USB drives—they are all independent They can go from place to place. They can be copied with precision. They can be stored securely. And once you purchase them, they stop costing you money. For those hundreds of terabytes or petabytes, thumb drives are impractical, but the lesson remains: you need component independence.
- Industry Standard: If you are preserving content forever, you want it in industry standard / non-proprietary formats. It is simply impossible, years from now, to guess at what proprietary systems might survive. We know about file formats—MPEG2, MP4, etc. etc. When it comes to authoring systems, it is probably impossible to avoid unique plugins and workflows from software like Avid, ProTools, and Adobe. But you certainly want your “mezzanine” files, your highest resolution files, in some kind of industry standard. But what about the physical file containers. Yes, a disk drive is a kind of standard, but what does it take to get the data out of the hard drive? In the same way that your Fundamental Storage Units should be independent, they should also be formatted in a way that no third party is needed to retrieve the data. USB drives, thumb drives, CD-ROM and DVD drives, all of standard, non-proprietary interfaces but they cannot be aggregated, and the storage volumes are small next to what the industry really needs. But try to pull a single disk drive out of a storage array to make sense of it? It may have bits and pieces of something you are looking for, but the odds are that it will be incomprehensible and probably unreadable. If you are storing your data in a way that any given company holds the keys to your access, whether through the complexity of their system, or a proprietary way to access the content, you are jeopardizing forever. In other words both the file formats AND the media itself should be free of proprietary constraints.
- Wear & Tear—Replacing Things That Wear Out: You have to think of your “system” as a set of components. And virtually anything in that system can and will wear out. So you need a system where you can replace any individual component without upsetting the integrity of the entire system. If you have a server that holds a database, can the database be recreated if the server goes down or blows up? Will your Fundamental Storage Units survive if all of the other components fry? Does the system support redundancy (multiple copies) of your most critical items—your digital files. Can you run maintenance on the system, one component at a time, without disrupting your normal operations?
- Expandable Repository and Namespace: How do you store digital items of unknowable volume? If you have independent Fundamental Storage Units, you will need a database that knows where things are. A system’s ability to identify individual objects is a function of its “namespace.” Namespace is quite literally the space used to store pointers to files. The perfect namespace can expand at will, without changing anything else about your workflow. Of course there may be some practical limits on your namespace—billions and billions—but it should be big enough, or expandable enough, such that your notion of “everything” and “forever” will both be covered. It’s probably obvious, but with the notion of independent Fundamental Storage Units, your ideal system will separate the database of what is where (metadata) from the actual storage devices.
- Referential Integrity: Referential integrity is a database term that refers to how data systems link together. When you go on a website, and you click a button, and it says “link not found” or “file not found”—that’s a lack of referential integrity. If you are running a media asset management system, and you want to find a set of files from long ago, the worst message to get is “file not found.” Although referential integrity does refer to how your database is built, it also refers to how a database system can create internal and external redundancy. I know this one is a little complex. But if Tom knows what Mary knows. And Harry knows what Tom knows. Then by extension, Harry knows what Mary knows. And if you build the data about your data (your metadata) so that everything talks to everything else, you can maintain your referential integrity (and keep your job!)
- Sustainability and Cost: Sustainability includes your carbon footprint as well as your costs, both of which someone will have to account for long after you’re dead. In theory, as storage systems become more dense and more efficient, your digital storage costs should drop over time. The worst problem would be to have your “forever content” in a place that is hard to leave, and where the costs are beyond your control. Certain costs in the future will likely rise, for example energy. But it storage density outpaces the rise in energy costs, your total cost and energy consumption should go down. Remember, we’re talking about forever, so long term trends are important!
Sustainability includes your carbon footprint as well as your costs, both of which someone will have to account for long after you’re dead.
Designing for the Long Haul
Yes, the future is always uncertain. But with perpetuity in mind, you can design a system that, in theory, can hold everything–forever.