Getting Started in Digital Preservation: Taking Your First Steps

Sharon McMeekin

Sharon McMeekin

One of the most common questions we get asked at the DPC about digital preservation is “how do I get started?” Many people are worried that digital preservation will require large financial commitments and lots of technical knowledge, beyond what they can ever hope to have available.

While digital preservation can be a complicated beast, there is no one right way to do it. In fact, the answers to most questions about digital preservation invariably start with “it depends….” as approaches are always context dependent. If you are just getting started, or have limited resources, there are simple first steps you can take to begin securing your digital content, and I’ll take you through them (and some useful resources) in this blog post. For more information please visit the Digital Preservation Handbook on our website: https://www.dpconline.org/handbook

Get to Know Your Organization and Your Data

Creating a Digital Asset Register

It is vital to understand the nature and extent of your digital collections, and you can capture information about this in a document called a Digital Asset Register (DAR). A DAR can be incredibly useful in assessing the extent and significance of your digital collections, identifying priorities and risks, and planning digital preservation actions. Start with a high-level assessment of the digital content and follow with more detailed mapping later, since a comprehensive and detailed audit can be time consuming. So the advice in early stages is to keep the Digital Asset Register simple. Think about answering the following questions:

  • What subjects are covered in the collections?
  • Where do the collections come from and what are their functions?
  • Where is the relevant digital content stored and what kinds of media are used?
  • Why are the collections being retained?
  • Who is responsible for the collections; who are the users; who are the subjects of the data?
  • How is the digital content accessed?
  • How is the digital content likely to change and grow in the near future?

The Digital Asset Framework (http://www.dcc.ac.uk/resources/tools/data-asset-framework) provides a more formal framework for carrying out an assessment.

Assess Your Organization’s Readiness

As well as understanding the digital content your organization will be managing, it is also important to understand your organization’s current digital preservation capabilities. This will help you plan for the future, and potentially advocate for more resources. A useful type of tool for this process is the maturity model. There are a number of digital preservation maturity models available, so it is worth thinking about which one will be right for you. Some models to consider:

Securing the Bits

We obviously also need to take some more technical steps to ensure continued access to our digital content. In digital preservation we talk about “Bitstream Preservation”, which aims to preserve the 1s and 0s that make up all digital content. To do this there are some important steps you can take:

  • Copy the digital content from legacy and/or removable media such as CDs or USB drives onto more reliable storage as soon as possible.
  • Create a “Verifiable File Manifest”. At a minimum this should be a list of the files, their storage locations, and a checksum for each file (this is an alphanumeric string of characters that is generated by a software tool to represent a file’s structure, more on this in a moment). There are many free “characterization” tools that can generate this information for you, including DROID from The National Archives (UK). https://www.nationalarchives.gov.uk/information-management/manage-information/policy-process/digital-continuity/file-profiling-tool-droid/
  • Create copies of your digital content in case something goes wrong (which it too often does…) A possible solution is:
    • A copy held locally on hard disk to deliver convenience and rapid speed of access.
    • A second copy on magnetic tape (perhaps held at a different geographical location to mitigate the potential for natural disaster). This provides resilience at a low cost.
    • A third copy in the cloud will provide further resilience, again at low cost. Some cloud providers charge low rates for writing data and high rates for reading it. This third copy might be storage of last resort, to be accessed and read back only if the other copies become lost or damaged.
  • Carry out regular integrity checks. This refers to the use of checksums to make sure that no changes have happened to your digital content over time, helping to check for errors and ensure authenticity. By storing a checksum for each file in a document such as a verifiable file manifest, we can generate another checksum from the file at some point in the future and if the two checksums match, we know no changes have occurred. If they do not match, we must act, e.g. retrieving a copy from another storage location. Ideally integrity checks should be carried out at regular intervals (e.g. every six months) and also whenever a file is moved. There are many tools for integrity checking, including Fixity by AVP https://www.weareavp.com/products/fixity/

For more help with establishing digital preservation workflows, check out the great new guidance from The National Archives (UK): https://nationalarchives.gov.uk/archives-sector/projects-and-programmes/plugged-in-powered-up/digital-preservation-workflows/

Some More Resources

Before I go, I just wanted to point out some more resources for those starting out in digital preservation. So, here are a few of my favourites:

And some more things from the DPC:

Sharon McMeekin, Head of Workforce Development, Digital Preservation Coalition
@SharonMcMeekin