ePADD, archiving eMails with an Open Source tool

Glynn Edwards

About a decade ago, archivists at Stanford University identified email as one of the most important formats to preserve and make accessible to researchers. We did a survey of the field and found that all the previous projects were focused on preservation and not discovery or access. Quite fortuitously, in 2011 we met Sudheendra Hangal, then a PhD candidate Computer Science at Stanford, who had created a program called MUSE as a way of reviewing a personal email archive. It offered capabilities that did not exist elsewhere and allowed us to review images and text for sensitive content. Delivery of email archives to researchers could be done through a standalone system in our reading room.

The idea of developing this software further to create a more robust open-source tool, captivated us all. Working with Sudheendra, we wrote our first grant with NHPRC. In 2015, at the end of two years, we released a basic prototype of ePADD (Email: Process Appraise Discover Deliver). Our main goals were to be able to search for PII and sensitive material and to allow more flexibility for search strategies. Building a discovery site was also important and based on the premise that we needed to publish metadata in order for researchers to find content.

After that, we received a grant from IMLS (2015-2018) to continue development and grow our user community. For the online discovery module we needed to assure the donors, as well as our library directors, that only descriptive metadata would be published. This holds true in our current version (7.2) which is available through GitHub.

This year we received an Andrew W. Mellon Foundation grant and partnered again with Harvard Library to continue development. Our main goal is to redesign the attachment review feature, as it is based on Adobe Flash which will be deprecated in December. Our emerging solution is to create a review panel for all attachments which utilizes Apache Tika to render plain text for many common text-based file types.

Another goal is to work with our partners to develop functional requirements to incorporate preservation actions into ePADD that will enable exports to preservation repositories. We arrived at this strategy after months of meetings focused on building interoperability between Harvard’s EAS and ePADD. This effort now also includes staff at the University of Manchester who have been working with ePADD independently for the past year. Our collaboration with these two institutions has been wonderfully collaborative and has produced plans for future work on email preservation and expanding ePADD’s support for additional languages.

It has been interesting working on a development project this year. Instead of face-to-face meetings over several days with our partners and developers, we are relying on more frequent virtual meetings – it has been much easier than anticipated to pivot to an entirely virtual project.

Glynn Edwards, Assistant Director, Department of Special Collections & University Archives, Stanford Libraries, Stanford University