Digital Formats Sustainability: A Work in Progress
Digital content must be formatted in order to be usable. The data—whether text, image, sound or video—must be given a structure and stored in a file. There are now vastly larger amounts of information created in a greater variety of formats than ever before, making it increasingly difficult for libraries to identify what is of value and ensure its longevity over time.
To help its staff plan for the future, the Library of Congress created the Sustainability of Digital Formats Web site. This ever-expanding resource provides internal guidance on strategic planning issues regarding digital formats and assists the Library in managing and preserving some of its most valuable digital materials.
The Formats Web site lists information on about 200 current and emerging file formats and their variants, including detailed documentation that will help the Library manage content created or received in these formats. The site identifies and describes formats that are promising for long-term sustainability and helps develop strategies for sustaining these formats.
The fluidity of digital formats was identified as an early concern of the National Digital Information Infrastructure and Preservation Program. Formats can change rapidly as designers alter features, and individual file formats can be very complex. For example, the widely used Portable Document Format (PDF) generally represents page-oriented documents, but these documents can be laden with images, graphics and multimedia content such as video and audio. The current PDF format specification is more than 1,300 pages long! The complexity and fluidity of format families like PDF make a strong case for the importance of a resource like the Formats Web site to track information about digital formats.
The Formats site evolved out of a concern for the future preservation of the digital files in American Memory, a Library program to provide free and open access through the Internet to digital materials that document the American experience. From the start (1994), American Memory's planners wanted to ensure that their content would remain viable over time, making significant efforts to select digital formats with staying power.
The formats that are described on the site are those that the Library is likely to accept for materials submitted for copyright deposit and omits many formats that are unsuitable for materials to be archived. Proprietary binary formats used by word-processing and desktop-publishing software, for example, are not good choices for long-term management in the Library's permanent collections, and the Web site does not include resources dedicated to these formats.
The Formats Web site also provides guidance to those looking for assistance in finding tools to identify, read and validate specific formats. In coming years, services of this type will be provided by the Global Digital Format Registry, a project of Harvard University with support from OCLC ( Online Computer Library Center). The Library of Congress formats team regularly coordinates its work with that of the global registry.