Sustainability of Digital Formats
 Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

XML (Extensible Markup Language)

>> Back
Table of Contents
Identification and description
Local use
Sustainability factors
Quality and functionality factors (text)
File type signifiers
Notes
Format specifications
Useful references
Format Description Properties
• ID: fdd000075
• Short name: XML
• Content categories: text, generic
• Format category: file format, bitstream encoding
• Last significant update: 2004-09-16

Identification and description Explanation of format description terms

Full nameExtensible Markup Language (XML)
DescriptionExtensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). XML documents fall into two broad categories: data-centric and document-centric. Data-centric documents are those were XML is used as a data transport. They include sales orders, patient records, directory entries, and metadata records. One significant use of data-centric XML is for manifests of digital content; another is for metadata embedded into digital content files. Document-centric documents are those in which XML is used for its SGML-like capabilities, reflecting the structure of particular classes of documents, such as books with chapters, user manuals, newsfeeds and articles incorporating explicit metadata in addition to the text. An XML document's markup structure can be defined by a schema language and validated against a definition in that language. The most widely used schema languages are the Document Type Definition (DTD) language and XML Schema. Other schema languages exist, including RDF and RELAX-NG.
  Production phase  Can be used as initial, middle, or final-state format.
Relationship to other formats 
  Defined viaXML_DTD
  Defined viaXML_SCHEMA

Local use Explanation of format description terms

LC experience or existing holdingsUsed by LC to represent metadata records (including MARC bibliographic and authority records, MODS, METS) for web-compatible interchange, in particular using the Open Archives Initiative Protocol for Metadata Harvesting and SRW (Search/Retrieve Web Service).
LC preferenceMay be a preferred format for textual content, metadata records, or as a wrapper format for complex digital objects if conformant to an appropriate standard or agreed DTD or schema that can be used for technical validation. LC will express preferences based on specific DTDs or XML Schema specifications. LC will prefer XML that represents structure of documents rather than layout.

Sustainability factors Explanation of format description terms

DisclosureOpen standard. Developed by W3C (World Wide Web Consortium). To be useful for interoperability or long-term content preservation, an XML document must be associated with a schema specification for the elements and tags it contains. Such schema specifications (see XML_DTD and XML_SCHEMA) must also be disclosed.
  Documentation Extensible Markup Language (XML) 1.1. W3C Recommendation 04 February 2004
AdoptionVery widely adopted as the basis for interchange of documents and data over the Web. Many generic tools exist, including free and open source software. Major software vendors have all incorporated support for XML in some form.
  Licensing and patent claimsNone
TransparencyXML is human-readable and designed for straightforward automatic parsing. For the contents to be understood, a well-documented DTD, XML Schema, or other specification is needed. Human-comprehensible element tags are advantageous for transparency.
Self-documentationXML is widely used as a syntax for metadata, and metadata for all purposes can be embedded in XML documents with appropriate schema specifications.
External dependenciesNone
Technical protection considerationsNone

Quality and functionality factors (text)

Normal renderingXML can represent all UNICODE characters, with UTF-8 being the default character encoding. XML tagging offers potential for explicitly representing logical structure of text, such as paragraphs and headings, and character emphasis (bold, italics, etc.). Effective support for normal rendering is dependent on an appropriate DTD or schema specification.
Integrity of structureXML is ideal for representing document structure.
Integrity of layoutFor textual content, best practice is to have the XML represent the logical document structure and use stylesheets to render the text in a form appropriate for the end user.
Integrity of rendering of equations, etc.Requires specialized markup (e.g., MathML) and corresponding rendering engine. Scholars in many scientific disciplines are not satisfied with the performance of such rendering engines.
Beyond normal renderingDepends on particular DTD or schema specification.

File type signifiers Explanation of format description terms

Tag typeValueNote
Filename ExtensionxmlCommon practice for XML document instances is to use the .xml extension. The particular XML Schema or DTD should be declared within the document.

Notes Explanation of format description terms

General 
History"XML is primarily intended to meet the requirements of large-scale Web content providers for industry-specific markup, vendor-neutral data exchange, media-independent publishing, one-on-one marketing, workflow management in collaborative authoring environments, and the processing of Web documents by intelligent clients. It is also expected to find use in certain metadata applications. XML is fully internationalized for both European and Asian languages, with all conforming processors required to support the Unicode character set in both its UTF-8 and UTF-16 encodings. The language is designed for the quickest possible client-side processing consistent with its primary purpose as an electronic publishing and data interchange format." [from 1997-12-08 W3C press release]

Format specifications Explanation of format description terms

URLs
http://www.w3.org/TR/xml11/. Extensible Markup Language (XML) 1.1, W3C Recommendation 04 February 2004

Print

Useful references

URLs
http://www.w3.org/XML/

Print


Last updated Monday, 06-Mar-2006 07:38:53 EST