ATTACHMENT 1
THE MICROFILM COLLECTIONS The information in this section is intended to provide an overview of the microfilm which is to be scanned and some problems which the films may present. The camera master negatives and printing masters (or duplicate negatives) which form the Library of Congress master microform collection are held in the Library's microform vaults. They are currently the custodial responsibility of the LC Photoduplication Service. Service copies (positive copies) are available to researchers in the public reading rooms.1.1 Terminology for Collection Formats
The following terminology is used in this solicitation to describe the collections formats of Library microfilm which will be scanned. The typical level of physical and bibliographic access available for the original materials, which may also appear on the microfilm, is also noted.
Manuscripts
Unique documents like letters or typed reports, typically cataloged by broad collection title, and typically organized by series, subseries, containers and folders. The original documents are physically housed in file folders which are then placed into containers or boxes. Access to these various levels and the individual documents on the microfilm is through printed or machine finding aids or indexes which are often filmed with the materials.
Monographs
Books and pamphlets, typically cataloged as separate entities. Access is through unique, separate monograph records, or sometimes through collections-level records. When available, catalog records often appear on the microfilm.
Serials
Periodicals and journals, including magazines, typically cataloged by title but not by issue or article. When microfilmed, detailed descriptions of the serials, often to the issue level, may be described in collation records or guides to contents of the microfilm. Typically, these appear on the microfilm. Of course, the microfilms may also include cumulative indexes prepared by the publishers of the serials.
1.2 Content of LC Microfilm
Capturing the text and other content of documents contained on the microfilm is the major objective of the NDL Program. However, there is considerable material filmed at the beginning, within, and at the end of a reel of microfilm which serves to inform the viewer of the contents of a reel or a collection, as well as any anomalies or irregularities in the original material which are also reproduced on the microfilm. The Library will provide guidelines regarding what explanatory material and targets are to be scanned for each
Explanatory material - - Preservation microfilm contains explanatory material at the beginning of a reel (termed head-of-reel information) and also at the end of a reel (termed end-of-reel information). The explanatory information at the end of the reel often repeats some of the frames which appear at the head of the reel. These materials include targets (when they can be read without magnification are termed eye-legible), and other information, overviews, guides to the contents of collections, narrative descriptions, bibliographic information, catalog records or cards, finding aids, copyright information, and other associated information which serves to inform the reader of the microfilm about the content, extent and sequencing of the entire collection or of the material contained on that reel. Generally speaking, head-of-reel and end-of-reel explanatory information will not be scanned.
Technical targets - - The frames at the head-of-the-reel also include technical targets or resolution targets which are filmed to provide a method to measure the resolution or the line pattern resolved on the film. Current practice requires that a resolution target also appear at the end of the reel. However, a review of some of the film being considered for scanning under the NDL Program shows that resolution targets were not routinely filmed.
NOTE: Technical targets may be important when the film receives a pre-scanning analysis, but scanned images of technical targets will not be a part of the final digital collection.
Irregularities targets - - Anomalies and irregularities in the original material which was filmed are noted by filming targets which identify the problem, for example, targets noting that material is missing, that the original is in poor or deteriorated condition, or if there are defects in the original.
1.3 Categories of Microfilm and Filenaming Structure
This solicitation presents five different categories or materials formats which are expected to cover the microfilm materials selected for scanning. These categories are identified not only by format but most importantly by the filenaming systems which have been devised for the resulting digital images. Section C.4. provides detailed information about the filenaming procedures and directory structure which the contractor is required to provide with the images.
A sample frame sequence for each of the five categories which includes head-of-reel and end-of-reel information is included on pages J-7 - J-11 of this attachment.
The categories are:
1.4 Special Problems--Variations in Filming Practice
As noted above, there is considerable variation in LC Photoduplication Service filming practices because procedures were revised or enhanced over the years and they also differ according to the collections format converted to film (monographs, serials, manuscripts). Examples of complex microfilm frames are illustrated on pages J-15 and J-16. Explanatory information appearing on the film and the technical specifications used in filming can vary from collection-to-collection and from reel-to-reel. However, there is also much consistency in the microfilm, if the original material was also consistent in size and content and when established filming practices were used.
Due to variance in the film, it is likely that adjustments will be required routinely at scanning so that the information in the microfilm frames can be successfully captured and the Library's filenaming requirements are accurately completed. It shall be essential for contractor staff to fully evaluate the selected microfilm and recognize the features and characteristics which are discussed in Section C.4.3. Some of the variations in filming practice, which need to be considered when scanning, include the following:
LIST OF SELECTED MICROFILM
TARGETS THAT CAN APPEAR ON LC FILMSTART
END
End of Reel/
Please Rewind
FILMED AS BOUND
Some Pages in the Original Contain Flaws and
Other Defects Which Appear on the Film
Blank Pages Not Filmed
REEL NO:
Volume(s) Missing
Page(s) Missing
Issue(s) Missing
Continued on next reel
Series No.
Title
Container
Best Copy Available
Material listed as missing, if located at a later time
may be added to the end of the reel.
There were in the original file some
pages containing mutilations and other
defects. These unavoidably constitute part
of the filmed file.
ATTACHMENT 2
RESEARCH USE OF LIBRARY OF CONGRESS IMAGE COLLECTIONS 2.1 Display-screen Viewing and Printed Output
The students and researchers who use Library of Congress collections online desire the ability to view the images on their computer display screens and to print copies, typically on a laser printer. Most students and researchers use current-generation color-capable display systems with resolutions of 1024 x 768 or 1280 x 1024 pixels; their printers are likely to be capable of printing at settings of 300 or 600 dpi.
For the foreseeable future, access to Library of Congress collections will be provided using software associated with the World Wide Web protocols for Internet.
Informal experiments by the Library of Congress suggest that the image type that works best for display may not be the type that works best for printing. Display systems often produce the greatest legibility (and thus the best results) with a grayscale image. But printers often do best with a bitonal image. When a grayscale image is printed it must be "halftoned" and this tends to break up small features like fine print.
Generally speaking, students and researchers using Library of Congress text-based (as opposed to pictorial) collections place greater importance on the printed output than on screen display. They do not always view a document page as an end in itself, but typically will use the information that they find in the documents when they write their own articles or reports. Although some researchers may "carry away the document" on a floppy disk, most will prefer to print it and carry away a sheet of paper for later reference.
In past paper-scanning projects, the preference for printed output over screen display has led the Library to favor bitonal images. More recent explorations, however, have shown that a laser printer's representation of a grayscale image can be very good. In one informal experiment, for example, some manuscript pages were scanned from the original paper at 150 and 300 dpi. Using graphic-arts software, the laser printer was set for 600 dpi output (which affected the way in which the halftoning occurred) and the resulting paper copy was very legible.
For this procurement, the Library seeks proposals to create images that will display and print successfully for researchers working in contexts like those described above, with the greatest emphasis placed on successful printing.
2.2 Scaling at Output Time and Capture Resolution for High-detail Content
The researchers who access Library of Congress collections via Internet employ a variety of software packages, ranging from modest freeware associated with World Wide Web browsers to sophisticated graphic arts software for image handling. With varying degrees of effectiveness, this software scales (changes the sizes of) the images at display and print time. As noted above, the Library findings thus far suggest that, for documents (as distinct from pictorial matter), printed output is of greater importance to users than screen display. Typically, a researcher's personal computer will have a laser printer as a peripheral device; the Library's digital images must be conveniently printed within such a system.
The Library recognizes the emergent state of software associated with the World Wide Web and it well aware of the shortage of available tools for certain image types, especially viewing and printing software appropriate for bitonal images, especially bitonal images with TIFF headers and CCITT group 4 compression. In fact, the Library is planning to make a special arrangement to offer viewing and printing software for this purpose to researchers who wish to use Library collections via the World Wide Web.
Although some researchers wishing to print document images may be limited to software like
that described in the preceding paragraph, many others will have additional graphic arts or other
software (not intended for use "within" the World Wide Web environment) capable of handling
raster-scanned images.
ATTACHMENT 3
IMAGE RESOLUTION AND IMAGE QUALITY3.1 The Analysis of Spatial Resolution
Various methods may be used to determine the appropriate levels of spatial resolution for digital images. A starting point, of course, will be a determination of the actual "delivered" resolution of the film itself.
The determination of the film's delivered resolution may result from an examination of resolution targets appearing on the film or other features that permit actual measurement. Since not many Library of Congress films produced during the period under discussion include images of resolution targets, in many cases, the contractor's analysis will have to be based on the creation and comparison of images produced in different ways and/or at different levels of scanning resolution. In such tests, for example, lines of representative text (that is lines of printed or written characters) may be scanned and examined to determine the actual level of film spatial resolution.
For reference, the Library offers this brief summary of informal tests carried out with a handful of Library microfilms during 1995. The films were scanned with a device that was reported to apply resolutions ranging from 1200 to 4000 dpi to the film. Both grayscale and bitonal images were produced. In an informal review of the resulting images, significant differences were observed when the 1200 and 2700 dpi images were compared; the differences between the 2700 and 4000 dpi image were insignificant or negligible. This informal finding suggests that many Library films may not contain recoverable data beyond the level of about 3000 dpi. For materials at a reduction ratio of 12:1, this suggests that the recoverable resolution in terms of the original document may be on the order of 250 dpi.
Based on the analysis performed on the sample films prepared for any given job under terms of this procurement, the contractor will recommend a course of action for the job to the Library. This course of action, of course, will take into account such factors as whether the delivered images will be grayscale (for which matching the spatial resolution of the film may be recommended) or bitonal (for which higher resolution may be recommended to compensate for the reduction of scanned data to one bit-per-pixel).
The analysis and recommendations will be reviewed by the Library' project leader and work will proceed after the leader gives his or her approval of the proposal.
3.2 Genuine, Interpolated, and Nominal Spatial Resolution
The reduction ratios for Library microfilms are not always known nor is it always possible to state the original dimensions of the documents on the microfilms. For this reason it will often be difficult to state the spatial resolution of the digital images in reference to the original documents.
It will however, be possible to state the resolution in terms of the film image itself, as suggested by the preceding section. The numerical value of this film-reference resolution, however, should not be recorded in image file headers or their equivalent if such recordation will cause printers to output "postage stamp" or other deviant-sized hard copy. The resolution in terms of the film shall be provided in the analytic documents created before a job begins and in the documentation that accompanies the digital images, e.g., in or in association with the scanning log.
When this film-reference resolution is given, it shall be stated in genuine terms, i.e., the actual optically achieved spatial resolution of the image. The numerical value shall not be based upon interpolation, i.e., the achievement of high levels resolution by the use of computer algorithms that "fill in" missing pixels.
The resolution stated in the delivered file headers or their equivalent shall be a nominal resolution that represents an approximation of the resolution as referenced to the original document. When the film's reduction ration or the original document size is known, then the nominal resolution shall be as accurately rendered as is practical. When the reduction ration and original document size is unknown, a reasonable estimate shall be provided. In every case, the nominal resolution shall be agreed upon during the analysis of the film that precedes a given job.
As noted in Section C.3.1.2, the digital image headers or their equivalent shall be such as to permit easy printing with a standard-type laser printer. If this means that it is inappropriate to place the nominal resolution value in such a header or equivalent, then the contractor shall report the nominal resolution in the scanning log or other report and place in the header the resolution value that will yield the desired outcome when printing the image.
3.3 Suppressing Print Through
To the degree possible, the Library desires images in which legibility of front-of-sheet writing is enhanced by the suppression of printing or other marks that may show through from the back of the sheet. This print-through is often of a lighter tone than the ink or data on the facing side of the sheet and, on the film, the human eye can "tune it out." However, in scanning--especially to produce a bitonal image--there is a severe risk that the threshold setting will render the lighter-tone print-through as black, i.e., at the same tonal value as the desired text. The resulting mix of desired characters and undesired "noise" degrades the legibility of the page.
The capture of legible images of handwritten documents may also be made difficult by show-through, presenting the same risk as described for printed matter. Handwritten documents also present the challenge of tonal information: some marks (e.g., ink) may be darker than others (e.g., pencil). If a bitonal image is produced, sophisticated thresholding is required to capture both light and dark marks. It is understood that a grayscale digital image will retain the tonal characteristics of the microfilm.
J.3.4 Suppressing Moire Pattern Interference for Printed-Halftones
The Library desires "clean/clear" reproductions of illustrative matter on the microfilm. It is understood that high-quality capture of illustrative material can be difficult when the source microfilms do not reproduce illustrations very well.
The production of clean digital images when the microfilm reproduces a printed halftones in a book can be especially difficult. The digital images may be marred by moire patterns, caused when the "frequency" of the original printed-halftone (resolution in lines per inch) encounters the implicit grid of the scanning device, with its own frequency (resolution in dots per inch).
Approaches that exist to solve this problem include at least the following: the use of a dithering
algorithm at scan time to "randomize" the implicit grid produced by the scanner, the use of
grayscale imaging (although this may only defer moire problems to the point of imagin printing),
the use of a de-screening and rescreening algorithm such as that employed by certain Xerox
scanners.
ATTACHMENT 4
IMAGE FILENAMES AND DELIVERY DIRECTORIES 4.1 Naming Files and Directories
The contractor shall assign a digital-image filename to each image captured as part of the initial image-capture process, and deliver these files to the Library in a certain arrangement of delivery directories and subdirectories, each containing no more than 300 files. Directory names and filenames shall conform to DOS naming conventions and, when alphabet letters are used, these shall be lower case.
The Library will specify what is called an identifier for the name of a delivery directory. An identifier is the prefix or left-side (right-truncated) portion of a name that may contain as many as eight characters. (See section C.4.1)
4.2 Naming files and directories: Five Structures
Different collections will require different structures for assigning filenames and naming directories. The Library identifies the five structures listed below for this contract.
4.3 FILENAME/DIRECTORY STRUCTURE 1: NUMBERED DOCUMENT STRUCTURE
Generally speaking, the numbered document structure applies to certain manuscript collections, e.g., the presidential papers collections. Every leaf (an individual sheet of paper) received a sequential number when the collections were processed in the 1930s, 1940s, and 1950s. The number was stamped on the leaf with a rubber stamp and called a leaf number. In some cases, the documents were then mounted on larger sheets in bound volumes and the leaf number (still on the document proper) is then called a mounting number.
Many leaves have writing on both sides; a leaf may thus "contain" two pages. On the microfilms of manuscript collections, the back side has been filmed if it contains a marking of any kind. The back side of a leaf, of course, always appears on the film following the front side. The rubber-stamped leaf number, however, does not appear on the back side.
The contractor shall assign filenames based upon the leaf or mounting number. Depending upon the collection, this number may reach six digits, e.g., 140862. The filename consists of the mounting number, with leading zeros added as needed to create a six-digit expression. In addition, the letter a is added to the six-digit expression to indicate that this image reproduces the front or numbered side of the leaf. For example, the image of the front of leaf 435 shall be assigned the filename 000435a.jpg (or 000435a.tif); the image of the front of leaf 140826 shall be assigned 140862a.jpg (or 140862a.tif).
Digital images shall be created for all back sides that appear on the film. These shall receive the same number as the front, with the substitution of the letter b for the letter a as the seventh character in the filename. For example, if the microfilm contains images for the front and back of leaf number 435, the two images shall be assigned the filenames 000435a.jpg (or 000435a.tif) and 000435b.jpg (or 000435b.tif).
At the time that a job for a particular numbered-document collection is assigned, the Library will provide written instructions, a copy of the finding aid (in print and/or in machine-readable form), and will also mark sample reels to show typical patterns for head-of-reel information and similar features.
Manuscript collections with leaf numbers are typically numbered consecutively throughout. Sets of pages (images) not to exceed 200 (100 leaves) shall be grouped in directories for delivery. Thus the name assigned to each directory will indicate the leaf numbers included within. The quantity limit has been set to facilitate ease of handling at the Library. The table that follows outlines a directory structure for the Lincoln Papers, to be created by the contractor when the images are delivered.
Directory name assigned by contractor | Images |
lp000000 Note: "lp" stands for "Lincoln Papers" |
All images for leaves through number 99 (e.g., 000001.tif through 000099.tif) |
lp000100 | Leaves 100-199 |
lp000200 | Leaves 200-299 |
continues as needed | |
lp045000 | Leaves 45,000-45,999 |
In this structure, missing, repeated, and unscannable pages or documents shall be recorded in the scanning log.
4.4 FILENAME/DIRECTORY STRUCTURE 2: UNNUMBERED DOCUMENTS IN FOLDER STRUCTURE
Generally speaking, this structure applies to certain manuscript collections, e.g., the Booker T. Washington Papers. The documents in these collections have been placed in separate file folders within certain logical elements: series and subseries. Each folder, series, and subseries represents units that cohere intellectually. In addition, the folders are stored in containers (boxes), in sequence. Each collection's organization, including a list of series, containers, and folders, is found in the collection's printed finding aid. The following table illustrates this form of organization:
Collection
Note: The folders are placed in containers (boxes), until each container is filled. Often, these containers are numbered. Thus, a researcher may identify materials within a collection logically, i.e., by series, subseries, and folder; or physically, by container number and folder name or number.
The Library's digital presentation of these collections will be organized in the logical structure, i.e., by series and folder. Researchers will access the collections by means of an online finding aid (based on the existing printed finding aid) to be produced by the Library. Although the containers do not represent logical units, their numbers may be employed in directory naming in order to retain the sequence of folders in the collection.
The Library will provide the contractor with a copy of the printed finding aid (or an equivalent list) at the time of scanning. If containers and folders have not been previously numbered, the Library will assign numbers by marking the finding aid or list. The marked-up finding aid or list will indicate the identifier for each series and folder. The Library will also indicate typical patterns for head-of-reel information and similar features.
For collections in this category, the contractor shall deliver the images in a combination of directories and subdirectories. The highest level delivery directories will represent collection series, with lower-level directories representing folders. The individual files will reproduce the document pages within the folders, as captured on the microfilm.
Within the folders, the images receive sequential numbers. Folders generally contain a few hundred pages. Since they never contain more than 10,000 leaves (and thus will not exceed 9,999 pages to be imaged) the Library requires the use of a four-digit number (including leading zeroes) for page-image naming. The contractor shall assign filenames sequentially within the folder, i.e., 0001, 0002, 0003, 0004, etcetera.
4.4.1 Recognizing and marking new documents
A requirement associated with folder-based manuscript collections is the recognition of "new documents." (See also section C.4.3.2) Documents in folders tend to be letters, reports, and other written or printed items. In order to aid future researchers, each image that represents the start of a new document shall be marked by adding the letter d to the filename in the last position before the filename extension, e.g., 0001d.jpg.
Recognizing new documents means observing that the next image represents the start of a letter (which may be indicated by letterhead, date, salutation, etc.) or the first page of a report (which may be indicated by a title, author's name, page 1, etc.). Miscellaneous pieces of paper (for example, scribbled notes, 3 x 5 slips, or groups of small items, etc.) shall also be treated as new documents.
4.4.2 Recognizing and marking missing, unscannable, or repeating pages
In the unnumbered document/file folder structure, missing or unscannable documents shall (1) be recorded in the scanning log and (2) the specific image numbers shall be left unassigned to permit the insertion of the missing images in the future. The Library requires 100 percent accuracy in the identification of missing and unscannable documents.
If any repeating images are identified by the contractor (50 percent accuracy requirement) and scanned, the image number shall not increment and the letter p shall be added to the filename for the second occurrence of that page. e.g., 0008p.jpg. As stated in section C.4.3 above, the contractor may choose not to scan any repeating images that are identified.
4.4.3 Example of unnumbered document/file folder collection
The table that follows offers an example of a directory and filename structure for a folder-based manuscript collection.
Finding Aid information | Identifier for directory provided by LC | Name assigned to directory by contractor | Identifier for folder provided by LC | Name assigned to folder sub-directory by contractor | Image filenames assigned by contractor |
Series: Correspondence | gpcor | gpcor001 (may be more directories if large series) | |||
Container: 816 Folder (no. 23): "Letters, January-March 1876" |
81623 | 81623 | |||
Image number 1, start of first document, feature recognized by contractor | 001d.jpg | ||||
Remaining pages of first document; image nos. 2-5 | 0002.jpg 0003.jpg 0004.jpg 0005.jpg | ||||
Image number 6, first page of new document, feature recognized by contractor | 0006d.jpg | ||||
Remaining pages of second document; image numbers 7-10. One page on the microfilm twice and scanned (contractor's option) | 0007.jpg 0008.jpg 0008p.jpg 0009.jpg 0010.jpg |
4.5 FILENAME/DIRECTORY STRUCTURE 3: BIBLIOGRAPHIC RECORD/PRINT-PAGE NUMBER STRUCTURE
This structure will generally be used for monographs (books and pamphlets). The Library will supply a list or a simple database for the group of monographs to be scanned. The key elements on this list or database will be:
Collection name
Example: Western Travel in Rare Books
Collection identifier
Example: wtrb
Book author and/or title
Example: White, Michael Claringbud. California all the way back to
1828.
LCCN: the Library of Congress catalog card number (a unique number)
Examples:ca4-14356 (stored in computer as 04014356)
Book identifier
Example: 014356
Text conversion yes/no
Example: No
Before scanning, the contractor shall verify that the next item on the microfilm is the correct one by examining the filmed target, filmed bibliographic record ("catalog card") if present, or monograph title page. (See pages J-37 and J-38 for a sample catalog card and record) This target and/or bibliographic record will provide the book's author, title, and LCCN. The filmed target and/or bibliographic record is to be scanned to permit the verification of the item during the Library's quality review process.
All of the images of book pages shall be delivered to the Library in a directory assigned the name of the book identifier, 014356 in the first example above.
The individual image filenames shall be assigned as outlined in the following two sections.
At the time that a task for a particular printed matter collection is assigned, the Library will provide written instructions, a copy of the finding aid or equivalent (in print and/or in machine-readable form), and will also identify typical patterns for head-of-reel information and similar features.
Specific instructions about scanning blank pages (pages with no marks of any kind) will be developed for each collection. In general, the rule will be to omit blank pages from the image set and numbering structure. When confronting two-page spreads, both sides would have to be devoid of marking before the omit-page rule took effect.
4.5.1 Structure 3A -- Filenames for book page images when printed page numbers are tracked
Overall name pattern: cccppppf
NOTE: ccc means image control number, pppp means print page number, and f means feature.
ccc Image control number
The first three digits are used to assign a set of sequential numbers to all of the images for the book. The ccc for the image of the target is 000. The first actual image from the book is assigned control number 001. This image may reproduce the book cover, title page, or page preceding the title page, depending upon the microfilm.
If the contractor encounters missing or unscannable film frames within a specific issue, the relevant control number shall be left unassigned to permit future capture and insertion of the image in the set. A note of this discovery shall also be made in the scanning log. If repeating film images are identified and scanned (contractor's option), the control number shall increment in the usual way and (as noted below) the repeat noted as a feature
pppp Printed page number
FOR SINGLE-PAGE IMAGE SETS. Digits four through seven carry the actual printed page number for the page reproduced. The page number is to be represented with leading zeros. The contractor must determine this number by examining the image itself.
If the printed number is arabic, then it is simply keyed in, with leading zeros.
If the number is roman, the lead digit (first of this set of four; fourth digit in the overall filename) shall be r, and the remaining three digits in this set (digits five, six, and seven in the overall filename) will represent the arabic translation of the roman number.
If there is no printed page number, then 0000 shall be keyed.
FOR TWO-PAGE-SPREAD IMAGE SETS. Digits four through seven carry the actual printed page number for the left-hand page of the pair reproduced.
If the printed number on the left is arabic, then it is simply keyed in, with leading zeros.
If the number on the left is roman, the lead digit (first of this set of four; fourth digit in the overall filename) shall be r, and the remaining three digits in this set (digits five, six, and seven in the overall filename) will represent the arabic translation of the roman number.
If there is no printed page number on the left, then 0000 shall be keyed.
f Feature
Digit eight indicates that the page or pages contain a special feature. The contractor must recognize the feature by examining the image itself. The abbreviations for the features to be indicated are as follows:
g Title Page (if the work has more than one title page, indicate the main title page if that can be easily determined; if not, indicate the first) n Table of Contents (if more than one page, indicate all pages) l List of Illustrations (if more than one page, indicate all pages) x Index (if more than one page, indicate all pages) p Repeating page image (see section C.4.3) y Irregularity target (see section C.4.3) s Segmented material (see Section C.4.3)
4.5.2 Structure 3B -- Filenames for book page images when printed page numbers are not tracked
Generally speaking, this approach will be used when text conversion is planned. The converted texts will include SGML markup that will indicate the relationship between image control numbers and printed page numbers thus making it unnecessary to capture this information in the filenames.
Overall name pattern: cccf
ccc Image control number
The first three digits are used to assign a set of serial numbers to all of the images for the book.
The ccc for the image of the target is 000. The first actual page image from the book is assigned 001.
If the contractor encounters missing or unscannable film frames, the relevant control number shall be left unassigned to permit future capture and insertion of the image in the set. If repeating film images are identified and scanned (contractor's option), the control number shall increment in the usual way and the repeat noted as a feature.
f Feature
g Title Page (if the work has more than one title page, indicate the main title page if that can be easily determined; if not, indicate the first)
n Table of Contents (if more than one page, indicate all pages)
l List of Illustrations (if more than one page, indicate all pages)
x Index (if more than one page, indicate all pages)
p Repeating page image
y Irregularity target
s Segmented material
4.6 FILENAME/DIRECTORY STRUCTURE 4: SERIAL STRUCTURE
This structure will be used for serials (e.g., periodicals, journals, magazines). The Library will supply a list or a simple database for the serials to be scanned. The key elements on this list or database will be:
Collection name
Example: Magazines for Children
Collection abbreviation
Example: mcgc
Serial title
Example: Wee Winkle Monthly
LCCN: the Library of Congress catalog card number (a unique number)
Example: 07-53986
ISSN: the International Standard Serial Number (a unique number)
Example: 45670923
Serial identifier
Example: 45670923
Issue enumeration
Example: January - December, 1918
Issue identifiers
Example: 191801, 191802, 191803, etc.
Cumulative index
Example: For 1918; appears at end of reel 14
Cumulative index identifier
Example: 1918in
Guide to Contents (Collation record identifier)
Example: mcgcgd
Before scanning, the contractor shall verify that the next item on the microfilm is the correct one by examining the "collation record" or "guide to contents of microfilm" that precedes the film frames at the head-of-the-reel that reproduce the pages of the serial. This collation record or guide, which may be more than one page long, shall also be scanned to permit the verification of the item during the Library's quality review process.
All of the subdirectories containing the images for each serial shall be delivered to the Library in a directory assigned the name of the serial identifier, 45670923 in the example above. New subdirectories shall be created for each issue, e.g., 191801, 191802, 191803, for collation records, e.g., 1918cl, and cumulative indexes, e.g. 1918in.
If the film includes a double-issue serial-cover frame (see section C.4.3), the contractor shall create two images of each double-issue serial-cover frame, and place one in each of the directories for the issues involved. A notation of the action shall be made in the scanning log.
Any missing issues (typically indicated by the presence of an irregularities target) will result in a gap in the series of issue-level subdirectories. Contractors shall note in the scanning log the presence of any missing-issue targets. This note shall include the specific identification of the missing issues, if provided in the target. If the missing issue is not identified on the target, then the contractor shall note the identification of the next issue actually appearing on the film.
At the time that a task for a particular serial collection is assigned, the Library will provide written instructions, a copy of the finding aid (in print and/or in machine-readable form), and will also indicate typical patterns for head-of-reel information and similar features.
4.6.1 Structure 4A -- Filenames for serial page images with tracking printed page numbers
The individual image filenames for actual serial page shall be assigned as follows. (Note that a separate requirement for the collation-record and cumulative index images is stated below.)
Overall name pattern: cccpppp
fccc Image control number
The first three digits are used to assign a set of serial numbers to all of the images for the issue of the serial. The first actual image is assigned 001.
If the contractor encounters missing or unscannable film frames within a specific issue, the relevant control number shall be left unassigned to permit future capture and insertion of the image in the set. A note of this discovery shall also be made in the scanning log. If repeating film images are identified and scanned (contractor's option), the control number shall increment in the usual way and (as noted below) the repeat noted as a feature. In the case of double-issue serial-cover frames (see notes above), two digital images are to be created. The image to be added to the "last issue" directory will receive the highest CCC image control number for that directory; the other copy of that same image will receive image control number 001 in the directory for the next issue.
pppp Printed page number
FOR SINGLE-PAGE IMAGE SETS. Digits four through seven carry the actual printed page number for the page reproduced. The page number is to be represented with leading zeros. The contractor must determine this number by examining the image itself. If the printed number is arabic, then it is simply keyed in, with leading zeros. If the number is roman, the lead digit (first of this set of four; fourth digit in the overall filename) shall be r, and the remaining three digits in this set (digits five, six, and seven in the overall filename) will represent the arabic translation of the roman number. If there is no printed page number, then 0000 shall be keyed. FOR TWO-PAGE-SPREAD IMAGE SETS. Digits four through seven carry the actual printed page number for the left-hand page of the pair reproduced. If the printed number on the left is arabic, then it is simply keyed in, with leading zeros. If the number on the left is roman, the lead digit (first of this set of four; fourth digit in the overall filename) shall be r, and the remaining three digits in this set (digits five, six, and seven in the overall filename) will represent the arabic translation of the roman number. If there is no printed page number on the left, then 0000 shall be keyed. Generally speaking, serial covers are not page numbered and most double-issue serial-cover frames will carry pppp data 0000.
f Feature and pair/spread indicator
Digit eight indicates that the page or pages contains a special feature. The contractor must recognize the feature by examining the image itself. The abbreviations for the features to be indicated are as follows:
c Cover (if the work has more than one cover, indicate the main cover if that can be easily determined; if not, indicate the first)
n Table of Contents (if more than one page, indicate all pages)
l List of Illustrations, if any
x Index, if any
p Repeating page
y Irregularity target
s Segmented material
4.6.2 Structure 4B -- Filenames for serial page images when printed page numbers are not tracked
The individual image filenames for actual serial page shall be assigned as follows. Note that a separate requirement for the collation-record and cumulative index images is stated below. Overall name pattern: cccf
ccc Image control number
The first three digits are used to assign a set of serial numbers to all of the images for the issue of the serial. The first actual image is assigned 001. If the contractor encounters missing or unscannable film frames within a specific issue, the relevant control number shall be left unassigned to permit future capture and insertion of the image in the set. A note of this discovery shall also be made in the scanning log. If repeating film images are identified and scanned (contractor's option), the control number shall increment in the usual way and (as noted below) the repeat noted as a feature. In the case of double-issue serial-cover frames (see notes above), two digital images are to be created. The image to be added to the "last issue" directory will receive the highest CCC image control number for that directory; the other copy of that same image will receive image control number 001 in the directory for the next issue.
f Feature and pair/spread indicator Digit eight indicates that the page or pages contains a special feature. The contractor must recognize the feature by examining the image itself. The abbreviations for the features to be indicated are as follows:
c Cover (if the work has more than one cover, indicate the main cover if that can be easily determined; if not, indicate the first)
n Table of Contents (if more than one page, indicate all pages)
l List of Illustrations, if any
x Index, if any
p Repeating page
y Irregularity target
s Segmented material
4.6.3 Structure 4C -- Filenames for collation records and/or cumulative indexes for serials
Overall name pattern: cccppppf
ccc Image control number
The first three digits are used to assign a set of serial numbers to all of the images in the index or collation. The first image is assigned 001. If the contractor encounters missing or unscannable film frames, the relevant control number shall be left unassigned to permit future capture and insertion of the image in the set. A note of this discovery shall also be made in the scanning log. If repeating film images are identified and scanned (contractor's option), the control number shall increment in the usual way and (as noted below) the repeat noted as a feature. pppp Printed page number Digits four through seven carry the actual printed page number for the page reproduced. The page number is to be represented with leading zeros. The contractor must determine this number by examining the image itself. If the printed number is arabic, then it is simply keyed in, with leading zeros. If the number is roman, the lead digit (first of this set of four; fourth digit in the overall filename) shall be r, and the remaining three digits in this set (digits five, six, and seven in the overall filename) will represent and arabic translation of the roman number. If there is no printed page number, then 0000 shall be keyed.
f Feature and pair/spread indicator
p Repeating page (see Section C.4.3)
4.7 FILENAME/DIRECTORY STRUCTURE 5: COPYRIGHT REGISTRATION AND TECHNICAL DOCUMENT NUMBER STRUCTURE
The copyright-registration-number/technical-document structure applies to two classes of material. First and foremost, it will be used for collections deposited at the Library in years past, as part of the copyright registration process and often left uncataloged. Some of these are printed matter, e.g., the nineteenth century sheet music collections, while others are manuscripts (including typescripts), e.g., the collection of unpublished early twentieth century plays. Second, it will be used for separate, short items like technical reports. These are typically offset-printed reports, many prepared for such agencies as the Department of Defense, running about 20-30 pages each.
Every document in the copyright collections received a registration number when the collections were copyrighted. The number is generally stamped on the cover or title page, often with a rubber stamp or written into a blank portion of a rubber stamp. In a few cases, the first part of the number is rendered in roman numeral and the latter part in arabic, e.g., the rubber stamp indicates registration number xxc 14, for registration number 8014. Technical reports also tend to have a unique number assigned by the agency that prepared them.
The contractor shall assign directory names based upon the registration or report number. Depending upon the collection, this number may reach five or more digits, e.g., 56872. The Library will provide a list of identifiers based on this number. One example is the Library's sheet music collection. For this collection, the identifier for the directory will consist of the copyright registration number, with added leading zeros sufficient to create a five-digit expression and prefixed with collection abbreviation, e.g., SM for the sheet music collection. Thus the directory for the sheet music items registered under the number 8692 shall be named sm08692.
At the time that a task for a particular collection is assigned, the Library will provide written instructions, a copy of the finding aid (in print and/or in machine-readable form), and will also identify typical patterns for head-of-reel information and similar features. The individual image filenames for actual page images shall be assigned as follows.
Copyright/tech report structure page-image name pattern: cccppppf
ccc Image control number
The first three digits are used to assign a set of serial numbers to all of the images for the item.
The first actual image is assigned 001. If the contractor encounters missing or unscannable
film frames, the relevant control number shall be left unassigned to permit future capture and
insertion of the image in the set. A note of this discovery shall also be made in the scanning
log. If repeating film images are identified and scanned (contractor's option), the control
number shall increment in the usual way and (as noted below) the repeat noted as a feature.
pppp Printed page number
Digits four through seven carry the actual printed page number for the page reproduced. The page number is to be represented with leading zeros. The contractor must determine this number by examining the image itself. If the printed number is arabic, then it is simply keyed in, with leading zeros. If the number is roman, the lead digit (first of this set of four; fourth digit in the overall filename) shall be r, and the remaining three digits in this set (digits five, six, and seven in the overall filename) will represent and arabic translation of the roman number. If there is no printed page number, then 0000 shall be keyed.
f Feature
c Cover (if the work has more than one cover, indicate the main cover if that can be easily determined; if not, indicate the first)
n Table of Contents (if more than one page, indicate all pages)
l List of Illustrations, if any
x Index, if any
p Repeating page
y Irregularity target
s Segment
ATTACHMENT 7
GENERAL REFERENCES
Library of Congress. Specifications for the Microfilming of Books and Pamphlets in the Library of Congress. Washington, D.C. 1973.
Library of Congress. Specifications for Microfilming Manuscripts. Washington, D.C. 1980.
Preservation Microfilming: A Guide for Librarians and Archivists. ed. by Nancy Gwinn. Chicago, American Library Association. 1987.
RLG Preservation Microfilming Handbook. ed. by Nancy E. Elkington. Mountain View, California, Research Libraries Group, 1992.
RLG Archives Microfilming Manual. ed. by Nancy E. Elkington. Mountain View, California,
Research Libraries Group. 1994.