Background Components ( Batch Manifest File , Attribute File ) Overview Examples ( Batch Manifest File , Attribute File ) Assumptions Batch Rework Iterations Definition Other Useful Information
LC collection material being included in the National Digital Library are batched for digitization and quality review processing. Depending upon the material involved, the digitization may be performed by either contractors or LC personnel. Regardless, the electronic files that are the product of digitization are submitted for LC quality review in batches analogous to those in which the collection items from which they were derived were processed. Currently (Spring 1998), such batches of digitized files are submitted for quality review on CD/ROM. However file transfer protocol (FTP) as well as other means may also be used. In order to keep track of what is actually being submitted, each batch of digitized files submitted for quality review will be accompanied by an electronic Batch Manifest.
The Batch Manifest identifies the digital files that comprise a digitized batch being delivered to the Library of Congress for quality review and incorporation into the LC digital repository. It also serves as the delivery vehicle for structural metadata attributes that were identified during each file's digitization process.
NOTE: | Field defaults indicated as "None" must be explicitly input. |
Field defaults indicated as "n/a" will be set to hash marks(#). |
Field | Field Name | Field ID | Length | Format | Default Setting | Required? |
---|---|---|---|---|---|---|
1 | Batch ID | BID | 8 | Character String | None | Yes |
2 | Batch Extension | EXT | 3 | Character String | Blanks | Yes |
3 | Aggregate Name | AGG | 8 | Character String | None | Yes |
4 | Upper Convenience Group | UCG | 30 | Character String | None | No |
5 | Item ID | ITEM | 10 | Character String | None | Yes |
6 | Lower Convenience Group | LCG | 30 | Character String | None | No |
7 | File Name | FN | 10 | Character String | None | Yes |
8 | File Extension | FXT | 4 | Character String | None | Yes |
9 | File Size | FSIZE | 9 | Number | None | Yes |
10 | File Size Unit | FSU | 2 | KB = Kilobytes
MB = Megabytes GB = Gigabytes TB = Terabytes PB = Petabytes |
Megabytes (MB) | Yes |
11 | Date Created | DATE | 8 | YYYYMMDD | Current date (e.g., 19981031) | Yes |
12 | Creator ID | CRE | 9 | See Creator Code Table | None | Yes |
13 | Operator ID | OPR | 4 | Character String | n/a | No |
14 | Equipment ID | EQU | 5 | Character String | None | Yes |
15 | Original Content Type | OCT | 4 | 0001 = Audio 0002 = Text 0003 = Motion Visual 0004 = Non-motion Visual |
None | Yes |
16 | Digitized Content Use | DCU | 2 | 01 = Archival 02 = Service 03 = Preview |
None | Yes |
17 | Description | DESC | 60 | Character String | None | Yes |
18 | Side Digitized | SIDE | 1 | 0 = Front 1 = Back 2 = Other | n/a | No |
19 | Presentation Sequence Number | PSEQ | 3 | Character String (right justified; leading blanks) | n/a | No |
20 | Grid Coordinate - Alphabetic (Column) | GCL | 3 | Alphabetic Character String (right justified; leading blanks) | n/a | No |
21 | Grid Coordinate - Numeric (Row) | GRW | 3 | Numeric Character String (right justified; leading zeros) | n/a | No |
22 | Detail | DTL | 3 | Lower case "x" followed by a "one-up count" number (right justified; leading blanks) | n/a | No |
23 | Printed Page Number | PPN | 20 | Character String (right justified; leading blanks) |
n/a | No |
24 | Printed Sheet Number | PSN | 25 | Character String (right justified; leading blanks) | n/a | No |
25 | Printed Plate Number | PPL | 25 | Character String (right justified; leading blanks) | n/a | No |
26 | Cover | CVR | 4 | 0001=Front-Cover-Outside 0002=Front-Cover-Inside 0003=Back-Cover-Outside 0004=Back-Cover-Inside 0005=Full-Cover-Outside 0006=Full-Cover-Inside 0000=Other |
n/a | No |
27 | Page Feature | PFEA | 4 | 0029 = Addendum
0023 = Advertisement(s) 0024 = Bibliography 0024 = Blank Page 0007 = Editorial Page 0008 = End Paper 0026 = Errata 0009 = Illustration List 0027 = Production Note 0028 = References 0016 = Table List 0017 = Table of Contents 0022 = Title Page |
n/a | No |
28 | Index | INDX | 4 | 0010 = Index
0013 = Author or Name Index 0011 = Comprehensive Index 0015 = Mixed Text and Illustration Index 0014 = Special Index 0012 = Subject Index |
n/a | No |
29 | Target | TRGT | 4 | 0018 = Content Target
0019 = Identifying Target 0020 = Irregularity Target 0021 = Scanning Target |
n/a | No |
30 | Scanning Orientation | ORI | 3 | 000 = 0 degrees
090 = 90 degrees 180 = 180 degrees 270 = 270 degrees |
n/a | No |
31 | Associated File Type | AFT | 3 | 001 = Catalog File
002 = Document Type Definition 003 = Entity File 004 = Entities 005 = SGML Catalog File 006 = Entityrc File 007 = Navigator File 008 = SGML Packet 009 = RealAudio Streaming File 010 = Style Sheet |
n/a | No |
32 | Associated File Name | AFN | 30 | Character String | n/a | No |
Additional metadata attribute fields may be included in the Batch Manifest File but are not required. The presence of such additional fields will vary according to the nature of the digitized material found in each batch.
This field is used to indicate the "rework" status of a batch of digitized objects. If the batch is being processed for the first time this field is empty. If the batch contains a first iteration of reworked digital objects (i.e., rescanned or otherwise corrected), this field contains the letters "rwk." Any subsequent rework iterations this field will contain the letters "rw" followed by the number of the rework iteration. If the batch is being completely replaced, the batch extension field contains the letters "red."
Examples:
sighh004 = Initial version of batch sighh004 sighh004rwk = First "rework" iteration of batch sighh004 sighh004rw1 = Second "rework" iteration of batch sighh004 sighh004rw2 = Third "rework" iteration of batch sighh004 : : : : : sighh004rwn = Nth "rework" iteration of batch sighh004 sighh004red = Redelivery of batch sighh004 in its entirety
Field Name | Length |
---|---|
ID of Batch Manifest File data field | 5 |
Name of Batch Manifest File data field | 25 |
Length of Batch Manifest File data field | 3 |
Examples of Attribute File and Batch Manifest File Data
BID Batch ID 008 AGG Aggregate Name 008 UCG Upper Convenience Group 030 ITEM Item ID 010 LCG Lower Convenience Group 030 FN File Name 010 FXT File Extension 004 FSIZE File Size 009 FSU File Size Unit 002 DATE Date Created 008 CRE Creator ID 009 OPR Operator ID 004 EQU Equipment ID 005 OCT Original Content Type 002 DCU Digitized Content Use 002 SIDE Side Digitized 001 PSEQ Sequence Number 003 PPN Printed Page Number 003 PSN Printed Sheet Number 025 PPL Printed Plate Number 025 CVR Cover 004 PFEA Page Feature 004 INDX Index 004 TRGT Target 004 ORI Orientation 003 AFT Associated File Type 015 AFN Associated File Name 030
Note: In this example, blanks for padding are represented by the "dollar" sign and every other field has been underlined for the sake of clarity.sighh004habshaer$$$$(30)$$$la1064$$$$$$$(30)$$$filenametif0000056MB199810310000007987612345000402Description(60 bytes)003358$XIX$$$####################000
The underlining in this sample is for illustrative purposes only. Do not include underlining in actual data!
Rework iterations of a batch will contain only the files of those digitized objects that have been rescanned or otherwise corrected. Objects that were correct in early iterations of the batch do not need to be resupplied by the vendor.
Digitized Content Use:
Indicates how a the digitized content carried in a file will be used.
Digitized content uses defined to date (02/98) include:
Archival | - | File contains the richest, most complete representation of the data content that is available. |
Service | - | File contains data content that will be used to respond to, or service, user requests for data presentation (e.g., display, or printing). Such data content usually is less than archival but greater than preview in its level of completeness and quality. |
Preview | - | File contains data content that will be used to provide a summary presentation of the data. Such data content usually is the smallest, shortest, briefest representation of the data that is available. Data content that is used for preview purposes includes thumbnail images, film clips, and sound bites. |
Grid Coordinates: Alpha-numeric grid coordinates identify the position of a digital object within a larger digital object where the larger object is a nonmotion visual item (e.g., raster map). The alphabetic coordinate indicates the vertical (column) position; the numeric indicates the horizontal (row) position. Both are relative to the upper left corner of the whole, which has coordinate values a:1. Note that if one coordinate is present, the other must be also.
Original Content Type: Indicates the general physical nature of the original collection item from which a portion of digitized content was produced. The purpose of the Original Content Type attribute is to facilitate digital object presentation. Original Content Type does not indicate how the digital object was produced (e.g., scanned, encoded), nor does it indicate the genre or subject content of the digital object.
Printed Page Number: The page number (printed or inscribed) that actually appears on the original content from which a portion of digital content was derived. This number is not necessarily the same as the number of the digital content in a presentation sequence. For example, a book may have a table of contents or introductory material such as a prologue that is numbered using roman numerals. Thus the page marked as page number 1 might actually be the ninth page in the sequence of pages. Simiarly, the table of contents might start on the ninth page of a book but bear the page number "ix."
Printed Plate Number: The number used on the original content from which a portion of digital content was derived to identify it as a particular plate in the original work. Plates are separately numbered; they do not form part of either the preliminary or the main sequences of pages or leaves in the original work.
Printed Sheet Number: The number used on the original content from which a portion of digital content was derived to identify it as a particular sheet in the original work. Sheets are separately numbered; they do not form part of either the preliminary or the main sequences of pages or leaves in the original work.
Created: 12/4/97 Revised: 10/5/99 |
National Digital Library Program - Digital Repository Development Project |
Comments: vvit@loc.gov |