National Information Standards Organization

Digital Talking Book Standards Committee
Document Navigation Features List

Contents

Background

1. Basic Navigation

1.1 Basic Movement Through Text

1.2 More Sophisticated Movement

2. Fast Forward and Fast Reverse

3. Reading at Variable Speeds

4. Treatment of the Table of Contents

5. Navigation Control File

5.1 Moving Between the Navigation Control File and the Actual Book

6. Notes

7. Cross-Reference Access

8. Index Navigation

9. Bookmarks

10. Highlighting

11. Excerpt Capability

12. Searching

13. Spell-Out Capability

14. Text Attributes and Punctuation

15. Tables

16. Nested Lists

17. Text Elements

18. Skipping User-Selected Text Elements

19. Location Information

20. Summary and Reporting Information

21. Science and Mathematics

22. Other Kinds of Visual Representations

Background

A digital talking book (DTB) is envisioned to be, in its fullest implementation, (a) a group of digitally-encoded files containing an audio portion recorded in human speech; (b) the full text of the work in electronic form, marked with the tags of a descriptive markup language; and (c) a linking file that synchronizes the text and audio portions. It may also include images. As this document illustrates, such a structure will offer the DTB user a broad range of capabilities not possible with current cassette-based talking books.

The National Information Standards Organization (NISO) is a nonprofit association accredited as a standards developer by the American National Standards Institute. NISO Standards Committee AQ was formed to develop an American national standard for a digital talking book for blind, visually impaired, physically-handicapped, and otherwise print-disabled readers. Consequently, the Committee has created a comprehensive list of the features and functions envisioned for the most complex book accessed through the most sophisticated playback device conceived for digital talking books. This document presents that list which is deliberately comprehensive to ensure that the underlying file structure being developed by the committee will be able to support every conceivable feature.

In all likelihood, the most sophisticated playback device for a DTB will be a personal computer running special software. A device equipped in this way will not only be able to play the audio portion of a DTB, it will also have the capability of displaying text files in appropriate font sizes, and images. However, we recognize that most DTB users will, in fact, want a far simpler method of reading talking books. At least three levels of device are envisioned: (a) a very simple unit suitable for users who primarily listen to books or magazines straight through; (b) a more complex, but still portable, device with a user interface that allows sophisticated navigation through a document; and (c) software running on a PC, as mentioned above.

In developing this document, we envisioned that most digital talking books will contain an audio file of recorded human speech. Some will also include the full text of the book in electronic form; the proportion that does will depend mostly on the costs associated with acquiring and marking up text files. Some number will consist only of the marked-up text. Users will be able to access such text files through synthetic speech, screen magnification systems, or braille displays. However, in this document, unless otherwise specified, when mention is made of "reading" or "hearing," the reference is to listening to the sound track of recorded human speech.

Also note that in the following discussion, the term "book" includes a wide variety of documents besides books themselves, e.g., magazines, journals, reference works, etc.

(Return to Contents)

1. Basic Navigation

Many of the navigation features that should be available in a digital talking book of the advanced variety will of necessity correspond to the navigation features available in today's personal computers. Blind people who are sophisticated users of screen-access technology, word processors, or book-reading software have already been exposed to many of the navigation features discussed here. Moreover, for purposes of discussion, it is assumed that users of the advanced digital talking book text navigation features possess a high degree of technological sophistication.

(Return to Contents)

1.1 Basic Movement Through Text

The advanced digital talking book should provide users with the capability to move through text one character, word, line, sentence, paragraph, or page (corresponding to the printed page, if present) at a time. In addition, the user should be able to jump to a specific page in the book (e.g., go to printed page 55) and any specific line or paragraph on that page.

A user should be able to read the entire publication--from beginning to end--without having to jump up and down a hierarchical tree structure (e.g., moving in and out of the Table of Contents to go to the next chapter).

A user should also be able to move back and forth through the book by time segments specified by the user. For example, a user might choose to move ten seconds or five minutes in each jump.

(Return to Contents)

1.2 More Sophisticated Movement

The user should be able to jump to specific chapters, sections, headings, and other segments of the digital talking book, for example, "Go to next chapter," "Go to next subheading," "Go to next section," "Go to Chapter 5, Section 1," etc. This feature may be linked to a hierarchical, collapsible "Navigation Control File" (discussed later), but then again, the user should have the choice of jumping to a specific part of the book if its number or title is already known.

(Return to Contents)

2. Fast Forward and Fast Reverse

A simple tape-recorder-type navigation feature (cue and review function) would be useful for DTBs. For example, a slider-like control or pushbuttons would allow a user to fast-forward or fast-reverse through a book at high speed. As text is traversed, speech could be generated at a high speed using some form of time scale modification. In the process, readers can pick up clues about the structure of the text that is passing. For example, lists can be detected as a series of short, staccato bursts. Paragraphs, chapter headings, etc. could be indicated by strategically-generated tones. This gives users the choice of zipping forward or backward through the book rather than typing commands to accomplish the same tasks. For some individuals, this interface would be much simpler to use. It might also be much more efficient in a document that is long and does not have particularly good titling or sectioning.

An alternative method of allowing the user to skim a document would be to have the playback device read the types of text elements that are passed. For example, the user might hear, "part, chapter, section, paragraph, paragraph,...; section, paragraph, paragraph,...; table, paragraph, paragraph,...; sidebar, etc."

It is recommended that the fast forward and reverse feature allow the book to be traversed anywhere from 10-25 times the normal or real-time reading speed.

(Return to Contents)

3. Reading at Variable Speeds

It should be possible to read the digital talking book at speeds that are faster than, or slower than the normal listening rate. This variable speed feature is necessary to enable playback at a speed that is comfortable and efficient for a wide range of readers. Three times faster than the normal "real-time" rate should be possible, and the slowest speed should be around one-third the real-time reading rate.

The device should offer "Time-Scale Modification" (TSM), that is, the capability to maintain constant pitch while the playback speed is varied. This feature should be optional, however, so that the user can choose to have the pitch change as the playback speed changes. The TSM system should not produce audible chopping, burble, or reverberation and should not skip over significant units of sound at high playback speeds.

(Return to Contents)

4. Treatment of the Table of Contents

Most, if not all, books are supplied with what we traditionally call a table of contents. In printed works, the table of contents represents the primary way for the reader to locate specific parts of the book. In a digital talking book, this function can be performed by both the table of contents and another tool called the Navigation Control File for XML applications (NCX), with the NCX offering significantly more capabilities (see next section). If the original book contains a table of contents per se, it should be presented in the digital talking book exactly as it is in the printed work. When focusing in on the table of contents, the reader would hear a list of headings with associated page numbers (if present in the original work). From each heading representing a specific segment of the book, it should be possible to jump directly to the segment and from the segment back to the table of contents. In addition, the user should be able to use the function described in Section 16, "Nested Lists" to determine at which level within the table of contents a given heading falls.

(Return to Contents)

5. Navigation Control File

The digital talking book should have incorporated into its design a Navigation Control File (NCX), which allows the user to easily obtain an overview of the material in the book while, at the same time, providing a convenient means for navigating through the book. This NCX should appear to the user to be a dynamic outline that can be collapsed or expanded easily.

The structure represented in the NCX should be the structure of the book as defined by the author. The table of contents prepared by the author can serve as a means of determining the basic structure. Additional levels of information can then be added based upon the headings and hierarchy provided by the author in the book itself (which usually goes beyond that reflected in the author-supplied table of contents). Talking-book producers should not reorganize or restructure the book but instead use the NCX as a means of enhancing the structure already defined by the author.

The most detailed level of the NCX should incorporate all of the components of the book including:

The user should be able to decide on the level of granularity to be used to examine the book with the NCX. Take as an example a book divided into chapters, sections, subsections, etc. Some users may want to read every single heading listed in the NCX regardless of its level while others may be interested in reading only chapter titles and hearing the titles of items contained within a specific chapter only when focusing in on that chapter. The digital talking book should provide users with these capabilities.

The NCX is also an ideal place to list footnote references. The user should be able to read the actual footnote when, while moving through the NCX, its reference is spoken.

The user should have the choice of whether to read the NCX in a circular (top-to-bottom-to-top) fashion or in a single-pass (top-to-bottom only) fashion.

In addition to any labeling that may have been placed in the book by the original author, there should be additional standardized labeling available in the NCX to aid in the determination of the heading level being examined (see Section 16, "Nested Lists").

(Return to Contents)

5.1 Moving Between the Navigation Control File and the Actual Book

While the user is examining the Navigation Control File, it should be possible to jump immediately to the beginning of the section, chapter, or heading whose title is being spoken. Once the actual text of the desired item has been read, the user should be able to either continue to the next item or return to the NCX. In so doing, the user should be able to choose where to end up within the NCX. One option would be to return to the place in the NCX that was last read--in other words, the jumping- off point. Another option would allow the user to return to the NCX at a spot that corresponds logically to the location of the text that the user was just reading.

(Return to Contents)

6. Notes

If the digital talking book contains notes, e.g., footnotes or endnotes, there should be flexibility in how they are presented as the book is read. The user should be able to choose to hear the note references and the notes, the note references only, or neither. If the user has chosen to hear only the note references, he or she should be able to override the current setting and hear a given note.

At any time during the reading of a note, the user should be able to return to the point in the text immediately following the point of departure. For example, if the note were read at the end of the sentence, the user interrupting the reading of the note should be returned to the beginning of the next sentence.

If after reading a passage without listening to the notes contained within it, the user wishes to hear the notes and their context, he or she should be able to go to each note reference, back up a short distance, and listen to that portion of the text and to the note.

(Return to Contents)

7. Cross-Reference Access

All cross-references in the digital talking book should be set up as hypertext links, that is, links that, when triggered by the user, move the user immediately to the target location. An example of a cross-reference is "For additional information, see Chapter 5."

The user should have the option of being notified via an audible signal when a cross-reference is encountered while reading a book. If the user chooses to be signalled, further options should include "enable" (default), "disable," and a choice among several audible indicators. The user should also be able to determine whether a given link is to an internal or external target, since the decision to follow a link may depend on the target's location. If the playback device is connected to the Internet or other network, the user should be able to follow external links.

When the user prompts the playback device to follow a link, the device should launch the nearest previous link.

Having followed a cross-reference to a target item, the user then should be able to return from that item to the cross-reference source. The target item itself may contain cross-referenced text which points to another chapter, topic, appendix, or table; and the user should be able to follow this cross-reference as well. It should also be possible for the user to retrace the cross-reference path back to the original cross-referenced text.

Consider the following example. A paragraph describing the steps necessary to install a word processing software package talks about creating a DOS batch file. The text "create a batch file" is highlighted and linked to an appendix at the end of the book called "DOS Batch File Creation and Syntax." In the appendix, a reference is made to loading a program into high memory. The phrase "high memory" links to another chapter in the book called "Use of High Memory." From there, the user should be able to trace back to the appendix and then back to the installation instructions for the word processing software.

(Return to Contents)

8. Index Navigation

If the book has an index, the user should be able to jump directly from an index entry to the top of the page where the index term occurs, and then back to the index. This differs from a cross-reference or hypertext link in that the link from the index is to the top of a page and not to any specific text.

The index of the digital talking book can be conceived of as a simple text page with each page reference in the index acting as a hypertext link to the top of a print page in the book. Where multiple pages are given for a word or phrase, each of the individual page references would be a hypertext link pointing to the top of the referenced page. Where a page range is given for a word or phrase, the hypertext link would point to the top of the first
page in the range.

(Return to Contents)

9. Bookmarks

The user should be able to set a large number of bookmarks within the book. These should not actually be inserted into the book itself but saved in a separate file that would be synchronized with the book. This file should be capable of being exported and used on other compatible devices.

Each bookmark should be capable of being tagged by a text or voice label the user can search for. For example, the user might want to locate a bookmark containing the label "Scientific Discoveries," which is either a text label or one recorded with the user's voice. There should be enough storage available for labeling so that the bookmark can be used for annotation purposes. The user should be able to browse through all bookmarks that may have been set for the book, regardless of any label that may be associated with a particular bookmark, and jump to any of them. The user should be able to assign the same label to multiple bookmarks to create a set of related bookmarks. A separate set of bookmarks is maintained for each book being read.

Any time a user stops playback, an unlabeled bookmark should be automatically placed at that point. The user can choose to disable this feature.

(Return to Contents)

10. Highlighting

It should be possible to highlight portions of the digital talking book, assigning similar or different characteristics (labels) to each section highlighted. The user should be able to create text- or voice-input labels. For example, the designator "Professor Smith's ideas" could be used to designate highlighted passages of text which a student thinks are important for an examination to be given by Professor Smith. Another category, "Final Exam Notes" could be used to designate highlighted passages of text which a student might want to review in preparation for the final exam.

As with bookmarks, it should be possible for a user to browse through all highlighted text, regardless of the designator used to identify the highlighting; and there should be a sufficient amount of storage for identifiers to enable the user to include notes to be associated with the highlighted text.

The highlighting feature should enable a user to jump to a specific highlight, identified with a text or voice label, or to review a list of all highlighted text items. Also, as a user is reading along in the book, the DTB should indicate when highlighted text (provided by the author or inserted by the reader) is encountered. The user should be able to learn what label (if any) has been attached to the highlighted text as well.

Like bookmarks, information about text highlighted by the user should be stored in a file that is separate from the actual book itself. The difference between highlighting and bookmarking is that the former lays down two
markers, one each at the beginning and end of a segment, while the latter lays down only a single marker.

(Return to Contents)

11. Excerpt Capability

If an individual is doing any type of research or just making notes, he or she will find it useful to be able to copy a portion of text from a digital talking book and paste it into another location. Some type of mechanism that would allow this capability, within the bounds of copyright law, should be provided.

(Return to Contents)

12. Searching

It should be possible to search the book for specific text strings. It should also be possible to search the book and move to specific structural and other tags. For example, the user should be able to search for and jump to all of the pictures in the book (assuming that they are described), all of the sidebars, and all of the footnotes, or any other text or structural element in the book (e.g., search for headings of level 1 type, search for ordered lists, search for unordered lists, jump to next unordered list, jump to next ordered list).

(Return to Contents)

13. Spell-Out Capability

It should be possible to have individual words spelled. If the user is listening to a digital recording of human speech as opposed to the text file rendered in synthesized speech, a mechanism must be in place to synchronize the speech with the text file so that the user can ask for a word to be spelled as soon as it is spoken.

(Return to Contents)

14. Text Attributes and Punctuation

The user needs to be able to know when and where bolded, italicized, or otherwise emphasized text and such elements as subscripts and superscripts occur within the digital talking book. Any feature that provides this information should be one that the user can easily turn on and off. Character sets supporting international characters (e.g., double-byte codes used in Asian languages or extended ASCII characters) should be accommodated. The user should also have the ability to identify text attributes such as background and foreground color, font type and size.

When listening to a synthesized speech rendering of the text file, the user should also be able to control how much punctuation is spoken. For some publications, it is important to hear every single comma, period, space, or exclamation point. For others, the user may want to hear only full words spoken.

(Return to Contents)

15. Tables

It should be possible for the user to choose a variety of ways to read a table whether reading the text file of a digital talking book via synthetic speech, large print, or braille, or listening to the human speech recording. Possible ways to read a table include:

1. Reading the table one row at a time. The user would hear the row heading and the contents of the row. Optionally, column headings could be spoken before reading any item in the row, or spoken only when reading items in the first row.

2. Reading the table one column at a time. The user would hear the column heading and the contents of the column. Optionally, row headings could be spoken before reading any item in the column or spoken only when reading items in the first column.

3. Locating a given cell in a table and hearing the value of the item in the cell. This might be accomplished by allowing the user to traverse the list of row or column headings until the desired row or column is found, then to traverse the table in the other direction, hearing headings until the appropriate cell is found.

(Return to Contents)

16. Nested Lists

When an item in a list contains its own secondary list of items, that second list is said to be "nested" inside the main list. Users should be able to invoke a function that assists with the comprehension of the layout of nested lists. The user needs to be able to determine at what level within a nested list a given item falls. One approach would be to apply a numbering scheme such as that used for legal documents (e.g., 4.2.3, 4.2.3.6) that will tell the user precisely where in a list the current item falls.

(Return to Contents)

17. Text Elements

Any book, whether electronic or paper, contains a variety of elements--some physical and some logical. The page is an example of a physical element in a book written on paper, whereas a chapter heading is an example of a logical element which is similar in character to any other piece of text in the book but which has been given the logical designator "heading." In many cases, it is not as important to understand the physical format of a given piece of text as it is to understand the logical elements of which it is composed. In other situations, it is important to keep text in its original format. These principles have been recognized in the use of descriptive markup languages and other tagging schemes used in conjunction with electronic text.

Given that an electronic version of a book will likely contain markup tags (e.g., to designate headings, paragraphs, and other elements), it should be possible to search the book for specific tags. Where tags are not present--as in the case of pre-formatted text--there should be provision in the navigation system to give the reader information about layout and structure of text that may not be provided by tags.

(Return to Contents)

18. Skipping User-Selected Text Elements

The user should be able to instruct the playback device to skip over specified elements of the document such as picture captions, optional producer's notes, tables, sidebars, etc.

(Return to Contents)

19. Location Information

If a sighted person can obtain information on his or her current location within a printed document by an examination of the document, then that information should be available to the digital talking book user. Location information should be relative to the particular publication. Three types of location information suggested are:

1. Logical. Where are you with respect to the chapter, what chapter are you in, what section, subsection, etc.?

2. Physical. What page of the book are you in and on what line (where applicable)?

3. Temporal. How much time remains in the chapter and the book?

The user interface could be set up to provide more detailed information in response to more presses of a button. That is, the first press might elicit the title, the second the chapter, the third the page number, etc.

The user should have access to enough information to create meaningful footnote references when extracting information from the book.

(Return to Contents)

20. Summary and Reporting Information

A feature should be available that permits the user to obtain a quick overview of the book (e.g. 563 printed pages, 9 hours of play, 4 levels of headings, 2 parts, 12 chapters, 5 tables). Suggested information includes: title, author, playing time, number of major logical elements (parts, chapters), and if applicable, number of printed pages.

Reporting should include a dynamic summary--that is, if the user is in a numbered part of a book, the number of chapters contained within that part should be indicated. If the user is in a chapter, he or she might be able to learn the number of sections and maybe the number of pages.

(Return to Contents)

21. Science and Mathematics

While it has traditionally been difficult if not impossible to render scientific and mathematical information audibly in a meaningful way, and while the traditional ASCII character set as we know it creates difficulties for the encoding of such information in a digital form, it is nevertheless essential to consider the issue in terms of the digital talking book and text navigation. It is possible that in time, an approach to rendering math and science information audibly in a meaningful way will be developed. In designing the navigation software, therefore, provisions should be made to accommodate any such approach, should it become available.

(Return to Contents)

22. Other Kinds of Visual Representations

Other kinds of visual representations such as organizational charts, flow charts, family trees, etc. should be able to be treated using specialized presentation techniques, depending on the information available in the original document.

(Return to Contents)