Home Projects Publications Presentations Repositories Photo Gallery Career Staff Favorites
  • Turning The Pages Online
  • MyMorph
  • Medical Article Records GROUNDTRUTH (MARG)
  • MD on Tap
  • AnatQuest
Links to Feeds:
PublicationsRSS  RSS
CEB NewsRSS  RSS

Last updated: June 18, 2008

Staff Bibliography

Print this Print this  E-mail this E-mail this


Challenges in Providing Access to Digitized Xrays over the Internet


L. E. Berman
R. Long
G. R. Thoma
#2368-22
Proceedings of the 23rd AIPR Workshop
Oct. 12-14, 1994
Cosmos Club, Washington, D.C.


ABSTRACT

As part of a collaborative project with other government agencies, the National Library of Medicine (NLM) is engaged in the development of an electronic archive of digitized cervical and lumbar spine xrays taken in the course of nationwide health and nutrition examination surveys. One goal of the project is to provide access to the images via a client/server system specifically designed to enable radiologists located anywhere on the Internet to read them and enter their readings into a database at the server located at NLM.

Another key goal is to provide general (public) access to these images, the radiologists' readings, and other collateral data taken during the survey. The system developed for such general access is based on a public domain server, the World Wide Web (WWW), and NCSA Mosaic, a distributed hypermedia client system designed for information retrieval over the Internet. This paper describes the design of the client/server software, the storage environment for the x-ray archive, the user interface, the communications software, and the public access archive. Design issues include file format, image resolution (both spatial and contrast), compression alternatives, linking collateral data with images, and the role of staging and prefetching.

1. INTRODUCTION

The Lister Hill National Center for Biomedical Communications, a research and development division of the NLM, is actively engaged in projects involving the archiving of digitized radiographs, scanned journal articles and rare documents, and digitized video [see Table 1]. This work is conducted as part of ongoing research and development that supports the notion of the virtual desktop library (VDL), one of whose goals is the delivery of related data of different types from various sources. We envision VDL as providing the biomedical end user access to remotely located data sources with speed and relative ease.

Initial research in this area led to the development of the System for Automated Interlibrary Loan (SAIL)[1]. This program investigated the technical feasibility and role of automated document delivery to meet the requirements of the NLM's interlibrary loan service. This system is operated in a pilot test mode to assess performance and cost issues. SAIL proved the technical feasibility of automated document delivery by mail and fax. Research and development is continuing on a follow-up system called DocView[2], which allows remote end users access to stores of document images over the Internet and to display and manipulate the images on the desktop.

An archive of digitized xrays is being developed by the NLM, in collaboration with the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) and the National Center for Health Statistics (NCHS). The xrays in this archive constitute part of the data collected since 1971 by the National Health and Nutrition Examination Surveys (NHANES I, II, and III) which are conducted periodically by the NCHS. These surveys help to assess the nation's health by collecting biomedical and demographic data on a representative sampling of the United States population.

Table 1. NLM/LHNCBC Imaging R&D Projects
Project Storage Class Data file
Format
Storage
Requirement
Through-put
Needed
Storage
Media
DAVE Video MPEG I & II
motion JPEG
TBD[a] 230KB/s[b] optical/
magnetic
DocView Journal
Articles
TIFF, JBIG TBD variable optical/
tape
DXPNET Digitized
Xrays
GIF
JPEG, Flat
145GB[c] variable optical/
magnetic/
tape
HMD Rare
Manuscripts
GIF, JPEG TBD variable optical
NHANES
Collateral
Data
Relational
Database
alphanumeric < 200MB variable magnetic

  • a. To be determined.
  • b. Based on an image size of 320x240x8bits at 30 frames/s with 10:1 compression.
  • c. This includes NHANES II digitized images only.

1.1 Why archive the xrays?

Traditionally the NHANES demographic and biomedical data ("collateral data") has been used for epidemiological research. Questions related to arthritis and musculoskeletal diseases[3], the association of weight, race, and occupation on osteoarthritis of the knee[4], prevalence of scoliosis[5], breast cancer[6], heart disease[7], and other health questions[8] have been addressed. Data to conduct studies such as these are available from NCHS in the form of public use data tapes[9].

The x-ray films in the NHANES data sets are of the cervical and lumbar spine (NHANES II), PA and lateral chest (NHANES I), hands and feet (NHANES I, III), hips (NHANES I), and sacroiliac region (NHANES I)[9]. Conducting research studies with any of these film sets is difficult for a variety of reasons. First, the logistics for obtaining the x-ray films can be overwhelming. Other issues such as shipping and receiving, loss, degradation due to environmental conditions, and theft all combine to discourage wide access to these valuable data resources, so that the films have been borrowed from the NCHS record center only nine times since 1974[10],[11]. In contrast, a recent search through MEDLINE returned over 800 citations corresponding to studies using the NHANES collateral data[11]. If the film were digitized, these images could be made readily available for epidemiological studies; establish population norms; develop radiographic atlases; train radiologists, rheumatologists and orthopedists in uniform reading of x-ray images with the use of a standardized radiographic atlas; conduct research in image processing, feature classification, and image database management; and education of medical students and x-ray technicians[12].

2. Archive Development

The development of a public access x-ray archive consists of several steps. First, the digitized images must undergo a rigorous quality control procedure to ensure the accuracy of film to digital image conversion and removal of any information identifying the subject. In parallel with this effort a prototype system level architecture must be developed that maps the presumed needs of a diverse user community to different levels of a storage hierarchy. The underlying assumption is that a majority of the users will be viewing thumbnail images and low resolution representations of the original image. Their needs must be met with rapid response, while a presumed smaller audience interested in the raw data will incur longer response times for image retrieval. Third, the collateral data must be indexed appropriately to respond to likely user requests in a timely manner. Linking the collateral data to the images is important for focusing in on specific research issues and reducing the data flow from the archive to the remote user.

2.1 Digitization and Quality Control

The NHANES II radiographs are being digitized by the University of California at San Francisco and the Radix Corporation. All radiographs have been digitized on either a Lumisys 100 or 150 laser spot scanner, with a spot size of 175 microns. The cervical and lumbar spine images have a resolution of 1463x1755x12 bits (5 MBytes) and 2048x2487x12bits (10 MBytes), respectively. After each image has passed a three-tiered quality control procedure, the data is stored on erasable optical disk and is ready for inclusion in the archives optical jukebox.

Quality control (QC) of the NHANES II x-ray images consists of three independent stages. The QC done at each stage is as follows. Stage 1 is done by a trained computer operator and laser scanner technician. The following operations are performed:

Stage 1 (UCSF/Radix)

  • re-calibrate laser scanner with each scan
  • clean optics every 2-3 months
  • use step-wedge films to check scanner calibration every 2-3 months
  • clean pinch rollers regularly
  • visibly check each image for general contrast, image alignment, and for removal of identification tags

Stage 2 is done by a non-medical person trained to filter out images that do not meet the following criteria:

Stage 2 (NCHS)

  • inspect each image to ensure that identification tags are not visible
  • check for sufficient contrast
  • check for correct image orientation

Stage 3 is being done at NLM by a trained physician under contract to NCHS. The physician answers the following questions concerning each image:

Stage 3 (NLM)

  • is the digitized image acceptable?
  • is the digitized image worse, same, or better than radiograph?
  • would a reader be able to detect and score the extent of osteophytes, subluxation, sclerosis, or disk space narrowing?

If an image is rejected at stage 1, the radiograph is re-digitized. Rejection at stage two or three eliminates the image from the archive, although such images might prove useful for future work in automating quality control. A separate database to archive these rejected images might be developed.

2.2 Storage Architecture

The NCHS releases NHANES collateral data primarily through its own publications and through a series of 9-track public-use data tapes[13]. From the user's viewpoint these tapes have drawbacks; they are difficult to work with due to the media and methods necessary to extract data, are not easily accessible to large numbers of people, and do not contain any of the x-ray images. Also, the logistics for acquiring the radiographs can be overwhelming due to the size of the collection and the physical storage space required. In contrast, we are developing a non-migratory hierarchical storage management (NHSM) data delivery system that will map user required response time and image quality (raw, lossy compressed, reduced resolution) to appropriate levels of the hierarchy. NHSM differs from the common notion of migratory hierarchical storage management[14] since increasing age or decreasing demand are not used as criteria to send data "downstream" to lower levels of the hierarchy.

There are various application dependent trade-offs to be considered in system implementation. Possible users include epidemiologists, electrical engineers, computer scientists, and statisticians interested in the raw uncompressed data which translates to longer transmission time and potentially large numbers of images. Medical students, technicians, K-12 students interested in anatomy, and commercial businesses might need real-time access which would suggest high quality compressed lossy images. The NHSM addresses these diverse needs with two levels of data storage in which progressively more data is stored and access time is increased as the levels are descended [see figure 1]. Level 0 of the NHSM system consists of a SUN SPARCStorage unit, a RAID system. This unit contains 18 1.2 GByte SCSI-2 hard-drives, six independent fast buffered SCSI-2 buses, and is connected to an Sbus card hosted in a SPARC 20 model 612 via a 25MByte/s fiber channel connector (upgradeable to a 100 MByte/s), under Solaris 2.3.

Level 1, consists of an optical jukebox with 144 5-1/4" erasable optical platters and a stand-alone optical drive. Each platter side maintains a unix file system and has a capacity of 283 MBytes for a total jukebox capacity of 82 GBytes. The jukebox has one picker arm and four optical disk drives capable of delivering data at 700 KBytes/s. A SUN 670MP running Solaris 2.3 serves as the jukebox host and a fast-buffered SCSI 2 Sbus host adapter card connects the jukebox. Near-line storage for any platter not contained in the jukebox is accessible by using the stand-alone optical drive. We have procured the equipment for levels 0 and 1 of the NHSM.

Figure 1. Non-migratory hierarchical storage management system for digital xrays.

Data delivered from the archive to a remote user is dependent upon the following factors:

  • image resolution required (contrast and spatial)
  • number of images returned from a query
  • Internet traffic
  • number of user's trying to access archive (server load)
  • throughput of archive devices

2.3 Image Data Sets

At Level 0 of the NHSM, four related but unique sets of images will be found. Sets one and two contain a thumbnail representation of each cervical and lumbar spine x-ray image, totalling 287 MBytes of storage [refer to Table 2]. Sets three and four also contain each cervical and lumbar spine xray spatially reduced (both horizontally and vertically) by a factor of two from the original image and will be compressed using JPEG at 10:1, generating 1.84 GBytes of disk space. The images have been horizontally and vertically reduced by two and compressed with a lossless algorithm because information loss is acceptable for the intended use of these sets, this resolution can be viewed on most display devices without too much scroll in the graphical user interface, and it will help speed the delivery of the data. Consider that if the transmission rate for an Internet connection were as low as 8 KBytes/s, a remote user would still receive one of these cervical or lumbar spine images in 8 or 16 seconds respectively. Included in each compressed image in sets three and four will be an identifying mark in the upper right hand corner to insure that the users will recognize these images as lossy representations of the original. JPEG has been selected because it is a well recognized standard, has widespread use in both commercial and public domain software, and has been shown to be effective in a limited in-house study[15]. Other compression alternatives will be considered as they become viable and usable by commercial and public domain software packages.

Table 2. Image Set Characteristics
Image Set File
Format
Spatial
Resolution
(W x H)
Bits/
Pixel
JPEG
Compressed
Individual
File Size
(bytes)
Total Set
Size (bytes)
Cervical Thumbnail (#1) GIF 91x109 8 No ~10 KB 51 MB
Lumbar Thumbnail (#2) GIF 128x155 8 No ~20 KB 236 MB
Low Res Cervical (#3) JFIF 723x878 8[a] Yes, lossy ~64 KB 324 MB
Low Res Lumbar (#4) JFIF 1024x1244 8[a] Yes, lossy ~128 KB 1.52 GB
Raw Cervical (#5) raw 1463x1755 12 No 5 MB 27 GB
Raw Lumbar (#6) raw 2048x2487 12 No 10 MB 122 GB

  • a. Reduced to 8-bits per pixel before compression.

The optical jukebox [refer to section 2.2] resides on level 1 and will house the raw NHANES II x-ray images. Currently, we are limited to 82 GBytes of storage space in the jukebox. Since the full resolution/uncompressed NHANES II digitized xrays (sets #5 and #6) requires 149 GBytes of storage space, under current operational procedures we would need an operator to intervene when a request is made for an off-line x-ray image. Several alternatives are under consideration to counter the shortage of mass storage. First, lossless compression of about 2:1, if it can be achieved with these data sets, would be an acceptable alternative. Secondly, it is possible to upgrade the jukebox drives and platters to double the storage capacity, albeit at greater cost. Combining these two alternatives would provide the most flexibility in that it would allow for future growth of the archive. The trade-off to be resolved is the cost of new equipment versus the acceptability of a method that will losslessly compress the images at about 2:1.

2.4 Collateral Data

The collateral data used with this archive serves three purposes: 1) it will be used to narrow the subset of images returned from a query, 2) it will be used for background information on each image, and 3) it can be used for epidemiological research. The design of this database will be based on an object-oriented/relational model. A commercial database package named Illustra was chosen because it is robust, uses SQL, has features that provide for security and data protection, is well supported, and is relatively inexpensive when compared with other commercial database packages. The database schema for the entire collateral data set is currently being designed and is intended to be indexed on several attributes such as age, sex, ethnicity, height, weight, and geography. Further indexing may result in partitioning based on results of the physical exam and/or lab tests contained in the collateral data[16].

3. Public Access

Over the past two years limited access has been provided to small sets of these x-ray images. Initially anonymous FTP access from a CD-ROM drive on a PC was used. Although this proved useful to researchers, access was slow due to the PC architecture and speed of the media hardware. This archive is still used occasionally, but is being phased out. We have also made available a test set of images that have been archived on 8mm tapes. Copies of this tape are distributed as requested. As demand for this test set has increased, we have recognized the need for wider and faster access to the data.

The public access x-ray archive we are developing has as a primary component the National Center for Supercomputing Applications' (NCSA) hypertext protocol daemon (HTTPD). HTTPD is a generic stateless object-oriented protocol for a distributed collaborative hypermedia information system. It can be used for many tasks such as name servers, object-oriented systems, and information servers[17]. Our HTTPD server will run on a SUN 670MP and deliver data to remote clients using browsing viewers such as the NCSA's Mosaic. For a remote user to access our archive and view images through our HTTPD they will need the following:

  • NCSA Mosaic or another suitable browser
  • color monitor
  • connection to the Internet.

3.1 Image Retrieval

Due to the large size of the archive, the inherent limitations of the jukebox, the user's needs, and the demands that could be placed on a public access server, we are proposing the delivery of these images with a staged approach. Initially the user will query the collateral database to focus in on a subset of the image database. In a typical scenario, a user will be presented with a query form in the browser. The nature of the query will be simple: for example, a logical combination of age, ethnicity, gender, and national origin. When the user submits the query the browser transmits this to HTTPD which passes the query onto to an Illustra gateway interface. The gateway is responsible for formulating a standard query language (SQL) macro to submit to the Illustra database engine [see figure 2]. The number of images matching the query will be reflected in the narrowness of the question posed. For example, asking for the images of all Hispanic females over the age of 60 will result in a smaller image set than a query for all females.

Viewing of this image set, which we shall name the active image set (AIS), involves several trade-offs related to graphical user-interface considerations (GUI), image size, traffic on the Internet, and location of the images in the NHSM. The browser could present the user with a scrollable page containing all the thumbnail images in the AIS and some limited collateral data. However this would be prohibitive if the AIS is large due to the duration of image transmission. An alternative is to use a state preserving approach as in the On-Line Imaging (OLI) system used for the NLM's historic image collection[18].

Since HTTPD is a stateless protocol it does not retain information such as which images from the AIS have already been delivered. As with OLI, the current state of a user's AIS will be maintained locally at NLM in the Illustra database. This state data will consist of a user identification to preserve the integrity of each individual's AIS, time of last connection, and the current delivery status of each image in the user's AIS. The database engine will partition the AIS into contiguous groups of images and will maintain the state of image delivery for each image in the user's AIS.

Figure 2. Data flow through the public access archive.

For example, if a query resulted in a list of 600 images, a user-defined group set size will determine the number of images delivered with successive user requests. Embedded in the information returned to the browser will be hyper-text links to the level 0 and 1 images from the NHSM for images in the current group, giving the user the flexibility to choose the image appropriate for their work. Selection of either hyper-text link will cause the HTTPD server to download the appropriate image to the client for viewing with an appropriate image viewer.

Currently the only known method for viewing the raw data is to use our public domain Imview package available for SUN workstations. Imview was developed, under X-windows for a SUN workstation, to read an image off a local unix file system and display it on an 8-bit color monitor. It has image processing features for image enhancement and thresholding and can do some basic image arithmetic, logic, and statistics. Enhancing Imview to include an in-house developed application level technique for sending data over the internet would decrease the amount of time it takes to receive an image from the HTTPD server. With this paradigm HTTPD would spawn of a server process to handle data transmission with the client Imview process. In-house experiments have shown a factor of three[19] improvement in transmission time by sending out multiple streams of image data to the client by varying the number of unique TCP/IP socket connections. As shown in figure 3, the client receives each stream of image data with a unique forked process through a virtual channel connection(SsxVsx> VcyScy) with the server. The client accepts multiple streams of data, I1 - In, from the server simultaneously and reassembles the data based on a stream identifier which maps the stream to the proper spatial location in the image.

Figure 3. A multi-socket connection for sending image data consists of image segments I1-In, sockets S1-Sn, virtual channel connections V1-Vn, and physical connections Ps and Pc.

Several precautions will be taken at the server to prevent any particular user from placing undue load on the server. In particular, a time-out period will be imposed so that after a certain amount of inactivity the user's AIS will be removed from the Illustra database, thus preventing unused data from being saved at the server. Also, a limit on group size will be imposed at the server to prevent any user from requesting too much data being sent out at a particiular instance. One exception to this might be researchers interested in looking at volumes of images for in-depth research studies. They might be served by a separate process that pre-fetches images and fulfills a request at off peak hours or periods of low-demand.

4. Summary and Conclusions

This paper outlines the integration of public domain tools and in-house developed and commercial software for the development of a multi-purpose public access x-ray archive. By building on the WWW with freely distributable tools such as HTTPD and Mosaic, a diverse target audience can be addressed at minimal expense to the user. We have shown the design for a prototype storage architecture that integrates fast magnetic media and slower optical media that maps to the user's needs. This system will provide for future growth in the archive with suitable image compression and hardware upgrades and provides for a method to deliver the raw data to the end user at data rates higher than normally found on the Internet.

REFERENCES

1. System for Automated Interlibrary Loan: System and Operations Description. Internal technical report. Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, Nov 1992.

2. F. L. Walker, G. R. Thoma, "Access to Document Images over the Internet, " Proc. 9th Integrated Online Systems Meeting, New York, pp. 185-97, May 1994.

3. R. C. Lawrence, D. F. Everett, M. C. Hochberg, "Chapter 7: Arthritis ", Health Status and Well-Being of the Elderly: National Health and Nutrition Examination Survey-I Epidemiologic Follow-up Study, J. C. Cornoni-Huntley, R. R. Huntley, J. J. Feldman (Editors), pp. 136-151, Oxford University Press, New York, 1990.

4. J. J. Anderson, D. T. Felson, "Factors associated with osteoarthritis of the knee in the first national health and nutrition examination survey (NHANES I). Evidence for an association with overweight, race, and physical demands of work, " Am. J. of Epidemiol., 128 (1, Jul), 179-189, 1988.

5. O. D. Carter, S. G. Haynes, "Prevalence rates for scoliosis in US adults: results from the first National Health and Nutrition Examination Survey, " Int. J. Epiemiol., 16 (4, Dec), 537-544, 1987.

6. C. A. Swanson, D. Y. Jones, A. Schatzkin, L. A. Brinton, R. G. Ziegler, "Breast cancer risk assessed by anthropometry in the NHANES I epidemiological follow-up study, " Cancer Res. 48 (18, Sep 15), 5363-5367, 1988.

7. R. S. Cooper, E. Ford, "Comparability of risk factors for coronary heart disease among blacks and whites in the NHANES-I epidemiologic follow-up study, " Ann. Epidemiol. 2(5), 637-645, 1992.

8. Vital and Health Statistics: Data Systems of the National Center for Health Statistics, Series 1 No. 23. March 1989, U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control, 14-16.

9. Vital and Health Statistics: Data Systems of the National Center for Health Statistics, Series 1-Number 10a, "Plan and Operation of the Health and Nutrition Examination Survey: United States 1971-1973 ", February 1973, U.S. Department of Health, Education, and Welfare, Public Health Service, National Center for Health Statistics, page 8.

10. Private communications, J. Findlay, National Center for Health Statistics, July 26, 1994.

11. Private communications, D. Blogett, National Center for Health Statistics, July 28,1994.

12. R. C. Lawrence, "Getting the Message Out: Using Digitized Radiographs from NHANES II & III, " Memorandum to Digitized Radiographic Images: Challenges and Opportunities Workshop, June 2-3, 1993, Bethesda, MD.

13. Vital and Health Statistics: Data Systems of the National Center for Health Statistics, Series 1 No. 23. March 1989, U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control, page 3.

14. S. Ranade, "Archive Storage Media Alternatives, " Optical Information Systems, Vol. 10, Num. 1, 7-13, Jan-Feb 1990.

15. L. E. Berman, R. Long, S. R. Pillemer, "Effects of Quantization Table Manipulation on JPEG Compression of Cervical Radiographs, " Society for Information Display, 1993 International Symposium, Seminar, & Exhibition, Seattle, WA, May 16-21, 1993.

16. Private communications, S. R. Pillemer, National Institutes of Health, May 3, 1994.

17. Internet Draft on HTTP, obtained from http://hoohoo.ncsa.uiuc.edu/docs/FAQ.html#whatis

18. R. P. C. Rodgers, S. Srinivasan, "On-line Images from the History of Medicine (OLI): Creating a Large Searchable Image Database for Distribution via World-Wide Web, " Proceedings of the First International World-Wide Web Conference, pp. 423-431, Geneva, 1994.

19. R. Long, L. E. Berman, L. Neve, G. Roy, G. R. Thoma, "An Application-level Technique for Faster Transmission of Large Images on the Internet, " accepted as poster to Multimedia Computing and Networking 1995.

 

National Institutes of Health (NIH)National Institutes of Health (NIH)
9000 Rockville Pike
Bethesda, Maryland 20892

U.S. Dept. of Health and Human ServicesU.S. Dept. of Health
and Human Services

USA.gov Website