LBNL Image Library -- Project Design

Table of Contents

Overview

A goal of the Image Library is to provide tools to help researchers who have many large image files organize, search and retrieve specific information and images from their collection. Since such collections of digital image files are likely to be stored on the Mass Storage System, the Image Library provides tools to interface to the MSS.

In order to allow quick searches of large images a set of small thumbnail versions of each large image is derived and a text description of each image is entered. The thumbnails can be browsed and when one appears to be of interest the larger image can be retrieved. The text files are indexed so that any word or combination of words can be searched for efficently. A search results in the display of a small image and the matching text for all the files that matched the search query. From there larger images and the complete text can be retrieved.

The Image Library is implemented by two standard NCSA HTTP servers, and a set of html documents and scripts. One server runs on the Mass Storage System where it performs three basic functions: staging one or more files, returning one file at a time and listing the files in a directory. This server will be referred to as the MSS Server . It uses the Unitree interface to most efficiently stage or read the file(s) that are requested. The MSS server is also responsible for authenticating that a user who is requesting a file has been authorized to do so by the file's owner. To facilate the owner's control over exporting files it provides a html forms interface which allows the owner to define users and grant them read privileges to all or subparts of his collection.

The other server, which is referred to as the Image Library Server or ILS, currently runs on a Solaris workstation and is responsible for defining an image collection, creating the small thumbnail images, keeping the descriptive text files indexed and providing search and browse functions. The ILS controls access to all the derived images and text. When a user wants to access one of the high resolution files, the ILS provides a URL to the file. All the functionality of the ILS is provided by a set of html documents, scripts and standard HTTP access files which could be copied to any other machine that would like to also provide these functions via its own HTTP server. There can be multiple ILS servers running on different machines, but each server only manages the collections which it has defined and whose derived data base is accessable via filesytem calls. No attempt has been made to have multiple ILS's cooperate with each other. However, each one does cooperate with the single MSS server.

The access control domains of the two servers are completely independent. The MSS server runs on a trusted machine and must be careful not to return any files that have not had HTTP access explicitly permitted. The ILS software may run on any user's workstation and so cannot be granted trusted access by the MSS server. The ILS controls the access to the parts of the collection it created: the derived images and text files. When a client requests access to a high resolution file, the ILS returns a URL to the file. If the high resolution image resides on the MSS that URL will contain a reference to the MSS server, so the file retrieval takes place directly between the client and the MSS. The MSS checks the authorization of the client before performing the requested funtion. The MSS provides an HTML forms interface for a MSS user to permit read access to files in directories owned by him to people who may not have accounts on the MSS. Since the MSS interface must function independently of the ILS, it can be used to provide a simple, platform independent read-only interface to Mass storage to complement the usual NFS or FTP interfaces.

The naming domains of the two servers are mostly independent. The ILS implements the collection abstraction where the user names collections and subcollections and then enters images into a collection. Each subcollection keeps the complete pathname for the source directory of the high resolution images but this name is only used to simplify the addition of images to a collection. When a image is entered in a collection the complete pathname of original image is saved. This name is used in any subsequent reference to high resolution original. The filename (i.e. the last component of the pathname) of the derived file is the set to be the same as the filename of the original file. The image source directory for a collection may be changed without affecting files that have already been loaded if, for example, the owner wants to add images from different directories to one collection.

This document describes the general functions and structure of the Image Library software. There is a User's Guide which documents the steps a user must take to use this system.

Defining a Collection

A collection consists of a set of original images which may be archived on the Mass Storage System, a set of smaller thumbnail versions of these images, pointers to the original images, and a descriptive text file, called a tag file for each image. The following summary information is kept for each collection. A collection is created and new images are added to an existing collection via an HTML forms interface. The user does not need direct access to the machine on which the Image Library Server (ILS) is running, but the ILS does need access to the original images either by having the images local or NFS mounted or accessible via the Mass Storage HTTP server. Collections are organized hierarchically. A given ILS may manage multiple collections for different users. Each such collection is named and its data stored within the ILS's COLLECTIONS directory. A user may create independent collections or may make subcollections as part of one collection. Subcollections have two advantages: as they are created they inherit tag file defaults and access permissions from their parent collection; and they can be searched as groups. Index files are created for each subcollection and for each tree of collections, but there is no index spanning all the top level collections, since they are presumed to be independent of each other.

Mass Storage HTTP Server

Functions

A HTTP server is run directly on the mass storage system for two reasons. First, so that requests for the high resolution image files can be sent directly from the Unitree storage interface to the client rather than having them be transferred via the Image Library Server and the NFS interface. Second, so that the access to MSS files is controlled by a trusted server running on the MSS machine. The functions of the MSS server are:

These functions are performed by cgi scripts which are called by a standard NCSA HTTP server. The files are accessed by the same names that the NFS view of the Mass Storage System presents. The script checks access rights to the file based on the parent directory and then calls the Unitree interface to read the file. As the file is read it is writen out directly to the STDOUT pipe that the server returns to the client. The server will run as root while checking access and then change its userid to that of the directory owner before accessing the file. This way the usual Unitree access protections will double check that the file should be exported via this request.

Access Control

Since many different projects store data on the MSS the security constraints of the server and how they are implemented must be clearly understood. As both the Mass Storge file system and HTTP control access to files only on the granularity of directories, the MSS Server will limit access control only to directories. Any files that are not contained in directories that have been explicitly allowed for HTTP access, shall not be readable via the MSS Server. Each subdirectory must be explicitly allowed for HTTP export. No symbolic links will be followed when exporting a file. In order for a user to export his files via the MSS server, he must first contact the MSS administrator and ask to be registered as exporter for some subtree of the Mass Storage hierarchy. He must have read and write access to the top of the subtree for this to work. Once a user has been registered as a exporter for specified subtree, he may export any subdirectory for which he has read/write access in his subtree and may define new HTTP users and grant them permision to his directories.

As much as possible the standard NCSA HTTPD access control mechanisims are used. A directory tree is created relative to the cgi-bin root of the server that mirrors the NFS directory hierarchy of the directories from which files are to be exported. The access information for each exported directory is stored in a the htaccess file in the mirror directory along with the userid of the exporter. The ILS creates URLs that access a file through the fetch or stage scripts called via the file's parent mirror directory. This way the HTTP server checks the access to the parent directory and prompts the client for a user name and password ensuring that the script is not called unless the user has the required access. The server then sets the environment variable REMOTE_USER before calling the script. The file name that is passed to the script is interpreted as relative to the directory for which access has already been checked. The script sets its uid to that of the directory exporter, checks that it still has read/write access to the directory and then reads the file. Thus a owner can only export files as long as his MSS access continues to exist.

An example follows: A user is using the Image Library Server to browse a collection whose high resolution source images are stored in the MSS directory /home/mss/icsd/mrt. The owner of those files (i.e. a person who has write access to that directory) has previously told the MSS administrator that she wishes to export files from that directory via the MSS server. At that time the administrator created a directory, cgi-bin/home/mss/icsd/mrt, relative to the HTTP root directory and put two files there:

owner containing the owner's uid
htaccess
allowing the owner to access the directory by password
Once this is done, the owner may call the MSS html forms interface to specify what additional users and groups may read these files. The MSS server will add these users to the htaccess file in cgi-bin/home/mss/icsd/mrt. When a user decides to retrieve an image named Fig.1 from this collection the URL he accesses is: http://www-mss.lbl.gov/cgi-bin/home/mss/icsd/mrt/fetch/Fig.1. The MSS HTTP server checks the user's access to /cgi-bin/home/mss/icsd/mrt before executing the fetch command and passing it Fig.1 in the PATH_INFO variable. The script needs to parse ARGV[0] to find out the name it was called by and prepend its directory name minus cgi-bin to the file name to create the Unitree name of the file. e.g. /home/mss/icsd/mrt/Fig.1. It then sets its uid to that of the collection owner, checks that it still has read/write access to the directory and reads the file from the Unitree interface.

Image Library Server

The ILS is implemented by running a standard NCSA HTTP 1.4 server, using NCSA access files, cgi scripts and html documents. All the scripts, documents and derived collection information, i.e. images and tag files , are assumed to be stored in a directory named ImgLib relative to the HTTP server's document root. All the collection information is stored in directories relative to ImgLib/COLLECTIONS in subdirectories named as upper case versions of the collection name. Each subcollection contains directories named: forms, hi-res, images and tags . The forms diretory contains the general description of the collection, the default tag file and a .htaccess file that controls write access to the collection. The hi-res directory contains pointers to all the high resolution images in the collection. The images directory contains all the derived images. The tags directory contains the tag files for each image and the glimpse-index files. The collection directories are protected by htaccess files whenever the collection is protected.

Functions

The Imgage Library Server provide the following functions:

These functions are all provided through HTML forms interfaces which should be fairly self-explanatory. For more details see the User's Guide .

Access Control

The ILS may be a dedicated server that just serves the Image Library functions or it may be a generic HTTP server that serves all the HTTP requests directed to a machine. All that is needed for the Image Library software to work is that scripts in the directory subtree cgi-bin/ImgLib may be executed and that .htaccess files in the ImgLib subtree may overide access.conf directives. The script access can be conveniently allowed by letting cgi-bin/ImgLib be a symbolic link from the servers's script directory to the ImgLib directory in the document root directory and allowing symlinks to be followed in cgi-bin. The .htaccess files use the Limit and AuthConfig directives so those at least must be permitted by the access.conf file. Currently the user/group that the ILS runs as needs to have write permission in the ImgLib/COLLECTIONS subtree. If it is desired to run the server as a less privileged user, the scripts that create files could be called from a setuid wrapper program.

Read access control is implemented by creating the appropriate .htaccess files in a protected collection directory and referencing scripts to diplay information through this directory. This causes the ILS to check access and prompt for user and passwords in its normal fashion so that by the time a display script is executed for a given directory, it can assume that this user has the required access. Write access is implemented by having the scripts check the user against the .htaccess file in the collection's forms directory whenever a request is made to modify the collection.


Administrative info for this page

Image Library Homepage DSD Home page LBNL Home page