System Description ACT/DB Architecture The ACT/DB architecture is summarized in
Figure 1. It is divided into
server and client components. The server holds four kinds of tables: | Figure 1A diagram of the ACT/DB architecture. The database server and the forms
server reside on different machines because they require different operating
systems (UNIX and Windows NT, respectively). On the database server, the
dictionary tables are central (more ...) |
- Dictionary tables, which are study-independent, contain
definitions of the parameters and their logical grouping and ordering for the
purposes of display and reporting of data.
- Vocabulary tables contain externally derived controlled
vocabularies. An example of a controlled vocabulary in ACT/DB is COSTART
(Common Standard Thesaurus of Adverse Reaction Terms, currently used for
reporting by the FDA, and shortly to be superseded by an international
system).
- EAV data are stored in a “normalized” form (i.e., with
minimal data redundancy). Two forms of EAV data are managed: data that have a
common structure across all studies, and data whose structure varies across
studies. Examples of the former are inclusion and exclusion criteria for
adding a patient to a trial, and treatment with standard therapeutic agents
defined as part of the study protocol. Here, the data are segregated into
individual tables based on function (e.g., there is a table for on-protocol
treatment history), and there are few such tables. Data whose structure varies
across studies include all the evaluation parameters recorded during a study.
(Many of these parameters are study-specific.) Here, information common to all
the parameters in a single clinical event (patient ID, study ID, start and end
event times) is stored in higher-level tables, while information about the
individual parameters gathered during that event is stored in
datatype-specific tables, which are described below.
- Non-EAV data, such as basic patient demographics (name, sex,
etc.), are stored in orthodox relational form. Such information is homogeneous
in structure and less of a natural fit to EAV structuring. Also, we wished to
explore ways in which EAV and non-EAV information can coexist. Production
systems will always have legacy non-EAV data on which existing applications
depend, and such data must be usable without conversion to EAV form.
The client component consists of display tables to transiently
capture EAV data, the forms associated with these tables and, most important,
the code to manage client—server communication. Forms are managed
through a network-accessible Windows NT—based Version Control
server. EAV-based Design Issues in ACT/DB Most mainstream relational database engines have an upper limit (typically
255) on the number of columns per table. With an orthodox database design (one
column per fact) numerous tables would be needed, and more would constantly
need to be added as the number of recorded parameters increased. Only a
relatively small number of positive and significant negative findings,
however, are actually applicable (and recorded) for a given patient. As a
result, the vast majority of columns would be empty (null), resulting in
considerable wastage of space. Furthermore, any attempt at presenting
consolidated information for a single patient (into a report, for example)
would require navigating all these tables and looking for non-null fields. This situation is similar to the computer-science problem of sparse array
representation. The equivalent database design solution is row
modeling. Here, data on a single entity (e.g., patient) are conceptually
represented as multiple rows in a single table with few columns, rather than
as a single row of data spread across multiple tables and columns. This design
is also called the entity—attribute—value (EAV) design,
because each row holds information on the entity (e.g., patient ID,
visit, date), the attribute (the name or ID of the parameter being
recorded), and the value of the parameter. To retrieve all facts on a
patient, one simply searches the entity columns for the patient ID, ordering
all rows by date if necessary. Because of its structural simplicity, the EAV design is popular in clinical
databases. Production databases using EAV components include the HELP
system, 14 the
Columbia-Presbyterian Medical Center's (CPMC's) clinical
repository, 15 a
system for tracking the care provided to a homeless population in the greater
Boston area, 16
DEC-RAD, 17 a
radiology package, and Oracle
Clinical, 13 a
commercial package for clinical trials management. Attribute—value pairs
are also used in World Wide Web
“cookies,” 18
which store state on the client and circumvent the stateless nature of WWW
communication. Successful operation of an EAV database requires management of the
attributes. This is accomplished through a set of tables that make up an
application-specific (as opposed to vendor-supplied) data dictionary. The
dictionary component itself is often structured in EAV fashion. For example,
the CPMC system uses the Medical Entities Dictionary
(MED). 19 While EAV
design is storage-efficient, it introduces complications with respect to table
representation, display of data, query of data, and data input. In our system
description below, we describe each problem briefly and discuss our design in
the light of existing approaches. Table Representation While EAV information is stored conceptually in a single table, in
reality databases are strongly typed. That is, a column can store only data of
a single datatype (e.g., integer, floating-point, string, or date). Some
databases (e.g., Oracle Clinical and early versions of HELP) use a single
structure for all EAV information. Such a design achieves simplicity by
coercing all the other simple datatypes into the least common denominator
(i.e., string). This prevents efficient indexed searching based on parameter
values. Value-based indexing can be important in queries aimed at rapidly
identifying patients with parameter values lying within or beyond a specified
range. It also becomes important when an EAV design is used as the basis of a
large-scale data warehousing application and needs to support fast value-based
searching. (An index on numeric values that have been coerced into strings is
not very useful, because of a different sorting order: the ASCII string
“123” is less than the string “23,” even though it is
numerically greater.) Furthermore, this design deliberately sacrifices the ability to store long
text (so called “Memo” fields) or complex Binary Large OBject
(BLOB) data such as a histopathology slide or a chest
radiograph. † In today's
Web-oriented, multimedia-influenced world, with increasingly affordable disk
storage costs (10 cents per megabyte or less) and desktop computing power,
there is little justification for not supporting BLOBs when the underlying
database engine is capable of handling such data. ACT/DB segregates the EAV data into six tables, based on the datatype of
the attribute: integer, floating-point, dates, short text (less than 256
characters), long text (up to 32,000 characters), and BLOB. (The inclusion of
dates as EAV data is, in general, discouraged because ACT/DB has another means
of tracking date-time information, as described later. EAV dates should be
used only for isolated facts that will not be used for chronologic reporting,
such as “date of menarche” in a gynecologic history.) All six tables have an identical structure except for the datatype of the
value field. The first four EAV tables have an index on the value column. The
integer table is also used to store Boolean data and enumerated data. (We
discuss enumerated and BLOB data in more detail below.) The ACT/DB data
dictionary records the datatype of each parameter so that the appropriate EAV
table may be accessed for querying. Attribute Grouping. As described above, all attributes are
atomic. However, many attributes are related to each other because
they refer to the same clinical event. Consider, for example, radiotherapy.
For each radiotherapy course (a separate event), the attributes of interest
are the dose, the site, and the best response. One attribute by itself is
relatively meaningless without the others. Therefore, the study designer can aggregate attributes into groups
and can also specify whether the group occurs an indefinite number of times
(as in the above example) or just once in a given phase of the clinical trial.
Groups are further aggregated into clusters, one cluster
corresponding to all the groups that are to be displayed, or edited, on a
single form. The designer specifies the ordering of groups within the cluster
and the ordering of attributes within the group. This ordering information is
used by ACT/DB's form generator, described later. There is a many-to-many relationship between the attributes, groups, and
clusters. This is because in different studies, different clusters may share
the same attributes. In the interests of standardization as well as study
design throughput, however, designers are encouraged to reuse existing
clusters and groups rather than create new groups with existing attributes. In
practice, therefore, most of the data follows a hierarchic relationship. Data Display It is generally desirable to present EAV data as though they were stored in
orthodox format—one column per fact. Apart from being easier to look at,
such a format is required for analysis by statistical and spreadsheet
packages. If EAV storage is regarded as a column-to-row transformation, then
display generally needs the reverse (row-to-column) transformation. If such
transformation needs to be done regularly, some form of automation is
desirable. One way to automate reverse transformation is through database
views (predefined queries), with the EAV tables participating in multiple
self-joins. Johnson et al., in their description of the CPMC system
architecture, 20
mention the possibility of predefined views customized to individual classes
of users. Packages like Oracle Clinical allow a database manager to generate
“canned” views for sets of parameters. The major limitation of study-specific views in an EAV database is that the
number of views needed by different users may increase nonlinearly with the
number of parameters in the database. (This problem is noted in Johnson et
al. 20) Furthermore,
in the case of clinical trials, each time a new study is defined in the
database, additional canned views need to be created. In a situation where
dozens of studies are active, simply managing such views can be a major
undertaking. When numerous clinical parameters need to be extracted in a single
operation, an additional problem arises. When the same EAV table is used
multiple times in a view by repeatedly joining with itself, each additional
use of the table is treated by the database engine as if it were a new table.
Most database engines limit the number of tables participating in a multitable
join. (For example, Sybase has the rather small limit of 16.) Working around
such limits manually requires inelegant solutions, such as creating temporary
tables in multiple steps to hold intermediate results and joining these
temporary tables to each other. Automating such operations then requires
writing study-specific stored procedures (server-based subroutines), which
again become increasingly hard to manage as they increase in number. (Very few
packages support SQL-3′s package mechanism for modularization of server
subroutines.) Predefined queries run somewhat faster than dynamic SQL queries because
they are “compiled” on the server into a query execution plan.
Such performance enhancements are not, however, guaranteed to be dramatic.
Complex joins involving a large number of tables may not run significantly
faster than a number of simple queries, each involving only a few
tables. 21
Especially if the total volume of the returned data is modest, it may be as
efficient (or more efficient) to issue a series of simple queries and assemble
the returned data on the
client. 22In designing ACT/DB, we decided that the improved performance of
study-specific queries involved too much maintenance overhead. As a result,
ACT/DB uses a small number of study-independent views and lets the user select
parameters, or sets of parameters, through a list-selection interface. The
result is generation of dynamic SQL, which is sent to the server. The captured
data are sent in EAV form to the client, where they are converted to columnar
form for output. ACT/DB displays data to the user in two ways. Parameter values gathered
during an individual clinical event are shown on forms, where they can be
edited. Data for an entire study are presented through reports of several
kinds. Some of these reports (e.g., those relating to a single patient)
present the data in attribute—value form, while others (e.g., reports on
numerous patients in a single study) use columnar form. Although ACT/DB is not
primarily an analytic tool, it can export columnar data into text files for
direct import into spreadsheet and statistical packages. Data Input Input requirements for EAV data are similar to display requirements. It is
desirable to create the illusion of an orthodox database design. The
“forms” design components of commercial database packages are in
fact geared toward such a design, with one editable form object (e.g., text
box, check box) corresponding to one column in a table. Because each fact
describing a patient is stored as a separate row in an EAV table (with all
attributes and values in the same conceptual columns), the EAV design is not
well matched for straightforward form-based data entry per se. Two approaches have been used to address this problem. In one, protocol- or
department-specific orthodox tables are the basis of forms for data entry.
Such tables may also serve the purpose of operational use within a department.
The information thus supplied is then translated through custom-designed (but
straightforward) programs into EAV form for storage at a remote (typically, an
institution-wide) repository. The second approach, in effect, allows direct
data entry and editing of the EAV data. ACT/DB's Form Generation Architecture. ACT/DB uses the second
approach. The protocol designer specifies what facts are to be gathered in a
form. (The designer does this by editing the data dictionary contents through
forms that are accessible to users with design privileges.) ACT/DB uses tables
that reside only on the Access client to transiently capture EAV data from the
server for an event and present it to the user in a “flattened”
(i.e., orthodox) view. These tables are viewed through the generated
forms. One such table is used for the main form, and up to five tables can be used
for subforms that are embedded in the main form. Subforms are used for groups
of parameters that may be repeated a definite or indefinite number of times in
a single case report form. For example, in a cancer therapy protocol when we
record “past therapy,” we may wish to record multiple instances of
past surgery, past radiotherapy, and past chemotherapy, with details specific
to each. Each of these is entered in its own subform, because a given patient
may have zero or more instances of a particular form of therapy. Another
example is a pharmacokinetic study where blood samples are collected at a
fixed number of intervals, the actual number and timing of the samples
depending on the study protocol. These client tables have fields with a simple naming convention: a single
letter indicating their datatype and two numbers indicating a sequential
number. (Thus, the second string field has the name S02.) For a given form,
the mapping between individual parameters to be entered or viewed on the form,
and the fields on the client that will transiently hold the parameter values,
is generated automatically during form creation. (This mapping is stored in a
server table.) When new clinical data are added or edited, the mapping allows
the flattened data in the file to be translated into EAV form and sent to the
server. When existing clinical data are to be viewed, EAV data are translated,
again through the mapping, into flattened data that are transiently captured
in the client tables. ACT/DB allows the protocol designer to generate a form with a single button
click after the parameters for that form, and the order in which they are to
be presented, have been specified. These forms have standard GUI components
(text boxes with vertical scrolling, check boxes, pull-down menus, subforms),
are reasonably esthetic, and have a consistent appearance. Furthermore, the
default fonts, character sizes, and colors for form titles, form section
headers, labels, and text boxes can be altered (by interactive editing of
entries in a “preferences” table) if the user so desires. The
forms should require modest, if any, cosmetic alteration before they can be
used in simulation or production mode. If necessary, they can be altered using
Microsoft Access form design tools. A limitation of our approach is that the number of objects on the form that
correspond to parameters is limited to the maximal number of fields that an
Access table can hold, which is 255. (In practice, this is significantly less:
for example, the “main form” table can store up to sixty integer
parameters but only three BLOB parameters and ten long-text parameters.) This
is not a significant limitation, because all forms map to the same display
tables, and there is no limit to the number of forms per study. Part of a form, and the mapping between fields on the form and records in
the EAV tables, are illustrated in Figure
2. | Figure 2Two-way mapping between the fields on a form and the EAV tables in the
database. The top half shows part of a form (in Form Design mode, without
data). The lower half shows the data that will be created when the fields are
populated. Note that 1) The (more ...) |
Ad Hoc Query Closely coupled to the task of EAV data display is ad hoc query. No existng
package seems to support ad hoc query of EAV data on arbitrary, complex
Boolean criteria. This is understandable, because query of EAV data is not
very efficient compared to query of tables in orthodox columnar form. Because
facts on multiple parameters are stored in a single table or structurally
similar tables, Boolean searches that combine multiple parameters must perform
numerous self-joins on the EAV tables through relatively expensive row-based
set unions, intersections, and differences for the And, Or, and Not operations
respectively. These operations have not received much support, let alone
efforts at optimization, among commercial database engines. Keyword support
for intersection and differences was added to the SQL standard only in
SQL-92. 23 Among
mainstream database engines, there is very limited support for all three
operators: even Oracle 7 uses different keywords for intersection and
difference. 24We are currently working on an ad hoc query module for ACT/DB. This module
is currently in prototype stage, and it is too early to tell whether our
efforts will be entirely successful. Our approach is based on a GUT rather
than a query language. (While potentially less powerful than a language, GUIs
are simpler to implement.) The interface lets the user query the EAV data as
though they were a single orthodox table with a very large number of columns.
The system converts the user's complex Boolean operators into the appropriate
set operators, with each clause in the query being converted into a SELECT on
the EAV table corresponding to the parameter in the clause. There are numerous complications in the ad hoc query of EAV data. For
example, semantic correctness of queries must be ensured by restricting
relational or aggregate operators to those appropriate to a given parameter's
datatype, and a query generator usually needs to create intermediate temporary
tables to process highly complex queries. These issues are beyond the scope of
this paper. Data Management in ACT/DB We now describe how ACT/DB manages different kinds of data, as well as
controlled vocabularies and the contents of its data dictionary. Management of Enumerated Data and Controlled Vocabularies Many clinical parameters are coded or scored for ease of subsequent
analysis. For example, a blood product transfusion may be classified as whole
blood, plasma, RBC, WBC, platelet, or other. A study designer might choose to
codify severity of pain as absent, mild, moderate, or severe. In each case,
each identifying phrase is associated with an integer (for example, the codes
on the pain scale might be 0, 1, 2, and 3, respectively). “Blood
transfusion type” is a nominal parameter (individual codes can only be
compared with test if they are the same or different), whereas “pain
severity” is ordinal (codes can also be compared for relative
magnitude). While numeric codification is useful, it is generally preferable,
in the interests of transcription accuracy, to let data-entry persons view and
select descriptive phrases. ACT/DB lets the user designate “choice sets.” Each member of a
choice set has two components, a numeric code and a descriptive phrase. A
choice set can be associated by the designer with a parameter. During form
generation, ACT/DB creates a pull-down menu (“combo box” in MS
Access parlance) for that parameter. When the pull-down arrow is clocked, the
list of descriptive phrases appears and the user can select any one. (This
list is dynamic and initialized through a SQL query. If the list is
subsequently modified after the form has been created, the list of values
appearing when the form is used is current.) The designer can control whether
codes and phrases, or only phrases, are displayed in the pull-down menus. When ACT/DB generates reports, the user has the option of looking at
enumerated values as either integer codes or descriptive phrases. The latter
are easier to understand, whereas the former more suitable for statistical
analysis. Managing Large Vocabularies. Designer-defined choice sets are
adequate when the list of values is reasonably small. However, a pull-down
menu interface is neither ergonomic nor efficient over a network when the list
of possible values is very large and is derived from an existing controlled
vocabulary such as COSTART or SNOMED. To permit the user to search such
vocabularies through the data-entry form, ACT/DB lets the designer associate a
“search” button on the form with a generic routine. This routine activates a standard predefined form through which the user
can search any controlled vocabulary table or view based on keywords in the
descriptive text associated with the code. The user can search for words
beginning with or containing a phrase; searches can be combined in complex
Boolean fashion using the And, Or, and Not operators. (The user's choices
result in generation of SQL that is sent to the server to retrieve matching
entries. The code foundation for this routine is based partly on a program
developed by the first author called Concept Locator, which was originally
built to search the UMLS
Metathesaurus. 25) Computed Parameters The designer can designate parameters whose value is set through a
computation based on other fields and define formulae for such
parameters. For example, body surface area, used widely in medical oncology
for chemotherapy dosing, is a function of height and weight. The designer is
given assistance in formula construction (through scrolling lists of available
parameters). Nonetheless, specifying such a formula requires some knowledge of
Visual Basic programming syntax (even though the “program” is a
single-line expression), because the formula is inserted verbatim into the
computed field in the generated form. For example, exponentiation in Visual
BASIC uses the caret rather than the double asterisk of FORTRAN. This is the
one of the rare instance in ACT/DB where the designer may need programmer
assistance. There are times when a value must be computed through a complex subroutine
involving branching and looping logic (although we have not encountered such a
situation in our present protocols). Microsoft Access is fully orthogonal in
that there are no restrictions on the function that may be used in a computed
field. Arbitrarily complex functions can be used in a field's formula or even
as expressions in queries. The designer may also allow direct data entry for such parameters. This is
a useful alternative when data are being transcribed into the electronic form
from an existing, filled-out paper case report. Here, the computed value might
have already been entered on a paper case report, but one or more of the
parameters required for the computation are missing from the paper form. For
example, body surface area is commonly computed at the patient's bedside
through nomograms and entered directly on paper. Laboratory Data and Ranges As part of its schema, ACT/DB stores information on laboratory tests
employed within Yale and the normal values (occasionally stratified by age and
sex) for results of each test. These data were gathered from the Yale Clinical
Pathology System for work that has been previously described by Kannry et
al., 26 and is used
in two circumstances: - During from generation, ACT/DB creates status bar text, a data entry aid,
for each form object representing a parameter. (This text appears on the
bottom line of the screen whenever the cursor is in that object.) For most
parameters this text is specified by the designer through the data dictionary.
For laboratory test parameters, however, ACT/DB generates the text
automatically. This text summarizes the units and range of normal values (for
male and female subjects). If the normal range varies with age and sex, the
maximal and minimal values over the entire range are computed. The status bar
text then displays this range and indicates that the range varies with age and
sex.
- Based on the age and sex of individual patients (age being computed from
date of birth), ACT/DB can generate a report that identifies values that lie
within and beyond the normal range.
Our current handling of laboratory values is somewhat limited. For example,
normal values may depend on physiologic conditions such as pregnancy and
lactation, which are not accounted for by our system. Further, the range of
normal values depends to some extent on the laboratory where a test was done
(a value of “upper normal” in one lab may mean “high”
in another). The Columbia MED employs an object inheritance mechanism whereby
tests performed by different labs are different entities within the database,
but related entities share a common parent (a “generic” prostatic
acid phosphatase, for example). We have not yet had to deal with this level of
complexity, but we may have to if ACT/DB ever scales up to handle
collaborative interinstitutional trials. Data Dictionary Management The ACT/DB data dictionary, which contains the definitions of study
protocols, the clusters, and the individual parameters, is constantly
consulted during almost every one of ACT/DB's operations. Therefore, in order
to scale up to handle a large number of studies across the institution (which
implies reuse of existing parameters and forms as far as possible), it must
come with tools for management. Our approach is that the data dictionary
tables are another vocabulary and therefore should be managed in the same way
as external vocabularies. The vocabulary searcher described earlier is used in
multiple ways—to identify parameter descriptions containing one or more
keywords, all form clusters containing a parameter, and so on. Further, each cluster is associated with one or more user-defined keywords
(which we call classifiers) that are used to categorize the cluster.
There are no restrictions on what a classifier might be. For now, the
classifiers table mostly contains the names of clinical fields (to identify
forms that are field-specific, like “oncology”), subject names
(like “chemistry”) or terms such as “general.” When
setting up a new study, reuse of existing clusters is facilitated by letting
the designer narrow down to a list of clusters described by one or more
classifier keywords. Managing Time-oriented Data Most clinical databases attach one or more time stamps to every fact
entering the database. One kind of time stamp, which is system-generated,
records when the fact was entered into the database. Another kind of time
stamp, which is user-supplied, records the date/time when the event described
by the fact occurred. (In some databases, the fact may be tagged with an
additional time stamp that records when the event ended.) Currently ACT/DB does not meet the definition of a time-oriented database,
but it has the basic data structures on which such features may be built. As
mentioned above, attributes are aggregated into groups when they apply to the
same clinical event. These groups are tagged with start and end date/times. In
the data dictionary entry for the group, the protocol designer specifies
whether the event recorded is a period event (both start and end
times) or an instant event (start time only; end time is null). For
example, a radiotherapy course is a period event. Through a view linking the
group with individual parameters in the group, every clinically related fact
in an EAV table is, in effect, time stamped. This allows straightforward
generation of a chronologic report for a single patient by sorting all events
on start and end date/times. The protocol designer also specifies the prompts for the starting (and,
optionally, the ending) date/times for the group and the format of the
events—date and time, or date only. This information is used by ACT/DB
when generating forms. Either one or two text boxes (depending on the type of
event) are created with these labels for date/time entry. For questionnaire-based outcomes research, as opposed to long-term clinical
trials, time stamps as defined here are often not needed on many data items
(e.g., “smoking history”), and prompts for their entry on a
generated form do not serve any purpose. ACT/DB therefore lets a designer
indicate, for individual groups, that no date/time information should
be associated with them. No prompts for time stamps will be generated for such
groups during form creation. (In the database, nulls will be stored against
both starting and ending event date/times.) For clinical studies, the use of
groups without time stamp information should, in general, be kept to a
minimum. ACT/DB currently lacks support for time-oriented operators and joins, like
those described for Das and Musen's CHRONUS
system. 27,28
Certainly, adding support for time-based ad hoc queries on EAV data (as
opposed to the non-EAV data in CHRONUS) will pose an additional challenge. Managing BLOB Data BLOB data, such as histopathology images, are important in patient records
and clinical trials: it is convenient to have data online for reference or
publication. ACT/DB manages BLOB data display and editing through the
object-linking
mechanism. 29 For a
given BLOB attribute, the protocol designer specifies the application that
will handle the BLOB by specifying the file extension to be appended to BLOBs
downloaded from the server. (A single BLOB attribute is therefore limited to a
single kind of BLOB data. This limitation is not serious.) BLOBs are uploaded to the server by pasting them into “object
frames,” which are areas in a form for displaying and editing the BLOB.
However, it is inadvisable to send the bytes in the frame itself to the
server. Instead, dramatic savings in storage requirements are achieved by
locating the BLOB's original file and sending the bytes in this file to the
server. This is because disk files storing multimedia BLOB data usually use
highly efficient compression techniques (e.g., Joint Photographic Experts
Group, or JPEG, images often use compression ratios of 30:1 or more). BLOB data can also be viewed in the columnar report. This requires viewing
the file through Microsoft Excel 97 (which is also commonly used for simple
statistical analysis and graphing). When the user includes a BLOB parameter
among the list of parameters to be viewed, ACT/DB downloads all BLOB data into
a series of sequentially numbered files into a temporary directory. Within the
text file itself, each cell representing a BLOB data item contains a formula
that uses the hyperlink function to link to the pathname representing the
corresponding file. (This function can also link to Web URLs in a similar
fashion.) An illustration of BLOB data in a form is shown in
Figure 3. | Figure 3 A sample data-entry form showing a BLOB data field (an echocardiographic
still image) along with fields for related descriptive parameters.
(Echocardiogram obtained from Dr. Ira Cohen of the West Haven, CT, Veterans
Administration Medical Center.) |
Forms Library Management In an institution-wide clinical trials database, the forms (data-entry
screens) can number several hundred, making them unmanageable without
automated assistance. Forms must be stored within the Microsoft Access client
application in order to be used in an application. Because forms are developed
for use within individual departments, an individual client machine needs to
store only those forms actively being used on that machine. Still, there must
be a central repository for all forms, especially if forms have been modified
with developer assistance. Individual forms may exist in multiple versions,
and a version-control system is needed to track these versions. ACT/DB uses Visual SourceSafe, Microsoft's source code control system, to
maintain a forms library on a network-accessible Windows machine that acts as
a forms server. (Access 97 is the first version of Access to have hooks to a
version-control package.) When a developer needs to change a form, the form is
checked out from the server, altered, and checked back in. When an individual
client is configured, it can incorporate the current versions of the forms
that are needed. (Currently, this process is manual and cannot be done through
program code. Fortunately, it needs to be done very infrequently.) Visual
SourceSafe also manages version control for program modules, facilitating
coordination between multiple developers working on ACT/DB. Streamlining Client-Server Communication Certain tables in ACT/DB exist only on the client, and certain tables exist
only on the server, while certain server tables of modest size that change
very rarely are replicated on the client. The mechanism to coordinate client
and server interaction uses a subroutine and template library called
SQLGEN, 30 which was
originally created to facilitate Sybase (server) and Macintosh 4th Dimension
client development. Two components of SQLGEN are particularly important for ACT/DB. One is an
upsizing tool that automatically converts an existing Access schema to an
operational Oracle schema. (Microsoft distributes an Access-to-SQL server
upsizing tool, but one for Oracle is not, as far as we know, commercially
available.) Our tool goes somewhat beyond Microsoft's in functionality. For
example, ACT/DB makes heavy use of sequentially generated long-integer unique
identifiers (UIs) as primary keys for most tables. (Microsoft Access has a
field data type called “autonumber,” which increments
automatically for each new record.) To port such tables, ACT/DB creates an
insert trigger on each table. This trigger fetches and uses the next available
sequence number whenever a new record is added. ACT/DB also automatically
generates referential integrity constraints. SQLGEN also facilitates searching of data through a query-by-form (QBF)
interface, where the user fills in one or more criteria matching the desired
data. The appropriate SQL is then generated and sent to the server. (Microsoft
Access's built-in QBF is powerful but not oriented to client-server or
networked operation. This is because for every field in the form, a set of
unique values is composed by a search of the data to be made available through
pull-down menus. This can consume a considerable amount of server capacity as
well as network bandwidth.) The details of the SQLGEN QBF facility are
described by Nadkarni et
al. 30Use of Enabling Technology Many of the interesting features of ACT/DB are possible because of enabling
technology that has become available only recently. To cite only a partial
list: - ACT/DB generates forms with GUI components whose visibility and appearance
can be programmatically controlled, and associates form component events
(e.g., double-clicking) with developer-defined routines. This was not possible
with early versions of Microsoft Access.
- For generating standard reports, transformation of EAV server data
(irrespective of data type) into client arrays that represent an orthodox
columnar view of data is simplified by the availability of the
“variant”
datatype,17 which
can store any kind of data except BLOB. Variants have become available only
recently in several languages on the Microsoft Windows platform. Their
availability provides the “typeless” convenience that LISP and
Smalltalk programmers have long taken for granted.
- Object linking makes it feasible to store, display, and edit efficiently
any kind of BLOB data without knowledge of their format or the
hardware/software codecs (compression-decompression modules) required to
manipulate them. It is possible to operate programmatically on the contents of
such objects if the object's parent application has been designed for
programmatic manipulation through an OLE client. We do not intend to do this
within ACT/DB, but freedom to do so might prove important in specialized
applications for handling particular kinds of BLOB data.
|
References 1. Rector A. Art and science: problems and solutions (editorial).
Methods Inf Med.
1996; 35:(3):
181-4. [PubMed]. 2. Cappelleri J, Ioannidis J, Schmid C, et al. Large trials vs
meta-analysis of smaller trials: how do their results compare? (review).
JAMA. 1996; 276:(16):
1332-8. [PubMed]. 3. Pampallona S. A model to control data flow in multicenter clinical
trials. Methods Inf Med.
1995; 34:(3):
283-8. [PubMed]. 4. Sim IGR. A trial bank model for the publication of clinical trials.
Proc 19th Annu Symp Comput App Med Care. Philadelphia, PA:
Hanley & Belfus, 1995:
863-7. 5. Blumenstein B. Verifying keyed medical research data. Stat
Med. 1993; 12:(17):
1535-42. [PubMed]. 6. The Inter-company Clinical Quality Assurance Working Group.
Computer validation: methods at investigator sites. Appl Clin
Trials. 1997;July:
36-40. 7. Stokes T. Computer systems validation, part 6: a survive-and-thrive
approach to audits and inspections. Appl Clin Trials.
1997;Aug: 40-4. 8. Marshall B, Hoffman S, Babadzhov V, Babadzhov M, McCallum R. The
Automatic Patient Symptom Monitor (APSM): a voice mail system for clinical
research. Proc 17th Annu Symp Comput App Med Care. New York:
McGraw-Hill, 1993: 32-6. 9. Carlson R, Tu S, Lane N, et al. Computer-based screening of
patients with HIV/AIDS for clinical-trial eligibility. Online J Curr
Clin Trials. 1995;Mar 28:Doc 179. 10. Pradhan E, Katz J, LeClerq S, West KJ. Data management for large
community trials in Nepal. Control Clin Trials.
1994;15:(3). 11. Othman R. Interactive database management (IDM). Comput
Methods Programs Biomed.
1995; 47:(3):
221-7. [PubMed]. 13. Oracle Clinical Version 3.0: User's Guide. Redwood
Shores, CA: Oracle Corporation, 1996. 14. Huff SM, Haug DJ, Stevens LE, Dupont CC. A PT.HELP the next
generation: a new client-server architecture. Proc 18th Symp Comput App
Med Care. Philadelphia, PA: Hanley & Belfus, 1994:
271-5. 15. Friedman C, Hripcsak G, Johnson S, Cimino J, Clayton P. A
generalized relational schema for an integrated clinical patient database.
Proc 14th Symp Comput App Med Care. Washington, DC: IEEE
Computer Society Press, 1990:
335-9. 16. Chueh HC, Barnett GO. Client-server, distributed database
strategies in a healthcare record system for a homeless population. J
Am Med Inform Assoc. 1994; 1:(2):
186-98. [PubMed]. 17. Niedner C. Use of SQL with an entity-attribute-value database.
MUG Quarterly. 1991;21:(3):
40-5. 18. Dwight J, Erwin M (eds). Using CGI (special
edition). Indianapolis, IN: Que Corporation, 1996. 19. Cimino JJ, Clayton PD, Hripcsak G, Johnson SB. Knowledge-based
approaches to the maintenance of a large controlled medical terminology.
J Am Med Inform Assoc.
1994; 1:
35-50. [PubMed]. 20. Johnson S, Cimino J, Friedman C, Hripcsak G, Clayton P. Using
metadata to integrate medical knowledge in a clinical information system.
Proc 14th Symp Comput App Med Care. Washington, DC: IEEE
Computer Society Press, 1990:
340-4. 21. Celko J. Everything you know is wrong. DBMS
Magazine. 1996;9:(9):
18-20. 22. Kimball R. The data Warehousing Toolkit. New York:
John Wiley & Sons, 1997. 23. Melton J, Simon AR. Understanding the new SQL: a complete
guide. San Mateo, CA: Morgan Kaufman, 1993. 24. Oracle Corporation. Oracle Version 7: PL-SQL Programmer's
Guide. Redwood Shores, CA: Oracle Corporation,
1995. 25. Nadkarni PM. Concept locator: a client-server application for
retrieval of UMLS Metathesaurus concepts through complex Boolean query.
Comput Biomed Res. 1997; in press. 26. Kannry JL, Wright L, Shifman M, Silverstein S, Miller PL.
Portability issues for a structured clinical vocabulary: mapping from Yale to
the Columbia Medical Entities Dictionary. J Am Med Inform
Assoc. 1996; 3:
66-78. [PubMed]. 27. Das AK, Musen MA. A temporal query system for protocol-directed
decision support. Meth Inform Med.
1994; 33:(4):
358-70. [PubMed]. 28. Das AK, Musen MA. A comparison of the temporal expressiveness of
three database query methods. Proc 19th Annu Symp Comput App Med
Care. Philadelphia, PA: Hanley & Belfus, 1995:
331-7. 29. Microsoft Access 97 User's Guide. Redmond, WA:
Microsoft Corporation, 1997. 30. Nadkarni PM, Cheung KH. SQLGEN: an environment for rapid
client-server database application development. Comput Biomed
Res. 1995; 28:(12):
479-99. [PubMed]. 31. Kimball R. Dealing with dirty data: the science of maintaining
clean data in your warehouse, and why nobody talks about it. DBMS
Magazine. 1996;9:(10):
55-6. 32. O'Neil P, Graefe G. Multi-table joins through bitmapped join
indices. SIGMOD Record.
1995;24:(3). 33. Bontempo CJ, Saracco CM. Accelerated indexing techniques.
Database Programming and Design.
1996;9:(July):
36-43. 34. Snodgrass RT, Ahn I, Ariav G, et al. TSQL2 language specification.
ACM SIGMOD Record
1994;23:(1):
65-86. 35. Campbell KE, Das AK, Musen MA. A logical foundation for
representation of clinical data. J Am Med Inform Assoc.
1994; 1:(3):
218-32. [PubMed]. |