METHODS OVERVIEW OF METHODOLOGY The application of the critical incident technique to the problem of defining the impact of MEDLINE had several methodological requirements. First, an appropriate sample of MEDLINE users had to be defined. The sample had to contain both professionals who search MEDLINE themselves in order to obtain information (the so-called "end users") and persons who have searches carried out for them by a trained search intermediary. And the sample had to be broad enough to encompass the widest possible range of professional activities and, hence, the widest possible range of reasons for searching. Specific procedures for sampling and recruiting the desired interviewees had to be defined. The second requirement was the development of an interview protocol that would elicit the desired information on why the search was carried out, how it was carried out, what the results of the search were, and what was the impact of the information obtained as a result of the search. A particular issue was that the questions and probes in the interview protocol had to elicit very detailed and specific information, including highly technical details, in order to enable the analysis of each search report from several different perspectives, and to insure that the report was clear and unambiguous. A related problem was the determination of the specific form in which the interview narrative was to be captured in a written "incident report" that would constitute the actual raw data for subsequent analysis. The "frames of reference" from which the data were to be analyzed also had to be defined. Each frame of reference involved looking at the incident with specific question or purpose in mind and constructing an inventory or outline (often referred to as a "taxonomy") within which all of the incident reports could be classified. In the present study, it was eventually decided that there were three distinct frames of reference that were of interest. These were (1) "Why was the information needed?", (2) "How did the information obtained impact the decision-making of the individual who needed the information?", and (3) "How did the information obtained impact the outcome of the clinical or other situation that occasioned the search?". Once the three frames of reference had been distinguished, procedures had to be specified for carrying out the analysis and taxonomy development. Interspersed among these other requirements were the need to define the procedures to be used to train and evaluate the interviewers, procedures for carrying out and monitoring the interviews, both those conducted by AIR and by regional medical library staff, procedures for investigating the validity of interviewer's translation of the interview into the written incident report, and procedures for investigating the accuracy of certain of the facts reported in the interview concerning the execution of the MEDLINE searches. The verification of the interview documentation involved the preparation of a sample of verbatim interview transcripts and their detailed comparison with the corresponding incident reports. The investigation of the accuracy of the interviewee's reports of details of the searches involved the retrieval and analysis of MEDLINE "traffic logs" for the end user searches reported. Traffic logs are the electronic records of MEDLINE searches that are maintained for a relatively brief time by NLM. The limitations of this verification procedure also had to be defined, in terms of the numbers of searches for which matched logs could potentially be obtained, and in terms of the specific criteria to be used in evaluating the correspondence of the incident reports and logs. And finally, a special effort was undertaken to try to shed light on the searches reported to have been ineffective from the users perspective -- why the desired results had not been obtained and what might have been retrieved. This required the definition of the procedures to be used by an expert searcher in carrying out a search with the same purpose, and, more importantly, the procedures to be used for comparing the results of the end user and the expert searches. The following sections describe how each of these aspects of the study was carried out. SAMPLE Sample selection. The population of interest for this evaluation consists of physicians and others who search MEDLINE, either themselves (the so-called "end users") or via a trained intermediary ("mediated users"). For a critical incident study, the goal in establishing the sample is to incorporate the range of diversity of perspectives and experience necessary to insure that the incidents collected are comprehensive of all of the types of impact that information obtained via MEDLINE might produce. It is not necessary that the sample be randomly selected from the population of interest; in fact, it is often important that certain types of experience, and hence certain types of individuals, be over-represented. In any event, while quantitative data were gathered that can be used to describe the incidents and the sample from which the incidents were collected, these data, weighted or unweighted, do not yield unbiased estimates of the characteristics of the entire population of MEDLINE users, and it would be inappropriate to draw such inferences from these data. The intent in this study was to collect approximately 1,200 reports of MEDLINE searches constituting "critical incidents" in the sense that they were helpful or not helpful to the user in carrying out his/her professional activities. Since experience suggested that individuals will generally provide 2-3 incidents each, in a wide variety of contexts, it was anticipated that approximately 600 interviewees would be required. Further, it was intended that the final sample should consist of approximately two-thirds (400) persons who are end users, and one-third (200) who typically request mediated searches. The assistance of the University of Texas and UCLA Regional Medical Libraries was sought in identifying mediated users who could be interviewed. Construction of the sample of "end users" began with a listing of all 4,311 persons identified as potential or probable NLM "end users" as of August 1, 1987. In the fall of 1987, these individuals were surveyed by NLM concerning their MEDLINE use. Of the 2,037 who responded and indicated a willingness to participate in further studies of MEDLINE, 563 were randomly selected to be invited to participate in the present study -- 80 in the pretest phase and 483 in the main study. End users are currently being added to NLM's system at the rate of over 200 per month. In order to include such recent additions and thus insure that some persons with less searching experience would be included, a second sample of 200 was drawn randomly from the group of 2,135 individuals identified as NLM "end users" involved in direct patient care or biomedical research, and who were issued access codes between August 1, 1987 and September 30, 1988. None of these individuals had been contacted previously by NLM regarding their MEDLINE use. Of these "new users," 20 were selected for the pretest phase and 180 more for the main study. This sample was drawn using the same algorithm as used for the survey respondents. In addition to the initial sample of 200 "new users," a sample of 20 individuals was randomly selected from among the 114 "new users" employed in health sciences education. This brought the total number of "new users" invited to participate in the study to 220. The results of the pretest interviews suggested that the number of physicians in a simple random sample of MEDLINE end users would be sufficient to insure enough incidents related to patient care to support development of a comprehensive taxonomy of outcomes in this area, and hence no oversampling of end users with an M.D. degree was undertaken. The pretest also suggested that a participation rate of approximately 60% of the survey respondents and 40% for new users could be expected. The number of end user invitees for the entire study was thus set at 763 (563 of the survey respondents and 220 new users), with the goal of achieving the desired final sample of approximately 400 end users, consisting of 320 survey respondents and 80 new users. Samples of "mediated users" were obtained in a somewhat similar manner. From their records of persons who had requested two or more literature searches within the preceding 11 1/2 months, the University of Texas (UTexas) Regional Medical Library selected all 128 of those with an M.D. degree who were at all involved in patient care, plus 60 others (40 MDs, MD/PhDs, or PhDs involved only in research and 20 nurses or other health professionals) in order to create a sample of approximately 188 potential interviewees, from which it was hoped that 90-100 actual interviews would be obtained. In this case, since the number of physicians requesting two or more searches was relatively low, they were oversampled. The UCLA Regional Medical Library canvassed 21 community hospitals in their region and requested names and contact information from each hospital for 10 persons who had requested literature searches in the preceding month (or two months if necessary to obtain 10 names). An explicit protocol was given to each hospital to insure a random selection when more than 10 names were available. From these lists, provided by 20 of the hospitals, a sample of 199 individuals was selected to be invited to participate, in the expectation that half (100) would actually be interviewed. The majority of these individuals were expected to be physicians in community-based practice. Recruitment. In the case of the end user sample, invitation letters were sent over the signature of Elliot Siegel, Ph.D., Director of Planning and Evaluation/NLM, to 563 survey respondents and 219 new users. The letter (see Appendix A) described the purpose of the study and the importance of their participation. For the UCLA-selected sample of mediated users, the letter also mentioned the name of the librarian in their local hospital who had provided their name. Included in both cases were response forms for the invitees to return directly to NLM to indicate whether or not they were willing to participate and optimal times for their interview. A second mailing was sent to invitees who did not respond within two weeks, and there was subsequent phone follow-up of nonrespondents until the cutoff date. Recruitment of the UTexas sample of mediated users included an invitation letter essentially the same as that sent to the survey respondent/end user sample except that it was sent out under the signature of Ms. Jean Miller, the Director of the Library, with a copy of a letter of support from Dr. Siegel. Instead of a mail-back response form, the interviewer telephoned each invitee to schedule a phone interview. Response rates. Of the 782 invitation letters mailed by AIR to 563 survey respondents and 219 new users, a total of 494 response forms were returned. Of those that were returned, 8 were inappropriate to the sample (librarians) and 48 were "regrets." Of those agreeing to an interview, eleven (11) ultimately declined to be interviewed, leaving a total of 427 potential interviewees, or a total agreement rate of 55% of the known eligible invitees. Of the 427 potential interviewees, 361 were actually interviewed prior to the cutoff date. Of the 199 invitation letters mailed to UCLA invitees, 125 response forms were returned. Of the total, one was ineligible for the study and 7 were "regrets." The remaining of 117 potential interviewees yield an agreement rate of 59% of known eligible invitees. Of the 117 possible interviewees in the UCLA sample, 110 were actually interviewed prior to the cutoff. The remaining 74 persons in the NLM and UCLA samples who had responded and agreed to an interview were called repeatedly, both at their preferred times and subsequently, without being able to make contact. Hence the ultimate interview rate for all 972 known eligible persons in the sample to be interviewed by AIR was 471/972, or 48%. Of the 188 UTexas invitees, 41 declined to be interviewed. Fifty were never reached by telephone or declined to be interviewed when contacted at the scheduled time. Sixteen letters were returned as undeliverable. A total of 81 individuals eventually were interviewed, an interview success rate of 43%. In total, 552 interviews were completed. This is 48% of the total known eligible interviewees (N=1160). DATA COLLECTION PROCEDURES Protocol development. An early pilot test version had been developed and used by NLM in order to evaluate the feasibility of the study and aid in its design. The initial protocol for the present study utilized this early version but incorporated revisions based on input from AIR and NLM. These new protocols included a standard introduction to be read by interviewers, with interview forms tailored to the nature of the search (end user or mediated) and to the nature of the incident, i.e., a search deemed by the interviewer to have been effective or ineffective in terms of its impact on their professional activities. An initial pretest was then conducted by AIR staff from the NLM offices in October 1988. During this effort, which involved 38 interviews, NLM and AIR closely monitored the effectiveness of the protocols. As a result of this initial pretest, some immediate changes were made in the protocol. A total of 58 pretest interviews then were completed from the NLM and AIR offices, and after preliminary analysis of the resulting data, more substantial changes to the protocol were incorporated because it was evident to AIR and NLM that there was often a lack of information, especially on the outcome of the situation, but also on the exact information being sought or how the information obtained affected the individual's decisions and actions. The changes were primarily made to insure that interviewers would obtain sufficient detail and specificity of information on all of the key aspects of the incident to allow for the creation of three different taxonomies. To do this, additional questions and suggested probes were added, and their importance and use was stressed in the interviewer training. The revised protocols, as used in the subsequent data collection, are contained in Appendix B. In this protocol, the respondent was asked to recall specific MEDLINE searches--either effective or ineffective from their point of view--that had a significant effect on their professional activities. For each such search, a series of open-ended questions was then asked, designed to elicit a detailed description of the situation that occasioned the search, what information was being sought, how the search was carried out, what information was obtained, how it was helpful or (if appropriate) what information not obtained would have been more helpful, how the information (or lack thereof) affected the individual's decision-making, and what was the final outcome of the situation precipitating the search. For each incident/search, a series of precoded questions was asked to insure uniform data on such items as when, where, and by whom the search was conducted, the setting in which the need for information arose, and certain attributes of the search process. An additional series of precoded questions about the interviewee also appeared on each incident report form, but these questions were only asked once of each interviewee. They requested specific facts on the individual's professional activities, work setting, and size of the community served by their hospital or practice, and on the nature and extent of their experience in searching MEDLINE. Interviewer training. Procedures for training the critical incident interviewers included familiarization with the critical incident technique, discussion of the specific application at hand, supervised practice interviewing, and careful review of samples of written records of subsequent interviews. Familiarization with the critical incident technique included a review of Flanagan's initial paper describing the technique and its uses,3 a review of previous applications of the technique to the health professions, and discussion of the general principles that apply to all applications. During interviewer training, the discussion of the use of the critical incident technique in the current study of MEDLINE usage focused on the goals of the study and how they were related to both the introduction and the questions on the critical incident protocol. Supervised practice interviews were monitored by a trainer who listened to the interview in person or to a tape recording of the interview. The trainer then reviewed the written incidents produced, edited them, and discussed them with the interviewer with the goal of improving both the questioning and the written record. An AIR staff interviewer spent three days in Dallas training University of Texas RML staff in the manner described above. Initial interviews and written incidents were critiqued on-site, and AIR staff subsequently compared tape recordings with written reports for 25% of the interviews conducted in Texas and provided detailed guidance and feedback. In order to insure their familiarity with MEDLINE searching, and prior to conducting any interviews, AIR interviewers attended the three-day NLM MEDLINE training offered at the Regional Medical Library in Omaha, Nebraska. Interviewing procedures. All of the non-pretest interviews were conducted between January 16 and April 28, 1989. AIR interviewers used the response forms from invitees to schedule telephone contact times, doing their best to accommodate the interviewees' schedules. At the outset of each interview, the interviewer introduced him/herself using a standard introduction. Permission to tape-record the interviews was requested and was granted in virtually all cases. The interviewer then proceeded to ask the interviewee to describe, in detail, instances in which information obtained via a MEDLINE search was especially helpful or not helpful in their professional practice, and recorded the interviewee's responses on the interview forms. Interviewers also recorded, on a separate sheet, any miscellaneous comments that were offered about experiences with MEDLINE or suggestions for improving the service. Reports of additional such searches were requested, and the interview proceeded until the interviewee had no significant searches to report or could not continue due to time constraints. In general, the interviewers experienced a high degree of cooperation from the individuals contacted. Interviewees provided more numerous effective (i.e., helpful) than ineffective (i.e., not helpful) incidents, despite efforts to encourage them to relate any ineffective experiences with MEDLINE. Many initially felt that ineffective incidents would not be of interest to the study because they felt the problems they experienced were due to their own inexperience or ineffectiveness using MEDLINE. However, they were encouraged to report any such MEDLINE experiences that had an impact on their professional activities regardless of what they believed to be the cause of the problem. To insure the quality of the interviews and the written incident reports, AIR and NLM instituted the procedure of listening to a sample of the tape recordings (5% of the AIR and 20% of the UTexas interviews) and comparing them with incident reports. This was concentrated at the beginning of the interviewing, so that useful feedback could be given. Complete transcripts of 6 interviews were created and compared with the incident reports in order to assess the level of accuracy and completeness of the reports. Appendix C contains the transcripts of two randomly selected interviews, along with the incident reports resulting from these interviews. The only apparent error or inconsistency between the first report and its transcript occurred in item 14 on the back of the form, where the interviewer chose to record that the interviewee had previously accessed MEDLINE via mediated searches, which was stated, but should have checked "Other" in response to the interviewee's statement that he had used Knowledge Index. One apparent error in the first incident resulting from the second interview is the substitution of "nutritional support" for "nutritional work" in the response to the question about what information was needed. There is a slight difference in meaning (the latter suggesting a broader type of information), but it has no special significance for the use of the data. A second error seems to have been made in indicating that the first search reported was done in the hospital library. The respondent is a family practice resident at a large metropolitan medical center and medical school, but has rotations to a hospital and clinic in a rural area some distance away. The first search was done at the medical school library, not the local hospital; the second does appear to have been done at the hospital. Finally, the interviewer appears to have assigned the respondent's location incorrectly to a small SMSA on both incident reports. It should have been recorded as "non-metro 50+ K," rather than "SMSA <100 K" since the town is not part of the very large SMSA in which the medical center is located, or as "SMSA 1+ million." There is some inherent ambiguity in that the respondent is located in the large SMSA at times, and it is unclear whether the Crohn's patient was seen there or in the smaller community. In general the level of inconsistency between tapes/transcripts was small and feedback was given to interviewers on the specific errors and types of discrepancies identified. Data processing. Following the interview, the interviewer expanded the notes taken into a complete incident report, referring to the tape recording as needed. Those interviewers able to do so prepared the final text of the incident report as a word processing document; all other interviews were typed into a word processing system by a typist from the interviewer's handwritten draft. Typed copy of all incidents was affixed to the original data collection form, proofed by the interviewer, corrected and revised, and the final draft reviewed by the Project Coordinator or Project Director for inclusion in the analysis. In the process of this review 8 incidents were identified as too vague or incomplete to be included and were discarded. A computerized database consisting of all invitees, each with a unique study ID number, was set up at the start of the study. When a response form was returned, the receipt date was logged in and the respondent was assigned to an interviewer. When the interview was completed and written up, the interview date, interviewer number and number of incidents obtained were entered into the computer database. The backs of the incident forms were checked for completeness, photocopied, and sent to a keytaping service in batches, where they were keyed to an ASCII file on floppy disk. Fields were included for NLM ID number; AIR ID number; survey respondent, new user, UCLA or Texas respondent group; data collection round and mailing wave assignment, as well as other identifying data, to facilitate tracking of progress and analysis. DATA ANALYSIS Critical incident data. The primary analysis of the critical incident data consisted of the qualitative analysis of the incident text in order to create three taxonomies--hierarchically organized inventories--of (1) the reasons why information is sought from MEDLINE, (2) the effects of the information obtained on the decisions and actions of the originator of the search, and (3) the ultimate impact of having (or not having) the desired information on the outcome of the situation that occasioned the search. An analysis also was carried out of the reasons given for choosing to do a MEDLINE search instead of or in addition to asking colleagues, consulting textbooks, or searching the individual's own reprint files. The initial analysis of the critical incident data was designed to help clarify the frames of reference to be used in developing the taxonomies. This involved the creation of three brief (one-sentence) statements summarizing each incident report, one for each of the three different frames of reference being considered: (1) why the individual needed the information, (2) the impact of the information on the decision-making of the individual, and (3) the impact of the information on the outcome of the situation. For any given frame of reference, those statements that were essentially identical were placed together, but the basic aim was to maintain as many potentially useful distinctions among the statements as possible, and to organize these into an outline form so that the detailed statements could be grouped in successively broader categories in a functional manner. Successive sets of statements were then examined, noting those that were unique and those that were identical to previously sorted statements. Unique statements were added at appropriate points in the developing outline, which led to periodic reorganization of the outline structure and number of levels. Eventually, statements were not actually written for every incident, but the process of grouping similar incidents and organizing the dissimilar sets continued until all incidents had been incorporated within a single outline. Both the effective and the ineffective incidents were used in construction of the three taxonomies. In the first taxonomy--why the information was needed--there is no useful distinction to be made between the two types. It is only when the impact of not getting the desired search result is considered that any issue arises. In the case of the taxonomy of decision-making impact, the impact of ineffective searches could be stated, but in the negative, e.g., "was unable to decide whether surgery or medical treatment was more appropriate; had to obtain the information elsewhere." Such statements were typically the reverse of corresponding positive impacts, and both types of incidents were classified together, with the resultant statement being written in the positive form. In the two instances in which the search was ineffective and only a negative outcome was reported, the resultant statement was written in the negative. It is theoretically possible that a search could have been judged to be ineffective and the resultant lack of information could have been pivotal in causing an adverse medical outcome for a patient, the use of an ineffective research protocol, etc. However, the fact is that physicians, researchers, and other professionals typically make every effort not to allow the failure to get information from a particular source to determine what happens to their patients, their research, and so on. And as a result, the impact of an "ineffective" search was either to cause the individual to pursue some other (possibly more onerous) avenue to get the information or to decide to redo the search later, with no discernible impact of the lack of information (or the delay) on the outcome of the situation that generated the search. Such reports were therefore treated as though they had "no outcome" and did not contribute to the taxonomy. Had any such untoward outcomes attributable to ineffective searches been observed, they would have been highlighted as noted above. A few searches termed "ineffective" were really ones in which nothing was retrieved, from which the searcher concluded that the desired information, case reports, etc., did not exist. This was treated, in classification, as the retrieval of information, and its impact on the outcome was handled in the same manner as for so-called "effective" searches. In some other "ineffective" instances, some useful citations were retrieved and had a beneficial effect, even though the search did not accomplish everything the user wanted. In such cases, also, the incident was treated in the same manner as effective incidents with similar impact. Respondent and search characteristics. A second type of analysis consisted of straightforward tabulations and cross-tabulations of the information recorded on the back of each incident form--information concerning the search and information concerning the interviewee. Data concerning respondent characteristics were examined separately for the end user (AIR), mediated user-UTexas, and mediated user-UCLA-community samples. Data concerning characteristics of the searches were examined separately for searches conducted by the respondent personally (end user searches) and those done by an intermediary, typically a medical librarian (mediated searches). It should be noted that a few "end users" reported on searches done for them by an intermediary and, conversely, a few "mediated users" reported on searches they had carried out themselves. Verification. A third type of analysis was aimed at verification of the data collection process and the incident reports. This had two components: (1) transcription of tapes of a small proportion of the interviews and comparison of the transcripts with the incident reports (described above), and (2) comparison of the MEDLINE transaction logs with the incident reports for incidents occurring within a time window where this was possible. The comparison of incident reports with transaction logs required advance permission from the respondent to examine their logs, permission which was requested in the initial letter and was forthcoming in all cases. The AIR ID codes for persons interviewed at the end of a given time period (three-week intervals for Round 1 and two-week intervals for Round 2) were provided to NLM where they were matched to NLM access codes, and NLM traffic files reflecting search activity for each of the access users over the preceding 15-week period were retrieved. Fifteen weeks is the usual time period for which traffic files are retained at NLM. All search activity during the time window for each user was identified, and an attempt was made by NLM staff to match each incident report with a corresponding search recorded in the traffic log. The transaction logs that were exact or possible matches for incident reports were forwarded to AIR for further analysis. Of the 704 end user searches examined (among the total of 762), 304 occurred outside the time window for which transaction logs were available. Of the remaining 400 incidents, 188 (47%) were matched by NLM staff to a corresponding search in the transaction log; 185 (46%) were not matched to a corresponding search; and 27 (7%) could not be matched precisely due to ambiguity in the incident report (e.g., an author search, but with no specific name provided). Of the 185 non-matched incidents, 75 (41%) were reported to have occurred within the previous three months, but may in fact have taken place somewhat earlier, outside the time window for the available transaction logs; 12% were carried out on a system other than NLM's; and 29 (15%) were not matched due to a variety of clerical, processing and recording reasons. Sixty of the incidents (32%) occurred well within the time window, and the failure to match remains unexplained. Possible explanations, other than inaccurate recollection of the substance of the search incident, include the possibility that the search may in fact have occurred on a non-NLM system, that the respondent may have done the search under a different access code, or that some of the transaction log data were missing. A sample of 135 of the matched incident reports and corresponding transaction logs were compared by a professional librarian on a number of objective features of the search in order to determine the relationship between the features as reported by the respondent and those reflected in the log. The features examined included the following: use of MeSH headings, textwords, Boolean operators, and search qualifiers (English only, reviews only, backfiles and time periods, humans only); and whether the search was carried out iteratively. In addition, a more subjective judgment was made by the librarian during the comparisons as to whether the incident report was an exact or essentially accurate description of the search that occurred, whether there were some differences but the report was accurate in all important respects affecting its interpretation for the present purpose, or whether there were differences significant enough to cast doubt on the match or the validity and usefulness of the incident report. Analysis of reports and logs of ineffective searches. The analysis of ineffective searches was carried one step further. For 135 such searches, matched or possibly matched transaction logs were located. Two professional librarians reviewed each incident report. In the verification process 10 logs were determined not to be matches. The analysis of the remaining 125 concentrated on what the respondent said he/she was looking for, how the respondent recalled the search having been carried out, and the results obtained. The librarian analysts then conducted their own MEDLINE search aimed at getting the desired information and compared the new results with those reported by the respondent. For each such comparison a conclusion was drawn as to whether the respondent had obtained essentially all of the available citations, i.e., whether the search had been conducted properly even if the results had been ineffective in the sense of not meeting the respondent's needs. If the respondent's search had not obtained all the available citations, the analysts went on to determine what it was about the searcher's strategy or the execution of the search that had caused it to fail to locate available and apparently relevant citations. These search-by-search judgments were then summarized across searches in order to identify common search problems.