(Slide #1) Institutional Repositories in the Physical Sciences: Enhancing Searchability Walter Warnick, Ph.D., Director DOE Office of Scientific and Technical Information It's an honor to address the Federal Depository Library Conference again this year. (Slide #2) Ten days ago, a former professor of mine passed away. You may well have heard of him-Stephen Ambrose. He wrote over 30 books about American history; several became best sellers, including "Undaunted Courage" about the Lewis and Clark expedition. I studied American History under Professor Ambrose at Johns Hopkins 35 years ago, long before he became famous. Even at that early stage of his career, Ambrose was a remarkable teacher. He had an uncanny ability to make history come alive. I bring this up here because Ambrose had something especially noteworthy to say about the community of folks represented in this room. Throughout his career, he often said, and I quote, "My favorite people are archivists and librarians." Note he did not say that his favorite people were his fellow historians. Nor did he say that his favorite people were the great leaders and heroes about whom he wrote books. Ambrose would go on to explain that it was only by the work of archivists and librarians that historians like him could access the source material so essential for their work. Archivists populate the public domain. Librarians make it retrievable. Well, I cannot claim to be a librarian, but the organization I lead at the Department of Energy has an important role to play in archiving. OSTI is the one place in all of government responsible for archiving all the scientific and technical output of DOE and predecessor agencies going all the way back to the Manhattan Project. And, we also play a role in making information retrievable. Among other things, OSTI provides DOE documents to librarians at Depository Libraries. So, I am proud, finally, to include myself and my close colleagues both here and at DOE among Professor Ambrose's favorite people. (Slide #3) Securing our Future. While Professor Ambrose enjoyed fabulous success, he did have detractors, just as many of us have. Ambrose was sometimes chastised by his fellow historians for "Triumphalism," which is the portrayal of the U.S. as a great country achieving great things and vanquishing its foes. Some historians prefer that scholarly works of history be more dispassionate. Well, today, given the current threats to our homeland security, we could well make use of a few more triumphs and vanquishing of our foes. And here, too, the community represented in this room plays a key role, for the war on terror is a perfect example of the essential role that science and technical knowledge plays in securing our future. What underlies our military advantages in the war on terror? Is it the superior ferocity of our soldiers? Is it the superior intellect of our generals? Perhaps, but we have no real evidence to support these notions. Rather, our advantages stem, in major part, from superior technology. And what is technology? Isn't it simply the physical expression of scientific and technical knowledge? Just as the military has specialists--the army to project force on land and the navy to project force on the seas--the information community has specialists, too. At OSTI, we are scientific and technical information specialists, and our role is to project the FORCE OF KNOWLEDGE. In the words of our enabling legislation, "The dissemination of scientific and technical information relating to energy should be permitted and encouraged so as to provide free interchange of ideas and criticism which is essential to scientific and industrial progress...." (Slide #4) Premiums of Encouragement. This concept is hardly new. I have noticed that whenever I come upon an idea I think is particularly profound, I can usually go back and find it addressed by Thomas Jefferson, only he expressed the idea with unequalled eloquence. Jefferson wrote, "The value of science to a republican people, the security it gives to liberty by enlightening the minds of its citizens, the protection it affords against foreign power, the virtue it inculcates, the just emulation of the distinction it confers on nations foremost in it; in short, its identification with power, morals, order and happiness (which merits to its premiums of encouragement rather than repressive taxes), are considerations that should always be present and bear with their just weight." (Slide #5) A Public Good. What OSTI does for DOE, our sister organizations do for their agencies. For example, the archivist and information disseminator for the Department of Defense is the Defense Technical Information Center. Similarly, the Department of Agriculture has the National Agricultural Library. Altogether, there are at least ten such organizations in government. They are distinguised one from another by the disciplines they pursue. For DOE, such disciplines include the great bulk of physics, materials, and chemistry, as well as portions of biology, environmental sciences and nuclear medicine. Each of these agencies has a mission to conduct research, and some major fraction of the research output is suitable for the public domain. Freely disseminated information coming out of research fits the definition of what economists call a public good. Agencies give their information away, but that does not mean that they are left with less of it. Indeed, the information has value only if it is shared with those who can use it. Science progresses only if knowledge is shared. While information is oftentimes a public good, it is not always so. Over a period of decades and even centuries, some manifestations of information have assumed the status of a private good, much like a commonplace commodity. Copyright laws bestow upon certain kinds of works the status of a private good. Private information goods have value to the owner only if the owner PREVENTS free dissemination. The information technology revolution--which is the internet--is causing the demarcation between public information goods and private information goods to be reexamined. Already, laws in Europe have been revised, and revision of laws in America is being seriously considered. While copyright is an important topic, I do not intend, however, to review it here. I leave that to another forum. In addition, copyright protection is not the only means by which information can become a private good. All that is necessary for the triumph of information as a private good is to prevent its free dissemination. (Slide #6) Collecting the Knowledge. In recent years, Professor Ambrose often remarked how technology has changed and facilitated scholarly research. Today, to help us in the war between knowledge and ignorance, we are now blessed with a revolutionary new tool--the Internet. Far more than any other DOE entity, OSTI's job is to use the Internet to share knowledge essential to progress. We are deploying tools for the Internet as fast as we can. Here is how the information system works in DOE. From its research contracts and grants, the Department requires deliverables which come to OSTI for further dissemination. In addition, such grants and contracts often produce journal articles, which OSTI does not disseminate, as they are protected by copyright. Rather, OSTI merely announces such articles and helps to make the announcements searchable. Announcement is entirely consistent with copyright laws. Both non-copyright protected documents produced through DOE and announcements of journal articles are made available to the public through GPO and the National Technical Information Service (NTIS). (Slide #7) DOE promotes energy production through applied R&D. Just as energy production requires oil wells and gas wells and wind mills to produce energy, and a distribution system such as pipelines or transmission lines to move energy to individual users, information requires a production system, too. We need INFORMATION PRODUCERS. In the case of DOE, they are researchers at Labs and universities. We need INFORMATION DISTRIBUTORS, which include OSTI and folks in this room. We have PATRONS who eagerly put the information to use. Thus, we in this room are part of our own production system; we are all engaged in Cognitive Production. We at OSTI provide 4 important "Pipelines" through GPO Access: Information Bridge www.osti.gov/bridge (which hosts 70,000 technical reports issued since January 1995 in full text and searchable by every word), Energy Citations Database www.osti.gov/energycitations (which hosts 2 million bibliographic citations going all the way back to the Manhattan Project), Graylit Network www.osti.gov/graylit (which includes full text technical documents from DOD, EPA, NASA, and DOE), and Federal R&D Project Summaries www.osti.gov/fedrnd (which hosts descriptions of research in progress at NSF, NIH, and DOE). We know we are doing our job because there were 450,000 full-text documents, over 15 million pages, downloaded this year from the Information Bridge alone. This figure does not include single page views, nor does it include downloads of announcements of journal articles. If upward trends continue, we estimate that there will be about 10 million information transactions during 2003 on OSTI web tools. We are making progress but we want to do better. (Slide #8) New Directions for Federal Information. Opportunities abound for increasing the public understanding of the Department's contribution to science. We seek to advance information finding tools. Undoubtedly, a well known finding tool today is Google. Too often, people believe that Google is the only finding tool anyone needs. As all of us in this room know, this is misguided. Few people outside our own information community realize that vast information resources are not accessible via familiar search engines. Such resources take the form of huge databases, like PubMed at NIH, which reside in the Deep Web. Content from such databases is accessed only through search engines accompanying those databases. The content is not accessible by web crawlers such as used by familiar search engines. The Deep Web is estimated to be orders of magnitude larger than the collection of web pages that make up the Surface Web, which is accessible via web crawlers. To access various sources on the Deep Web, another kind of finding tool is needed. OSTI is using Distributed Search to improve retrievability and discovery from the Deep Web. In anticipation of the needs of web patrons, familiar search engines like Google crawl the Surface Web and create a massive index. When a patron uses these tools, the search is done against this massive index. Distributed Search works much differently. When a patron launches a Distributed Search, his or her inquiry is forwarded directly to multiple databases in real time, where the search engines of each of those databases are triggered. The search results from each database are returned to the Distributed Search engine, where they are compiled for the patron. We are utilizing the Distributed Search across many agencies to make the most important federal information collections accessible and less difficult to find. OSTI, working with Deep Web Technologies, deployed one of the first, if not THE first, government Distributed Search system when, on April 21, 1999, we launched EnergyFiles. Since then, Distributed Search has been more broadly deployed. Most recently, it is a featured tool on Science.gov. Science.gov was the topic of a presentation here on Monday by Gail Hodge. I encourage you to try it. There is tremendous potential for further application of Distributed Search and we are deploying it further as I speak. (Slide #9) A Searchable Gateway to Preprint Servers. The information business is in a state of rapid change. Among the most recent developments is the emergence of Institutional Repositories, whereby institutions, mostly universities and research departments, post documents and other output created at the institution. In disciplines of concern to DOE, the great bulk of this output is government funded. Seeing this evolution, even before the phrase "Institutional Repository" was coined, we at OSTI set out to help make these sites searchable. The PrePRINT Network provides access to these Repositories, and makes them searchable. Really, the term preprints does not do justice to the Network, as it includes preprints, post prints, and never prints-any kind of relevant STI hosted by the Institution. In disciplines of concern to DOE, we have identified about 10,000 Institutional Repositories, which we think is a fairly comprehensive collection. The Network integrates 10,000 isolated islands of information into a searchable whole, much as if they were all part of one gigantic journal. Right now, the site is undergoing an upgrade, which will be completed soon. The PrePRINT Network is an example of the new directions OSTI is taking. I encourage you to check it out for yourself. (Slide #10) Parallel Archiving Opportunities. Institutional Repositories are gaining momentum. They will become more and more useful as they adopt standards, such as those championed by the Open Archives Initiative (OAI). There is now agreement on metadata tagging standards that make the contents of Institutional Repositories harvestable by others. Software is now freely available for interested parties to build tools and thus become Service Providers. Here, too, opportunities abound. Currently dozens and dozens of OAI projects are being launched all over the world. OAI is not limited just to Institutional Repositories. OSTI and other government information organizations are ideally positioned to use the OAI as both a federal repository (which OAI terms a Data Provider) and as a user resource (which OAI terms a Service Provider). Already, OSTI has successfully adopted OAI in a proof-of-principle exercise. If OSTI were to fully adopt OAI, we would need to make each of our databases compliant with the OAI protocol. Such a task is doable. The Information Bridge is now being moved to Dublin Core format, which is a major step. As we at OSTI continue to fulfill our mandate to make DOE research information available, our function as the pipeline to this information will continue to evolve. We at OSTI are eager to shape the future, reassured that, as we do more and more, we continue to advance the goals of archivists and librarians. We are confident that my former Professor Stephen Ambrose would continue to regard us among his favorite people. (Slide #11) This month, DOE is celebrating its 25th Anniversary. Please visit the DOE web site http://www.energy.gov to learn more about what DOE has done.