Workshop on Data Derivation and Provenance

Chicago, October 17-18, 2002

Organizing Committee: Peter Buneman, Ian Foster


Provenance and data derivation are important to many aspects of scientific computation. In molecular biology, where data is repeatedly copied, corrected, and transformed as it passes through numerous genomic databases, understanding where data has come from and how it arrived in the user's database is of crucial to the trust a scientist will put in that data, yet this information is seldom captured properly. In astronomy, useful results may have been been obtained by filtering, transforming, and analyzing some base data by a complex assemblage of programs, yet we lack good tools for recording how these programs were connected and the context in which they were run.

The importance of provenance goes well beyond verification; it is closely related to archiving and annotation, also important in the context of scientific data. Moreover it may be used in data discovery. Knowing the provenance of a data item may help the biologist to make connections with other useful data. The astronomer may want to understand a derivation in order to repeat it with modified parameters, and being able to describe a derivation may help a researcher to discover whether a particular kind of analysis has already been performed.

The purpose of this workshop is to bring together a group of researchers who have confronted these issues either in specific situations or in the development of generic principles and technology. The workshop is informal and will consist of a mixture of presentations and discussions.

Provisional Schedule and Agenda

The workshop will start with coffee and stuff at 8am Thursday October 17th and end 4pm Friday October 18th. We will arrange a group dinner Thursday and, if there is interest, on Friday also. The location is the Sheraton hotel (address below), Superior 1 and 2.

Thursday, October 17th
    08:00               Coffee etc.
    09:00-10:30    Introduction and scene-setting talks; review goals and format
    10:30-11:00    Break
    11:00-12:30    Session 1: Requirements and applications in the biological sciences

    14:00-15:30    Session 2: Provenance and annotations
    15:30-16:00    Break
    16:00-17:30    Session 3: Workflow and derivation
    17:30-18:30    Discussion

Friday, October 18th
    08:30-10:00    Session 4: Requirements and applications in other sciences
    10:00-10:30    Break
    10:30-12:00    Session 5: Archiving and versioning

    13:30-14:30    Open mike
    14:30-16:00    Discussion to synthesize conclusions

Position Papers are now available online (they are still coming in...)

Location and Accommodation Information (NB: Hotel Block Only Good Until September 26th)

The meeting will take place at the same hotel as the Global Grid Forum, i.e.
    Sheraton Chicago Hotel & Towers
    CityFront Center
    301 East North Water Street
    Chicago, Illinois, USA

We have negotiated a group rate as follows:
    Single or Double Occupancy $155.00 US
    Triple Occupancy $185.00 US
    Quad Occupancy $215.00 US

a) If you are attending the Global Grid Forum meeting, then register in the usual way at http//www.gridforum.org/Meetings/ggf6/

b) If you are only attending the workshop (which is the case for most of you), then book directly by calling 877-242-2558 (or 312-464-1000) and indicate that you with the ANL/Data Workshop. (NOTE: This should now be working, despite earlier difficulties.)

Participant List

Malcolm Atkinson, U. Glasgow
Bruce Barkstrom, NASA
Raj Bose, UC Santa Barbara
Peter Buneman, U. Edinburgh
Maria Cláudia Reis Cavalcanti, UFRJ, Brazil
Rick Cavanaugh, U. Florida
Vassilis Christophides, FORTH
Umeshwar Dayal, HP Labs
Ewa Deelman, USC Information Sciences Institute
Ian Foster, Argonne/U.Chicago
Peter Fox, National Center for Atmospheric Research
Mike Franklin, UC Berkeley
Jim Frew, UC Santa Barbara
Rob Gardner, U.Chicago
Michael Gertz, UC Davis
Carole Goble, U. Manchester
Greg Graham, FermiLab
Bill Howe, Oregon Graduate Institute
Yannis Ioannidis, University of Athens
Sanjeev Khanna, U. Pennsylvania
Christoph Koch, Edinburgh
Michael Lesk, Bell Labs
Miron Livny, U. Wisconsin Madison
David Maier, Oregon Graduate Institute
Natalia Maltsev, Argonne National Laboratory
Bob Mann, U. Edinburgh
Marta Mattoso, UFRJ, Brazil
Jim Myers, Pacific Northwest National Laboratory
Norman Paton, U. Manchester
Carmen Pancerella, Sandia National Laboratory
Dave Pearson, Oracle UK
Larry Rahn, Sandia National Laboratory
Joel Saltz, Ohio State University
Alex Szalay, John Hopkins University
Wang-Chiew Tan, UC Santa Cruz
Jens Voeckler, U. Chicago
Mike Wilde, Argonne National Laboratory
Yong Zhao, U. Chicago

Thanks to our Sponsors

www.griphyn.org www.nsf.gov www.sc.doe.gov/ascr/mics