From David Alexander September 4, 2003 Hello All, In response to the phone meeting today, as Joseph suggested, I'm putting down some ideas to start an email discussion. I think David's paper is one of the most comprehensive attempts at a definition of dataset that speaks to an application developer who wants to design a piece of software that uses datasets. I hope I'm not being to bold to suggest some semantic changes (and maybe one new idea) to the defining properties of a dataset to spark more off-line discussion. I wanted to (1) see if I understand what David Adams' definition of a dataset is, and (2) hopefully clarify it for others. *********************************************** DEFINING PROPERTIES OF A DATASET: If an object has all the following 4 properties (the last two are optional), then the object is defined as a dataset. 0. Unique Identifier 1. Data Description 2. Dataset Reference 3. Essential Provenance 4. Additional Production Details (Optional) 5. Additional Metadata (Optional) ********************************************* Where the changes are: 0. Same as David's except uniqueness emphasized. 1. "Content" alone was too confusing with the data itself. In fact, David refers to "content description" in the Mapping paragraph, so I like "Data Description" here to distinguish it as indeed being meta-data. 2. Here I merged of David's "2. Location" and "3. Mapping" into "Dataset Reference". What I mean by "Dataset Reference" is either a data location map (which is the combining of David's two ideas) or a catalog reference (which is the new idea). There are two lines of reasoning here. The first is that if the mapping is non-trivial, then the location list by itself isn't very useful for getting particular records or sub-sets of data from the dataset, so why not combine them into a "location map", which will always include the locations in addition to the more complicated mapping. The second line of reasoning is that for datasets that are indeed virtual and do not physically exist before requests there *is* no location and one can only get the dataset from a catalog (or some other generating engine I suppose). Reference is then a generic term to mean either "location map" or "catalog", but the main point is that with the reference you can actually get the data. 3. Just added the word "essential" to distinguish it more from what David was calling "History". 4. Emphasized here that the history is really about production details. Any other types of information can go in the catch-all metadata. 5. Here "Additional" reminds us that all the properties are really meta-data, but there are other types of meta-data to include. Is this any better? Does it make sense? dave _________________________________________________ David Alexander desk: 303-448-7751 Tech-X Corporation fax: 303-448-7756 Boulder, CO email: alexanda@txcorp.com (303)448-7751 http: www.techxhome.com _______________________________________________ Ppdg-idat mailing list Ppdg-idat@ppdg.net http://www.ppdg.net/mailman/listinfo/ppdg-idat