Revised EDM Model Notes 18-25 February 1999 Rob Kennedy I) Collections API II) Using the Handle/Body idiom to minimize copies III) Root Branches and I/O --------------------------- I) Collections API The revised EDM incorporates much of the Collection/Predicate concepts developed to support the CDF Track and TrackCollection classes. The use-cases here provide a more generic description of the StorableCollection APIs, and explores the result of adding that API to the EventRecord itself. To avoid confusion with the CDFTrack design, I will use a MuonSet as the basis for the use cases. As with the CDFTrackCollection, each StorableCollection has, if appropriate: 1) forward iteration (no universal support for reverse iteration) 2) begin(), end() 3) Iterator::peek_ahead(), is_valid(), prefix and postfix operator ++() 4) insert(), append(), erase(), removeIf(), clear() 5) [](), front(), back(), size(), empty() 6) sort(), find(), print() 7) Accessors returning a const/non-const reference to an element 8) Support binary comparisons for element sorting 9) Support unary predicates for element selection StorableCollections store by value. Collections which store by reference and which do not own the objects referred are StorableCollectionViews. Collections which store by reference and which do own the objects referred to are StorableRefCollections. To support this in the EventRecord, StorableObjects must have methods to return their name, number, version (of object data format), CreatorId (Module class), AbsParmSetId (Module instance), and any other desirable selection or sorting criteria. EventRecord::RankByName : StorableObject::Comparison EventRecord::RankByNameNumber : StorableObject::Comparison EventRecord::RankByCreatorId : StorableObject::Comparison EventRecord::SelectAll(void) : StorableObject::Predicate EventRecord::SelectByName(string) : StorableObject::Predicate EventRecord::SelectByNumber(int) : StorableObject::Predicate EventRecord::SelectByVersion(int) : StorableObject::Predicate EventRecord::SelectByCreatorId(int) : StorableObject::Predicate EventRecord::SelectByAbsParmSetId(int) : StorableObject::Predicate Example 1) Create a StorableMuonSet class StorableMuonSet: public StorableSet ; // typedef StorableSet StorableMuonSet mygoldmus ; // BEGIN Loop over tracks, to determine if a muon candidate if (current_track.is_golden_muon_candidate()) { Muon tempmuon(current_track) ; mygoldmus.insert(tempmoun) ; } // END LOOP... mygoldmus now contains muons by value Example 2) Select Muons in a collection for (Iterator iter = mygoldmus.begin(Muon::PtGreaterThan(4.0)) ; iter != mygoldmus.end() ; ++iter) { // treat muons with Pt > 4.0 } Do we really want to approach selection by supplying arguments to begin()? Consider the alternative approach I implemented in Trybos: for (Iterator iter = mygoldmus.find_first(Muon::PtGreaterThan(4.0)) ; iter.is_valid() ; iter = mygoldmus.find_next(iter, Muon::PtGreaterThan(4.0))) { // treat muons with Pt > 4.0 } The Trybos approach seems to me to be clumsier. It has the advantage that the iterator is clearly defined in its role at all times, but then it does even less that a pointer since it does not iterate itself. I solved this with EventRecord iterators by creating two iterator classes. I could just as easily have used the CdfTrackCollection approach instead, permitting iteration using one iteration class over all banks, same name banks, or more exotic criteria. I can easily re-implement the old iterator classes with the one new-style iterator class. Example 3) Select Muons in a collection, capture as a view MuonSetView myhardgoldmus = mygoldmus.createView(PtGreaterThan(4.0)); Example 4) Sort Muons in a collection mygoldmus.sort(Muon::RankByPt) ; Example 5) Sort Muons in a view, underlying collection left undisturbed myhardgoldmus.sort(Muon::RankByPt) ; Example 6) Sort Muons in a collection, affects dependent view Suppose mygoldmus is initially by Pt... MuonSetView myhardgoldmus = mygoldmus.createView(PtGreaterThan(4.0)); mygoldmus.sort(Muon::RankByPt) ; Now, myhardgoldmus contains pointers to values which have changed. It is no longer in a well-defined state... especially if a Muon is dropped from the MuonSet. The view must be reformed in order to be in a usable state. This points out the advantages and limitations of Views. A View minimizes storage space for overlapping collections by sharing elements Views can also minimize the CPU time used to re-order elements in some data structures. They are most useful when a master collection by value is used as a static (for the lifetime of the dependent Views) repository of all elements. Example 6) Loop over distinct pairs of Muons in a collection for (Iterator iter1 = my_muon_set.begin(Muon::PtGreaterThan(4.0)) ; iter1 != my_muon_set.end() ; ++iter1) { for (Iterator iter2 = iter1.peek_ahead() ; // called copy_next() in Trybos iter2 != my_muon_set.end() ; ++iter2) { // treat distinct muon pairs, for muons with Pt > 4.0 } } To make this work, then, the peek_ahead() method must transfer both the data representing the next value of iter1 and the iteration predicate. The CDFTrack implementation of Iterator stores a pointer to the predicate, so this is not a problem. ------------------------------------------------------------------------------- II) Using the Handle/Body idiom to minimize copies Suppose there is class Muon which contains data members of built-in data type and a one-way association to a class Track. What options are there to make this class storable with an efficient insert/retrieve operations? Option 1) Adapter class - StorableMuon (described in previous notes) Option 2) Minimize copies with SmartRCPointer and StorablePointer ------------- First, some related observations: Track mytrack(...) ; StorableTrack mystorabletrack(mytrack) ; StorableLink mylink = p_record->insert(mystorabletrack) ; // results in a copy // StorableLink mylink = p_record->insert(&mystorabletrack) ; If I pass the address of the object to insert, then I am referring to a stack variable that will go out of scope soon. So I cannot create handles on the fly using this minor change to the API, unless the user always allocates the StorableObject on the heap. How can I guarantee that this is done??? -------------------------------------- StorableHandle myhandle(mystorabletrack) ; p_record->insert(myhandle) ; Again, I have a problem of insuring that the StorableObject is actually allocated on the heap and not on the stack. -------------- Option 1 results in a copy from Muon instance to StorableMuon instance, and then a copy from local StorableMuon to recorded StorableMuon. -------------- Option 2 minimize copies Suppose the Muon class contains a smart StorablePointer instance: template class StorablePointer { EventRecord* _p_record ; // TRANSIENT TYPE* _p_item ; // TRANSIENT ObjectNumber _number ; static ObjectName name(void) { return(ObjectName("TYPE")) ; } // or equivalent stream_read() ; stream_write() ; activate(TYPE*) ; deactivate(ObjectNumber) ; // Can have _p_item active, _number active, or both // _p_record is NULL: object pointed to is not stored, so _number inactive } ; class Muon { StorablePointer _p_track ; activate(void) ; deactivate(const ObjectId&) ; } The non-pointer data members of Muon never have to be copied. The transient to storable transition is simply an in-place translation of the pointers to dumb data (ObjectId or ObjectNumber). The storable to stored transition is simply the insertion of one smart pointer into the event record (and setting the write-lock flag). The stored to transient transition re-activates the pointer.... Track aTrack(...) ; // Track derived from StorableObject SmartRCPointer p_muon = new Muon(&aTrack, ...) ; // p_muon._p_track._p_item is set, _number is undefined // Override operator->() to "inherit" Muon API float pt = p_muon->pt() ; // Store the track ObjectId trackId = p_record->insert(aTrack) ; // This insert() allocates a new track and copies aTrack's data members // Set the p_muon._p_track._number p_muon->deactivate(trackId) ; { _p_track.deactivate(trackId) ; } ObjectId muonId = p_record->insert(p_muon) ; // SmartRCPointer does *not* inherit from StorableObject, but it does inherit // from a base class StorableObjectPointer. This avoids a template member // function insert(), a method with much code (?), but generic operations. // This insert() knows it just copies the smart pointer, not the muon // The record now owns the memory at p_muon RecordIterator muonIter = p_record->find_first("Muon") ; SmartRCPointer p_muon(muonIter) ; // Override operator->() to "inherit" Muon API // Leave p_track as a link (null pointer) float pt = p_muon->pt() ; p_muon->activate() ; { RecordIterator trackIter = p_record->find(p_muon->p_track().objectId()) ; _p_track.activate(trackIter) ; } The method activate(void) is simple enough, but deactivate() must receive the ObjectIds of the stored objects, so its signature may vary... virtual bool activate(void) = 0 ; virtual bool deactivate(const ObjectId* p_objectId = 0, const int n_objectId = 0) = 0 ; It may be desirable to have an activate() and deactivate() in Muon for each StorablePointer, to allow users to selectively re-activate a muon. Suppose, for instance, I want to activate the track, but do not want to activate a pointer to a MCTruth object (if there were such a data member in Muon). This cannot be enforced by the type system, since signatures will vary. Muon::activateTrack(void) ; Muon::deactivateTrack(const ObjectId&) ; Another question is whether we will have to store the ObjectName with each pointer, or use a trick to recover the name information via the template argument or RTTI. Ideally we would store the ObjectName as a cross-check, but with arbitrary length ObjectName == class name, this can consume quite a bit of storage space. // The EventRecord is a restricted list of StorableObjectPointers. // SmartRCPointer is derived from StorableObjectPointer, which contains a // StorableObject pointer. SmartRCPointer p_muon = new Muon(&aTrack, ...) ; ObjectId muonId = p_record->insert(p_muon) ; bool EventRecord::insert(const StorableObjectPointer& p_item) { _list.insert(p_item) ; return(true) ; } There is a conflict here. On the one hand, I want to store StorableObjectPointers in the EventRecord, generic things that do not have the API of the items to which they point. On the other hand, I want to have SmartRCPointer handles which have the API of the thing they point to since they overload operator->(). Unless TYPE derives from StorableObject, these two classes may not be translatable to each other. In particular, how do I set the StorableObject* _p_object in StorableObjectPointer from the TYPE* in SmartRCPointer? I may have to use the constrained template parameter trick in Barton & Nackman's "Scientists and Engineering C++" book. Or, I can use run-time tests. If so constrained, I should rename these to GenericStorableObjectPointer and SpecificStorableObjectPointer. class GenericStorableObjectPointer: public SmartRefCntPointer { StorableObject* _p_generic_object ; constructor(const SpecificStorablePointer& p_specific_object) { _p_generic_object = static_cast(p_specific_object) ; } operator *() ; operator ->() ; // StorableObject API only... objectName(), objectNumber() bool is_valid(void) const ; // Smart RefCnt pointer machinery } ; template class SpecificStorableObjectPointer: public SmartRefCntPointer { TYPE* _p_specific_object ; constructor(const GenericStorablePointer& p_generic_object) { _p_specific_object = dynamic_cast(p_generic_object) ; } // TYPE must be child of StorableObject for dynamic_cast to work operator *() ; operator ->() ; // TYPE's API available bool is_valid(void) const ; // Smart RefCnt pointer machinery } ; Finally, suppose I find aMuon and activate its link to aTrack. In order to do this, without creating a transient aMuon, I would have to modify the stored aMuon object, violating the strict policy that objects are read-only once stored. We should consider this policy to refer to the logical state of the objects in the EventRecord, not the physical state. PointerLinks will be modified in stored objects, but the modifications do not change the logical state of the links, only the specific data members used to implement the links. ------------------------------------------------------------------------------- III) Root Branches and I/O ROOT Branches are only manipulated at event data I/O. Only the input and output modules manipulate branches using input and output lists as guidance. Users may manipulate the I/O lists, but do not manipulate branches directly. So, we have only to amend the input and output filtering policy to account for Branch organization in our EDM. Input filtering: An BranchInputList is consulted by the input module to determine which ROOT branches are to be read in. It is possible to specify "all branches" in the input list. The specified branches are read in. Then all objects and classes of objects referred to in the ``drop upon input'' ObjectList are removed from the EventRecord by AC++. The ``actual input'' ObjectList is then created which lists all the branches and classes which were input. The association of classes to branches is also stored in the actual input list. Output filtering: Only objects referred to in the ``object output'' ObjectList are written to a particular stream. Each output stream has its own such list. Each list is initialized with the ``actual input'' ObjectList which includes the branch association for all entries. References to branches, classes, or object instances in a ``drop upon output'' list are then removed from the ``object output'' list. References to objects or classes of objects in a ``keep upon output'' list are then added to the ``object output'' list. The keep list includes the branch association for each entry. Only those branches, classes, or object instances that remain in that list are written to the output stream.