Revised EDM Model Notes
                               18-25 February 1999
                                   Rob Kennedy


I) Collections API
II) Using the Handle/Body idiom to minimize copies
III) Root Branches and I/O

---------------------------

I) Collections API

	The revised EDM incorporates much of the Collection/Predicate concepts
developed to support the CDF Track and TrackCollection classes. The use-cases
here provide a more generic description of the StorableCollection APIs, and
explores the result of adding that API to the EventRecord itself. To avoid
confusion with the CDFTrack design, I will use a MuonSet as the basis for the
use cases.

As with the CDFTrackCollection, each StorableCollection has, if appropriate:

	1) forward iteration (no universal support for reverse iteration)
	2) begin(), end()
	3) Iterator::peek_ahead(), is_valid(), prefix and postfix operator ++()
	4) insert(), append(), erase(), removeIf(), clear()
	5) [](), front(), back(), size(), empty()
	6) sort(), find(), print()
	7) Accessors returning a const/non-const reference to an element
	8) Support binary comparisons for element sorting
	9) Support unary predicates for element selection

StorableCollections store by value. Collections which store by reference and
which do not own the objects referred are StorableCollectionViews. Collections
which store by reference and which do own the objects referred to are
StorableRefCollections.

To support this in the EventRecord, StorableObjects must have methods to return
their name, number, version (of object data format), CreatorId (Module class),
AbsParmSetId (Module instance), and any other desirable selection or sorting
criteria.

EventRecord::RankByName       : StorableObject::Comparison
EventRecord::RankByNameNumber : StorableObject::Comparison
EventRecord::RankByCreatorId  : StorableObject::Comparison

EventRecord::SelectAll(void)        : StorableObject::Predicate
EventRecord::SelectByName(string)   : StorableObject::Predicate
EventRecord::SelectByNumber(int)    : StorableObject::Predicate
EventRecord::SelectByVersion(int)   : StorableObject::Predicate
EventRecord::SelectByCreatorId(int) : StorableObject::Predicate
EventRecord::SelectByAbsParmSetId(int) : StorableObject::Predicate


Example 1) Create a StorableMuonSet

class StorableMuonSet: public StorableSet ;  // typedef StorableSet<Muon>
StorableMuonSet mygoldmus ;

// BEGIN Loop over tracks, to determine if a muon candidate
  if (current_track.is_golden_muon_candidate())
    { Muon tempmuon(current_track) ;
      mygoldmus.insert(tempmoun) ;
    }

// END LOOP... mygoldmus now contains muons by value

Example 2) Select Muons in a collection

for (Iterator iter = mygoldmus.begin(Muon::PtGreaterThan(4.0)) ;
     iter != mygoldmus.end() ;
     ++iter)
  {
    // treat muons with Pt > 4.0
  }

	Do we really want to approach selection by supplying arguments to
begin()? Consider the alternative approach I implemented in Trybos:

for (Iterator iter = mygoldmus.find_first(Muon::PtGreaterThan(4.0)) ;
     iter.is_valid() ;
     iter = mygoldmus.find_next(iter, Muon::PtGreaterThan(4.0)))
  {
    // treat muons with Pt > 4.0
  }

The Trybos approach seems to me to be clumsier. It has the advantage that the
iterator is clearly defined in its role at all times, but then it does even less
that a pointer since it does not iterate itself. I solved this with EventRecord
iterators by creating two iterator classes. I could just as easily have used the
CdfTrackCollection approach instead, permitting iteration using one iteration
class over all banks, same name banks, or more exotic criteria. I can easily
re-implement the old iterator classes with the one new-style iterator class.


Example 3) Select Muons in a collection, capture as a view

MuonSetView myhardgoldmus = mygoldmus.createView(PtGreaterThan(4.0));

Example 4) Sort Muons in a collection

mygoldmus.sort(Muon::RankByPt) ;

Example 5) Sort Muons in a view, underlying collection left undisturbed

myhardgoldmus.sort(Muon::RankByPt) ;

Example 6) Sort Muons in a collection, affects dependent view

Suppose mygoldmus is initially by Pt...

MuonSetView myhardgoldmus = mygoldmus.createView(PtGreaterThan(4.0));
mygoldmus.sort(Muon::RankByPt) ;

	Now, myhardgoldmus contains pointers to values which have changed. It
is no longer in a well-defined state... especially if a Muon is dropped from
the MuonSet. The view must be reformed in order to be in a usable state. This
points out the advantages and limitations of Views. A View minimizes storage
space for overlapping collections by sharing elements Views can also minimize
the CPU time used to re-order elements in some data structures. They are most
useful when a master collection by value is used as a static (for the lifetime
of the dependent Views) repository of all elements.


Example 6) Loop over distinct pairs of Muons in a collection


for (Iterator iter1 = my_muon_set.begin(Muon::PtGreaterThan(4.0)) ;
     iter1 != my_muon_set.end() ;
     ++iter1)
  {
    for (Iterator iter2 = iter1.peek_ahead() ; // called copy_next() in Trybos
         iter2 != my_muon_set.end() ;
         ++iter2)
      {
        // treat distinct muon pairs, for muons with Pt > 4.0
      }
  }

To make this work, then, the peek_ahead() method must transfer both the data
representing the next value of iter1 and the iteration predicate. The CDFTrack
implementation of Iterator stores a pointer to the predicate, so this is not a
problem.

-------------------------------------------------------------------------------

II) Using the Handle/Body idiom to minimize copies

	Suppose there is class Muon which contains data members of built-in data
type and a one-way association to a class Track. What options are there to make
this class storable with an efficient insert/retrieve operations?

Option 1) Adapter class - StorableMuon (described in previous notes)
Option 2) Minimize copies with SmartRCPointer and StorablePointer

-------------

First, some related observations:

Track         mytrack(...) ;
StorableTrack mystorabletrack(mytrack) ;
StorableLink  mylink = p_record->insert(mystorabletrack) ; // results in a copy
// StorableLink  mylink = p_record->insert(&mystorabletrack) ;

If I pass the address of the object to insert, then I am referring to a stack
variable that will go out of scope soon. So I cannot create handles on the fly
using this minor change to the API, unless the user always allocates the
StorableObject on the heap. How can I guarantee that this is done???
                            --------------------------------------
StorableHandle myhandle(mystorabletrack) ;
p_record->insert(myhandle) ;

Again, I have a problem of insuring that the StorableObject is actually
allocated on the heap and not on the stack.

--------------

Option 1 results in a copy from Muon instance to StorableMuon instance, and then
a copy from local StorableMuon to recorded StorableMuon.

--------------

Option 2 minimize copies

Suppose the Muon class contains a smart StorablePointer instance:

template<class TYPE>
class StorablePointer
{ EventRecord* _p_record ; // TRANSIENT
  TYPE*        _p_item ;   // TRANSIENT
  ObjectNumber _number ;
  static ObjectName name(void) { return(ObjectName("TYPE")) ; } // or equivalent
       stream_read() ;
       stream_write() ;
  activate(TYPE*) ;
  deactivate(ObjectNumber) ;
  // Can have _p_item active, _number active, or both
  // _p_record is NULL: object pointed to is not stored, so _number inactive
} ;

class Muon
{ StorablePointer<Track> _p_track ;
  activate(void) ;
  deactivate(const ObjectId&) ;
}

    The non-pointer data members of Muon never have to be copied. The transient
to storable transition is simply an in-place translation of the pointers to dumb
data (ObjectId or ObjectNumber). The storable to stored transition is simply the
insertion of one smart pointer into the event record (and setting the write-lock
flag). The stored to transient transition re-activates the pointer....


Track aTrack(...) ; // Track derived from StorableObject
SmartRCPointer<Muon> p_muon = new Muon(&aTrack, ...) ;
// p_muon._p_track._p_item is set, _number is undefined
// Override operator->() to "inherit" Muon API

float pt = p_muon->pt() ;

// Store the track
ObjectId trackId = p_record->insert(aTrack) ;
// This insert() allocates a new track and copies aTrack's data members

// Set the p_muon._p_track._number
p_muon->deactivate(trackId) ;
  {
    _p_track.deactivate(trackId) ;
  }

ObjectId muonId = p_record->insert(p_muon) ;
// SmartRCPointer does *not* inherit from StorableObject, but it does inherit
// from a base class StorableObjectPointer. This avoids a template member
// function insert(), a method with much code (?), but generic operations.
// This insert() knows it just copies the smart pointer, not the muon
// The record now owns the memory at p_muon

RecordIterator muonIter = p_record->find_first("Muon") ;
SmartRCPointer<Muon> p_muon(muonIter) ;
// Override operator->() to "inherit" Muon API
// Leave p_track as a link (null pointer)
float pt = p_muon->pt() ;

p_muon->activate() ;
  {
    RecordIterator trackIter = p_record->find(p_muon->p_track().objectId()) ;
    _p_track.activate(trackIter) ;
  }


        The method activate(void) is simple enough, but deactivate() must
receive the ObjectIds of the stored objects, so its signature may vary...

virtual bool activate(void) = 0 ;
virtual bool deactivate(const ObjectId* p_objectId = 0,
                        const int       n_objectId = 0) = 0 ;

        It may be desirable to have an activate() and deactivate() in Muon for
each StorablePointer, to allow users to selectively re-activate a muon. Suppose,
for instance, I want to activate the track, but do not want to activate a
pointer to a MCTruth object (if there were such a data member in Muon). This
cannot be enforced by the type system, since signatures will vary.

Muon::activateTrack(void) ;
Muon::deactivateTrack(const ObjectId&) ;

        Another question is whether we will have to store the ObjectName with
each pointer, or use a trick to recover the name information via the template
argument or RTTI. Ideally we would store the ObjectName as a cross-check, but
with arbitrary length ObjectName == class name, this can consume quite a bit of
storage space.


// The EventRecord is a restricted list of StorableObjectPointers.
// SmartRCPointer<TYPE> is derived from StorableObjectPointer, which contains a
// StorableObject pointer.
SmartRCPointer<Muon> p_muon = new Muon(&aTrack, ...) ;
ObjectId muonId = p_record->insert(p_muon) ;

bool EventRecord::insert(const StorableObjectPointer& p_item)
  {
    _list.insert(p_item) ;
    return(true) ;
  }

        There is a conflict here. On the one hand, I want to store
StorableObjectPointers in the EventRecord, generic things that do not have the
API of the items to which they point. On the other hand, I want to have
SmartRCPointer<TYPE> handles which have the API of the thing they point to since
they overload operator->(). Unless TYPE derives from StorableObject, these two
classes may not be translatable to each other. In particular, how do I set the
StorableObject* _p_object in StorableObjectPointer from the TYPE* in
SmartRCPointer? I may have to use the constrained template parameter trick in
Barton & Nackman's "Scientists and Engineering C++" book. Or, I can use run-time
tests. If so constrained, I should rename these to GenericStorableObjectPointer
and SpecificStorableObjectPointer<TYPE>.

class GenericStorableObjectPointer: public SmartRefCntPointer
  { StorableObject* _p_generic_object ;
    constructor(const SpecificStorablePointer<TYPE>& p_specific_object)
      { _p_generic_object = static_cast<StorableObject*>(p_specific_object) ;
      }
    operator *() ;
    operator ->() ;  // StorableObject API only... objectName(), objectNumber()
    bool is_valid(void) const ;
    // Smart RefCnt pointer machinery
  } ;

template <class TYPE>
class SpecificStorableObjectPointer: public SmartRefCntPointer
  { TYPE* _p_specific_object ;
    constructor(const GenericStorablePointer& p_generic_object)
      { _p_specific_object = dynamic_cast<TYPE*>(p_generic_object) ;
      } // TYPE must be child of StorableObject for dynamic_cast to work
    operator *() ;
    operator ->() ;  // TYPE's API available
    bool is_valid(void) const ;
    // Smart RefCnt pointer machinery
  } ;

    Finally, suppose I find aMuon and activate its link to aTrack. In order to
do this, without creating a transient aMuon, I would have to modify the stored
aMuon object, violating the strict policy that objects are read-only once
stored. We should consider this policy to refer to the logical state of the
objects in the EventRecord, not the physical state. PointerLinks will be
modified in stored objects, but the modifications do not change the logical
state of the links, only the specific data members used to implement the links.

-------------------------------------------------------------------------------

III) Root Branches and I/O

        ROOT Branches are only manipulated at event data I/O. Only the input and
output modules manipulate branches using input and output lists as guidance.
Users may manipulate the I/O lists, but do not manipulate branches directly. So,
we have only to amend the input and output filtering policy to account for
Branch organization in our EDM.

Input filtering: An BranchInputList is consulted by the input module to
determine which ROOT branches are to be read in. It is possible to specify "all
branches" in the input list. The specified branches are read in.  Then all
objects and classes of objects referred to in the ``drop upon input'' ObjectList
are removed from the EventRecord by AC++. The ``actual input'' ObjectList is
then created which lists all the branches and classes which were input. The
association of classes to branches is also stored in the actual input list.

Output filtering: Only objects referred to in the ``object output'' ObjectList
are written to a particular stream. Each output stream has its own such list.
Each list is initialized with the ``actual input'' ObjectList which includes the
branch association for all entries. References to branches, classes, or object
instances in a ``drop upon output'' list are then removed from the ``object
output'' list.  References to objects or classes of objects in a ``keep upon
output'' list are then added to the ``object output'' list. The keep list
includes the branch association for each entry. Only those branches, classes, or
object instances that remain in that list are written to the output stream.