Event Data Model Working Group (Informal) Meeting 06 September 2000 Last Editted: 19 Sept 2000 Rob Kennedy In attendance: Rob K, Marge S, Jim K, Marc P, Liz S-K, Rick S. I) Overview of Current EDM Issues A) I produced an EDM-oriented response to the HLO/Physics Object Review document, which Marc Paterno has posted. A number of EDM issues brought up in the HLO/Physics Objects Review have been dealt with as recommended, or as time permits. This includes the typedefs requested in Handle and StorableContainers (also now in EventRecord), improvements to Link to permit clients to be decoupled from objects to which they contain a Link, and documentation. An EDM FAQ (in progress) which answers most of the basic questions folks have asked recently. The FAQ topics will all be filled in soon, the content ported to LaTeX, the HTML reproduced with latex2html, and examples linked in which are part of a (not-yet-existent) EDM validation suite. Once this is stable, I will continue to work on the Intro document which is still incomplete. Discussion: Marc P. recommended looking into the use of "tth" as an alternative to latex2html. He uses it for the Special Projects web pages. B) const-safety: numerous instances in Offline software have been found where code is capable of changing values of objects in the event. Most of these (not all) have been traced an interface and design flaw of the EDM's Link class. A constructor is defined which takes a ConstHandle as an argument, but then one can access a non-const pointer by way of the Link::operator->(). I have performed numerous experiments recently with modifying Link, Handle, and EventRecord classes recently in order to uncover where these flaws are in Offline, and what can be done in the EDM design to discourage them in the future. Discussion: This has led to a proposal to redesign some of the EDM internals. See below for a brief introduction. C) A number of people have suggested combining Handle, EventIter, and perhaps Link classes together to simplify the manipulations required to get access to objects in an event. The combination of Handle and EventIter is not well regarded in design discussions where C++ experts find the two concepts too different to be one entity. I have looked into these combinations to see if the user code really is simplified, and what impact they would have on the EDM and Offline code. Discussion: In the new proposal, ConstLink is a sub-class of ConstHandle, since it adds the ability to store/restore its state. This keeps Handle and ConstHandle symmetric (there is no Link class in the proposal since it requires an undesirable const_cast to restore its state). D) How a RefVector's objects is handled is not considered desirable. Currently, they are not placed in the event until prewrite(). So algorithms which expect to search for this elements will work if the RefVector is read in, but may fail to find the objects if the RefVector was just created. The ideal solution IMHO would be for EventRecord::append() to trigger the RefVector append its contents in the event as soon as it is appended. I have not found a desirable code or design approach which accomplishes this. It has also been suggested that all containers should hide their contents, storing their contents in separate compartment in the event. This has a problem with algorithms that expect to work on CdfTracks, but which will work or not depending on how the CdfTracks are put into the event: individually or as part of the CdfTrackCollection. Discussion: Marc P. suggested trying an template append coupled with traits, as are used in the C++ Stdlib, in order to differentiate an append of a non-composite object and an append of a composite object (which then an append its components to the event immediately). There was not discussion of where the components would go, since that matter is independent of the immediate append issue. E) EventRecord::findOne/All<>: A proposal for this is in the Physics Objects Review document. I have not yet developed a more detailed proposal. F) Polymorphism: Handles, as they are now, are difficult to use with polymorphic algorithms. Fixes to this are included in new Handle classes used in const-safety tests. If one writes an algorithm that can work with "any" CdfJet class, then one should be able to write that algorithm based on ConstHandles to the CdfJet Base class (whatever it is actually called). G) postread/prewrite elimination. We found during I/O benchmarking some months ago that a substantial fraction of the CPU consumed in an I/O intensive process was consumed in object transformations occurring in the postread/prewrite methods. In this way, the transformations could not be controlled by the user, and seriously threatened our ability to process data at the expected rates in several points in the Production Farms. We decided to pursue a long-term goal of eliminating the postread/prewrite calls altogether. In order to accomplish this: 1) Links must be restorable without a postread() call 2) RefVectors must store their contents without a prewrite()call 3) All other Offline code in postread/prewrite be migrated to other class methods invoked by user-controllable AC++ modules Discussion: Marge S. stated that she would strongly prefer and API changes to occur as soon as possible, even if the implementation does not change to match right away. In other words, as soon as we decide on how to achieve (G1), we should force the change in the ConstLink API right away even if we still call postread() to do the actual restore of links. This makes sense as a matter of migration to no postread/prewrite calls anyway, since not all objects may be ready for this change even though the new EDM APIs are ready. H) Version numbers in StreamableObjects. Lightweight StreamableObjects cannot afford the data space to contain version numbers. Examples are Id, CdfHit, and so on. Heavyweight StreamableObjects clearly can afford a version number, which in ROOT code is the two-byte data type Version_t. However, all of the support that ROOT provides for version numbers, in ClassDef(), ClassImp(), and Streamer utilities, have to be recreated to simplify the effort and enforce a style that folks will be familiar with from working with StorableObjects. I do not think there is any technical challenge here, just of a lack of staff-time to work on StreamableObject and the macros needed. This issue takes on more importance now that the HLO/Physics Object Review has tentatively recommended that all Physics Objects be Streamable (Later discussions seemed to conclude that the fundamental points was they should all at least have the same level of indirection when used in a collection, not necessarily all be StreamableObjects). Discussion: The issue of concern, though, is of large collections of lightweight objects such as a CdfHitCollection == ValueVector. There is a coupling problem if CdfHitCollection expects ValueVector to know about its version number. Rob K. suggested that perhaps we can have ValueVector treat version numbers afterall, leaving it up to the CdfHitCollection to set that version number in order to reduce the coupling issue to CdfHitCollection-CdfHit. With a special (and default) value of '0', the ValueVector will stream itself out as it does now. With a greater value, it will stream out the version number and transmit it the Streamer of the contained element, CdfHit. Liz S-K will look into this again. I) Are associations amongst objects copied absolutely or relatively? I think this topic died for lack of an alternative to current design/policy. Links are copied absolutely. I have proposed a means to copying associations relatively, when that is desired, and which can treat complicated association structures. Developers who want the relative behavior while compressing or trimming events outside of the ROOT gzip-a-branch approach have to treat this issue as they think best. Discussion: Part of the criticism of links being copied absolutely, as we discussed, is that this is a change of behavior relative to Run I code. Bank numbers played an equivalent role to links in Run I, and could be re-used if so desired. This had the benefit of allowing summary objects to easily maintain relative associations, but the negative consquence that absolute associations were easily and often broken. We should resolve this soon, as summary objects and collections are being defined now. II) An EDM Design Change Proposal The main thrust of this proposal (still being refined and under testing) is to close existing const-safety loopholes in links, enable polymorphic treatment of handles, simplify common user interactions with the EDM, and reduce the amount of redundant code in the EDM. I have tried a number of approaches in large test releases to test what is feasible with an "acceptable" level of disruption to Offline code (a matter for debate, of course). 1) Eliminate the GenericConstHandle class. A typedef to ConstHandle is provided for backward compatibility. This has proven to require just a few extra #include "Edm/GenericConstHandle.hh" in code outside the EDM. 2) Re-implement the StorableContainers to store ConstLinks. Use a collection_type typedef (as recommended by the HLO Review) to insure that client code, once it uses the typedef, will no longer have to change if StorableContainer's underlying container changes in small ways. 3) Eliminate the Link<> class in favor of the ConstLink<> class. All user code referring to or containing a Link<> will have to be changed to use a ConstLink<>. StorableContainers are re-implemented to use ConstLink<>, as mentioned. 4) Revise the interfaces of Handle, ConstHandle, and ConstLink to be const-safe and to permit polymorphism with method template constructors and assignment. 5) Re-implement Handle, ConstHandle to be reference-counting smart pointers using a new reference count hidden in StorableObject. This will mean that code which now extracts const pointers from handles, which seems safe to do though not recommended, will become defeat the reference-counting mechanism. 6) As a result, re-organize the Handles as: CountingHandle AppendableHandle\0 | AppendableConstHandle\0 | | | ^ ^ ^ /-\ /-\ /-\ | | | | |------------------------| | | | | | Handle ConstHandle | ^ /-\ | (no more Link<> class) ConstLink The reference-counting protocol is implemented by: StorableObject CountingHandle -------------------------- -------------------------- 1) attach_reference() 1) attach() 2) detach_reference() 2) detach() (can call destroy()) 3) reference_count() 3) pointer to TYPE derived from StorableObject 4) size_t _reference_count The initialization/assignment/conversion relationships are now: Handle ^ | V TYPE* -> Handle | | ConstEventIter(== ConstHandle*) | | | | V V v v const TYPE* -> ConstHandle <-> ConstLink <- const TYPE* ^ ^ | | V V ConstHandle ConstLink CountingHandle at present is absorbed into Handle<> and ConstHandle<>, but I do not think it has to be. CountedBody (a potential base class of StorableObject) must be absorbed into StorableObject, I think, to insure that StorableObject is still the base class for all things that can go directly into the EventRecord. The AppendableHandle classes are there to implement a polymorphic EventRecord::append(), which may be replaced by a template append in the future. 7) In order to untangle physical dependencies and coupling, remove ConstIterator from the EventRecord class to become the ConstEventIter class. Provide a typedef from EventRecord::ConstIterator to the new ConstEventIter. 8) As a result of the above, EventRecord copies are simply copies of ConstHandles. No more StorableObject::clone() methods are required or will be used by the EDM. The EventRecord destructor and clear() methods are much simpler as they only need to deal with handles now, not the direct destruction of objects. Users also no longer need to destroy an object if it is not put into the event... just let the handle they have go out of scope and the reference counting will take care of the rest. 9) On-demand Link restoration: Support for the on-demand restoration begins with the EventRecord::Streamer() storing the event's location in a PrivateChunnelA (a private data tunnel, so to speak, which can only be accessed by two specific classes, EventRecord and ConstLink). Inside ConstLink<>'s Streamer, that location is cached for use in restore(). Whenever one dereferences a link or called restore() directly, that link is restored if it has not been already. ITEMS ALSO BEING CONSIDERED... but not yet thought out yet..................... 10) Consider making ConstEventIter templated (as are ConstHandle and ConstLink) and its usefulness with findOne<> and findAll<>. 11) Consider using template append methods for Handle<> and ConstHandle<>, eliminating the need for the handle abstract base classes. To avoid code bloat, these append<> might use a common implementation method. The new append(ConstHandle<>&) is harmless and may help with storing StorableContainers (which are implemented by ConstLinks). Define a CompositeObject/NonCompositeObject trait to allow append() to store the contents of StorableContainers. Discussion: This may seem like a large change to the EDM, but much of it is change to the internal design and implementation of the EDM. The most visible changes to user code are Link->ConstLink, including the StorableContainers implementation, and the newly re-emphasized recommendation against extracting bare pointers to objects from handles and links. The latter has been done in some cases in order to exploit polymorphism, which should now be possible with the new handles and links.