----------------------------- ----------------------------- The ZOOM ErrorLogger Package ----------------------------- ----------------------------- Design notes version 3 7/22/98 The following syntaxes and semantics are based on: * The original Run1 Error Loggers of D0 (supplied by Jim Linnemann) and CDF (supplied by Liz Sexton). * The semantics agreements that came from the meeting of May 21, with Jim Linnemann (D0) and Liz Sexton Kennedy (CDF). * Meeting of May 27 with the D0 framework people (Jim Linnemann, Gordon Watts and Jim K having input) and including a note from Jim Linnemann. * The ZOOM meeting the afternoon of May 27, including Liz Sexton Kennedy, and Gordon Watts. * An extensive list of comments subsequent to that meeting, from the aforementioned people, plus Wyatt Merritt, Q. Li, Al Lee, and others. * A pair of extensive discussions with Jim Kowalkowsi on 6/23 and 6/25, addressing framework issues and a few technical strategies. Jim had by that point been designated as speaking for D0, and is also knowledgable about the CDF framework. * The ZOOM meeting of 7/1/98 at which both framework writers made "last" perturbations. This document has been modified to reflect all relevant input. Unless unexpected implementation technicalities force a change, these design notes define the intended ErrorLogger. At this point, as of 7/27 a prototype of the package will have been implemented. This will allow frameworkers and users to code to the full spectrum and syntax of features, including controling the ELoutput, ELstatistics, and ELsaveBuffer destinations. Most of these features, including anything having to do with limit throttling and all features of ELstatistics and ELsaveBuffer, are not yet implemented. ELoutput does work, however, and will give sensible if limited logging behavior. The intent is that neither the physicist NOR THE FRAMEWORKER should need to change any code as the features get implemented. Moreover, none of the headers these users would include are expected to change further, so user code should not be forced to recompile as more ErrorLogger features are enabled. Various portions of these design notes are sutable for extraction as users guides: The users guide for physicsts would consist of the "How the Physicist Logs Errors" sections. The guide for frameworkers would be everything up to the "Customization Hooks" chapter. Corrections and changes appearing since last version 2: * Significant change to what was previously the Global ErrorLog Instantiation section. This now addressess all issues having to do with utilities outside the set of modules issuing to the log. * editErrorObj(ErrorObj &) placed in the context supplier interface, as a way to do arbitrary changes in the message info. * In "How to instantiate an ELerrorObj," it should have been (and is changed to) ErrorObj. ~ Insertion of canonical example code. * setPackage() equivalent to setModule(). ~ Section on intended order of implementatoin of various stubs. ~ means the change will be put here but has not yet been. Corrections and changes appearing since last version 1: * The "To Get Started Immediately" section is relegated to a rejected alternative. Instead, the "Early Implementation" section describes what we will do quickly, such that the full interface is in place and **neither** the framework nor the physicist code need change as the full implementation is fleshed out. * We describe a macro ERRLOG(x,y) which implements the Babar idea of having __FILE__ output with the logged information. * We now provide a section describing the design patterns used. This will be fleshed out in the course of the early implementation effort. * The ELcontext structure goes away in favor of giving the ELcontextSupplier three functions: context(), summaryContext(), fullContext(). * When doing sample implementations, I noticed collisions between the variable name log, and the math.h function of the same name. Though you can survive this by judicious use of namespace, it will save some headaches to dodge the problem altogehter. Therefore, I now assume the Module class calls its ErrorLog instance "errlog" and not "log." * There is no good way, in our Module base class scenario, for the module name information in an ErrorLog object to be set in its constructor. The way we can get this information in is by the setModule() method. This is reflected in the examples. * We point out in section (0) that nothing extra need be included by the physicisit in case of the need to use errlog. * We provide for output of statistics summary upon deletion of the ELstatistics object attached to the logger. * Obviated stats->summary() with no argument as going to all dests; this was too convoluted. * ELdestinationI contains public bools to indicate whether this instance: will deal with immediate items as them come in or the entire message when complete (or both); accepts or ignores summary information; and whether it needs (standard) formatting for its messages or will do custom formatting. This relates to these two earlier changes: * ELsaveBuffer Local Semantics is modified to clarify that though operator<< is ignored, the contents of the entire message do get saved. * ELdestinationI must be capable of accepting (though they may ignore) msgStart() and msgEnd(). * ELdestinationI has setPreamble and setNewline methods to customize those without having to build a new class. * ELdestinationI must have a newCopy method. There is a note about the semantics of ownership of files and such. * In note 7, there is now a simplified way to set up module name. The old version is kept in, in case this has some flaw. * We define, in note 9, what happens if a user supplies a fileName for ELoutput and we can't open that file. * We define in (25) how an ELdestinationI can impose a firm limit on the sizes of its limits and counts tables, and what happens if this is exceeded. * We describe how the framework can log directly to one destination. * A correction: setSubroutine is in ErrorLog not ErrorLogger::. * There is a brief discussion about custom severity levels. Contents -------- Fundamentals Purpose and Scope Design Considerations Classes and Concepts ErrorLogger Semantics and Syntax How the Physicist Logs Errors 1) How the Physicist has access to the logger 2) How a physicist issues a log message 3) How a physicist can issue a log message in multiple statements: 4) How an instance of ErrorObj may be formed 5) What will the error message output look like 6) What summary information will be available 7) How the physicist can indicate what is being done 8) A list of the available severity levels Frameworker's Basics of Setting up and Using Error Loggers 9) How the frameworker sets up for logging 10) How the Module (or Package) Sets Up errlog 11) Global ErrorLog Instantiation 12) ELcontextSupplier: How the run and event numbers are indicated: 13) Basics of Thresholds and Limits 14) Controlling abort behavior 15) Obtaining error statistics summaries 16) Clearing statistics and/or limits Further Options for the Frameworker 17) Checking on errors 18) Error counts 19) Control Over whether traces are logged 20) Timespan in conjunction with limits 21) ELsaveBuffer Local Semantics 22) Accessing Saved Error Messages 23) How ELstatistics information is kept 24) Formatting control available in ELoutput Customization Hooks and Issues Relating to Collective Logging 25) What an ELdestinationI Must Do 26) ELsaveBuffer Semantics and Layout 27) Avoiding lists, strings, streams, and templates 28) Table Limitations 29) Direct logging to one destination 30) Custom Severity Levels Technical Design Design Patterns and C++ Techniques Used To Get Started Immediately Rejected Alternatives Minutia and Notes Known Concerns ****************** * Fundamentals * ****************** Purpose and Scope ----------------- The purpose of this package is: 1) to let user code issue ("log") error and warning messages; 2) to provide a uniform syntax and set of concepts for logging, across the different experiments and different groups in an experiment; 3) to provide a uniform and sensible logger behavior, in terms of information output and formatting; 4) to provide means of controlling multiple log destinations, output limitations, and other behavior; 5) to allow for (to the greatest reasonable extent) customizations tailoring to various environmental and system consideration. Equally important is the time scale for achieving this: 6) Items 1, 2 and a basic semblance of behavior 3 should be in place within a few weeks, so that physicists can begin inserting error logging into their code, without concern that those lines will have to change later. 7) At that point of release, all header files defining the interface, which would be included either by physicist code, module base classes, or the framework, should be frozen. This avoids the burden of extra compilations of user code, and guarantees stability of the interface. 8) The full behavior 3 and 4 should be in place within a month of that point. This has no particular justification in terms of coding needs, but provides a criterion by which to limit complexity of the feature set. 9) Tailoring for special environments, and other non-essential support work, may proceed after the package has been delivered. This support would include integrating traceback capabilities into the log information, and creation of ZMexception subclasses that log via this ErrorLogger mechanism. -- Purpose and Scope -- In terms of concepts explained below, a "physicist" inserts logging statements into the code. These work because a "frameworker" has set up the logger, specified one or more log destinations, and established various behavior controls. A standard form of behavior is performed when errors occur or summaries are requested. Ease and cost of use is critical: 9) Overall, if only the standard features and and basic controls are used, the package ought to be very easy for the physicist to use, clear and simple for the frameworker to set up, and not a heavy burden in terms of code to compile or executable code pulled in. A "customizer" may, by introducing derived classes, change that behavior. For instance, the formatting of the error text may be modified beyond the ways the framework controls permit; or the customization may avoid C++ features and system support assumptions that may not be desired in a Level 2 context. Having said what the package intends to do, we set some boundaries by listing issues this package does not intend to address: A) Beyond providing clear ways to derive classes for customization, the issue of implementing these customizations is not addressed. B) This package does not address issues of "collective logging," that is, coordination by some central entity of error log destinations stemming from multiple processes. The extent to which this issue is respected is: i) The provided "ELsaveBuffer" destination is meant to be a way to get the error information into memory in a controlled format, and as such may help with collective logging; ii) Since there will be a clear way to provide customized destination classes, behavior which will coordinate with collective logging can be smoothly integrated with the rest of the ErrorLogger package. Design Considerations --------------------- In the text of this section, whenever a class which is part of this package is mentioned for the first time, we will indicate this by hyphens. All classes are briefly described in the next section ("Classes and Concepts") and detailed later in this document. Let us define a "physicist" as anyone coding "on top of" some framework level, whose code may want to issue log messages. And define a "frameworker" as the person providing the means for the physicists to do things (like error logging) WITHOUT extraneous knowledge of the mechanism; the frameworker, in a sense, controls the flow of the job. The structure of the framework we have in mind is that there are a collection of classes, which for the purpose of this document we shall refer to as "modules" (D0 calls these packages). Each module is a class derived from a standard Module class; and the overall framework has some way of registering all the modules and seeing that some key method of each one is invoked when appropriate. Anything in the framework above the level of modules, or in the Module base class itself, we will say is in the realm of the frameworker; anything in methods of the derived specific modules is said top be physicist code. The fundamental idea is that the framework instantiates an --ELadministrator-- object, and through its methods attaches various forms of sinks derived from --ELdestinationI--. Then the base Module class will have an instance of --ErrorLog-- as a protected (or public) variable; for illustrations in this document we assume that is assigned the name "errlog." Since this is a variable at module scope, all methods of the specific module class can use log, but methods of different modules will be using different ErrorLog instances; that is, ErrorLog can own information about the module. The variable "errlog" makes the logger is made available to the user, who can issue errors in a syntax like errlog (ELerror, "Too much energy") << "E = " << totalEnergy << endmsg; The frameworker can control aspects of the behavior of the logger and the individual destinations, such as limits on how may times a given message is to be output. Those fundamentals, combined with the details of what aspects of behavior can be controlled, would dictate the (simplest) design of the package, were it not for the requirement that customization be accommodated. Three possible csutomizations we have in mind are: 1) It should be possible to link in error logging without requiring true streams and stl streams, and without "post-initialization" use of memory allocation on the heap. (But this should not stop the normal form of the package from using these useful concepts!) 2) It should be possible to derive from the ELdestinationI class in such a way that things like centralized log collection can be implemented. 3) To the farthest extent practical (BUT NO FURTHER), the design should allow for customization of such behaviors as message formatting, save buffering, and so forth. It turns out that the same design philosophy helps with both of these. In areas where flexibility is wanted, we define classes that encapsulate the use and behavior of each concept, and derive from these classes that implement our intended behaviors. For example, ELdestinationI defines what we mean by a destination object, while --ELoutput-- and --ELsaveBuffer-- are two specific classes derived from that base. The encapsulation classes will carefully specify: * Which methods must be implemented with behavior agreeing with our expectations; * Which methods must be present but may be "ignored" when invoked, or may safely be implemented with very different behavior than what we had in mind; * Guarantees concerning the ways the package will use the classes. Also, in doing our implementations, we will try to isolate behaviors that we anticipate a customizer may want to modify, such as the use of std::strings, from the bulk of the code, which the customizer can likely make use of. Classes and Concepts -------------------- Each of these is discussed in the Syntax and Semantics discussion but here I wish to enumerate them to orient the reader: ELadministrator Object providing methods for overall control of the error logger. This class uses the "Singleton" pattern. The framework should instance() one ELadministrator, and through it, attach various ELdestinationI sinks for log message to flow to. ErrorLog Encapsulation of the fundamental behavior supporting issuing of messages. Pointed to by ErrorLog: ELerrorLog Fundamental object supporting issuing of messages to the error logger. The ErrorLog class provided will use templating to allow the physicist to use operator<< with arbitrary classes; a different derived class might avoid templates and be more restrictive ELdestinationI Encapsulation of the behavior of a "sink" for the logged information to go to. Derived from ELdestinationI: ELoutput A specific class implementing our agreed behavior and formatting for sending messages to an ostream (which could be cout, cerr, or an ofstream for a file). ELsaveBuffer A specific class, implementing storage of messages in a defined set of buffers, in a publicly known format. The save buffers might, for example, be read out in a reconstruction job to write log message information out with an individual event. ELstatistics A specific class, implementing storage and output of message frequency statistics and summary tables. ELdestControl An object to act as a proxy for an ELdestinationI which is attached to the logger, faithfully providing the full spectrum of public methods available from the destination object. ELseverityLevel A class allowing for the definition of a fixed set of severity levels. The package provides the levels. Each severity level is a an instance of this class, and contains its unique level number, symbol, and name. -- Classes and Concepts -- ErrorObj An error message object. While the ErrorObj class provided uses templating to allow operator<< with arbitrary classes, a custom substitute might provide operator<< only from an ELstring. ELerrorObj provides implementation and non-interface material so that ErrorObj may be frozen at an early stage. ELcontextSupplier An interface for an object providing three methods (which can be equivalent) to obtain info about run, event, and other framework-wide concepts. ELstring Because quite a few methods throughout this package work with strings of characters, it is very useful to have a data type with the semantics of std::string. To allow for substitution of a different class, the interfaces and use ELstring instead; our provided classes will typedef this to std::string. ELlimitTable Class implementing the logic and table storage needed to allow a destination to decide whether or not to ignore an error based on having seen too many of that type of error. ELextendedID The combination of id, severity, process, module, subroutine. This defines one type of message, for the purposes of summary statistics. It does NOT include additional text, timestamp, or context. **************************************** * The ErrorLogger Syntax and Semantics * **************************************** How the Physicist Logs Errors ----------------------------- 1) How the Physicist has access to the logger --------------------------------------------- Both experiments have the concept of an organizational step in the course of processing, under control of some responsible person or group. One experiment calls this a "package," another a "module"; but in both cases it is implemented in code as a set of structural rules that the framework requires its various major components to follow. In particular, there is an object encapsulating each entire package (or module); and this is derived from some framework-provided base class which we will call Module. Each module has an object at scope of the Module class, which for illustration sake, we will name "errlog." The question of what the writer of the base Module class must do to establish this log object is answered below in "How the Module Sets Up errlog." So the physicist working in a methods of that module sees "errlog" as an instance of --ErrorLog--. A physicist coding for a different module will see some other instance of ErrorLog (with the same name "errlog" if things are set up as we recommend). But these are thin shells attached to a single instance of an --ELadministrator-- object that represents the actual logger. A logger can be associated with one or more destinations. Each destination represents a "sink" for error information; it is a class with methods to accept and deal with error message information, and to allow the frameworker to control various behavior aspects. In any event, the frameworker has established the logger and set up the destinations (possibly driven by information from a job control file). The physicist logging errors need not be concerned with the destinations which have been set up. To get the headers allowing the use of ErrorLog, the compilation unit has to include one header: #include "ErrorLogger/ErrorLog.h". However, since the physicist code is going to be part of a class that is derived from a Module base class, and the Module base sets up errlog, that will already have included the necessary ErrorLog.h. So in normal use the physicist does not have to include that line; nothing extra need be included to get access to errlog. Also, since this product is namespace protected, the user must specify that namespace zmel is in general use: using namespace zmel; ZOOM provides a mechanism to use if you wish the use of namespaces to be disabled/enabled depending on a define. Instead of "using namespace zmel": ZM_USING_NAMESPACE( zmel ) /* using namespace zmel; */ -- How the Physicist Logs Errors -- 2) How a physicist issues a log message: ---------------------------------------- errlog (ELerror, "Too much energy") << "E = " << totalEnergy << endmsg; a b c d e f Here is the breakdown of this syntax: a) The ErrorLog object has a () method which returns something you can treat like an ostream, that is, you can do << to it. It takes two mandatory arguments -- a severity and a message ID. The presence of these arguments has the effect of saying that this is the start of an error message. b) ELerror is one of the ErrorLogger severity levels. They all start with EL (they are listed below). They are all instances (provided by the ErrorLogger package) of the class --ELseverityLevel--. Invoking this operator() of the ErrorLog object has the effect of starting a new error message, and establishing the severity of that error. c) The second argument is a string (or char*), and determines the message identifier (ID). Although the entire content of the string will go into the output text, the message ID is considered for statistics and limits purposes to be the leading 20 characters of this string, padded with trailing blanks. d) There follow an arbitrary number of further outputs to the message. These will together form an informational string, which most destinations will print after the ID, with suitable line-break insertions. These further outputs are optional. e) Notice that the further arguments need not be strings (although you could, if you wish, format them all up into one string by using a stringstream). In the example, the double totalEnergy is put to the stream. Any data type or class that could be streamed to cout can be sent to the log message in this way. f) The endmsg at the end of the message is treated specially, and indicates that the message is over. Only one endmsg should be placed in a message; the user is free to insert explicit "\n" characters and/or endl should control of multiple lines be desired. See Note 1 about endmsg. An alternative to doing log (sev, id) is to use the provided ERRLOG macro: ERRLOG ( sev, id ) equivalent to errlog ( sev, id ) << __FILE__ <<":" << __LINE__ << " " This assumes that the ErrorLog available is named errlog; a second macro is ERRLOGTO ( logname, sev, id ) equivalent to logname ( sev, id ) << __FILE__ <<":" << __LINE__ << " " -- How the Physicist Logs Errors -- 3) How a physicist can issue a log message in multiple statements: ------------------------------------------------------------------ errlog (ELerror, " "Too much energy") << "E = " << totalEnergy; if ( condition ) { errlog << "more stuff"; } errlog << "yet more stuff" << endmsg; g h g) The ErrorLog object also has a direct << method. This streams more into its message string, without declaring a new error or establishing a severity level. Thus you may continue to build up the message, by having repeated lines which do not use endmsg until the last one. h) Although key error information (id, severity, timestamp, context, and so forth) is captured immediately when the log () method is done, the --ELerrorObj-- object remains "open" to additional data until either the endmsg is encountered, or some other invocation of errlog () occurs. If the physicist forgets to start a new message with log (severity, id) and instead then does errlog << stuff, then assuming the previous message was closed off with an endmsg (or this is the first one for this log), a new message will be opened. The severity assigned to such a message will be ELunspecified. -- How the Physicist Logs Errors -- 4) How an instance of ErrorObj may be formed ---------------------------------------------- An --ELerrorObj-- is an object representing an error message. Although normally neither the physicist nor the frameworker works with an ErrorObj directly, one can form such an object and later send it to a log. This might be done, for example, if you want to build up some history of what was done, but only log it if some ghastly condition later occurs. ErrorObj myMsg ( ELwarning, "Suspicious Pt" ); j k l errlog (myMsg); m myMsg = ErrorObj ( ELsevere, "Out of space" ); myMsg << "was doing step" << 20 ; n p errlog (myMsg); j) You can instantiate some pre-defined ErrorObj and, when and if an error happens, use it. k,l) The constructor for ErrorObj takes the two mandatory fields: severity and id. m) We support sending one of these formed messages to the log. n) ErrorObj has operator<< so you can pump further info into it, just as in the case of pumping more into the log. p) Notice in both this and the previous example we did not put endmsg. endmsg is ignored in ErrorObj's. When an ErrorObj is supplied to a log (as in errlog(myMsg)) this is known to be complete and the implied endmsg is supplied automatically. -- How the Physicist Logs Errors -- 5) What will the error message output look like: ------------------------------------------------ Although each possible destination may do different things with an error message, the ErrorLogger package supplies a standard --ELoutput-- distination which will output (to a stream or file) as follows: %ERLOG-w Too much energy: E = 834.750032 d0L2proc5 CTCDRVmodule CTCTRKsubr 10-Jul-1999 14:49:03 run=234 event=543 This is the default formatting; if some portion of the message would overrun the end of an 80-column line, it will instead be started on the next line, indenting to align with the first character of the id. The frameworker can control some aspects of the format produced by ELoutput (discussed below). A custom ELdestinationI can make more complex format changes. 6) What summary information will be available --------------------------------------------- There will be a method (normally invoked by the framework at the end of a job or run) to deliver a summary, either to one or more of the log destinations or to another ostream or file. The summary information will be a table by message type (message type means the combination of ID, process, module, subroutine, and severity) of count. If there were any occurences of a message that were not logged anywhere due to limits, that is indicated by an asterisk. A second part supplies up to 3 example contexts for each message type: The first two occurences, and the last one. In a third part there will be information about total counts at each severity level. Counts are given since the last clear, and also a total for the whole job. The responsibility of triggering the output of summary information belongs to the frameworker rather than the individual physicists. An illustration of the format of the summary information is given in the section on "Obtaining error statistics summaries." -- How the Physicist Logs Errors -- 7) How the physicist can indicate what is being done: ----------------------------------------------------- The physicist may OPTIONALLY declare the name of the subroutine currently executing. When an error is logged, this name will go into the message, and also into the overall statistics. The user will generally only declare fairly large steps, to avoid some overhead. errlog.setSubroutine( "myName" ); An alternative is available: If after the ID string, the next item is a string of the form "@SUB=myname", then myname is treated as the subroutine name, regardless of what setSubroutine has set up. errlog ( ELsevere "Bank Confusion" ) << "@SUB=prepare_IO" << endmsg; 8) A list of the available severity levels: ------------------------------------------- Each severity level has a corresponding ELseverityLevel object, for example, ELwarning. Each such object has two relevant behaviors: a There are methods to get its symbol and full name; the destinations will use these methods to prepare text. b There is a comparison to ask whether one ErrorSeverity is more severe than another. So the concept of logging everything above a certain severity is meaningful. The ErrorSeverity objects -- instantiated globally by the package -- are: Severity object Symbol Full name Intention --------------- ------ --------- --------- ELzeroSeverity -- -- ELincidental .. .. flash this on a screen ELsuccess -! SUCCESS report reaching a milestone ELinfo -i INFO information ELwarning -w WARNING warning ELwarning2 -W WARNING! more serious warning ELerror -e ERROR error detected ELerror2 -E ERROR! more serious error ELnextEvent -n NEXT advise to skip to next event ELunspecified ?? ?? severity was not specified ELsevere -s SEVERE future results are suspect ELsevere2 -S SEVERE! more severe ELabort -A ABORT! suggest aborting ELfatal -F FATAL! strongly suggest aborting! ELhighestSeverity !! !! The frameworker can control whether declaring severe, abort, or fatal errors actually will abort the job. The intentions listed for all the error types are only advisory; it is up to the framework to do what the experiment intends. ELzeroSeverity and ELhighestSeverity are not supposed to be used in forming error messages but are available to the frameworker when setting up various thresholds. Frameworker's Basics of Setting up and Using Error Loggers ---------------------------------------------------------- 9) How the frameworker sets up for logging ------------------------------------------- The ErrorLog objects that modules set up and that users see require that an ELadministrator exist. This encapsulates all the framework control over how logging is done; in particular, the ELadministrator holds the information as to what destinations should be used, and the way to get context information when needed. The framework, outside the context of any ordinary module, should get an instance by invoking the ELadministrator::instance() method. It will then use methods of this to establish how to get context information, and to attach various ELdestinationI sinks. So first the framework instantiates the ELadministrator: #include "ErrorLogger/ELadministrator.h" ZM_USING_NAMESPACE( zmel ) /* using zmel; */ ELadministrator * logger = ELadministrator::instance(); Notice that the Singleton pattern returns a pointer to the single instance. Next the framework provides the ELcontextSupplier. This is done by creating a class derived from ELcontextSupplier and overriding the pure virtual methods such as context(). An instance of this class is passed to logger.setContextSupplier(). MyContextObj contx; // See below for how the derived context supplier // looks. logger->setContextSupplier (contx); Providing a context supplier is optional; it provides a way for the logger to get at the run and event numbers when an error is logged. If no supplier is provided, no run/event information will be associated with messages. Further details are given below in "ELcontextSupplier: How the run and event numbers are indicated." The frameworker may also declare the name of the overall process (or job, or node in a farm situation). When an error is logged, this name will go into the message. This could be used to distinguish among many cooperating nodes which share the same output for destinations. logger->setProcess( "PCfarm81" ); -- Frameworker's Basics -- Having instantiated logger and told it how to get context information, the framework must attach one or more destinations -- sinks for the messages and statistics to be sent to. An error logger with no sinks is probably useless. The logged information can go to each destination attached to the logger. But not every message will be acted on by every destination; each is subject to thresholds and limits, as discussed below in "Basics of Thresholds and Limits." Each destination must be derived from ELdestinationI. Three classes derived from ELdestinationI are provided by the ErrorLogger package: --ELoutput-- is constructed taking an ostream, and is the typical way to associate a log with either cout, cerr, or an ofstream for a file. It has two forms of constructors: ELoutput xx(ostream &); // An ostream (e.g. cerr). ELoutput xx("xfile"); // A file name. // This is mostly a convenience, since // one could have supplied an ofstream. // See Note 9: "ELoutput (fileName)" // for details. --ELsaveBuffer-- may be constructed taking a size limit (see the discussion below of ELsaveBuffer semantics). A frameworker would normally put one ELsaveBuffer on the list so the framework can examine ErrorObjs that were logged. --ELstatistics-- is a table of message frequencies and sample contexts, which contains methods for generating run summaries. It may also have an associated ostream, to which unreported statistics would be sent when a job terminates. The usual framework will use ELstatistics, ELsaveBuffer, and one or more ELoutput sinks. To attach a destination, the frameworker must instantiate it, and attach the destination to the ELadministrator. Attaching the destination returns an --ELdestControl-- object, which is a handle for controlling the behavior of that destination. ELoutput logfileD("myfile.log"); ELdestControl logfile = logger->attach( logfileD ); Later, the ELdestControl may be used to control the behavior of this destination, as in logfile.setLimit("*", 20); A bit about object ownership and such is given in note 5: "ELdestControl details." The important thing to know is that all actions that apply to the ELdestinationI class can be invoked off the ELdestControl; it is a faithful proxy. Such methods include setting limits, thresholds, and timespans, triggerring summary output, and even directly sending an ErrorObj. -- Frameworker's Basics -- For completeness, there is a way to wipe a destination off the list of those the logger will dispatch to: logger->detach ( logfile ); In principle, one could temporarily detach an ELdestinationI and later re-attach the same type of object; but since that would involve copying a new ELdestinationI, any statistics or limits set would be lost. It is better to temporarlily turn a destination off by logfile.setThreshold(ELhighestSeverity) A typical example of the entire setup code in a framework, with a log to a file, one to the screen, and a save buffer, might be (see note 8 about includes): ZM_USING_NAMESPACE( zmel ) /* using zmel; */ ELadministrator * logger = ELadministrator::instance(); class MyContextObj : public ELcontextSupplier { public: ELstring context () { ostringstream ost; ost << "run= " << getRun() << " event= " << getEvent(); return ELstring(ost.str()); } ELstring summaryContext () { ostringstream ost; ost << getRun() << "/" << getEvent(); return ELstring(ost.str()); } ELstring fullContext () { return context(); } } MyContextObj contx; logger->setContextSupplier (contx); ELdestControl logfile = logger->attach ( ELoutput(cerr) ); ELdestControl logcerr = logger->attach ( ELoutput("myFileName.log") ); ELdestControl logstats = logger->attach ( ELstatistics(5000) ); ELdestControl errbuf = logger->attach ( ELsaveBuffer(20000) ); logfile.setThreshold(ELwarning).setLimit("*", 10); logcerr.setThreshold(ELinfo).setLimit("*",20); logstats.setThreshold(ELwarning); errbuf.setThreshold(ELerror); // See "Basics of Thresholds and Limits" Note that although many modules may instantiate many ErrorLog objects, only one instance of ELadministrator actually will be formed. This is achieved by making ELadministrator a singleton pattern; see Note 4 about "ErrorLog and the Singleton ELadministrator" for details. In this example, the last two destinations attached were the ELstatistics and then the save buffer. This is the recommended order. See "How error statistics are kept" for an explanation of why ELstatistics should come after the ordinary destinations. -- Frameworker's Basics -- 10) How the Module (or Package) Sets Up errlog ---------------------------------------------- (Although in these design notes we use the CDF term "Module" for the base class that a package of physicists' codes is associated with, in D0 the same concept is called a Package.) The class for each module should allow its methods to log errors by containing a variable object of the type ErrorLog. This should be done by giving the base class Module, which all modules classes derive from, an instance of ErrorLog. The name of this object should be known to the various physicists coding parts of the module. We assume that it will be "errlog." This should be placed in the protected section, not private, so that methods of the derived actual module classes can make use of errlog. The ErrorLog instance will contain a small amount of module-specific information, and depend heavily on the ELadministrator set up by the framework. For instance, the ELadministrator will hold the list of destinations for messages sent to the log. Since there is inherently only one instanceof ELadministrator, the many instances of ErrorLog will send to destinations in a coherent and unified way. After instantiating errlog, the Module class should provide a module name, to use when methods of this class issue an error message to the log. (Module classes in the framework generally have a way to know their name.) Destinations provided in the ErrorLogger package utilize the first 16 characters of this name wherever a brief module identification is wanted. (An optional second argument can supply a more verbose module identification string. If this is present, ELoutput will use it as part of the error message text, and other custom loggers may parse it or use it as they see fit.) Thus in the Module base class, the lines setting up the ErrorLog will look something like: class Module { public: virtual string getModuleName(); Module(module_selection) { ... errlog.setModule(getModuleName()); }; protected: ... Errorlog errlog; } (An additional routine, setPackage(name), is identical to setModule(name).) If this setup is used, the classes derived from Module, which contain the physicists' code, need do nothing further to be able to issue messages to errlog. There will be only one ELadministrator for the whole process. The mechanism for seeing that the base class Module knows the name associated with the derived class that is causing it to be instantiated -- the mysterious module_selection in the above example -- is discussed in Note 7: Module Name. 11) Global ErrorLog Instantiation --------------------------------- Here, we discuss how to have an error log for code which is not underneath a module. We show how to do this whether (a) you would want to assign some universal "utility" module name to such a log, or (b) you want the module name to change depending on the actual Module that originated the call to that external "utility code." Also, we will show how to hook the cout and cerr used in the utuility (or anywhere in your code) to the log, so that lines streamed to those will come out through the error logger. This should be considered a measure to avoid having to scan through large chunks of code utilizing cout or cerr, rather than a general alternative to the error log mechanism. (a) Illustrations in this document (other than this section) assume a structuring into Module classes, with each Module instantiating an ErrorLog. Howver, if necessary, one can instantiate a log at global scope. For programs not completely organized into Modules, this permits any piece of code to issue log messages. The penalty for working that way is that the mechanism for pulling in a different module identification string for each class where a log message might originate is bypassed: Whatever module is indicated for the global ErrorLog will be there for any use of that log. Of course, the techniques can be combined. If a framework involves several modules, plus universally available "utilities" that are shared among modules and are not encapsulated by a class of their own, the latter might use a global ErrorLog to allow them to issue error messages. (For programs involving multiple compilation units, each one can instantiate an ErrorLog, or sources which are not in the same compilation unit as the one that instantiates the ErrorLog could declare that as an extern. Because of the underlying Singleton pattern for the ELadministrator, the effect is the same.) #include "ErrorLoggger/ErrorLog.h" ErrorLog errlog; main() { // The main framework ... errlog.setModule ("Framework Level"); errlog (ELsevere, "Event Sequence Bad") << "Events went from" << prior_event << " to " << event" << endmsg; ... } -- Frameworker's Basics -- (b) The problem with using a global errlog as shown above can be seen from the following scenario: Module Tracker (with name "TRACKER") calls the utility chiSqFit which is at global scope (not a member function of Tracker). chiSqFit detects a problem, and having been instrumented for logging, does errlog (ELwarning, "bad fit") << ndegrees << endmsg; where now errlog is a global scope logger which we assume has been set up with errlog.setModule("Framework Level"). This message will appear in the log as having come from module "Framework Level" rather than module "TRACKER." The frameworker may well judge that the identity of the originating module is much more important to know, particularly since the utility does have the opportunity to identify the subroutine in which the problem occurred. The solution to this dilemma lies in explicitly doing a setModule(name) to the global errlog when entering a new module. (You could conceivably automate this, by doing ::errlog.setModule in the Module code before the key function is invoked; but for simplicity we will illustrate doing this explicitly.) Thus: main() { // The main framework ... errlog.setModule ("Framework Level"); errlog (ELsevere, "Event Sequence Bad") << "Events went from" << prior_event << " to " << event" << endmsg; ... errlog.setModule("FindTracks"); // In case of use of the global errlog FindTracks.doit(event); // The FindTracks module. errlog.setModule("DoPhysics"); // In case of use of the globbal errlog DoPhysics.doit(event); // The DoPhysics module. ... } Notice that those setModule calls will not be necessary if all the error logging happens in places where the errlog in scope is that of the module; in that case, the automaitic mechanism gets the proper module name whith no extra concern for the frameworker. -- Frameworker's Basics -- (c) Although the ErrorLogger package is meant to augment rather than replace cout and cerr, the package supports hooking cout and/or cerr to the logger instead. The short answer to how do to this is: - Typically at global scope, declare new variables: ELcout cout (ELinfo); ELcout cerr (ELerror2); - Then, for all code where these are in scope, anything streamed to cout or cerr will instead go to the log, and reach the destinations as a special sort of "output" message. Some destinations may be set to ignore such messages; others will typically write the message out. The intent of this ability is to help the user whose existing code does a lot of ordinary cout output. Such a user can do one thing for a whole compilation unit, and get all the cout and cerr output into ErrorLogs without going through and changing cout in each case to some errlog (ELwarning, "cout: ") or whatever, and without inserting endmsg after each message is complete. The downside is that these outputs are ignored by the ELstatistics and ELsaveBuffer destinations, and will not be time-stamped, limited by id, or enjoy any other sophisticated features. We recommend that this be used only as an expedient for dealing with existing code that streams to ostreams directly. New code should use the actual errlog syntax when issuing messages that should go into the log! Resolution of several issues, and subtleties in usage and behavior, are discussed in Note 10: "ELcout: Replacing cout and/or cerr." -- Frameworker's Basics -- 12) ELcontextSupplier: How the run and event numbers are indicated: ------------------------------------------------------------------- The run and event are printed in the text of every error message. However, the framework does not call some setEvent() method every time a new event is started. In order to avoid overhead for every event (when errors might be logged in only a tiny fraction of those events), this information is not pre-supplied by the framework. Instead, when an error is being logged, the logger asks for the run and event names at that point. It asks by invoking the context() function of the --ELcontextSupplier-- object set up for the logger. The frameworker should create a class derived from ELcontextSupplier and pass an instance of that class to the constructor of the ELadministrator, as shown above ("How the frameworker sets up for logging"). An ELcontextSupplier defines the following simple interface: class ELcontextSupplier { public: virtual ELstring context () = 0; virtual ELstring summaryContext () = 0; virtual ELstring fullContext () = 0; void editErrorObj(ErrorObj & msg); } As seen from this, the class derived from ELcontextSupplier **must** define the three virtual methods returning ELstring, though some may be identical to others. Although destinations are free to call for and use whichever context string they wish, the intent of the three forms is as follows: context() is the form used by the typical output-to-a-log-file destination, for example ELoutput. Here the string ought not to be too long, to avoid clutter in the log; but not limitation or truncation is imposed per se. A typical string would be "run= 1234 event= 12345". summaryContext() is the form used by ELstatistics to get a string suitable for insertion into a table. This is length-critical, and will be truncated at 16 characters in the statistics summary. A typical string would be 1234/12345. fullContext() is a form intended for use where extra info may be useful and length is not a big consideration. ELoutput can enable fullContext() instead of context() on a per-destination basis. fullContext() would typically just return context(), but another example might be "run= 1234 event= 12345 reco version 3.2" If a custom form of ELstring is used which may have a fixed length limit, the context supplier should protect against over-writing past the end of the available space. The editErrorObj() routine will be called when the message is started. It provides a hook to modify the module or subroutine information, the id, or any part of the message other than the actual text items (which would not have been established yet). This method is optional; the editErrorObj() method of the base ELcontextSupplierI class is an inline method doing nothing. The context supplier is spcified via the setContextSupplier() method of ELadministrator. If this method is not invoked, no context information will be associated with error messsages. -- Frameworker's Basics -- 13) Basics of Thresholds and Limits ---------------------------------- Some nomenclature: Thresholds and limits both apply to particular destinations. When we say "limit", we shall always mean some count of messages of a particular description, beyond which some action will cease to happen. A limit can be specified by message id, or for multiple id's in two ways: A general wildcard, or all messages of some severity level. When we say "threshold", we shall always mean some ELseverityLevel, at or above which some action would happen. One can set a limit or a threshold for each individual ELdestinationI. To do this, you use methods of the associated ELdestControd, which we will call "dest" for these examples. The effects are as follows: Method Effect ------ ------ dest->setThreshold (severity) Supress logging or acting on messages below this severity, for this destination. dest->setLimit (id, n) dest->setLimit (severity, n) dest->setLimit ("*", n) For this destination, don't log past n instances of any given exception id matching the specified type. In case one has established two or more applicable limits, the limit used is the most specific applicable case: Specified ID before specific severity level, and both before wildcard "*". (See Note 6: "Limit Semantics" for further details.) As implied by the above chart, each ELdestinationI owns a threshold level, a general limits table (indexed by severity level) and a limits table (indexed by message id). Each limits table contains as data the limit and the count. Notice that the ELsaveBuffer is just a particular destintation. That means that filtering by severity and throttling by frequency for that buffer are set up via setThreshold and setLimit, just as for other destinations. Also notice that ELstatistics is also a particular destination. That means that filtering by severity is set up via setThreshold. In the case of an ELstatistics destination, however, setLimit has no effect: Once an error message id gets into that table, there is absolutely no cost to incrementing its count, so a limit would be useless. -- Frameworker's Basics -- The typical framework code related to these routines might look like: stats.setThreshold ( ELerror ); // Don't keep stats on messages // with severity below ELerror. screen.setThreshold (ELincidental); // Send messages of severity ELincidental // and up to the screen. logfile.setThreshold (ELwarning); // Don't send to logfile unless severity // is at least ELwarning. logfile.setLimit ( "*", 10 ); // Generally don't log past the 10th // occurence of a given message id. logfile.setLimit("Bank short",50); // But log up to 50 occurences of // "Bank short" for the logfile. savebuf.setLimit( "ELwarning", 6 ); // Don't save more than 6 occurences of // any given warning in save buffer. Although a limit conceptually may or may not have been set, and if no limit whatsoever applies, no messages will be throttled out, the threshold severity level always has a value. By default, in each individual destination, this threshold starts as ELwarning. For convenience, the ELadministrator has setThresholds() and setLimits() methods. The meaning of those is to invoke the corresponding method for EVERY destination currently attached to the logger. (That might change some limits previously established.) logger->setThresholds (severity) logger->setLimits (id, n) logger->setLimits (severity, n) logger->setLimits ("*", n) Finally, two features slightly soften the concept of throttling via a limit. The first is that a count is past its limit, action does not cease completely: If the applicable limit for a given error would be L, then all if there is an excess E = count-L, instances with E/L = 2**N for any non-negative integer N, will be logged. Thus if the limit is 5, you will see numbers 1, 2, 3, 4, 5, 10, 15, 25, 45, 85, and so forth. The second concept applies when a large bunch of errors of some type is followed by a long period with no such errors. At that point the count toward the limit may be reset, so that a new cluster of these errors can again be logged. This is explained below in "Timespans on Limits." -- Frameworker's Basics -- 14) Controlling abort behavior ------------------------------ The higher severity levels are used by physicists to indicate that they consider this error so severe as to warrant terminating the job. However, ultimate control of this rests in the hands of the framework. The control is associated with the logger (the ELadministrator) rather than any specific destination. logger->setAbortThreshold (ELfatal); By default, the abort threshold is ELabort. If the physicist issues a log message with severity as at or above the abort threshold, UPON COMPLETION OF THE LOG MESSAGE and dispatch of that to the various destinations, the logger will terminate the job by invoking exit (severityLevel). Assuming the framework had established an atexit() handler, this would be called as the job exits. A plausible phlosophy is to set the abort threshold at ELhighestSeverity, so that NO user message aborts the job. In that case, we advise periodically checking for the presence of severe errors (see "Checking on errors"). -- Frameworker's Basics -- 15) Obtaining error statistics summaries ---------------------------------------- Assuming the frameworker has attached an instance of the provided ELstatistics destination to the logger, statistics summaries may be obtained. ELstatistics has no visible reaction to being sent an error message, but places information into a table, which can be formatted and output. For the purposes of the illustrations below, we will assume that the logger has attached an ELstatistics destination, with an ELdestControl called "stats." The summary of error statistics can be obtained as a string. It can also be sent to a destination device (as a series of summary lines); ELoutput places these lines into its associated stream to output the summary. Other destinations may treat these lines differently; in particular, ELsaveBuffer and ELstatistics ignore them. To obtain the summary: stats->summary(dest); // Sends the summary to this ELdestinationI. stats->summary(os); // Sends the whole summary string as a char*, // to some arbitrary ostream os. stats->summary(s&); // Sends the whole summary string to ELstring s. Except when the framework supplies a string& s, the summary will be sent using char* arguments, a single line (with a fixed maximum length) at a time. Each of the above forms also has an optional last argument, containing a char* to insert into the first line of the summary string. stats->summary(dest, "summary title"); stats->summary(os, "summary title"); stats->summary(s&, "summary title"); ELoutput puts up 40 characters of this title in the first line of the summary output. (Custom destinations should protect themselves against misbehaving if sent an arbitrarily long title.) There is an additonal means of having the summary sent to an ostream: If the constructor of ELstatistics is supplied an ostream argument, then when the job completes the destructor for that ELstatistics will check to see if any information has changed since the last summary request. If so, one final summary will automatically be sent to that ostream. (This is safe assuming the user does not explicitly delete the ostream before the statistics destination goes away.) The lines sent to a ELdestinationI are preceded by a startSummary(title) method, sent using however many summaryLine(char *) are needed, and terminated by endSummary(). Not every message will make it to become a summary entry: The frameworker can set a threshold for the ELstatistics destination, filtering out errors below the severity of below (say) ELerror. Also, ELstatistics can be constructed with a maximum number of error IDs it will keep track of. -- Frameworker's Basics -- The format of the summary information is that of three parts: The first part lists the errors which have occurred, with their frequencies. The second part re-lists just the identification, and supplies the contexts (that is, the run/event or up to 17 characters of the context string) of the first two and last examples of occurences of each error. The third part lists the count of occurences of each severity level. To illustrate: %ERLOG =============== SUMMARY === summary title ========================== Process job123a type message id sev module subroutine count total ---- -------------------- - ----------------- ---------------- ----- ----- 1 Bank Error E central tracker refine_candidate 97 97 2 Bank Error E central tracker prepare_sigma 5 5 3 Bank Error E muon tracker prepare_sigma 5 5 4 Energy overflow S CALhadron sanityChecks 1 1 5 Too many tracks w track matching start_seeds 11* 11 6 Too many stray trax w CTKtrack countTracks 32 32 7 Too many stray trax E CTKtrack checkTracks 4 4 * Some occurences off this message were suppressed in all logs, due to limits. Process job123b type message id sev module/package subroutine count total ---- -------------------- - ----------------- ---------------- ----- ----- 8 Bank Error E central tracker refine_candidate 91 91 ... 12345 abcdefghijklmnpqrst -w abcdefghijklmnop abcdefghijklmnop 123456* 1234567 type message id Examples: run/evt run/evt run/evt ---- ------------------- ------- ------- ------- 1 Bank Error 14231/54101 14231/54103 14239/59350 2 Bank Error 14231/54215 14232/55506 14238/58190 3 Bank Error 14231/54200 14232/55606 14239/59195 4 Energy overflow 14235/60305 5 Too many tracks 14231/54101 14231/54506 14238/59190 6 Too many stray trax 14231/54164 14231/54171 14239/59195 7 Too many stray trax 14231/54100 14231/54191 14239/60012 8 Bank Error 15282/64102 15282/64106 15288/69358 12345 abcdefghijklmnpqrst ABCDEFGHIJKLMNOP ABCDEFGHIJKLMNOP ABCDEFGHIJKLMNOP Severity Number of occurences Total -------- -------------------- --------- .. 275984 512902 SUCCESS 21000 42000 INFO 2100 4200 WARNING 325 604 ERROR 4 6 ERROR! 1 1 SEVERE 1 1 The last lines of the first parts in this illustration demonstrate the maximum number of characters each field will display. A customizer is free to alter this format by providing a custom destination similar to ELstatistics. -- Frameworker's Basics -- 16) Clearing statistics and/or limits ------------------------------------- The method to clear the counts in the statistics being kept in the destination whose ELdestControl is "stats" is stats->clearSummary(); This clears both the individual statistics (kept by message ID) and the counts for the various severity levels. It would normally be used after having invoked stats->summary() in some form to stream the information somewhere. clearSummary() does not zero the aggregate counts for individual error IDs or for severity levels. It also does not affect any limits set, or wipe the knowledge of the message IDs from the statistics tables. (Thus there would remain a bunch of errors with counts of zero; when statistics are output these do not appear.) There is a more sweeping method: stats->wipe(); This clears everything -- counts and aggregate counts for individual ID's and for severity levels. The statistics is wiped clean, which may be relevant if memory management issues have imposed a limit on the number of message ID entries in the statistics table. The same routine also wipes out any information in the limits table. This includes values which have been supplied by setLimit() (or setTimespan()), and counts of the individual instances of each message ID. For the ELstatistics destination this is moot (since limits do not apply) but for ELoutput or ELsaveBuffer this may be useful when a maximum number of message ID entries in the limits table has been imposed. dest->wipe(); logger->wipe(); // Applies wipe() to all attached destinations This clears everything -- counts and aggregate counts for severity levels and for individual ID's, as well as any limits established. (This includes the limits for "*" all messages.) The table is wiped clean, which may be relevant if memory management issues imposes a limit on the number of message ID entries in the statistics table. Finally, there is a way to zero the counts going toward the limits all IDs in a destination. This does not affact ELstatistics (which does not use limits) and will not compactify the limits table (see Note 6: "Limit Semantics"). It also does not affect aggregate counts. dest->zero(); **************************************** * Further Options for the Frameworker * **************************************** 17) Checking on errors ---------------------- Since the severity levels above ELnext indicate advice that the flow of processing ought to be modified or ceased, it is advisable (in all but the most time-critical applications) that the framework check for the presence of such errors, between modules or between events. To make this inexpensive, a routine checkSeverity() is provided. ErrorSeverity highest = logger->checkSeverity(); if ( highest >= ELnext ) { do whatever }; This routine provides the severity level of the highest error declared since the last checkSeverity(), or ELzeroSeverity if none have been declared since the last check. 18) Error counts ---------------- You can obtain a cumulative count of errors of a given severity -- including those which occured but were not sent to a destination or saved. logger->severityCount (ELerror); logger->severityCount (ELsevere, ELfatal); The two-argument form adds the counts of the severities from the first to the second level; for example, severe1, severe2, abort and fatal. These counts are can be reset to zero by logger->resetSeverityCount(ELerror); logger->resetSeverityCount(ELerror, ELfatal); logger->resetSeverityCount(); The latter resets all the severity counts. Note that these methods of the ELadministrator "logger" have nothing to do with the counts kept in ELstatistics for summary purposes. 19) Control Over whether traces are logged ------------------------------------------ Obviously, if a message is not logged at some destination (because it misses the severity threshold or has occured more times than its limit) no trace is sent to that destination either. But even if the message is logged, you may not want the additional information (and clutter) of a trace unless the message is sufficiently severe: dest->setTrace(severity) -- For errors logged to this destination, include the trace (if available) if severity is at least this level. The default provided for ELoutput and ELsaveBuffer is setTrace(ELerror). -- Further Options -- 20) Timespan in conjunction with limits --------------------------------------- Sometimes, you may have a large bunch of errors logged in a short period of time, due to a specific problem within, say, a peculiar event. The limit mechanism prevents an output destination from being swamped with more than N of these identical messages. If the messages continue to arrive on a regular basis, then this throttling should continue to apply. But sometimes, there may be a burst of errors, hitting the limit, followed by a long hiatus, followed by another instance or burst of that type of error. In that case, you may well wish to see the errors immediately after the dry spell. The way this is provided is by the concept of a timespan. A timespan is a number of seconds associated with a given error type. When a message arrives to ELoutput or ELsaveBuffer (which use the same ELlimitTable class to implement their throttling behavior), the count (tward throttling based on the limit) might be zeroed. The count is zeroed if the number of seconds between the previous occurence of this type of error and the present occurence exceeds the timespan for this type of error. Of course, if the count is zeroed, this and the next N errors will not be suppressed. (On systems where getting time information is impractical, timespan will be moot.) The semantics of setting timespans, and choosing which timespan is applicable, is exactly the same as for setting limits. However, a timespan is expressed as a float number of seconds t. dest->setTimespan (id, t) dest->setTimespan (severity, t) dest->setTimespan ("*", t) logger->setTimespans (id, t) logger->setTimespans (severity, t) logger->setTimespans ("*", t) A timespan (or a limit) can be "unset" by supplying a honking large value that will never be reached. -- Further Options -- 21) ELsaveBuffer Local Semantics -------------------------------- The save buffer is intended to fill two roles: Firstly, in Run I, at least one experiment had the framework examine errors AFTER an event was completed. One could imagine storing the eoor infomration in a data bank tied to the event, for example. Secondly, athough depending on the strategy chosen collective logging may require some custom destination, the save buffer is designed to be suitable for a "boss" process to pull error info for collection. This section describes the ELsaveBuffer destination from just the point of view of the local process; the first of those two roles. Locally, however, it can ve viewed as a single list of LENGTH ErrorObj objects. The fundamental operation -- done by the logger -- is to add an ErrorObj to the collection at the logical end. Locally, we will implement semantics patterned on those of list, with the implemented concepts being push_back(), pop_front(), and clear(). Direct const_iterator manipulation is also supported so that the program can examine the contents (see the next section, "Accessing Saved Error Messages"). The various pieces of control information needed to implement these operations are kept in a state block, and it is possible via the constructor to force that state block and the actual data area to be located at some given address. See "ELsaveBuffer Semantics" for details about this state block, which will be essential in the collective role. The operations which come from the user or frameworker, to impact ELsaveBuffer, mostly come from the fact that it is an ELdestinationI subclass: setThreshold() Thresholds, limits and timespans are set setLimit() as for any other ELdestinationI class. setTimespan() log(errorObjI &) This ends up calling push_back() to place an error object into the save buffer. The error object, contains the extended ID plus the entire text of items sent to that error message. summaryLine(char*) Summary lines are stored in some way in a summary buffer which some external entity can read out. We have yet to flesh out this mechanism but it will be much like the message mechanism. **** And particular to ELsaveBuffer: flush() This ends up calling clear(). If you also wish to zero the counts toward limits (which you might do if error info is stored with each independent event), call flush() then zero(). The ELsaveBuffer destination ignores individual operator<< operations that send one item at a time, in favor of wating till the entire message is logged. Of course, no information is lost since the ErrorObj contains the full text of all the items logged in. -- Further Options -- The logical list can be of finite extent, and in fact, when used in the collective mode, it is possible that even with a small amount of room available, there will be no way to add a message without blocking. Thus, adding a message has peculiar semantics: It is NOT GUARANTEED that a given message added to the list will actually be remembered; messages may be dropped if they overflow. If that happens, the last message before the overflow will also be lost, replaced by a message warning of skipped messages. In any case, long jobs ought periodiacally to flush the save buffer, otherwise a large collection of occasional messages can amount to a large memory hog. If the logical list is of indefinite extent, the implementation may use a true list, and in that case no messages can be lost. If collective use of the ELsaveBuffer is desired, a limit should be imposed. If a true list is used, the state block used for remote control and access becomes meaningless. -- Further Options -- 22) Accessing Saved Error Messages ---------------------------------- For the program to get at information contained in saved error objects, if needs to know how to get at the items in the ELsaveBuffer list, and having gotten an ErrorObj, how to get at its individual fields of data. The ELsaveBuffer list sticks with standard container semantics: To get at an item, you declare a const_iterator and use it. ELsaveBuffer::const_iterator err; for ( err = savebuf->begin(); err != savebuf->end(); err++ ) { if ( err->severity() == ELabort ) { think_about_aborting(); } } The methods for forming and adding information to an ErrorObj were already described in "How an instance of ErrorObj may be formed." ErrorObj also has methods to get the various pieces of information, both those captured when the object was created, and those supplied directly by a user. ELseverityLevel severity(); ELstring id(); ELstring text(); ELstring process(); ELstring module(); ELstring subroutine(); ELstring context(); ELstring verboseContext(); time_t timeStamp(); ELstring fullMessage(); int sequenceNumber(); The last item bears explaining: ErrorObjs sent and not ignored by ELsaveBuffer are assigned sequential numbers, so that a non-local entity can detect any skipped messages. Only ErrorObj's obtained from the save buffer have a meaningful sequence number; if a user just constructs an ErrorObj it will have sequence number 0. Notice that only information associated with the error message is available in this way. This would not include information describing how a specific destination (including an ELstatistics destination) would treat the error. In particular, there is no way to get from the count of how many times this type of error has occured, directly from an ErrorObj. -- Further Options -- 23) How ELstatistics information is kept ---------------------------------------- The error statistics kept by the ELstatistics destination are logically a map. The combination of 20-byte message id, severity, process, module and subroutine, which we for conveneince combine to form --ELextendedID--, acts as the key for this map. The data is a (zero-able) count and an aggregate count for each type of message, plus the (brief) contexts (run/event) of the two first and one latest instance of occurence of such a message with non-null contexts. A last piece of data for each entry is a flag telling whether any message of this type has been throttled out of all the destinations because of limits. In "How the framework sets up for logging," it was mentioned that ELstatistics should be attached afet all the ordinary destinations. ELdestControl logfile = logger->attach ( ELoutput(cerr) ); ELdestControl logcerr = logger->attach ( ELoutput("myFileName.log")); ELdestControl logstats = logger->attach ( ELstatistics(5000) ); ELdestControl errbuf = logger->attach ( ELsaveBuffer(20000) ); This is the recommended order, for the following reason: Each destination indicates whether it has ignored a given message because of a limit or threshold. An ELstatistics, when passed an error object, notes whether any destination has YET actually logged the error message; if not, it will mark that error type as having an instance that was ignored by every destination due to limits. So if you want that asterisk in the statistics summary to be meaningful, attach ELstatistics after all the destinations which might output the message. In the above example, we attached errbuf after ELstatistics because we want the * to appear in the summary if an error was neither logged to logfile nor logcerr. But of course the frameworker has flexibility. For instance, one could imagine a attaching a scrolling screen that will get millions of messages AFTER logstats, saying that if the instance only was output there, we want to note in the summary that some instance appears in no destination logs. In addition to the information kept in the individual error table, a count and an aggregate count of errors by severity level is kept, so that that information may be supplied in the summary. You can access the ELstatistics information by the stats.summary() methods. 24) Formatting control available in ELoutput -------------------------------------------- The ELoutput destination provided allows the frameworker to control some aspects of the format outputted for each error message. The methods in the second column of this chart reflect the default behavior. dest->suppressTime() dest->includeTime() dest->suppressModule() dest->includeModule() dest->suppressSubroutine() dest->includeSubroutine() dest->suppressText() dest->includeText() dest->suppressContext() dest->includeContext() dest->usefullContext() dest->useContext() For further flexibility, a custom ELdestinationI must be created. ***************************************************************** * Customization Hooks and Issues Relating to Collective Logging * ***************************************************************** 25) What an ELdestinationI Must Do ---------------------------------- In order for a custom the ELdestinationI to work properly in the context of how the physicists and frameworker can use it and how the logger will interact with it, the destination class (and its associated ELdestControl) must support the following methods: The destination class must have: bool log ( ErrorObj ) This is allowed (and encouraged, based on limits and thresholds) to decide to ignore or act upon this message. If the message is acted upon, log() should return true so that statistics can know somebody logged it. bool msgStart(ErrorObj) These methods can be null methods. On the bool msgItem(ELstring) other hand, if a destination chooses to output bool msgEnd() partial information before the termination of the error message, it can. The msgStart() method provides an ErrorObj with empty item list, so that ELoutput can get at the message "pre-amble" and "post-amble." These return true if the msg is being logged rather than filtered. bool output(ELstring, ELseverityLevel) Comes from ELcout. If the severity level is at or above threshold, write this item out (into the log). startSummary(const ELstring & title=""); These are present for the purpose of summaryLine(const ELstring & line); feeding formatted summary to a endSummary(); destination, a line at a time. A given destination is free to ignore these. ELdestinationI * newCopy(); Pure virtual in ELdestinationI; must be supplied. This must new a copy of the destination object and return its address. See note 5 "ELdestControl Details" for the ownership of destinations issue. Also, the following public logicals are false by default; they should be set appropriately by the derived class constructor: bool skipImmediateItems This destination will wait for the message to be complete and the full message to be sent. Thus there is no need for individual items to be sent. bool skipMessageObject This destination is acting on individual items and will not need to be sent the full ErrorObj when completed. Either skipMessageObject or skipImmediateItems should be false (both could be false). bool skipSummary This destination will not output summary info. bool skipOutputItems This destination will not react to items steamed into an ELcout. -- Customization Hooks -- Even if skipMessageObject is set true, the destination's operator()(ErrorObj) ought to check the message's serial number and be prepared to act on it if it does not match that of the message being output. The reason is that a physicist can construct a message object and do errlog(msg). In that case, skipMessageObject is ignored, the msg is sent to each destination, and no immediate items are sent. If operator()(ErrorObj) were trivial, such messages would never be acted upon by the destination. The ELdestControl object has, and the destination class must support: setThreshold (ELseverityLevel sv) The destination is free to ignore these, setLimit (ELstring s, int n) as does ELstatistics which does not setLimit (ELseverityLevel sv,int n) apply limits or timespans. If these are NOT ignored, then the string "*" setTimespan (ELstring s, int n) must be treated as a wildcard, and setTimespan (ELseverityLevel sv,int n) the semantics of precedence described in this document must apply. setTableLimit (int n) Impose a limit on the number of entries in any given table. See note 3. clearSummary(); Ignore clearSummary except for statistics wipe(); destination; see "clearing Statistics and/or zero(); Limits" behavior definitions. setPreamble (ELstring preamble) Establishes a substitute for the default %ERLOG start of message output. Pre-ambles of up to 6 chars will work properly with the standard formatter. setNewline (ELstring nl) Establishes a substitute for the default \n as a newline character in the log. Provided in case NT needs this. The control object for a destination intended as a statistics gatherer must also support: summary(dest, char* title=0); These methods should be ignored except in summary(os, char* title=0); the case of the controller of a statistics summary(s&, char* title=0); destination, which will output the summary as directed. The ELdestControl class has two protected pointers, ELdestinationI * d; ELdestIX * x; It is possible to derive off this class, placing values do customized subclasses of ELdestinationI and/or ELdestIX into these pointers in the constructor of the derived ELdestControl subclass. This would minimize the amount of new code one needs to write, and also may eliminate in some cases the need to recompile framework compilation units when the header of the new customized class changes. -- Customization Hooks -- 26) ELsaveBuffer Semantics and Layout ------------------------------------- The semantics of the ELsaveBuffer are dominated by the idea that it ought to be usable in the context of a collective system, in which a single "boss" may need to extract the saved messages, a chunk at a time. The buffer is logically a set of three limited heaps; the boss may designate one heap as being read out and therefore "frozen" to additional inserts. The semantics of overflow are that it is possible to lose the last messages past an overflow condition; and if that happens, the last message before the overflow will also be lost, replaced by a message warning of skipped messages. The control variables relevant to this boss commumunication are as follows: FrozenBuffer - boss writable The buffer unavailable due to being read out by the boss. A, B, C or 0. NoFlush - boss writable If this flag is set, then instead of replacing any message with a skip notice, BLOCK until the next buffer is unfrozen. Ensures no message skips, at the cost of potential local blocking. Dump - boss writable If a non-zero integer, instructs the system to flush() the buffers and then to copy that ingeger into DumpResponse. ActiveBuffer The buffer being written at this instant (and thus unavailable to read even though the boss may have just said it should be frozen). If this is A, B or C, it should quickly return to 0. DumpResponse Starts at 0. Assuming the boss sends sequential integers to Dump, this lets the boss see if a given dump has been done, to avoid accidentally missing or doing unwanted dumps. When a buffer has been frozen, and now a new buffer is frozen (or FreezeBuffer is changed to 0) the ELsaveBuffer, upon noting this (which it will do whenever any message come in) will immediately clear that previously frozen buffer. So the general prcedure for the boss is: At leisure, decide to read out some messages. Having kept track, you know the first unread messages are in A. Write A to FrozenBuffer. Read ActiveBuffer and the state block (see above), which are contiguous. Almost certainly, ActiveBuffer will not indicate A, but if it does, re-read until it does not. Now read out A, and use the information in the state block to understand it. Then either write 0 to FrozenBuffer, or if you want to read B immediately, write B to FrozenBuffer, and repeat the process with buffer B. The state block will look like: ActiveBuffer DumpResponse startOfA startOfB startOfC nA nB nC capacityA capacityB capacityC NoFlush FrozenBuffer Dump There will also be a summary buffer. Again, one will need to define procedures for the boss to read this out without feaing that it will change while being read. We have not yet defined these additions to the state. **** To support this sort of (interprocess) communication we provide a way of imposing a limit of the size of the ELsaveBuffer as well as a way to force the buffer to reside at a given address: ELsaveBuffer () // Usual default constructor presenting // unlimited room using std::list. ELsaveBuffer (int) // Limit to that may bytes. Divide into // buffers A B and C with 1/3 to each. ELsaveBuffer(ELsaveBuffer*, int) // Place at address, and limit to that many // bytes. -- Customization Hooks -- 27) Avoiding lists, maps, strings, streams, and templates ---------------------------------------------------------- When first designing this with D0 L2 in mind, the following concerns came up: Due to the lack of psot-startup memory allocation, and desired avoidance of C++ library classes, there several types of worries: A Structures that may in principle grow indefinitely. B Use of C++ strings. C Use of stringstreams to build strings. D Use of streams other than stringstreams. E Use of maps. F Use of templates penetrating to user interface Ideally, we would like to feel free to use all of these in a basic implementation, yet have clean ways to avoid using any of them if necessary. Here are the places where there can be concern: A1 - The errorStats table could have unlimited entries A2 - The errorLimits table could have unlimited entries A3 - The save buffer ELbuffer might grow arbitrarily A4 - The control structure for ELbuffer might grow arbitrarily A5 - Capture buffers used to flush the ELbuffer need to be limited A6 - The list of destinations could grow indefinitels B1 - An item sent to log by << can be arbitrarily long B2 - The formatted message string can be arbitrarily long B3 - The date/time might be a C++ string B4 - Part of the Context object might be a string B5 - Objects contatined in a class (e.g. ELerrorStats) might be strings C1 - Use of stringstream to implement operator<< D1 - An ELdestination might inherently be associated with a stream E1 - Use of maps in the limitsTable E2 - Use of maps in ELstatistics F1 - The ErrorLog and ErrorObj classes use templates for operator<<. We have followed several design principles to allow a customizer to resolve any or all of these concerns. These include keeping functionallity that we do insist on implementing via full C++ in subclasses below interface definitions; the customizer can replace any of those subclasses which violate the conditions of the limited environment. The following techniques can be followed to avoid (or avoid using in a post-init dynamic memory manner) the std:: concepts of list, string, and stream: -- To avoid using lists and maps: We use various ELlist_XXX as typedef-ed to std::list, for instance, typedef std::list ELlist_destI so one option is to replace those with your own container classes, but we do assume list semantics. Similarly, we use various sorts of ELmap_XXX as typedef-ed to std::map, and there we assume map semantics. To try to avoid or put limits on those semantics: ELadministrator To avoid lists altogether you could supply a custom ELadministrator. But if you can tolerate lists if they do not grow after init time, using the provided one should be fine. ELstatsTable The errorLogger will have an additional set of constructors taking an instance of ELstatsTableI & which will allow substituting a derived class that does not do any undesired operations. ELlimitsTable The ELdestinationI derived classes can utilize any limits table or mechanism desired. ELsavebuffer A custom ELdestination may easily be put in place of ELsaveBuffer on the list of sinks (or howenver the ErrorLoggerI accepts sinks). -- To avoid using strings: The ErrorLogger package uses ELstring which is typedef-ed to std::string, so one can easily substitute any class with similar semantics. In partticular, a string class with some hacked allocator to avoid dynamic memory allocation after some startup epoch would be a plausible substitute. The following classes would be places to check for whether your string semantics are adequate: ErrorLog This one is easy: All string-like arguments are done as C-strings via char*, and all internal strings with the exception of the error summary are stored as ELstrings with specified maximum lengths. ErrorSummary When a summary is sent to a destination, it is sent one fixed-length char* representing a single line at a time. So unless the framework explicitly does log.summary(string&), no strings or arbitrarily long char arrays which might have to be malloc-ed are used. ELerrorObj Our provided class does use a string to hold the formatted error message string, however, the customizer can subsitute a different class or use a different form of ELstring. ELlimitsTable ELstatsTable Our provided classes for these do not use strings. ELdestination Our provided class for ELdestination uses strings in an essential way. One could derive off ELdestinationI to form sinks that do not use strings, or rely on a different form of ELstring. ELsavebuffer Our ELsaveBuffer does not use strings, since it strives for a clean and remote-accessible fixed format. -- Customization Hooks -- -- To avoid using streams: ErrorLogger ErrorLogger does not actually use streams, though the syntax is patterned to some extent after the string << syntax. ELdestination Our provided ELdestination uses streams extensively. A customizer would have to create a custom derived class to avoid this. -- To avoid using templates: Templates are a trickier matter. We have avoided using templates in all but two essential circumstances: a) The std::list and std::map make use of templates, so to avoid templates altogether one would have to custom derive custom ELlist_dest and similar classes defined in ELlist.h and ELmap.h. b) ErrorLog accepts items of information supplied to it by operator<<. It formats these into ELstrings using an ostringstream to format the message, and sends them to the list of ELdestinationI and the error message object. Both the accepting and the formatting via ostringstream use templated operator<<, and this HAS TO BE, because otherwise the user would not be free to << any type of string, number or object to the logger! To avoid the use of templates, one would have to substitute a custom file in ErrorLog.icc, supporting the creation of ELstrings from some restricted set of object types. By out rules, an ErrorLog must have << from at least char*, so at least that trivial one must be present. Similarly, one would have to substitute a custom file for ErrorObj.icc, supporting operator<< from presumedly the same restricted set of types. 28) Table Limitations --------------------- In implementations that allocate a fixed amount of space for tables, such as a fixed number of ids in limit tables, the customizer needs to define the behavior when these limits are exceeded. We suggest the following: - When a statistics table (which is a pure map) cannot add another key of error, log that fact (the first time) as an ELwarning2 and ignore the overflow. - When the id-only part of a limits table gets full, additional setLimit commands log an ELwarning2 and become no-ops. - When the count part of a limits table, which is keyed off extended id, becomes full, log the first time as an ELerror2. Additional types of errors are simply not limited. -- Customization Hooks -- 29) Direct logging to one destination -------------------------------------- Although in general all logging is done through ErrorLog objects, which work through the ELadministrator, the frameworker may log a formed ErrorObj directly to an ELdestControl. bool reacted = logfile.log ( ErrorObj & myMsg ); As implied, the destination's thresholds and limits still apply; the log() method returns false if the destination did not react to thes ErrorObj. If this backdoor mechanism is used, it has bypassed the ErrorLog functions. In particular, ErrorLog normally provides module and subroutine strings. To restore this ability, the ErrorObj class has two further methods: myMsg.setModule ( "whatever" ); myMsg.setSubroutine ( "whatever" ); Note that (unless the destination happens to be ELstatistics) error messages logged in this manner will not be reflected in the statistics. Generally, it is a better idea to send things to all destinations using errorlog (myMsg); and let each destination filter out what it does not want. 30) Custom Severity Levels -------------------------- The intent is that all assigned severity levels fall into one of the 14 levels provided; in fact, such fine granularity is normally met with derision. We discourage but do not preclude inventing one's own levels. To allow for this, we establish the level_ variable for the provided ELseverityLevels at intervals of 256 apart. Thus one could do things like const ELseverityLevel ELspecialWarning ( ELseverityLevel::ELsev_Spacing * ELseverityLevel::ELsev_warning2 + 128, "-Q", "Special WARNING!" ); at global level, and use this as you would any of the provided severities. Be warned, howver, that to avoid some nasty space or time overheads and excessive use of the map container, ELadministrator, ELstatistics, and ELlimitsTable all will treat the level_ as chopped to the nearest lower multiple of 256. Thus severityCount and the statistics for counting severities would lump this new ELspecialWarning in with ELwarning2. Similarly, limits set by severity level would lump these together -- but thresholds would distinguish between them. You must not create levels lower than ELsev_zeroSeverity or higher than ELsev_highestSeverity; this may cause problems and would not be caught at compilation time. ******************** * Technical Design * ******************** Design Patterns and C++ Techniques Used --------------------------------------- ELadministrator "Destructable Singleton" ---------------------------------------- Blind Adapters at the User and Framework Interfaces ---------------------------------------------------- One of the goals of the earliest implementation was that the headers seen by bpth the physicist and the frameworker should not change after the initial limited prototype release. The concern is that recompiling all the framework units is a big deal; so even if the interface remains stable, changing anything in the headers is bad. (Re-linking as more functions are enabled, on the other hane, is OK.) Since we had (by consensus or by fiat) firm definitions of the functionallity, this ought to be possible by definining interface headers in the right way. The problem that arose with the naive approach was as follows: The header file for the class contains public methods, and protected data and perhaps supporting methods. While the methods available to the user, can be considered absolutely stable, it would be foolish to suppose that before implementing 90% of the features, we can anticipate every piece of protected data required. The first reaction to this is to say that the actual class is derived off the pure interface class; the user includes just the interface header. The drawback to this is that now the constructor is in a header that the user is not supposed to be concerned with or even including! For instance, if ErrorLog is the interface and the user has to declare an ELErrorLog, that is bad. The classic solution is to use the Adapter pattern C++ implementation suggested in Gamma+3. There ErrorLog would be a public ErrorLogI and a private ErrorLogX. Aside from solvable issues of multiple inheritance, this is a no-go for us: When ErrorLogX changes the user compilation units have to re-compile. The solution chosen is what I call a "Blind Adapter" pattern. The header seen by the user provides an interface class (ErrorLog in our example) with ONLY: * The agreed public methods * A protected pointer to a struct of some type the user should not be concerned with (ELerrorLog) Note that this pointer is declared by a forward declaration: class ErrorLog { public: methods(); protected: struct ELerrorLog; ELerrorLog * X; } Now ErrorLog does not need to include ELerrorLog.h, so when something changes in the data structure needed, user and framework code is COMPLETELY ISOLATED from those changes in terms of re-compilation. The cost is that there is an extra level of indirection in some dispatched methods. This cost is quite small, and the same as the cost of virtual methods, but the compiler is more likely to find optimizations in the latter case. I call this pattern a blinid adapter, but it is also the "Chesire Cat" special case of the "Bridge" pattern in Gamma+3. As such, it has further possibilities: For example, run-time choice of implementation is in principle possible. This pattern is used for all the classes physicist or framework code ought to talk to directly: ELadministrator ELadminX ErrorLog ELerrorLog, ELadministrator, ELadminX {1} {1} The ErrorLog is itself a decorated proxy for ELadministrator. Rather than go thru yet another level of inderiction, we put the ELadminX pointer into the ErrorLog protected area as well. There is one shortfall of this pattern, where implementation actually must be in ErrorLog.h: The ability to accept << from any type implies a template mechanism, and this has to be defined in a file that the user program including ErrorLog.h will depend upon. What that means is that the template has to be implemented very carefully, to get it right the first time. Fortunately, it is quite a simple template. Early Implementation -------------------- The minimal set of code allowing the physicist to use the ErrorLog syntax, and the frameworker to use the actual ultimate syntax in at least the Module class and preferably throughout, is: 1) Complete headers for the following classes: 2) Implementation of the constructor and any applicable singleton/facade patterns for: 3) Implementaion of the attach method for the ELadminstrator, and the associated ELdestControl skeleton. 4) Implementation of log(ErrorObj) in the ErrorLog class. 5) Implementation of operator<< to the ErrorLog. 6) Basic implementations and full interfaces of the following destinations: ELoutput, ELstatistics, ELsaveBuffer. 7) Stubs for reamining methods in all headers. Rejected Alternatives --------------------- The particular syntaxes and features selected were picked in concert with the CDF and D0 people, and represent a consensus. We may have omitted things that are worth doing. However, we do not wish to re-trace ground covered by decisions already made once. Here, I list some of the proposed syntaxes, so that when somebody says "why not change it to do this" we can say we considered that and rejected it. *** How a physicist issues a log message We considered log << ELerror << "id" << ... This was narrowly rejected because the accepted form makes it tough to accidentally use a data-dependant string for the id, e.g. log << ELerror << energy << " is too much energy" <setContext(co); ELdestControl logfile = logger->attach(logfileD); Note 5: ELdestControl details ----------------------------- To avoid any possibility of a frameworker attaching a destination object to the logger, then deleting the destination object and trapping the logger into faulting when the next message tries to invoke methods of the deleted object, the logger has to make a copy of the destination object. But then, the framework can't directly talk to the instance of the object in use by the logger! So logger->attach() returns an ELdestControl object, a hook to control the destination. The destination instantiated by the user can be deleted or allowed to go out of scope. The control object ought to stick around, though the only consequence of letting that go out of scope would be that the framework no longer has a way to affect the behavior of the destination. We ensure that the control object faithfully accepts all methods applicable to the ELdestinationI. This includes setting limits, and directly sending a log message. If a custom destination contains methods outside the normal set defined by ELdestinationI, then the customizer will also need to provide a custom control object for that type of destination, inheriting off ELdestControl but adding dispatchers for the new custom methods. Then when attach returns the ELdestControl, the framework can use that to construct an instance of the derived control class. The ELdestControl contains, of course, an ELdestinationI* pointing to the particular destination object that was created by the newCopy method when logger->attach was called. All of its methods dispatch through this pointer. Since they are all virtual methods in class ELdestinationI, they will go through the derived class virtual table, hence getting the correct versions. Note 6: Limit Semantics ----------------------- If an instance of a message with a particular ID is sent to a destination before setLimit(ID) has been invoked for that ID, but while there is an applicable wildcard limit (for either "*" or the message's severity level), then that ID is assigned an entry in the limits table, and the limit used IS THE APPLICABLE WILDCARD LIMIT. Changing a wildcard limit will not affect this entry once it is in the table. For example, dest->setLimit ("*", 8); log ( ELerror, "Bank Limit"); // will use 8 as limit for Bank Error dest->setLimit ("*", 2); log ( ELerror, "Bank Limit"); // eight instances of Bank Error will log ( ELerror, "Bank Limit"); // be output -- not just 2. log ( ELerror, "Bank Limit"); ... log ( ELerror, "Bank Limit"); A consequence of this is that when dest->zero() is called, the entry for Bank Error would remain in the limits table (still with a value of 8) as if dest->setLimit("Bank Error", 8) had been called. Note 7: Module Name -------------------- In the example in "How the Module Sets Up errlog" we assumed that the Module class has a constructor taking as an argument some information that would allow it to know the name by which this (derived) instance is to be known. We called this information module_selection; it could be as simple as a string, or involve other information that the error logger would not care about. At any rate, the info would be saved and the Module::getModuleName() method would use this info to produce the name. Thus we are assuming that each particular class is providing that information in the initializer list of its constructor, as in: class CentralTracker : public Module { public: CentralTracker() : Module("CENTRAL TRACKER") { // any other construction needed } virtual doTheWork(Event& e){ ... } // whatever other methods and data CentralTracker needs }; class Module { public: Module (string name) { errlog.setModule(name); // This line was added } virtual doTheWork(Event& e) = 0; protected: Errorlog errlog; // This line was added }; Note that very few lines were added to make this Module class support the ErrorLogger mechanism. Initially, I thought a bit more complexity was needed. Here is text from an earlier version of this note: class CentralTracker : public Module { public: CentralTracker() : Module("CENTRAL TRACKER") { // any other construction needed } virtual doTheWork(Event& e){ ... } // whatever other methods and data CentralTracker needs }; class Module { public: Module (char* name) : name_(name) { errlog.setModule(getModuleName()); } ELstring getModuleName() {return name_;} virtual doTheWork(Event& e) = 0; protected: ELstring name_; Errorlog errlog; }; There are more clever-looking ways to accomplish this, but I have found none that actually work. For example, you can't directly supply the module name to the instance of ErrorLog instantiate as a class variable in the base class Module -- it has no way of knowing that name at that point! And you could not make getModuleName a pure virtual in Module, relying on each derived module class to supply its own version which simply returns the name you want -- at the point this function is called, you are initializing the Module member of the derived class, so the virtual mechanism will not apply. Note 8: Includes ----------------- People using the ErrorLogger classes should include the header for each class they are using directly. Thus the Physicist and the Module base class need only #include "ErrorLogger/ErrorLog.h" while the framework probably needs a collection like #include "ErrorLogger/ELadministrator.h" #include "ErrorLogger/ELcontextSupplier.h" #include "ErrorLogger/ELdestControl.h" #include "ErrorLogger/ELoutput.h" #include "ErrorLogger/ELstatistics.h" #include "ErrorLogger/ELsaveBuffer.h" #include "ErrorLogger/ErrorLog.h" And everyone has to say using namespace zmel; or ZM_USING_NAMESPACE( zmel ) /* using namespace zmel; */ Note 9: ELoutput (fileName) ---------------------------- Constructing ELoutput from a string holding the file name is a convenience, but it presents a couple of subtle issues. - What mode should the file be opened in (trunc to overwrite any existing file, app to append to the end if a file exists, or fail if the file exists)? - If the file is opened for append, how do we avoid logs from two jobs from looking like one longer error log? - What to do if the attempt to open the file fails. - Who owns (and deletes when appropriate) the ofstream created to represent this file? We take the following positions: + The file is opened for append. (If the file does not exist, it is effectively opened for simple writing.) + To separate two logs, we always place lines of ====== as divider strings before a log starts (even if it is the first thing in the file). + If the attempt to open the file fails, this destination instead logs to cerr -- we output a "meta error message" to indicate this switch. + If the ELoutput is constructed from a file name, it makes a new ofstream to represent that file. It owns that for purposes of deleting *d at the end. + When a destination is attached to the ELadministrator, a new copy is made of the ELoutput object (to prevent any possibility of a user causing another user's error message to do a segfault by wiping out the destination object). The newCopy() method ***also transfers ownership of the ofstream*** if it was constructed from a file name. So the copy owned by the ELadministrator is the one that will delete the ofstream when it goes away. When ELoutput is constructed from an ostream (which can ouf course be an ofstream), it is assumed the framework has opened that stream with the desired mode (app or trunc) and that if appending, has sent any desired divider strings. And it is assumed that any deleting or destructing of the stream is handled by the frameworker or the system. Note 10: "ELcout: Replacing cout and/or cerr." Although the ErrorLogger package is meant to augment rather than replace cout and cerr, the package supports hooking cout and/or cerr to the logger instead. In section (11) we outlined how to do to this by declaring cout and/or cerr to be variables of type ELcout. Obviously, ELcout has to react appropriately to the << syntax, routing the string it gets to ELdestinations. That sweeps a lot of questions under the rug. For example, there is no natural analogue, in the context of ostream semantics, of supplying the severity level and id. Here, we answer those questions: a) What are the constructors for ELcout? The constructor accepts one argument: An ELseverityLevel. The default constructor sets up as a level of ELunspecified. This severity level is used (at least by standard logging destinations) only to compare to their threshold for deciding whether to react to or to ignore an output item sent to them. b) How does the code use this mechanism (after constructing the ELcout)? This one is set by the very purpose of the mechanism: You leave all your code sending things to cout (or cerr) unghanged. Thus the ELcout type accepts all the stuff that can be streamed to an ostream. c) What about special operations to the cout stream? The intent is to help the user whose existing code does a lot of ordinary cout output. Manipulators, which have meaning when sent via << to ostringstreams, will work. Direct methods on cout (and in fact all methods other than via <<) are NOT SUPPORTED for ELcout. For example, if cout is an ELcout, then cout.flush(), cout.good(), cout.setf(ios_base::showbase) will not compile. Almost nobody uses these routinely. d) How do we get the right cout in scope? If the compilation unit contains, in any compiled file, at global scope, the declaration of ELcout cout (level); instead of using std::cout; this will make the ELcout form available to all code in that compilation unit. Some compilers and/or includes may violate the standard and either have cout at anonymous global scope, or place using std::cout; into a header where it should not be. Either of these would clash with the ELcout cout declaration. One possible fix is to put that declaration into a namespace (say "mine") and explicitly say using mine::cout. A last-ditch brute-force approach if a name class becomes bothersome is to do ELcout myCout (level); and globally substitute myCout for cout. e) How do we route an ELoutput destination to the REAL cout? In the complilation unit that includes the main framework, the frameworker will often be instantiating an ELoutput destination with the ostream cout in its constructor. If cout is declared as an ELcout, then there is the potential for a terrible mess: The logger sends a message to ELoutput, which outputs text to cout, which turns into output sent to ELoutput, and so forth. The solution is to explicitly say std::cout in the constructor of ELoutput: ELoutput outputD ( std::cout ); To avoid utter confusion in the case of a mistake, ELcout is given a constructor off ELcout -- and that constructor will send an explanation to sdt::cerr before exiting the job. f) How do the standard destinations react to these? Of the standard destinations, ELstatistics and ELsaveBuffer will ignore items sent via the output() method. Only ELoutput will reat to them. ELoutput will send the unvarnished text of all items it gets in output() to its associated ostream. g) Will things sent to ELcout appear in multiple places? If multiple destinations are attached to the log, a single cout<< can appear in each one. However, two things may prevent a given output item from appearing at a given destination: - If the severity level appearing in the constructor of the ELcout is lower that the threshold for that destination, the item will not appear. - The base class ELdesinationI has a public bool data member skipOutputItems. If this is set false, no output items will appear at that destination. h) What about endl? Each item sent to an ELcout gets put into an ostringstream via <<. The endl behaves in a sensible way: It inserts \n or whatever the appropriate end-of-line sequence is for the particular machine. Since ELoutput mearely sends on output() items as it gets them, the cout behaviour is "what you do is what you get". That is, if many outputs are done without doing any endl's or \n, then they will be strung together into a long line. i) What about building messages in multiple steps? Again, ELoutput vomits one output() item at a time. It makes no difference whether these are chained on one line or placed all over the code. However, it is not a good idea to mix cout into multi-step formation of a true message to errlog. For instance: ELcout cout; errlog ( ELwarning, "track length" ) << "too long: "; cout << track.length() << "explanation"; // better would be: errlog << track.length() << "explanation"; errlog << endmsg; would imbed the cout stuff into the error message, but might screw up the decisions about formatting, line breaks, and so forth. j) What does this imply about the nature of a generic ELdestinationI? The burden on ELdestinationI is that now it has a bool skipOutputItems, and must have an output(const ELstring &) method -- which will be a nop in the base class and can be overrided in ELoutput. k) Does this type of output produce real ErrorObjs? NO. Even though a severity is implied, that is only for the purpose of ELoutput deciding whether to suppress the items because of its threshold. We thought about creating a true message if the first woud sent ends in a colon (and treating that word as the id); this opens up too large a can of worms. l) How does the initialization of the ELcout instance work? Since each severity level is an instance of ELseverityLevel, and since ELcout cout (ELerror) would be done at global scope in most cases, one has to worry that the severity object (ELerror in this case) may not have been constructed before the ELcout is instantiated. To take care of this worry: The constructor actually takes and stores a refernence rather than an object: ELcout (const ELseverityLevel &). The various levels are declared as externs in ELseverityLevel.h. So ELcout cout (ELerror) produces a reference to what at compilation time is an unresolved external; this is resolved at link time. At the instant cout is constructed, this reference points to memory set aside for ELerror -- it may not as yet hold meaningful information. But by the time any code is entered, ELerror will have been constructed, and thus the reference will be valid. m) What is going to happen before ELcout is fully implemented? In order to avoid forcing user code that does ELcout cout (level) to recompile as ELcout evolves, we will keep ELcout as a proxy, with ELcoutX dealing with implementation needs. However, even at the start this proxy had better send its stuff to ELoutput (and ELoutput had better output it); otherwise, the stub behavior would be to "swallow up" everything streamed to an ELcout -- this would be unacceptable.