Foreword (J. Linnemann)
    The offline verification consisted of running the L12SIM suite through data
samples supplied by physics groups, and running the output through the trigger
EXAMINE.  Thus, the quantities stared at were the pass rates and the histograms,
which were a superset of those seen by the GM shifters.  Some of the histogram
comparisons were automated by Rich Genik and Terry.


Comments on L2 verification procedures for the trigger review.

                        Terry Geld

General observations:
 - There were instances where the L2 offline verification procedure caught bugs
   before an exe went online.  There were also instances where the verification
   procedure was not enough to keep bugs from going online or from getting into
   the simulator.

 - A complete verification system should be in place prior to start of run,
   contrary to Run I running.  Each trigger level will need verification and
   there needs to be good communication between the trigger groups.

 - More experts/personnel are needed.  This includes the entire range of people
   involved from experts associated with a particular detector to the trigger
   experts involved in the verification and releases to people analyzing
   data offline to be sure we're getting the data we should be.

 - A wide-ranging set of data/monte carlo input is important.  Much of the
   initial testing done by authors or detector groups is done on good data
   samples and may not contain sufficient statistics in all phase space to see
   any strange behavior.  The large TOP MC sample and the QCD sample were
   useful for their large statistics.  Reasonably up-to-date samples are also
   necessary.  There was one instance where we didn't catch a samus bug
   because we didn't have post-shutdown muon data.  A dedicated disk area for
   data samples would speed up the process.  The robot tape mounting system is
   a fine idea, if it is more reliable than the FCC operators.

 - Communication with trigger/simulator novices.  It took a while for me to
   understand that the trigger/simulator use different (stable!) libraries
   than does the usual analysis code.  There was at least one instance of
   someone who wrote buggy code because it was based on non-production
   libraries (this was easily caught by the verification process which
   automatically picked up the correct libraries).

 - The baseline comparison method (which was the verification method used in
   RunI) is a good one.  Any differences between versions was supposed to be
   thoroughly understood before a new version was passed.  However, there were
   several instances of problems.  The most common problem was a change in STP
   parameters which led to small changes in results.  When verifying the latest
   exe which included an STP change, there was a tendency to blame any
   unresolved changes on the STP changes.  Breaking the up-date into more steps
   (first change the STP, then change the code) was usually sufficient 
   to resolve any questions about STP effects.  Another problem arose with the
   update to V7.01.  The changes were so drastic that the baseline was useless
   and bugs were allowed to creep in.  I'm not quite sure how to solve these
   types of problems.  Looking at pass rates is useless unless there is some
   understanding of what those rates should be, hence the baseline.  More
   diligence is always a good idea, but it would be useful to come up with a
   method that could programmed into the process.

 - The final verification process achieved is a good model.  This consisted of:
       1) complete verification by the author(s) of new code
       2) complete verification by the relevant subdetector/subgroup
       3) final verification by the Level 1/2/3 group
       4) any bug-fixing resulting from steps 1-3 to be done by the author(s)
   The L2 experts wasted too much time debugging other people's code.  Not an
   appropriate use of resources.

Suggestions?
 - There are two issues for bug-finding: finding bugs in the code which
   translate to online bugs and finding bugs in the code which are specific to
   the simulator (not online).  

 - Most of the bugs which went online were found by physics groups looking at
   their data.  There was often a time lapse in finding those bugs.  One idea
   would be to incorporate an official offline step in the verification
   procedure to more quickly catch those bugs, perhaps based on a monitor
   stream?  The problem with this is that this data sample would not be 
   constant so the comparison would be a statistical comparison to some
   baseline (e.g., data from the last exe or data from yesterday's running)
   rather than an exact comparison.  Subtle differences may not be easily
   detectable on a statistical basis.   

 - As for the simulator-only bugs, most were eventually caught by individual
   users.  Maybe improving the types of information we look at would help to
   find bugs.  Perhaps looking at electron efficiency rather than just the
   distribution of electrons in Et, eta, phi.