Foreword (J. Linnemann) The offline verification consisted of running the L12SIM suite through data samples supplied by physics groups, and running the output through the trigger EXAMINE. Thus, the quantities stared at were the pass rates and the histograms, which were a superset of those seen by the GM shifters. Some of the histogram comparisons were automated by Rich Genik and Terry. Comments on L2 verification procedures for the trigger review. Terry Geld General observations: - There were instances where the L2 offline verification procedure caught bugs before an exe went online. There were also instances where the verification procedure was not enough to keep bugs from going online or from getting into the simulator. - A complete verification system should be in place prior to start of run, contrary to Run I running. Each trigger level will need verification and there needs to be good communication between the trigger groups. - More experts/personnel are needed. This includes the entire range of people involved from experts associated with a particular detector to the trigger experts involved in the verification and releases to people analyzing data offline to be sure we're getting the data we should be. - A wide-ranging set of data/monte carlo input is important. Much of the initial testing done by authors or detector groups is done on good data samples and may not contain sufficient statistics in all phase space to see any strange behavior. The large TOP MC sample and the QCD sample were useful for their large statistics. Reasonably up-to-date samples are also necessary. There was one instance where we didn't catch a samus bug because we didn't have post-shutdown muon data. A dedicated disk area for data samples would speed up the process. The robot tape mounting system is a fine idea, if it is more reliable than the FCC operators. - Communication with trigger/simulator novices. It took a while for me to understand that the trigger/simulator use different (stable!) libraries than does the usual analysis code. There was at least one instance of someone who wrote buggy code because it was based on non-production libraries (this was easily caught by the verification process which automatically picked up the correct libraries). - The baseline comparison method (which was the verification method used in RunI) is a good one. Any differences between versions was supposed to be thoroughly understood before a new version was passed. However, there were several instances of problems. The most common problem was a change in STP parameters which led to small changes in results. When verifying the latest exe which included an STP change, there was a tendency to blame any unresolved changes on the STP changes. Breaking the up-date into more steps (first change the STP, then change the code) was usually sufficient to resolve any questions about STP effects. Another problem arose with the update to V7.01. The changes were so drastic that the baseline was useless and bugs were allowed to creep in. I'm not quite sure how to solve these types of problems. Looking at pass rates is useless unless there is some understanding of what those rates should be, hence the baseline. More diligence is always a good idea, but it would be useful to come up with a method that could programmed into the process. - The final verification process achieved is a good model. This consisted of: 1) complete verification by the author(s) of new code 2) complete verification by the relevant subdetector/subgroup 3) final verification by the Level 1/2/3 group 4) any bug-fixing resulting from steps 1-3 to be done by the author(s) The L2 experts wasted too much time debugging other people's code. Not an appropriate use of resources. Suggestions? - There are two issues for bug-finding: finding bugs in the code which translate to online bugs and finding bugs in the code which are specific to the simulator (not online). - Most of the bugs which went online were found by physics groups looking at their data. There was often a time lapse in finding those bugs. One idea would be to incorporate an official offline step in the verification procedure to more quickly catch those bugs, perhaps based on a monitor stream? The problem with this is that this data sample would not be constant so the comparison would be a statistical comparison to some baseline (e.g., data from the last exe or data from yesterday's running) rather than an exact comparison. Subtle differences may not be easily detectable on a statistical basis. - As for the simulator-only bugs, most were eventually caught by individual users. Maybe improving the types of information we look at would help to find bugs. Perhaps looking at electron efficiency rather than just the distribution of electrons in Et, eta, phi.