---------- Forwarded message ---------- Date: Wed, 11 Oct 2000 18:03:24 -0500 From: Gerry Bauer To: Michael Schmidt Cc: BauerG@fnal.gov, giorgio.chiarelli@pi.infn.it, Chlebana@fnal.gov, Frisch@fnal.gov, ptl@fnal.gov, mlindgre@fnal.gov, ohsugi@fnal.gov, luciano@fnal.gov, spalding@fnal.gov, yagil@fnal.gov, bed@fnal.gov, goshaw@fnal.gov, bauerg@mit1.fnal.gov Subject: MORE DAQ thoughts (non-Evt/L3) for Run 2b hi folks, the following are some thoughts from Steve Vejcik on 2b... Part I: -------- Hi, I've attached another wordy core dump of ideas for upgrading the daq circa 2005 with an attempt to put out possible arguments and explanations for upgrading different parts of the L2 system. It's broad and motivated by the unknowns: what limits our throughput now and what will limit it later? What parts will be obsolete and in need of upkeep? Here are some guesses on my part using some numbers from the L3/EVB info I was sent and some experience for the past few years of the Run 2 upgrade. These are in descending order of desireability*feasibility (A new L2 crate would be great but is it realistic?) The first number is cost (in 1000$) and the second is the number of man-years. The first column is for a full scale upgrade for better capability in Run 2b and the second column is for just maintaining current capability through run2b. As pointed out in the core dump, some areas overlap systems that others outside the DAQ group have developed for Run II. Improve Maintain for Run 2b through 2007 Online Computing 50/2 150/8 Host side software 30/10 10/20 (includes JAVA,readout code, monitoring tools, display packages, etc.) New Crate OS/Software 50/4 -/4 SCRAMnet-TriggerSupervisor 75/2 - Interface New Crate CPU's 800/4 80/4 New L2 Crate 40/8 - TRACER upgrades 650/10 - Total 1,695/40 240/32 Part II: -------- Run 2B DAQ Core Dump, Revision 2 ================================== In parallel with the L3/EVB folks, I've attempted some classification of possible DAQ changes over the next several years. It is not clear exactly what the current bottlenecks in the DAQ system throughput are. This is being closely examined now. Even after better understanding the system performance, accurately extrapolating to a Run 2B-like environment cannot be done without significant uncertainty. However, one can imagine that it may be in the interest of maintenance, performance, and physics to improve parts of the system. The notes below sketch out what would be needed to undertake improvements in various components and which ones can clearly not be changed. There are some areas for which there is some blurring the lines between the DAQ system and components that use it-such as the front end electronics. I've generally taken an inclusive approach. I've also semi-intentionally left out Silicon-specific pieces. Finally, I've tried to clarify what area's might need some maintenance-type upgrade vs. improvement-type upgrades along the lines of what the EVB/L3 group has done. 1. Front End Crates a. Hardware i. The DAQ system uses Motorola MVME-2301 in most of the front end crates. While they easily meet the requirements for readout processing and VME back-plane operations, it is possible that one may want to add a faster CPU, Operating system, Processing power-perhaps implementing some distributed triggering within the front end crates. Real Time processor development is certainly a "hot" field. Replacing 130+ Cpus would currently work out to ~130*5000$=$650,000. The chief difficulty would be ensuring a compatible software environment, in particular one that can accomodate the SmartSockets type scheme used by the DAQ. This may be a motivation for going to a Linux-based system instead. ii. Crate Electronics a. The number of custom designed VME boards in the experiment make it prohibitively difficult and expensive to imagine any change to these components. However, it is conceivable, though perhaps not likely, that one could imagine an upgrade to the TRACER boards that populate every non-Silicon front end crate. Perhaps one could accomodate additional buffering on new TRACER boards that would allow for more localized data processing for example. Cost of 130 new TRACER2, assuming $5000 per board including development cost: $650k. b. Another exception to the rule of thumb that no boards will be changed might be the L2 system, the L2 processing in particular. In Run 1, the situation arose that the processors were not supported and the then-new Alpha chip based boards had to be implemented ahead of schedule. It is not clear whether something similar might happen in the next several years. At the same time, as pointed out w.r.t. vme crate cpu's, Processor technology is a very rapidly evolving area right now-and not just the chips themselves but what applications can use them. Although we don't know the throughput bottlenecks for the high luminosity era, past experience tells us that advancing the capability and intelligence of the trigger system can pay big dividends and is an eventuality we ought to be prepared for. Within the DAQ, implementing a new VME-based processing system for the L2 Trigger is achievable. The essential requirements that need to be thought through are the interfaces: 1. Getting Data into to the processors The current scheme accomplishes this via custom boards in the L2 System. They require an interface with the L2 Processor boards that is specific to the Alpha-based system currently in use. If a non-Alpha based upgrade were to be considered, this would like pose the greatest difficulty as it might neccessitate a redesign of the entire L2 Decision crate-currently comprised of 8 VME boards of which 4 are the processors themselves. 2. Communication with the Trigger Supervisor The current L2 System relies on a custom set of boards in the Trigger Supervisor system to mediate the communication of Level 2 Decisions between the processors and the Trigger Supervisors. The latter then synchronize the transmission of trigger information to the front end crates. 3. Communication with the Host environment. The current design relies on communication with Run Control through the Crate CPU and with a host computer to obtain database stored parameters and algorithms though the L2 Processor board itself. These interfaces are not likely to pose significant challenges. Cost: Interface boards, maybe 4x3000 each=12k. Processor board(s) maybe 4x5000 =20k I'm assuming that the new processors would be cheaper for more capability as technology improves. Otherwise, its not as clear we'd do it. b. Software i. Operating System -non VxWorks real-time operating systems are currently viable but do not yet exhibit the same interrupt responsiveness. It might turn out that improving the speed of this would be desireable. This would also be important as VxWorks is not guaranteed to always be a supported system-it could be that support for our hardware goes away for instance. Changes to this would have implications for the control/communications (SmartSockets) facilities, front-end (readout) code, and choice of crate cpu (currently MVME-2301's). ii. Readout Code The CDF readout code is written entirely in c. It is unlikely that this would change. If a new operating system and/or hardware is desireable, the most difficult aspect would be accomodating the control/communications network which currently is framed around SmartSockets and CDF-supported DAQMSG packaging. It is not easy to envisage this being changed. However, since SmartSockets is supported on Unix/Linux systems as well as VxWorks, it seems that supportable flexibility in the system exists. 4. EVB/DAQ Interface a. Trigger Manager Interface- Coordination of the Front End system (including the Trigger) with the Event Builder is accomplished with the Trigger Manager system. This facility provides a fast communication network (SCRAMnet) for messaging within the event builder and between the event builder and the Trigger Supervisor. The latter is the central point through which the status of events is maintained-in particular, whether they are marked for uploading to the event builder. This system introduces an as yet unmeasured but finite overhead to the readout time for the DAQ. It could desireable to think about improving it or altering it. The basic element is the SCRAMnet VME communication modules. b. Front-End VRB Data interface. This is currently based on serial optical links. It is unlikely that these would need to be replaced. However, there is little data on the longetivity of either the fiber medium or the TAXI interface chips. 5. System Software a. Monitoring Tools Event Stream monitoring is accomplished with offline algorithms, tools, and the ROOT package. Monitoring of the DAQ system also relies on status information transmitted via the DAQ communication/control network with SmartSockets/DAQmsg facilities. This relies heavily on JAVA. Additionally, a still-evolving tool for the DAQ is a JAVA based error handler. Imporant upgrade issues for these elements are 1) Upgrades to the Computers themselves and to the software languages (such as JAVA). The software maintenance will need to be tracked carefully as changes to it will almost certainly be needed but touch many areas of the DAQ system. Fortunately, this will not likely incur a heavy drain on resources. Performance upgrades might alse benefit for the error handling system which could benefit from a more intelligent system that is aware of problems and can implement sophisticated algorithms in an automated way. This is conceivably an area that outside development may provide solutions. Other software-related areas that may need maintenance/upgrade include: -Oracle Database and associated tools -3rd Party (non-ROOT) display visualization packages (for Event stream monitoring). Of these different areas, the use and upgrade of the JAVA programming language is of greatest impact. It is not expensive but needs the attention of dedicated personnel. In fact for all of the software changes discussed, the cost is mostly guesswork, but is certainly small. The main need is for people to develop, test, and implement. 6. Infrastructure a. Network The B0 ethernet-based network is used to communicate with system crates. The performance beyond current capabilities is not terribly significant for higher luminosity configurations, other than as the provider of the medium for status and monitoring information. There are certain utilities that are more reliant on this medium,e.g. the software event-builder with the calibration of the Shower Max System. If this really needed to be improved, it is probably more likely that changes in the hardware event builder would be accomodated first. It is unclear what sort of maintenance type upgrades might be needed for the network switch facilities-currently a CISCO system. b. Computing The online system currently maintains approximately 40 Linux PC's, 12 SGI workstations and a couple WindowsNT based PC's. It is likely that some evolution of this system on the time scale of Run II would be of use. While the path for such a change is not clear, the cost can be estimated in a rough way using current prices. Using a similar configuration to the current one would cost ~50x3k$=150k$.