From bauerg@mit1.fnal.gov Tue Oct 10 13:34:20 2000 Date: Tue, 10 Oct 2000 11:17:36 -0500 From: Gerry Bauer To: Michael Schmidt Cc: BauerG@fnal.gov, giorgio.chiarelli@pi.infn.it, Chlebana@fnal.gov, Frisch@fnal.gov, ptl@fnal.gov, mlindgre@fnal.gov, ohsugi@fnal.gov, luciano@fnal.gov, spalding@fnal.gov, yagil@fnal.gov, bauerg@mit1.fnal.gov Subject: Evt Builder & L3 in 2b hi folks, the following is the advertised response from Evt Builder & L3 folks to some Run 2b questions (note: fkw = Frank Wuerthwein): --------------------------------------------------------------- Q: Would you provide some information on issues for Run 2b event builder/L3. there are two levels of issues: 1) possible *performance* upgrades that might be needed, and 2) "upgrades" just to MAINTAIN the system. item 1): right now we have no idea what may be required for improved performance, BUT you could hopefully make a statement of the data rate the current system can reasonably expect to handle. fkw answers: - ------------- - -> We've demonstrated close to 600Hz @ 15*16kB = 240kB event size 9 months ago. This was not in a realistic environment but a benchmarking test with fake data. - -> Ideal scaling behaviour would give us roughly 15*65Hz = 1kHz @ 15*16kB = 240kB event size => With some work I would expect us to be able to reach 600-800Hz @ 240kB, i.e. roughly 60-80% of perfect scaling behaviour. This assumes that the load per SCPU crate can be balanced!!! If we wanted to increase the total bandwidth, I see the following options: (a) we switch from the present OC-3 to OC-12 and buy the new switch ASX 4000 from Marconi. On the Marconi web pages there's a cost estimate for their product in it's 64 port OC-12 configuration of $500k. I.e. they quote $8k per port as the cost. We would need only 32port switch, but also 32 cards on the other side of the links. I.e. the 64port estimate of $500k is probably accurate enough for now. OC-12 provides us with 4 times the link bandwidth we presently have with OC-3. I.e. a theoretical limit of 4000Hz @ 240kB per event. The system limitations are then likely to be not the ATM switch but instead: -> VRB readout bandwidth via VME: For 16kB event fragments per SCPU 4000Hz would amount to 64MB/sec which we won't be able to achieve. We have benchmarked this some time ago at close to 50MB/sec. Assume a fudge factor of 60-80% and you might expect to be limited to something like 30-40MB/sec which translates into a maximum L2 accept rate of roughly 2-2.5kHz. -> SCRAMNet maximum message passing rate: To reach the 2kHz mentioned above we would have to change the message passing protocol. This requires some work but isn't too unreasonable to expect on the timescale for run2b. -> New ATM driver development for OC-12 cards under Linux as well as VxWorks !!! This would be most likely the largest development effort in this scenario. Even if we could get commercial ATM drivers for both VxWorks and Linux, we would almost certainly need to make changes to the drivers because our application is quite different from what is done in the telephony industry. (b) Abandon ATM and go to Gigabit Ethernet using TCP/IP. Performance increase would be a factor of at least 5. We'd stick with the SCPU crates we have now. Cost guestimate: let's be generous: 16*5k for VME cards, 16*2k for PCI cards 20k for the switch. And we might as well send the control messages also over Gigabit Ethernet. I.e. total is < $150k . Drawback: Would require significant amount of development !!! It's not obvious to me that we would want to do this because the costs in human resources completely dominate. One would have to look into this carefully if one thought that 2kHz L2accept rate is not sufficient. Bottom line: => The present system will provide 600-800Hz @ 240kB event size. =>An upgrade to 2kHz @ 240kB event size would cost roughly $500k and at least 2-3 person years of development. =>An upgrade to 3kHz or more could probably cost a bit less in hardware but be much more expensive in person years development costs. It's far from clear that this would make sense on the time scale we're talking about for run2b unless we really NEED the bandwidth and the rest of the DAQ can deliver it!!! Now to your second question: - -------------------------- Q: item 2): what hardware would you imagine needing/wanting to replace around 2003-4 JUST to keep cdf going "as is" into 2007??? fkw answers: - -------------- Obsolete Hardware: ------------------ -> the ATM cards in the SCPU crates are obsolete but we bought enough of them to be prepared for a long run. If we don't have an unreasonably large failure rate this shouldn't be a problem. -> ATM cards in the Linux PC's are currently going out of production. We have a lot less spares for this and will need to watch the availability of these cards very closely. It is likely that we'd want to buy some soon just to be safe. -> ATM switch. The ASX 1000 switch we have has a huge installed base. And Marconi has repeatedly said that they will back this product indefinitely. I don't think this is a problem. -> SCRAMNet hardware. In the previous run SCRAMNet was very stable after the initial shakedone. We would probably want to buy some more spares than we presently have but I don't think this is of real concern. Obsolete software: ------------------ -> We want to upgrade the ATM driver on the Linux side because it is already dated. I.e. we are presently running a different version of Linux on converter nodes than processor nodes. It is likely that we would have to go through this one more time for run2b. Bottom line, I'd guess we'd want to spend roughly $100k in spare parts and at least one person year on software development to just keep on going with the same system at 600-800Hz for 240kB event size L2accept rate. All of the above is EVB only. In addition, we'd presumably want to have the option of upgrading the L3 farm with roughly the same amount of money it cost in first place. I'd assume that we would want to have another $300k in the budget to be able to upgrade the existing L3 system. Many thanks, Frank Frank Wuerthwein MIT Tel: (617) 452 2705 http://mit.fnal.gov/~fkw