Title |
The LHCb Farm Monitoring and Control System |
Submitted |
21-JAN-07 05:04 (UTC -06:00) |
Classification |
Controls and Monitoring Systems |
Modified |
19-MAR-07 16:45 (UTC -05:00) |
Session |
CM-Exist |
Presentation |
Oral |
Speaker |
Domenico Galli |
Paper ID |
CM-Exist03 |
|
|
Paper PDF |
Download |
Author(s) |
Domenico Galli, Daniele Gregori (Bologna University, Bologna), Clara Gaspar, Eric van Herwijnen (CERN/LHCb, Geneva), Federico Bonifazi, Angelo Carbone (CNAF, Bologna), Umberto Marconi, Gianluca Peco, Vincenzo Maria Vagnoni (INFN-Bologna, Bologna) |
Abstract |
The LHCb experiment at CERN will have an on-line trigger farm composed of up to 2000 PCs. In order to monitor and control each PC and to supervise the overall status of the farm, a Farm Monitoring and Control (FMC) application was developped. The FMC is based on DIM(*) and is accessible both through a command line interface and through a PVSS graphical interface. The FMC consists of a Logger, to collect the application messages (which can work either in no-drop or in congestion-proof mode, with filter and duplicate suppression capability), an IPMI Power Manager to switch on/off the farm nodes and monitor physical parameters, a Task Manager to start/stop processes (able to manage real-time schedulers, to real-time notify a process termination and to redirect application stdout/stderr to the FMC logger), a Process Controller to manage automatic process respawn and a detailed but light-weight Monitoring system. The FMC is an integral part of LHCb's Experiment Control System, in charge of monitoring and controlling all online components: it uses the same tools (DIM, PVSS, FSM, etc.) to guarantee its complete integration and a coherent look and feel throughout the control system. |
|
Word Count: 190 Character Count: 1176 |
Footnote |
(*) C. Gaspar and M. Donszelmann, "DIM, Distributed Information Management
System, for the Delphi Experiment at CERN", IEEE RT'93 (Vancouver, June 8-11 1993). |
Funding Agency |
|