# **EEE**Conference 2 Charge Particle Track **Processors for** Hardware Triggering Levan Babukhadia **SUNY** at **Stony Brook**

# **Fast Triggering on Tracks**

- Trigger Systems in HEP collider experiments are typically multi-level
- W/Z-boson, Top, Higgs, SUSY, other exotic physics require highly efficient triggering on high transverse momentum electron, muon, and tau leptons
- Bottom Physics program requires triggering on low transverse momentum tracks
- This needs to be done in Level 1, the first and fastest trigger stage typically implemented in hardware
- Very fast track finding, algorithms tightly connected with the detector architecture

- Need to find and either report or store tracks every bunch crossing (e.g. 396 or 132 ns at the Tevatron)
- Good resolution in transverse momentum and azimuth required
- Cruder tracks can be used in Level 1, but more detailed list of tracks needs to be provided to later stages (e.g. Siliconbased detached vertex triggers)
- Often matching with triggers from other detectors such as Calorimeter and Muon system is needed

In this lecture, I will review fast, hardware digital charged particle triggers systems as implemented in currently running HEP collider experiments:  $D\emptyset$  and CDF at the Tevatron and BABAR at PEP-II

# **Physics Challenges ® The Upgraded Tevatron**

#### Physics goals for Run 2

- precision studies of weak bosons, top, QCD, B-physics
- searches for Higgs, supersymmetry, extra dimensions, other new phenomena

#### require

- electron, muon, and tau identification
- jets and missing transverse energy
- flavor tagging through displaced vertices and leptons
- luminosity, luminosity, luminosity...

|                                               | Run 1b                                                                                                                                       | Run 2a               | Run 2b               |  |
|-----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|----------------------|----------------------|--|
| Bunches in Turn                               | 6×6                                                                                                                                          | 36 × 36              | 140×103              |  |
| √s (TeV)                                      | 1.8                                                                                                                                          | 1.96                 | 1.96                 |  |
| Typical L (cm <sup>-2</sup> s <sup>-1</sup> ) | 1.6 ×10 <sup>30</sup>                                                                                                                        | 8.6×10 <sup>31</sup> | 5.2×10 <sup>32</sup> |  |
| ∫ Ldt (pb <sup>-1</sup> /week)                | 3.2                                                                                                                                          | 17.3                 | 105                  |  |
| Bunch xing (ns)                               | 3500                                                                                                                                         | 396                  | 132                  |  |
| Interactions / xing                           | 2.5                                                                                                                                          | 2.3                  | 4.8                  |  |
|                                               | Run 1 $\rightarrow$ Run 2a $\rightarrow$ Run 2b<br>0.1 fb <sup>-1</sup> $\rightarrow$ 2–4 fb <sup>-1</sup> $\rightarrow$ 15 fb <sup>-1</sup> |                      |                      |  |

IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

#### FERMILAB'S ACCELERATOR CHAIN



#### 2.0E+31 2.0E+31 1.8E+31 8E+31 1.6E+31 .6E+31 1.4E+31 4E+31 1.2E+31 1.2E+31 1.0E+31 1.0E+31 8.0E+30 8.0E+30 6.0E+30 6.0E+30 4.0E+30 4.0E+30 2.0E+30 2.0E+30 0.0E+00 0.0E+00 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 Relative Store # Peak Luminosity A Peak Lum 20X Average 05/27/02

**Collider Run IIA Peak Luminosity** 

#### Peak Lum. achieved over 2 ×10<sup>31</sup> cm<sup>-2</sup>s<sup>-1</sup> Planned to reach Run 2a design by Spring 2003

# The Upgraded DÆ Detector



- New tracking devices, Silicon (SMT) and Fiber Tracker (CFT), placed in 2 T magnetic field
- Added PreShower detectors, Central (CPS) and Forward (FPS)
- Significantly improved Muon System

- Upgraded Calorimeter electronics readout and trigger
- New forward proton spectrometer (FPD)
- Entirely new Trigger System and DAQ to handle higher event rate





## **DØ Run II Tracking Detectors**

Run I: no magnet; drift chamber tracking with TRD for electron ID



- Silicon Tracker
  - Four layer barrels (double/single sided)
  - Interspersed double sided disks
  - \$40,00 channels
- Fiber Tracker
  - Eight layers sci-fi ribbon doublets (z-u-v, or z/
  - ◆ 74,000 830um fibers w/ VLPC readout



#### **The DØ Central Fiber Tracker**

Scintillating Fibers
Up to |h| =1.7
20 cm < R < 51 cm</li>
8 double layers
CFT: 77,000 channels



IEEE NSS/MIC 2002, Norfolk, Virginia, 10 - 16 November, 2002

#### Zoom in to run 143769, event # 2777821



Technology Focus. Subatomic Physics

# 500+ Xilinx FPGAs Search for Elusive Higgs Boson at 1.5 Terabytes per Second

With an array of more than 500 Virtex and Spartan FPGAs processing 1.5 terabytes of real-time data per second, scientists at the Fermi National Accelerator Laboratory hope to track down the last subatomic particle — the Higgs boson.

http://www.xilinx.com/ publications/xcellonline/ partners/xc\_pdf/xc\_higgs44.pdf November, 2002

liel land

WE FIND OURSEIVES IN A REWLIDERING WE WANT TO MAKE SENSE OF WARIN WHAT WE SEE AROUND US AND ASK WHAT IS THE NATURE OF THE UNIVERSE? - STEPHEN VE BEMONE LINSAN PROFESSOR IF METHONICIES AT CAMALOFF IM NYSSITY

by Mark Hovener Science Writer, Software Consultant be weed Binner th com

#### Nick Hatt Gold FAE. Avnet Design Services nich hatti Dametcam

Inside the four-mile long Tevatron, the world's most powerful particle accelerator, protons and antiprotons collide at nearly the speed of light, creating bursts of energy and showers of millions of subatomic particles. If theoretical predictions are correct, over the next five years a million billion collisions (10%) will produce only 120 events with the characteristic pattern most easily recognizable as evidence of the existence of Higgs boson.

Discovery of the Higgs boson will verify the "Standard Model" theory that is the foundation of modern particle physics. Finding a Higgs boson needle in this haystack of particles, however, requires a digital signal processing (DSP) system capable of gathering and processing 1.5 terabytes of data per second.



#### **DØ Track and Preshower Trigger FPGAs**

- To provide EM-id in |η| < 1 in Level 1
  Find cluster of CPS axial scintillator strips in 80
  azimuthal 4.5° sectors
  Likely to use CPS in J/Ψ triggers, in W/Z only with
  real high luminosities</li>
- 1/4 Detector CAL tower Pre-Shower CFT Layers
- To provide charge lepton id in |η| < 1 in Level 1</li>
   Find tracks in 4 p<sub>T</sub> bins in axial fibers of 80 azimuthal 4.5° CFT sectors Match CFT tracks with CPS clusters in Level 1 within pre-optimized window
- To help with muon id in Level 1

After ~900ns from the time of collision, send the CFT tracks to L1 Muon Allows having  $p_T$  thresholds in single muon triggers

- For possible handle on multiple interaction Provide average hit count of CFT axial fiber hits
- To provide EM-id in forward regions ( $|\eta| < 2.6$ ) in Level 1 Find clusters in FPS scintillator strips in 16N/16S 22.5° azimuthal sectors
- To provide charged lepton id in forward regions in Level 1 Match FPS clusters in downstream strips with upstream scintillator hits



#### **DØ Track and Preshower Trigger FPGAs**

• To control rates in Level 1

Match preshower and calorimeter at quadrant level via p-terms in the L1TFW

• To provide  $\tau$  identification in Level 1

Use track isolation making use of the fact that  $\tau$ 's decay dominantly hadronically but unlike QCD give pencil-like jets (L1Cal resolution is poorer)

- To help refine charged lepton id in Level 2 Upon L1 Accept, send to Level 2 preprocessors detailed lists of CFT tracks and CPS/FPS clusters Perform track p<sub>T</sub> sorting and cluster selection such as to avoid biases from truncations
- To help with displaced vertex id in Level 2 STT Send a p<sub>T</sub> ordered list of tracks in SMT sextants STT uses CFT tracks and SMT hit clusters to perfor global track fit in L2, significantly improving track p<sub>T</sub> resolution and capability of triggering on high impact parameter tracks

1.25 m

#### **CTT Digital Front End Motherboard**



DFE MotherBoard Revision A

> Uniform throughout the system

# Front view of a DFE sub-rack with custom backplane and a DFE Motherboard installed



IEEE NSS/MIC 2002, Norfolk, Virginia, 10 - 16 November, 2002

#### **CTT Digital Front End Daughterboards (1)**



| Sin           | gle Wide DE | 3 (5 FPGAs in | BG432 foo | tprint)       |
|---------------|-------------|---------------|-----------|---------------|
| <b>XVC600</b> | XVC400      | <b>XVC400</b> | XVC400    | <b>XVC300</b> |

IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

#### **CTT Digital Front End Daughterboards (2)**



| Double Wide DB (3 FPGAs in BG560 footprint) |  |              |  |        |  |
|---------------------------------------------|--|--------------|--|--------|--|
| <b>XVC600</b>                               |  | XVC600       |  | XVC600 |  |
| or                                          |  | or           |  | or     |  |
| <b>1000E</b>                                |  | <b>1000E</b> |  | empty  |  |

#### **CTT Digital Front End FPGAs**

- Creative and innovative high-speed trigger algorithms designed for FPGAs
- FPGAs run at RF clock ~ 53 MHz
- Need to synchronize all input records in all FPGAs to cross the two clock domains
- Receive all data, correct for single-bit transmission errors, re-map, and present the data to the processing algorithm
- Format the output according to the protocols to transfer on either LVDS, G-Link, or FSCL
- Implement 36-event-deep L1 pipeline, send inputs to the L3 upon L1Accept

- Xilinx Virtex FPGAs throughout the digital CTT system
- XCV400 0.5M gates, XCV1000E
   1.5M gates
- Same footprint but different sizes allows needed flexibility in board/layout design
- Each FPGA has 4 global clock and ~22 secondary clock nets; important for synchronization
- Flexible Virtex RAM architecture
- Large dual-port memories, important for synchronization and pipelining of events
- Good price-to-performance ratio

#### VHDL

- Offshoot of the Very High Speed Integrated Circuits (VHSIC) founded by Department of Defense in late 1970s and early 1980s
- Describing complex integrated circuits with hundreds of thousands of gates was however very difficult using only gate level tools
- A new hardware description language was proposed in 1981 called VHSIC Hardware Description Language, or VHDL
- In 1986, VHDL was proposed as IEEE standard and after a number of revisions as the IEEE 1076 standard in 1987
- In some ways it is like a high level programming language but in *many* ways it is very different
- Most of the people in the DØ team, including myself, did not know any VHDL before this project
- I will review just a few of many examples of algorithm implementation in VHDL in the CTT system
- VHDL firmware designed, simulated, and implemented using advanced CAD/CAE tools: Xilinx ISE, Aldec Active-HDL, Synopsis FPGA Express (Xilinx Edition), Synplicity

#### **VHDL vs Software (1)**

```
void FPSWedge::CluFind() {
  int IStrip; int NewCluster=1;
  for( IStrip=1;IStrip<=N SHO;IStrip++ ) {</pre>
     if ( ShoL[IStrip]==1 ) {
                                        Imagine that you want to find
       if ( NewCluster==1 ) {
                                        "clusters" of contingeous ones
         NewCluster = 0;
                                      in a bitstream of ones and zeros...
         NClu++;
         CluSta[NClu] = IStrip;
                                        00110000011110101110000
         CluWid[NClu] = 1;
       }
                                            Here is an example of
       else if ( NewCluster==0 )
                                          accomplishing this task in
         CluWid[NClu]++;
                                              software (C++)...
     }
                                          The cluster-finder code fits
    else if ( NewCluster==0 )
                                              on this one slide!
       NewCluster = 1;
```

#### **VHDL vs Software (2)**

| Ζ   |                                           |           |      |        |               |      |          | Input st           | ring |
|-----|-------------------------------------------|-----------|------|--------|---------------|------|----------|--------------------|------|
|     |                                           | 00111001  | 0001 | 100011 | <b>110000</b> | 0000 | )1100111 | .1                 |      |
| Z•7 | $\overline{\mathbf{Z}}_{-1} = \mathbf{F}$ |           |      |        |               |      | CI       | uster <u>S</u> ta  | irts |
|     |                                           | 000010010 | 001  | 100000 | 0010000       | 0000 | 0100000  | 1                  |      |
| Z•  | $\overline{\mathbf{Z}_1} = \mathbf{S}$    |           |      |        |               |      | Clu      | ster <u>F</u> inis | hes  |
|     |                                           | 001000010 | 0001 | 100010 | 000000        | 0000 | 1000100  | 0                  |      |
|     |                                           | Region 3  |      | Reg    | ion 2         |      | Region 1 |                    |      |
|     | Start                                     | Finish    |      | Start  | Finish        |      | Start    | Finish             |      |
|     | 0                                         | 0         |      | 0      | 0             |      | 0        | 0                  |      |
|     | 0                                         | 0         |      | 0      | 0             |      | 0        | 0                  |      |
|     | 0                                         | 0         |      | 0      | 0             |      | 0        | 0                  |      |
|     | 8                                         | 10        |      | 0      | 0             |      | 0        | 0                  |      |
|     | 4                                         | 4         |      | 12     | 12            |      | 7        | 8                  |      |
|     | 1                                         | 1         |      | 5      | 8             |      | 1        | 4                  |      |
|     | 1                                         | 1         |      | 5      | 8             |      | 1        | 4                  |      |

Region 3Region 2IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

Region 1

#### **VHDL vs Software (3)**



IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

#### Firmware vs Software?

- Sorting 10 lists of 24 tracks in order to report 24 highest Pt tracks (e.g. in L2CTOC) takes on the order of 30 RF clock ticks, or approximately 500ns
  - Rough estimation of a similar task of sorting N objects in software goes at best as Mog<sub>2</sub>N and with four 1ns cycles per each of N basic operation gives ~ 7500ns, by an order of magnitude more...
- Cluster finding in 144 FPS U & V strips, <u>regardless</u> of cluster pattern, takes 8 RF clock ticks or approximately 150ns
  - Rough estimation of a similar task in software, assuming "...101010..." pattern, goes as 4\*N/2 + 7\*N/2 = 11\*N/2 (in ns) with N being the total number of strips. This gives for U & V cluster finding 11\*144 ~ 1600ns, again by about an order of magnitude more...

#### **Firmware Design and Certification (1)**

- Design algorithms for board / functionality
- Write VHDL code, pipelined design
- Behavioral Simulations: SoftBench, Test-Vectors
- Synthesis/constraints (GCkl buffers, clock to out,...)
- Implementation/constraints (board layout, clocks, skews on nets, etc.), resources, speed
- Timing Simulations: same SoftBench, Test-Vectors (!)
- TestStand download in target hardware and run with the same TestVectors (!)
- Board/Functionality certified
- SoftBench for multi-board/FPGA chain, propagate TestVectors
- Chain TestStand with the same Test-Vectors
- Firmware certified

## Firmware Design and Certification (2)



#### Plus:

- Download and run in target hardware (Test Stand)
- VHDL TestBench for multi-board/FPGA chain
- Chain Test Stand Firmware certified!

#### Firmware is a Complex IC in a Chip



#### Firmware is Hardware "Built" in a Chip



type = SLICE type = SLICE

 $\Sigma$ 

 $\Sigma$ 

 $\Rightarrow$ 

 $\triangleright$ 

CE BLATON

CE CLATCH

DA

5

site "CLB\_R24C39.S0", site "CLB\_R24C39.S0",

12th

A4 LUT A3 BAAM A2 BAAM A1 W5 DI

ABVSG WE OKWSF

ev<sup>pg</sup> ex<sub>or</sub>

 $\Box >$ 

 $\square$ 

v600bg560-5 No Logic Changes

clear

delau

delete

dre

editblock

editmode

find

hilite

ila

info

probes

route

swap

unroute

IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

comp "OCC\_TTM\_inst/C216/C0/C2", comp "OCC\_TTM\_inst/C216/C0/C2",

<sup>68</sup>

For Help, press F1

#### Levan Babukhadia

+

## L1 CFT/CPSax Chain of CTT



Normal, L1 Acquisition mode:

- DFEA In each of the 80 CFT/CPSax sectors, find tracks, clusters, match them and report these counts to the collector boards, CTOC; Also send the 6 highest p<sub>T</sub> tracks to L1MUON
- CTOC Sum up counts from 10 DFEAs, find isolated tracks in Octants, ...
- CTTT Based on the various counts from 8 CTOCs, form up to 96 neoterms (currently 64)
- DFEA/CTOC/CTTT maintain 32 events deep L1 Pipeline for inputs, and more ...

#### Upon L1 Accept:

- DFEA Pull out the data for the corresponding event from the pipeline and either reprocess or send a list of tracks (up to 24) and clusters (up to 8) to CTOC
- CTOC Sort lists of tracks from 10 DFEAs in  $p_T$  and report up to 24 highest  $p_T$  tracks to CTQD
- CTQD Sort lists of tracks from 2 CTOCs in p<sub>T</sub> and report up to 46 tracks to L2CFT
- CTOC/CTTT Send all of the inputs for the event that got L1-accepted to the L3 via G-Link

#### **CFT 3-Track FPGA TestVector Event**

| 第 8 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2                                        | <sup>4 de</sup> 88 98<br>555 89 88 P<br>855 89 88 P | 75(61,62,63)<br>46(3,4,5)<br>55(21,22,23)    |
|--------------------------------------------------------------------------------|-----------------------------------------------------|----------------------------------------------|
| 85 5 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8                                       | 28232 G                                             | 67(53,54,55)<br>42(3,4,5)<br>51(21,22,23)    |
| に 5 年 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5                                        | 8885<br>600<br>888                                  | 59(45,46,47)<br>39(5,6,7)<br>50(27,28,29)    |
| 20220202000000000000000000000000000000                                         | e E                                                 | 52(39,40,41)<br>36(7,8,9)<br>48(31,32,33)    |
| 8 28 29 29 29 20 2 20 2 20 2 2 2 2 2 2 2 2 2                                   | D                                                   | 44(31,32,33)<br>33(9,10,11)<br>45(33,34,35)  |
| \$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$<br>\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$ |                                                     | 37(25,26,27)<br>29(9,10,11))<br>41(33,34,35) |
| **************************************                                         |                                                     | 30(19,20,21)<br>25(9,10,11))<br>36(31,32,33) |
|                                                                                |                                                     | 24(15,16,17)<br>21(9,10,11)<br>31(29,30,31)  |
| De                                                                             | J0044.1<br>J0044.1<br>J00Het ID(Fiber #             | NOT.AND.OR)                                  |
| ODD Numbered Sectors                                                           |                                                     | ↓                                            |
| ]                                                                              | Doublet = [NOT(                                     | m-1) AND m] OR (m+1)                         |

#### 9 AFE-MIXER links & 3 MIXER-DFEA links

IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

1. High Pt bin #294

ExtPt : 001, PHI 31

2. Medium Pt bin #133

ExtPt : 101, PHI 2

ExtPt : 011, PHI 11

3. Low Pt bin #406

SS:1, TS:1, C : 0, PT : 01,

SS:1, TS:1, C:1, PT:10

SS:1, TS:1, C:1, Pt:11.

# **DFEA Firmware (1)**

- Finds CFT tracks from fiber doublet hits
- 4 p<sub>T</sub> bins, each with 4 sub-bins (8 subbins in the lowest p<sub>T</sub> bin)





- $N_{eq} \sim 1/p_T$  per sector (~ 16K eqns)
- Number of track equations
  - ~8.5/3.3/2.5/2.5K in the four  $\ensuremath{p_{\text{T}}}$
  - bins, lowest to highest

sector N+1 (a.k.a. left hand sector)

# **DFEA Firmware (2)**

- Automated VHDL code generation was necessary to streamline large number of equations
- Started off with ideal geometry track equations, but easily evolved to "as-built" geometry
- Four track p<sub>T</sub> bins:
   [ 1.5 || 3.0 || 5.0 || 10.0 || ∞ ]
- And more sub-bins:

| PT BIN Name | PTBIN | EXT PT | Offset               | PT min   | PT max             | PT average                            |
|-------------|-------|--------|----------------------|----------|--------------------|---------------------------------------|
|             | 11    | 111    | 14                   |          | 8                  | -                                     |
|             | 11    | 110    | 13                   | 8        | 100                |                                       |
|             | 11    | 101    | 12                   |          | 2. 200             |                                       |
| LOW         | 11    | 100    | 11                   | - 18 - I | 8 S.A. 5           |                                       |
| LOW         | 11    | 011    | 10                   | 8        | 8 19 <b>-</b> 33 1 |                                       |
|             | 11    | 010    | 9                    | ×        | 8 page 4           |                                       |
|             | 11    | 001    | 8                    | X        | 1. Sec.            |                                       |
|             | 11    | 000    | 7                    |          | R seen S           | - <del>-</del> - 2                    |
|             | 10    | 011    | 6                    | -        | 1.24               |                                       |
| MEDIMUM     | 10    | 010    | 5                    | 10       | S and S            | · · · · · · · · · · · · · · · · · · · |
| MEDIMON     | 10    | 001    | 4                    |          |                    | 8                                     |
|             | 10    | 000    | 3                    |          |                    | and the second of                     |
|             | 01    | 011    | ( 24 <u>5</u> )      | 5.00000  | 5.71429            | 5.33333                               |
| UICU.       | 01    | 010    |                      | 5.71429  | 6.66667            | 6.15385                               |
| HIGH        | 01    | 001    | 5 6 <del>5</del> 8 8 | 6.66667  | 8.00000            | 7.27273                               |
|             | 01    | 000    | 14.2                 | 8.00000  | 10.00000           | 8.88889                               |
| MAY         | 00    | 011    | 5.48                 | 10.00000 | 13.33333           | 11.42857                              |
|             | 00    | 010    | 1.00                 | 13.33333 | 20.00000           | 16.00000                              |
| MAA         | 00    | 001    |                      | 20.00000 | 40.00000           | 26.66667                              |
|             | 00    | 000    |                      | 40.00000 | infinite           | 80.00000                              |

 p<sub>T</sub> binning also gives sharper "turn on" than offset binning



- In L1, cruder p<sub>T</sub> bins are used
- More detailed info on tracks sent to L2 where performance is critical e.g. in STT

| Muon p <sub>T</sub> , GeV | $\Delta p_T / p_T^2$ | $\Delta \phi_0$ , mR | IP res, μm |
|---------------------------|----------------------|----------------------|------------|
| 2                         | 1%                   | 1                    | 40         |
| 5                         | 0.7%                 | 0.6                  | 25         |
| 50                        | 0.3%                 | 0.4                  | 20         |

#### **DFEA Firmware (3)**

- Finds CFT tracks in four FPGAs, one per each  $\ensuremath{p_{\text{T}}}$  bin
- Finds CPS clusters; Do track/cluster matching in the backend FPGA
- Reports counts in L1 and more detailed lists of tracks and clusters in L2
- Also reports 6 highest p<sub>T</sub> tracks to L1 Muon

#### CFT/CPS AXIAL Trigger Daughter Board Dataflow



## **DFEA Firmware (4)**

 In each p<sub>T</sub> bin, track equations are represented as a two dimensional array 44 wide (phi) and 8 tall (±4 sub-bins)



- In the above array, 8 tracks are found, however, only six tracks are reported as follows:
  - Track A is reported
  - Track B is reported, C is pushed on the stack
  - Track D is reported
  - Track E is reported, G is pushed on the stack, F is Lost
  - Track H is reported
  - Track G is popped from the stack and reported. Stack is then cleared

#### **DFEA Firmware (5)**

- CPSax cluster finder sees 16 strips from the homesector and 8 strips from previous and following neighboring trigger sector
- It finds up to 6 clusters in each of the two 16-strip halves
- Depending on the p<sub>T</sub> bin of a CFT track, it is extrapolated out to CPSax radius
- If the extrapolated CFT track passes within number of strips of the CPS cluster centroid, the track and cluster are considered to be matched; the size of the matching "window" is p<sub>T</sub> bin dependent
- DFEA also calculates the number of doublet hits in the sector and this is used in Hit / Occupancy Level Trigger



sector N+1 (a.k.a. left hand sector)

#### **CFT/CPSax Timing Diagram**



IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

#### **Sample of Protocols in L1CTT**



#### Information provided to the CTT (Central Trigger Term generator)

#### 96 Possible Trigger Terms are generated in the CTT.

32 of these are selected to go to the Trigger Framework

| BYTE 0 BYTE 1                         |                                         |   |
|---------------------------------------|-----------------------------------------|---|
| 07 06 05 04 03 02 01 00 15 14 13 12 1 | 11 10 09 08                             |   |
| 07 06 05 04 03 02 01 P 15 14 13 12 1  | 11 10 09 08 First Set of Trigger Terms  |   |
| 07 06 05 04 03 02 01 00 15 14 13 12 1 | 11 10 09 08 Second Set of Trigger Terms |   |
| 07 06 05 04 03 02 01 00 15 14 13 12 1 | 11 10 09 08 Third Set of Trigger Terms  | D |
| 07 06 05 04 03 02 01 00 15 14 13 12 1 | 11 10 09 08 Fourth Set of Trigger Terms | A |
| 07 06 05 04 03 02 01 00 15 14 13 12 1 | 11 10 09 08 Fifth Set of Trigger Terms  | Ă |
| 07 06 05 04 03 02 01 00 15 14 13 12 1 | 11 10 09 08 Sixth Set of Trigger Terms  |   |
| Longitudinal Parity                   | TRAILER                                 |   |

# Sample of L1CTT Trigger Terms

| CFT+PS                          |                             |                           |                       |          |
|---------------------------------|-----------------------------|---------------------------|-----------------------|----------|
| TTK(n,p)                        | CFT Track trigger           | n=#tracks                 | n=1,2                 | 8 terms  |
|                                 |                             | p=pt threshold            | p=1,2,3,4             |          |
| TEL(p,ps)                       | CFT Track+CPS cluster       | p= pt threshold           | p=3 values (1.5,5,10  | 6 terms  |
|                                 |                             | ps=ps threshold           | ps=2 values (I,h)     |          |
| TPQ(n,q)                        | CFT Track+CPS /quad         | n=#tracks                 | n=1,2                 | 8 terms  |
|                                 |                             | p=1.5GeV / t=L thresh.    |                       |          |
|                                 |                             | q=quadrant                | q=1,2,3,4             |          |
| TNQ(n,q)                        | PS match/quad               | n=#clusters, t=high thres | n=1,2                 | 8 terms  |
|                                 |                             | q=quadrant                | q=1,2,3,4             |          |
| TDL(p,s)                        | 2 CFT tracks + CPS Match    | s=charge sign             | s=ss/os/ns (ns for F) | 6 terms  |
|                                 |                             | p=mom. Threshold          | p=1.5 or 5            |          |
| TIS(p)                          | CFT isolated (4.5) deg trac | p=mom. Threshold          | p=5,10 GeV            | 2 terms  |
| TDS(s)                          | 2 CFT isolated tracks       | s=charge sign             | s=ss/os               | 2 terms  |
|                                 |                             | p=mom. Threshold          | p=5 GeV               |          |
| THT(h)                          | CFT total Hits              | h= #hit threshold         | h=2 values            | 2 terms  |
|                                 | (Occupancy trigger)         |                           |                       |          |
| TAC(f)                          | CFT track jets              | f=dphi cut                | A<35 sectors          | 2 terms  |
|                                 | Take 2 highest Pt octants   | acoplanar or coplanar     | C>35 sectors          |          |
|                                 | with ptsum>5 GeV            |                           |                       |          |
|                                 | count # sectors between     |                           |                       |          |
|                                 | highest pt sectors          |                           |                       |          |
| TIQ(q)                          | CFT isolated / quadrant     | q=quadrant                | q=1,2,3,4 p=5 GeV     | 4 terms  |
| TOC(n,p)                        | Octant above thresh         | n=number                  | n=1,2                 | 4 terms  |
|                                 |                             | p=threshold               | p=3, 5 GeV            |          |
| TTA(n)                          | CFT for tau loose match     | n=#clusters, t=L thresh   | n=1,2                 | 2 terms  |
|                                 |                             | p=5 GeV track only        |                       |          |
| TIL                             | Low Local Occupancy         |                           |                       | 1 terms  |
| ТМО                             | Minimum Occupancy (>noi     | se)                       |                       | 1 terms  |
| spares                          | Terms like TLQ              |                           |                       | 8 terms  |
|                                 |                             |                           |                       | 64 terms |
| FPS                             |                             |                           |                       |          |
| FPN(n.t.c)                      | FPS North cluster           | n= EM cluster             | n=1.2                 | 8 terms  |
|                                 |                             | t =threshold              | t=H,L                 |          |
|                                 |                             | c = e(lectron), s(hower)  | c=e,s                 |          |
| FPS(n,t,c)                      | FPS South cluster           | n= EM cluster             | n=1,2                 | 8 terms  |
|                                 |                             | t =threshold              | t=H,L                 |          |
|                                 |                             | c = e(lectron), s(hower)  | c=e,s                 |          |
| FQN(n,q)                        | FPS North cluster/quadran   | In= EM cluster, L thresh. | n=1,2                 | 8 terms  |
|                                 |                             | q= quadrant               | q=1,2,3,4             |          |
| FQS(n,q)                        | FPS South cluster/quadrar   | n= EM cluster, L thresh.  | n=1,2                 | 8 terms  |
|                                 |                             | q= quadrant               | q=1,2,3,4             |          |
| Note: $FQ(n,q) == FQN(n,q) + F$ | QS(n,q) (Number of Ors in   | expression is recomputed  | accordingly)          | <u></u>  |
|                                 |                             |                           |                       | 32 terms |

#### **DØ Central Track and Preshower Trigger**

- L1 CTT digital trigger is unique in the experiment as it is heavily based on reprogrammable devices for both physics algorithms and for addressing hardware issues (such as e.g. synchronization, transmission bit-error correction)
- It is therefore an extremely flexible system allowing continual improvement and development of algorithms
- Extensive testing of firmware using good set of test vectors and software test-bench is vital, excellent design and simulation tools available
- The entire L1 CTT trigger is emulated in C++ so that it can be plugged into DØ software; Need to run on Monte Carlo samples to understand performance, generate test vectors, etc.
- Raises unique issues such as storage and versioning of the source VHDL code. Need to archive not only VHDL, but also binary downloadables and simulation, synthesis, and place & route tools
- In very last stages of inclusion in global physics running...

#### **The Upgraded CDF Detector**



- New tracking devices, Silicon (SVX II) and Central Outer Tracker (COT), placed in 1.4 T magnetic field
- New fast, scintillating tile Endplug calorimeter
- Muon System extensions

- Front-end electronics, buffered data
- Entirely new Trigger System and DAQ to handle higher event rate
  - Tracking at Level 1 (XFT)
  - Pipelined
  - Three Level trigger system

# **CDF DAQ/Trigger System**



- Pipelined readout
- Data sampled every 132 ns (TDC's Calorimetry, Silicon)
- New Level 1 trigger decision every 132ns. Latency ~5.5 μs. (Pipelined)
- Data  $\rightarrow$  Level 2 Buffer
- Level 2 Dec: Asynchronous, 20 μs
- Readout  $\rightarrow$  Level 3 Farm
- Accept rates 10x more than in Run I
  - Level 1: < 50 kHz
  - Level 2: 300 Hz
  - Level 3: 50  $Hz \rightarrow tape$
- Design: 90% live at 90% maximum bandwidth

# **CDF Trigger System**



- Trigger combines primitives from tracking, muons, EM and HAD calorimeters, SVX II, etc.
- Similar in concept to the Run I trigger, multilevel, flexible, programmable, etc., but now all information must be pipelined
- New central tracking information available at Level 1, the eXtremely Fast Tracker (XFT)
- Impact parameter information at Level 2 from SVT

#### **CDF Central Outer Tracker**



- Previous chamber (CTC) need to be replaced: drift times too long, had aged
- New chamber (COT) covers radial region 44 to 132 cm
- Small drift cells, ~ 2 cm wide, a factor of 4 smaller than in the Run I tracker
- Fast gas, drift times < ~130 ns

- COT cell has 12 sense wires oriented in a plane, at ~ 35° with respect to radial direction for Lorentz drift
- A group of such cells at given radius forms a superlayer (SL)
- 8 alternating superlayers of 4 stereo (~3°) and 4 axial wire planes

# **COT Design**



- Basic Cell:
  - 12 sense ,17 potential wires
  - 40  $\mu$  diameter gold plated W
  - Cathode: 350 A gold on 0.25 mil mylar
- Drift trajectories very uniform over most of the cell
- Cell tilted 35° for Lorentz angle
- Construction:
  - Use winding machine
  - 29 wires/pc board, precision length
  - Snap in assembly fast vs wire stringing
  - 30,240 sense wires vs 6156 in CTC
  - Total wires 73,000 vs 36,504 in CTC

#### **CDF eXtremely Fast Tracker**

- Find tracks in Level 1 trigger, parallel processing, pipelined data
- Must report results for every event every 132 ns, fast  $\rightarrow \text{XFT}$
- Requirements include low fake rate, high efficiency for  $|\eta| < 1.0$ , excellent momentum resolution



SL1-3 Finder Board looks for segments in axial superlayers 1 and 3

One Linker Board covers 15° of the chamber. Each board has 12 linker chips, each of which finds tracks for 1.25° of the chamber

SL2-4 Finder Board looks for segments in axial superlayers 2 and 4

- Stage 1: Look for hits on the COT wires in 4 axial SLs (Mezzanine Card)
- Stage 2: Group the hits in each SL and construct track segments (Finder)
- Stage 3: Combine the segments across SLs, track momentum (Linker)

#### **XFT Implementation**



- Stage 1: Look for hits on the COT wires in 4 axial SLs (Mezzanine Card)
- Stage 2: Group the hits in each SL and construct track segments (Finder)
- Stage 3: Combine the segments across SLs, track momentum (Linker)

#### **The Time to Digital Converter Mezzanine Card**

• "Prompt" and • "Delayed" hits



For 132 ns bunch crossing: Prompt – drift time from 0-44 ns Delayed – drift time from 45-132 ns Tracks passing through each layer of the COT generate "hits" at each of the 12 wire-layers within a superlayer

The mezzanine card is responsible for classifying each hit on a wire as either prompt and/or delayed

There are total of 16,128 axial wires and after this classification the number of bits representing hits is doubled

The definition of prompt and delayed depends on the Tevatron bunch spacing

For 396 ns bunch crossing: Prompt – drift time from 0-66 ns Delayed – drift time from 67-220 ns

#### **The TDC and the Mezzanine Card**



IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

#### **The Finder**

Track segments are found by comparing hit patterns in a given layer to a list of valid patterns or "masks". Can allow up to 3 misses. Presently using a 2 miss design to obtain high efficiency.

A mask is a specific pattern of prompt and delayed hits on the 12 wires of an axial layer: Inner Layers: 1 of 12 pixel positions Outer Layers: 1 of 6 pixel positions and 1 of 3 slopes (low  $p_T^+$ , low  $p_T^-$ , high  $p_T$ )



IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

Algorithm implemented in a programmable logic device ("Finder chip"). Chips within a layer are identical. Each chip is responsible for four adjacent cells. (336 Altera 10K50 chips)



#### **The Finder Board**



- Two types of modules SL13 and SL24
- Each module handles 15° azimuth
- Input Alignment Xilinx FPGAs latch in and align the COT wire data
- SL13: 2 SL1 and 4 SL3 Altera FPGAs
- SL24: 3 SL2 and 5 SL4 Altera FPGAs
- The Finder FPGAs also hold Level 1 pipeline and L2 buffers for the input COT hit information and for the found pixel information
- Total Finder logic latency is ~560 ns
- Pixel Driver Altera FPGAs ship the found pixel data to the Linker over 10 ft LVDS cables clocked at 30.3 MHz
- The Finder module also has RAM for loading PLD designs, circuitry for boundary scans, and ports for loading design from a PC serial port

#### **The Linker**

- Tracks are found by comparing pixels in all 4 layers to a list of valid pixel patterns or "roads"
- Each chip contains all the roads needed (2400) to find tracks with transverse momentum > 1.5 GeV/c

20

- Can generate roads for any beam spot position, sensitive to > 1 mm changes
- Presently using a design with a 4 mm offset at 105°
- Number of roads proportional to 1/p<sub>T</sub> minimum





#### **The Linker Board**



- Each Linker covers 15° in azimuth, hence 24 of them (in 3 crates)
- LVDS receivers capture the track segment data from the Finder
- 6 Input Formatter Altera FPGAs latch and synchronize the data with onboard clock
- Then 12 Linker Altera FPGAs, run at 30.3 MHz clock (each handling 1.25° in azimuth), search for the best track
- The Linker FPGAs send data at 7.6 MHz to 2 Output Formatter FPGAs
- LVDS drivers send data to the XTRP system over 50 ft of cabling
- Total Linker module latency is ~730 ns
- The Linker module too has RAM for loading PLD designs, circuitry for boundary scans, and ports for loading design from a PC serial port

### **XFT Performance (1)**

- Events from 10 GeV Jet trigger
- CDF reconstructed tracks:
  - Hits >24 in axial and stereo layers
  - p<sub>T</sub> >1.5 GeV/c
  - Fiducial
- Match if XFT track within 10 pixels (about 1.5°) in at least 3 layers
- Find XFT track for 96.1±0.1% of these reconstructed tracks
- Azimuthal coverage flat
  - only 20 / 16,128 COT wires off



#### **XFT Performance (2)**



- Transverse momentum resolution 1.64±0.01 %/GeV/c (< 2 %/GeV/c)</li>
- Angular resolution at COT SL3: 5.09±0.03 mR (< 8 mR)</li>
- Meets design specifications!

# **XFT Performance (3)**

- Sharp threshold at  $p_T=1.5$  GeV/c
  - Important for B physics L1 trigger rate
    - Run 1 threshold was 2.2 GeV/c at Level 2
  - Thresholds look same in  $1/p_T$
- XFT track is fake ~ 3% at low p<sub>T</sub>
- XFT track is fake ~ 6% in 8 GeV electron triggers
- Single track trigger cross section with p<sub>T</sub> >1.5 GeV/c is ~11mb, close to extrapolations from Run I data



### **The PEP-II Storage Ring and BABAR Detector**



- 3.1 GeV e<sup>+</sup> on 9 GeV e<sup>-</sup>
- $e^+e^-$  CM boost  $\langle\beta\gamma\rangle = 0.55$
- Peak luminosity 4.6×1033 cm-2s-1
- Number of bunches 800



- DCH: 40 axial and stereo layers, tracking magnetic field 1.5 T
- Tracking:  $\sigma/p_T \sim 0.13\% p_T + 0.45\%$ ,  $\sigma(z_0) \sim 65 \mu$  at 1 GeV/c
- DIRC: 144 quartz bars
- EMC: 6580 CsI(TI) crystals

 $\sigma/E \sim 2.3\% E^{-1/4} \oplus 1.9\%$ 

• IFR: 19 RPC layers, muon and  $\rm K_{\rm L}$  id



#### **The BABAR Level 1 Trigger**



- The PEP-II beam crossing at 238 MHz thus basically continuous
- Multilevel trigger system, no Level 2
- Only DCH and calorimeter participate in Level 1
- Similar/more complex than Bell, CLEO

- The input data to DCT consist of one bit per each of the 7104 DHC cells
- Sampled every 269 ns, the bits convey time information from an amplitude discriminators for cell's wire signals

#### **The BABAR Drift Chamber**

- 40-layer small-hex-cell chamber
  - Cells are  $12 \times 18 \text{ mm}^2$  in size
  - 7104 drift cells with hexagonal field wire pattern
  - 96 to 256 cells per layer
  - 80 and 120 mm gold-plated aluminum field wires
- Layers organized into superlayers with same orientation
  - Wire directions for 4 consecutive layers: Axial-U-V-stereo
  - Required for fast reduction of input to Level 1 trigger via segment finding
  - Transition field shaping voltages to maintain reasonably uniform performance



#### **The BABAR Drift Chamber Level 1 Trigger**



IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002

#### **The Track Segment Finder**



- In an 8-cell pivot group, the cells are numbered 0 through 7, with cell 4 being the pivot point
- The shape was chose to correspond only to stiff tracks from the interaction point
- There are total of 1776 pivot groups

- The hits from each of the 7104 DCT channels are sent to the 24 TSF modules via G-Links
- Each TSF module extracts track
   segments formed by a set of contiguous hits within a group of neighboring cells
- Segments are assigned weights based on the number of hits within a pivot group and quality of data according to the associated LUTs
- The complete Segment Finder has 1776 track segment finder engines
- Depending on where a track passed through a cell, it will take 1 to 4 3.7 MHz clock ticks for charge to drift to wire
- These discrete steps in sampling time are used by TSF for better position resolution and event time determination

### **The Track Segment Finder**

use drift time information to better determine track position and event time



- Track hits are close in time with at least three out of four layers within the SL (cell inefficiencies)
- 2 bit counter for each cell allows to capture hits in 4 time slices, with 269 ns intervals
- Each pattern is given by a 16 bit address, 65,536 possible addresses



use drift time information to better determine track position and event time

- Look-Up-Table address <----> track position and time
- Each address is translated into a 2-bit weight by the preloaded LUTs
- Weight indicates: no, low-quality, 3layer, or 4-layer segment candidate
- Based on the pivot cell, the tick with best or highest weight is identified
- The found segment is then the best recent pattern

#### **The Track Segment Finder Module**



- TSF system is housed in two Euro crates
- Separated into processing (9U×400mm) and interfacing (6U×220mm) boards
- The DCH data received at 1.2 Gb/s on a G-Link and shipped off differentially at 30 MHz
- 72 or 75 (depending on which region of DCH) Segment Finder Engines housed in 13 Lucent OR2C-series FPGAs (0.9M gates, 20K FFs)
- FPGAs connected with 13 64K ×16 LUTs
- The board runs at 30 MHz
- Input and processing clock domains are crossed using Input De-coupling FIFO
- Each board has 7 Mb of SRAM for input and output diagnostic memories
- Firmware written in VHDL. Generation of LUT code automated from the DCH's wire tables

#### **The Binary Link Tracker**

- DCH tracks reaching the outer layer (SL A10) are classified as of type "A". Tracks that reach the middle layers (SL U5) are labeled of type "B".
- A single BLT module processes 360 Mbyte/s segment hit data based on the information from the entire DCH, reformatted in 10 radial SLs and 32 azimuthal sectors or supercells
- Programmable mask of the input data allows activation of dead or inefficient cells to regain tracking efficiency
- Linker starts from the innermost SL and moves radially outward
- A track is found if
  - there is a segment hit in every layer, and
  - segments in two consecutive layers are within certain window of number of supercells (3 or 5 depending in SL type)



- The track linker algorithm is the extension of a CLEO II trigger algorithm, but allows for up to two SLs to be inefficient
- The data are compressed and output to GLT as two 32-bit words corresponding to either "A" or "B" tracks respectively

#### **The Binary Link Tracker Module**



- One 9U×400mm module
- The board has equivalent of 75K gates of programmable logic
- Five Lucent ORCA-2C series FPGAs represent: Fast Control, Operations Control, Channel Select, Track Linker, and Post-Processor units
- Board runs at ~60 MHz
- Memory buffers are attached to both input and output data streams. They allow injecting test vectors and checking the response allowing *in-situ* debugging capabilities
- The Track Linker consists of an array of logic blocks, 32 columns wide by 10 rows
- For a given SL, segment hits from 3 or 5 consecutive supercells are OR'ed (the wider window is used for stereo-to-stereo SL transitions)
- The output of the OR is then AND'ed with the hit signal from the like numbered supercell of the next SL. (Allowing missing segments is a more complex algorithm.)

# **The P<sub>T</sub> Discriminator**



- Each of 8 PTD boards receives data from six TSFs (one DCH quadrant)
- Only axial superlayers are used
- The processing in each board is subdivided in 8 logic engines, one for each of 8 "seed" areas



|   | Cycle |   |   |   |   |   |   | Action                   |
|---|-------|---|---|---|---|---|---|--------------------------|
| 1 | 2     | 3 | 4 | 5 | 6 | 7 | 8 |                          |
| Х | 01    |   |   |   |   |   |   | Latch TSFM Data          |
|   | х     |   |   |   |   |   |   | Test for Seed            |
|   |       |   |   |   |   |   |   | Read Mask-word           |
|   |       |   | x |   |   |   |   | a: Mask && Data          |
|   |       |   |   |   |   |   |   | Read Limitsmask          |
|   |       |   |   | X |   |   |   | b: L-mask && Data        |
|   |       |   |   | * |   |   |   | Read Limits-word         |
|   |       |   |   |   | X |   |   | c: Data within Limit     |
|   |       |   |   |   |   | x |   | all (b たた c) for each SL |
|   |       |   |   |   |   |   | x | Sum SL Results 3/4       |

Particle Trajectory

- Good resolution TSF segments  $\rightarrow$  Seeds
- All segments in cells within the track envelope for a given seed are counted for each SL
- Segments at the boarders are added
- All SLs with >0 count of segments added
- Mask-words give patterns in the three layers other than the seed layer
- Limitsmask is used to include cells on the boundaries

#### **The P<sub>T</sub> Discriminator Module**



- Eight 9U×400mm PTD modules
- 11 FPGAs on one PTD board with total of 400K logic gates
- Logic written in VHDL

- Eight P<sub>T</sub> Engines are the main data processing unit, receiving 6 TSF data at 30 MHz
- LUT Memories holds 96-bit mask defining track envelope, and limitmasks for handling boundaries
- Post Processing completes processing and reformats output to be sent to GLT where PTD data get combined with the BLT and the L1 Cal data
- DAQ Memory holds data temporarily to allow run-time monitoring
- Play/Record Memories store test vector data for *in-situ* board debugging, verification, and calibration

#### **General Observations**

- Fast response tracking devices occupying radial volume around IP, several layers, axial/stereo, are used to detect charged particles
- Two technologies on the market: drift chambers and scintillating fiber
- Tracking magnetic field required to bend charged particles, allowing direct measurement of track  $p_{\rm T}$
- Two algorithmic approaches: track equations vs segment /track building
- Use of re-programmable logic devices make possible fast charge particle identification even in very high rate environments, vital in present day and future high energy experimentation
- FPGAs offer powerful parallel processing capabilities, diverse processing power and number of gates, fast clock speeds, fast I/O, configurable and flexible memories
- Need to build input/output data pipeline to avoid trigger dead-time
- Built-in debugging features (embedded test-vectors, buffers to store data)
- Remote downloads, operations, and control critical as often front-end electronics in the collision hall and not easily accessible
- FPGAs with embedded CPUs such as Xilinx Platform Virtex-II Pro series may allow to marry software and hardware/parallel algorithms and implementation

IEEE NSS/MIC 2002, Norfolk, Virginia, 10 – 16 November, 2002