# 5 Data Acquisition and Readout

## 5.1 **Prototype and Final Design**

We are designing the readout and data acquisition system for the HFT in two stages. These stages follow the development of the silicon sensors in our project. The initial prototype readout system is designed to read out the MIMOSTAR4 detectors which have analog outputs and a 4 ms readout time. The second stage is for use with the final Ultimate series sensors which have a digital output and a 1 ms readout time. The goal of the first stage is to develop much of the infrastructure for doing cluster finding and data sparsification, the interfaces to trigger and DAQ and the mechanical readout structures and assemble a working prototype detector with the MIMOSTAR4 sensors. The second stage would make use of the development that we have done with the prototype detector and integrate the Ultimate series sensors with a developed readout system.

### 5.2 Requirements and Prototype Design

The requirements for the prototype and final readout system are very similar. They include:

- Triggered detector system fitting into existing STAR infrastructure and interfaces to the existing Trigger and DAQ systems.
- Deliver full frame events to STAR DAQ for event building at approximately the same rate as the TPC.
- Reduce the total data rate of the detector to a manageable level (< TPC rate)

We have designed the prototype data acquisition system to read out the large body of data from the individual MIMOSA4 sensors, to digitize the signals, to perform data compression, and to deliver the sparsified data to an event building and storage device. A summary of the specifications and requirements is provided in Table 15.

| Total number of pixels                         | $98 \times 10^{6}$               |
|------------------------------------------------|----------------------------------|
| Number of pixels per chip                      | $640 \times 640$                 |
| Pixel Readout rate (analog output)             | $2 \times 50 \text{ MHz}$ / chip |
| Readout time per frame                         | 4 ms                             |
| Frame integration time                         | 4 ms                             |
| Fixed pattern noise                            | $2000 e^{-}$                     |
| Noise after Correlated Double Sampling         | 10 e <sup>-</sup>                |
| Maximum signal                                 | 900 e <sup>-</sup>               |
| Dynamic range after Correlated Double Sampling | 8 bits                           |
| Total power consumption (24 ladders)           | 90 W                             |

 Table 15: Prototype Stage Requirement Summary - constraints for the MIMOSTAR4 APS.

Digitizing the analog signal on each pixel into a 10 bit digital signal yields approximately 1 Gb/s per sensor chip when read out in 4 ms. Thus, the total front end data rate is  $\sim$ 240 Gb/s. Clearly, the volume of data must be reduced before being passed to the DAQ event builder and written to storage.

Data compression is achieved by performing correlated double sampling (CDS) and then leakage current subtraction, i.e. subtraction of two consecutive frames followed by zero suppression. CDS cancels out fix pattern and reset noise and reduces 1/f noise. The fixed pattern noise corresponds to the spread of the baseline voltage in all pixels. It has been measured on the MIMOSA-5 chip to be 2000 electrons. The noise remaining after CDS must be on the order of  $14 e^-$  to guarantee an efficiency of greater than 98%. The maximum signal is estimated from dE/dx calculation and by measuring how the charge spreads over pixels. The signal can be truncated above 900  $e^-$  without compromising either the efficiency or the position resolution, so 8 bits is a sufficient dynamic range for signal storage. A synchronous cluster finding algorithm and the reduction of the data to addresses of cluster center pixels reduce the data to a manageable rate.

## 5.3 Architecture for the Prototype System

The basic flow of a ladder data path starts with the APS sensors. An HFT ladder has 10 MIMOSTAR4 APS chips each with a 640 by 640 pixel array. Each chip is divided in half with two sectors each containing a separate analogue, differential current output buffer. The chips are continuously clocked at 50 MHz and the data is read out, running serially through all the pixels connecting them to the output buffer. This operation is continuous during the operation of the MIMOSTAR detectors on the HFT ladder. Analog data is carried from the two 50 MHz outputs in each sensor in parallel on a low mass ladder flex printed circuit board to discrete electronics at the end of the ladder and out of the low mass detector region. This electronics performs current to voltage conversion and contains buffers and drivers for the clocks and other control signals needed for ladder operation.

Each MIMOSTAR detector requires a JTAG connection for configuration of the chip, power, ground and a 50 MHz readout clock. These signals and power as well as the analog outputs from the detectors are carried via a low mass twisted pair cable from the discrete electronics at the end of the ladder to the readout electronics located about 1 meter from the HFT ladders. Each ladder and associated readout board are the same and there is one readout board per HFT ladder. A functional diagram of an HFT ladder and a description of the data flow are shown in Figure 38.



Figure 38: Ladder Layout - sketch of the readout-topology on a detector ladder. This figure shows the ten APS and the corresponding current to voltage conversion and driver electronics. The drivers will be located out of the low mass region of the detector and may require additional cooling.

The readout electronics consist of a motherboard and daughter card configuration. A functional block diagram is shown in Figure 39. There are 5 daughter cards per motherboard and each daughter card services 2 of the MIMOSTAR sensors on the ladder. The analog signals are carried to the daughter cards where they are digitized with a 10-bit ADC at 50 MHz. Following digitization, the 10-bit ADC values are passed synchronously to an FPGA for CDS. Performing CDS and pedestal subtraction requires a data sample to be stored for each pixel of the detector. This drives the need for external RAM on the daughter cards. After CDS and pedestal subtraction, 8 bits can represent the data. The data is then transferred to the next stage for hit finding and data reduction.



Figure 39: Prototype DAQ Layout: schematic of DAQ system for a single MIMOSTAR4 ladder. Analog data is carried as differential current on the low mass cable at 50 MHz. The signals are driven in parallel over short (~1m) twisted pair cables to the motherboard. Analog to digital conversion, CDS and data reduction are performed in the Motherboard / Daughter cards. The reduced hit data is transferred digitally to the SIU and carried to Linux based readout PCs via an optical fiber. Control, synchronization, and event ID tagging are accomplished in the Synch/Trigger FPGA on the motherboard.

The 8-bit data data exiting the CDS stage is resorted on the fly to be a traditional raster scan through the pixels of the sector. This stream of rasterized data can then be passed to the cluster finder. We are currently investigating methods of hit finding and data reduction for use on the motherboard. A simple readout of the address of a center pixel high threshold hit with the surrounding 8 pixels meeting additional cluster selection criteria such as at least 1 cell over the low threshold is our default approach. This can be implemented in an FPGA and run as a pipeline filling the output buffer with center pixel address values. A simple example of an FPGA logic diagram that accomplishes this can be found in.Figure 40. We are also investigating a number of cluster selection methods including summing algorithms around different thresholds and center pixel determination by geometric pattern with high and low thresholds. A preliminary study of some simple and FPGA implementable cluster finding algorithms shows promising results for efficiency and noise rejection. A sample of these results can be seen in Figure 41. An implementation document using this method of hit finding saving only the center pixel address of the cluster is available as an appendix.



Figure 40: A simple cluster finding algorithm for the HFT detector. ADC data from two MIMOSTAR detector columns + 3 pixels are sent to a high/low threshold discriminator. The resulting 2 bits are fed sequentially in an 2-bit wide shift register. The center pixel of a  $3 \times 3$  pixel window is compared to a high threshold with each clock tick. If the threshold is exceeded, the additional cluster identification criteria are checked for the  $3 \times 3$  pixel window. If the results meet the critera for a cluster, the center pixel address is stored into a readout FIFO. This method is extendable to allow for multiple simultaneous thresholds and geometric pattern triggers.



Figure 41: Efficiency versus accidentals for a cluster finding algorithms run on cluster data from a MIMOSA5 detector. Note that some parameter combinations of this algorithm are already over 98% efficient with a accidentals rate of 1-2 hits /  $cm^2$ .

The reduced data is then buffered and transferred to the STAR DAQ system over a highspeed bi-directional fiber link. We intend to use the Source Interface Unit (SIU) and Readout Receiver Cards (RORC) developed for ALICE as our optical link hardware to transfer data to and from the STAR DAQ system. These links have been chosen as the primary readout connections for the new STAR TPC FEE. Leveraging existing hardware and expertise in STAR allows for a faster and more reliable design than developing our own custom solution. The complete system consists of a parallel set of ladder readouts consisting of 24 separate chains.

#### 5.4 Data Synchronization, Readout and Latency

The readout of the prototype HFT sensors is continuous and hit and cluster finding is always in operation during the normal running of the detector. The receipt of a trigger initiates the saving of the found clusters into a FIFO for 1 frame (204,800 pixels). The

HFT detector as a whole will be triggered via the standard STAR TCD module. Since 4 ms are required to read out the complete frame of interest, the data will be passed to DAQ for event building  $\sim 4$  ms after the trigger is received. In order to service multiple triggers within that 4 ms readout time we will provide multiple buffers that will allow the capture of temporally overlapping complete frames. A functional block diagram of this system is shown in Figure 42. In this system, the cluster data is fanned out to 5 Event FIFOs. A separate Event FIFO is enabled for the duration of one frame upon the receipt of a trigger from the TCD. Subsequent triggers enable additional Event FIFOs until all of the event FIFOs are full and the system goes busy. The resulting separate complete frames are then passed to STAR DAQ as they are completed in the Event FIFOs. This multiple stream buffering gives a system that can be triggered up to the expected rate of the STAR TPC (approximately 1 KHz) after the DAQ1K upgrade. This will result in the duplication of some data in frames that overlap in time, but our data rate is low and the duplication of some data allows for contiguous event building in the STAR DAQ, which greatly eases the offline analysis. In addition, synchronization between the ladders/boards must be maintained. The HFT will receive a clock via the standard STAR TCD and will derive its internal clocks from the RHIC strobe. We will provide functionality to allow the motherboards to be synchronized at startup and any point thereafter.



Figure 42: Multiple event FIFOs are fed in parallel from the cluster finder. A separate Event FIFO is enabled for one frame upon the receipt of a trigger from the TCD. The resulting separate complete frames are then passed to STAR DAQ as they are completed in the Event FIFOs.

#### 5.5 Data Rates for the Prototype RDO

| Item                                      | Number |
|-------------------------------------------|--------|
| bits/address                              | 18     |
| inner ladders                             | 6      |
| outer ladders                             | 18     |
| MIMOSTAR sectors per ladder               | 20     |
| average hits/sector, inner, $L = 10^{27}$ | 245    |
| average hits/sector, outer, $L = 10^{27}$ | 49     |

| Table 16: Da | ta rate calcul | ation parameters |
|--------------|----------------|------------------|
|--------------|----------------|------------------|

The data rate from each  $640 \times 640$  MIMOSTAR detector is thus approximately 1 Gb / sec. The total rate of raw data entering the processing chain in the detector is thus approximately 240 Gb/sec. After CDS, the data can be represented by 8 bits. Pixel addressing within a sector requires 18 bits. The sector-in-ladder address will be accomplished as address words in the data stream. Ladder address will be added at the DAQ receivers. This covers the address space to map the detector pixel space. Each cluster word stored in the FIFO contains the 18 address bits of a cluster central pixel. Combining this with the occupancy per layer and the readout rate of 1 KHz gives an event size of 106 KB and data rate from the detector of 106 Mb/sec. Figure 43 shows this graphically.



Figure 43: Data rates at the various stages of the Prototype MIMOSTAR4 readout chain.

#### 5.6 Requirements for the Ultimate Design

The Ultimate series of APS detector will incorporate several changes from the previous MIMOSTAR versions. The primary changes include on pixel CDS and a two level programmable discriminator applied to the CDS output for each chip. The Ultimate chip will be read out digitally in 2 bit words / pixel through 4 LVDS outputs / chip. The control functions for the chip are still via the JTAG interface. A summary of the new

specifications is provided in Table 17. The basic requirements are as indicated previously.

| Total number of pixels                                      | $98 \times 10^{6}$          |
|-------------------------------------------------------------|-----------------------------|
| Number of pixels per chip                                   | $640 \times 640$            |
| Pixel Readout rate                                          | 4 × 250 Mb/s<br>LVDS / chip |
| Readout time per frame                                      | 1 ms                        |
| Frame integration time                                      | 200 µs                      |
| Internal configurable Discriminators<br>(post internal CDS) | 2 bits                      |
| Raw data from one sensor                                    | 820 Mb/s                    |
| Total power consumption                                     | 90 W                        |

Table 17: Final Stage Specification Summary - constraints for the ULTRA series APS.

This system readout is a bit different than the previous MIMOSTAR4 based readout but most components are the same.

#### 5.7 Architecture for the Ultimate System

In this system, the much of the functionality of the daughter cards has been moved into the Ultimate sensors themselves. The correlated double sampling and dual level discriminator functionality are now integrated onto the sensor and there are 4 LVDS readout lines / chip. The rest of the system remains substantially the same however. A revised functional block diagram is shown below in Figure 44.



Figure 44: Functional block diagram for Ultimate sensor based readout system.

In this system digital 2 bit words representing high / low discriminator threshold crossings are fed into the cluster finding FPGAs via 4 LVDS lines per sensor. The data will still be delivered in a sorted raster scan so our cluster finding algorithms will process the data in the same way as before. The cluster center addresses are again passed into the event FIFOs for readout to DAQ. There is a difference in the way triggers are processed. At the current level of design, this will be a triggered system that goes dead for the 1 ms readout time required to move the data to and through the cluster finding and into the event FIFOs. This is a change in that while we will employ multiple event buffering, the system will not run continuously with data always being processed and event FIFOS filling simultaneously, event FIFOs will be filled sequentially as triggers are received and read out by DAQ as they are filled. The event rate requirement of ~ 1KHz is still met by this modified design.

| Item                                      | N <u>umber</u> |
|-------------------------------------------|----------------|
| bits/address                              | 18             |
| inner ladders                             | 6              |
| outer ladders                             | 18             |
| Ultimate sectors per ladder               | 40             |
| average hits/sector, inner, $L = 10^{27}$ | 6.1            |
| average hits/sector, outer, $L = 10^{27}$ | 1.2            |

#### 5.8 Data Rates for the Ultimate RDO

 Table 18: Data rate calculation parameters

The data rate from each  $640 \times 640$  Ultimate detector is 102 MB / sec. The total rate of raw data entering the processing chain in the detector is thus approximately 2.45 Gb/sec. Pixel addressing within a sector still requires 18 bits. Combining this with the occupancy per layer and the readout rate of 1 KHz gives an event size of 5.24 KB and data rate from the detector of 5.24 Mb/sec. Figure 45 shows this graphically.



Figure 45: Data rates in ULTIMATE HFT readout.

## 5.9 Prototypes

Several different prototype readout electronics boards have been constructed and tested that are very similar to the proposed prototype readout electronics described above. In Figure 46 one can see the prototyping results of a low mass flex PCB on a prototype ladder with MIMOSA5 detectors.



Figure 46: A prototype ladder showing low mass PCB, MIMOSA5 detectors and driver electronics bonded to a mechanical carbon fiber and reticulated vitreous carbon foam based carrier.

A prototype readout system for reading the MIMOSA5 detectors was also constructed and used. A functional schematic is shown below in Figure 47.

Carrier with </= 10 MIMOSA detectors



Figure 47: Functional component diagram of the prototype readout system constructed for the readout on a MIMOSA5 based ladder.

One can see the great similarity between what we have constructed for this test and the prototype MIMOSTAR4 readout design. The basic data flow is the same. The details of hit finding and final readout to DAQ are not present in this system but the motherboard / daughter card concept and the ADC FPGA and SDRAM for CDS are present. A photograph showing the motherboard with a daughter card attached is shown in Figure 48.



Figure 48: Early Prototype motherboard and daughter card used for reading out MIMOSA5 detectors.

This system is working and the basic elements of FPGA based control and deserialization of the ADC outputs, SDRAM memory interface and CDS are implemented on the daughter cards seen above. We will use these VHDL elements in our final design. The daughter card design above is quite flexible and the same physical boards may be suitable for the prototype HFT readout system.