Phenix Technical Note 369

GL1, GTM, DC FEM and DCM
Chain Test

Author

Stephen Adler

Abstract

This note describes the setup and first tests of the readout capability of a DC FEM unit via a DCM with the GL1 throttling the level 1 accept signals which are generated via a NIM pulser unit. The major tests performed by this system were to verify the operation of the GL1 dead for 4 circuit, the GTM count 5 busy and the DCM buffer full busy. Timing diagrams captured by a logic analyzer and histograms of level 1 and beam clock counters are presented which demonstrate the proper functionality of the busy system.
Introduction:

A test was performed of the phenix DAQ system in which an early version Global Level 1 (GL1) was hooked up to one Granule Timing Module (GTM) which in turn was hooked up to a Drift Chamber Front End Module (DC FEM). The DC FEM was in turn hooked up to one Data Collection Module (DCM). This setup could be described as a minimal PHENIX DAQ system in which one component of each major system was used in the DAQ chain. Two major components have been left out due to lack of hardware. These are a Local Level 1 system (LL1) and the Master Timing Module (MTM). The GTM has an internal clock which is used instead of the external RHIC clock coming from the MTM. The local level 1 was simulated by a NIM pulser unit which fed TTL pulses directly into the reduced bit input bus of the GL1. A logic analyzer was used to monitor the data coming off the receiving and transmitting Glink daughter cards on the DC FEM as well as the output of the NIM pulser, the Level 1 accept coming out of the front panel of the GTM and the DCM/FEM busy which also comes out of the front panel of the GTM. Figure 1 has a diagram of the setup.
 


Figure 1: Diagram of the DAQ test setup

Theory of Operation:

The basic mechanism of the data flow and throttling is the following. The NIM pulser, simulating a LL1, sends trigger pulses to the GL1. The GL1 then process this information and passes the level 1 pulse on to the front panel of the GTM which fires the level 1 accept circuit. The L1 accept signal is then sent, over the Glink fiber to the FEM which causes the FEM to pull data out of its short term "ring" buffers and push it onto its long term "stack" memory which is 5 deep. The FEM then proceeds to pop the data off its "stack" memory and transfer it over the Glink fiber link to the DCM where it is stored in some DSP memory and eventually read out by the VME crate controller.

To throttle the data flow, the DCM has a busy output which is asserted when its local memory exceeds some capacity. The GTM, which sends level 1 accept signals to the FEM, keeps track of the number of accepts and ensuing ENDAT0 and ENDAT1 pulses. If the difference between the number of level 1 accepts and the number of ENDAT1 pulses is equal to 5, then the GTM asserts a "Count 5 Busy" signal for a fixed amount of time during which the FEM pops off all 5 data events off its long term "stack" memory and sends it to the DCM.

The DCM sends its busy signal to the GTM. The GTM or's the external DCM busy with its internal count 5 busy. The or of these two signals is then fed to the GL1. The GL1, when detecting that the GTM/DCM busy signal is asserted, prevents further level 1 accepts from being issued to the GTM. Just as a reminder, the GL1 also prevents level 1 accepts from being issued if a subsequent level 1 trigger is issued within 4 clock cycles (roughly equivalent to 400 ns.) To summarize, there are 3 sources of busy, the GL1 dead on 4, the GTM count 5 and the DCM DSP buffer full.

Setup and Data Taking:

The NIM pulser was setup to deliver a 150ns wide pulse, every 200ns. One needs this high rate of pulses to exercise the dead on 4 circuitry which the GL1 provides. One issue to note; the GTM latches the level 1 accept input signal from the GL1 on the beam clock edge. Therefore in order to not miss level 1 accept pulses, the pulse has to be at least the width of a clock cycle. The clock cycle is on the order of 100ns, so the pulse width was set to 150ns. Setting the period of the pulses to 200ns, will provide for 1 or 2 extra LL1 pulses to be issued during the dead for 4 period, depending on the phase of the pulser relative to the beam clock. With this high rate pulser feeding LL1 trigger pulses to the GL1, the DC FEM took data to disk for further processing. The DC FEM data packets contain as part of their packet headers, the value of its internal beam clock counter when the L1 accept signal was received and the L1 accept counter, (event counter). This allows one to measure the time between successive L1 accept pulses as well as to verify that all DC FEM data packets were properly sent to or received by the DCM. There is also a packet ID word which was searched for in a particular position within the data stream which helped verify the integrity of the data.
 


Figure 2: Histogram plots of data packet id check, time between successive events and event numbers.
Several files were taken of  DC FEM data, one of which had 100K events. This file was run through a program which generated histograms which show the results of the packet id checks, the time difference between successive events in units of beam crossings, and the event number difference between successive events. This plot can be seen in figure 2. The first plot is just the packet id minus 0x80dc111, the result being 0 each time, confirming to 0 order the data integrity. The second and third plots show the time difference between successive events in units of beam clocks. The difference between the second and third plot is the X axis scale. From this one can see that the minimum time difference is 5 beam clocks which is consistent with the dead for 4 busy logic of the GL1. The fact that sometimes a difference is 6 beam clocks is due to the relative phase of the pulses to the GTM internal clock. The final plot is the event number difference between successive events. If the difference is 1, then a 1 is filled in the histogram, otherwise 0 is filled. Thus the height of the 0 bin of this histogram is the number of times the event counter difference between successive events is not 1. One can see that this occurs a very small fraction of the time, on the order of 1 part in 1e4. This can be accounted for by the fact that the GTM sends a level 1 counter reset mode bit command to the FEM every 100 seconds. Since it takes several minutes to collect 100K DC FEM events, the height of the 0 bin in this histogram is consistent with the number of L1 counter resets sent by the GTM during the data taking period.

Figure 3: Logic analyzer display with full time scale, 115ms/div.

A logic analyzer was hooked up to the input and output Glink daughter cards on the DC FEM to view the data as it came in from the GTM and sent out to the DCM. 3 other signals were fed into the logic analyzer, these being the GTM count 5/DCM busy level, the level 1 accept gated with the beam clock (Sync L1 Accept) and the raw local level 1 pulses simulated by the NIM pulser. Refer to figure 3 for the following discussion. On the left hand side of the logic analyzer plot, you will see 15 labels. The name of the labels denote the meaning of each of the signals. But just for reference sake, the most important ones will be discussed. The first entry labeled modebits all are the mode bits recorded on the Rx Glink bus. The entry labeled lv1GTM all is the external level 1 accept signal coming out of the front panel of the GTM. The next one labeled ALLBusy all is the or of the GTM count for 5 busy and the DCM buffer full busy. The next one labeled LV1Accept all is the level 1 accept signal as seen on the Rx Glink bus. EnDat0 all and EnDat1 all refer to the enable data signals as seen on the Rx Glink bus. The definition of the next 5 signals is unknown. The last 4 signals deal with the data being transmitted to the DCM via the Tx Glink daughter card. CAVNTX all and DAVNTX all are the two signals used by the DCM compressor (Rx Glink) cards to tag the data. d0c13 all and d000b all refer to the 20 bits of data being sent to the DCM. d000b all are the first 12 bits and d0c13 all are the last 8 bits.

The first thing to notice of the logic signals in figure 3 is the sparsity of the data. lv1GTM all is an inverted signal which when low, indicates that a level 1 accept was sent to the DC FEM. (The LV1Accept all has the normal polarity. lv1GTM all is in essence an inverted copy of LV1Accept all.) The time difference marked by the Tr, G1 and G2 markers show time scales on the order of 80 ms and a large gap on the order of 400ms. What one is seeing here is the time it takes for the crate controller to read out 5 events from the DCM, (80ms time scale, ~450Kbytes/sec) and the time it takes to write 15 events to an NFS mounted disk, (400ms time scale, ~300Kbytes/sec). One can infer that the ALLBusy all signal is driven in most part by the DCM on these time scales.
 


Figure 4: First 5 Endat pulses. Time scale of 72us/div

The next logic analyzer plot (figure 4) is a zoom in around the first "non-busy" part of the previous logic analyzer plot. Here one can see the "hand shake" between the GTM and the DCM. One can see the AllBusy all signal go low for a very short period of time on the time scale of this plot, followed by 5 endat0 and 5 endat1 pulses. Between each of the endatN pulses, one can see the CAVNTX all, DAVNTX all, d0c13 all and d000b all signals toggle. This shows how the GTM holds the busy to the GL1 while the 5 events stored in the long term stack memory of the DC FEM are being sent over to the DCM. The fact that one sees only 5 events and after being readout the ALLBusy all signal stays high is an indication that the GTM held the busy high and then released it after the 5 events were send to the DCM, but then the DCM took over as the busy issuer since it had a full event buffer which needed to be flushed out.
 


Figure 5: Zoom in around first 2 Endat pulses. 15us/Div





The logic analyzer plot in figure 5 zooms in around the level 1 accepts and the first pair of endat pulses. This show the time scale of the conversion time, and the endat pulse times. They were all programmed to be about 37us. One can also see that the time it takes to send the data over to the DCM is well under the 37us time window provided by the endat pulse. This is an indication that one could trim the readout time by shortening the endat pulse width. On this time scale, on can make a rough measure of the average data rate from the FEM to DCM which is on the order of 65Mbytes/sec if you consider that only 20 bits of data per word are being sent on each clock cycle.
 


Figure 4: Close look at the DAVNTX and CAVNTX signals. 5us/Div

Figure 7: Beginning of DAVNTX pulse. 19ns/div

Figure 8: End of DAVNTX pulse. 19ns/Div

The logic analyzer plot in figure 6 is a close zoom in around the first ENDAT0 pulse. The time marked by the G1 and G2 markers measures the 37us pulse width as programmed in the GTM. The CAVNTX pulse is too narrow to be able to resolve and its timing relation with the DAVNTX pulse, but one can clearly see that two CAVNTX pulses framing the DAVNTX pulse, one at ether end of the DAVNTX pulse. This shows the proper handshaking specifications between the FEM and the DCM. One can also start to see the header and trailer data words of the DC FEM data packet on lines d0c13 all and d000b all. Figures 7 and 8 zoom in around the beginning and the end of the DAVNTX all pulse respectively. In Figure 7, one can see the required all data bits on during the CAVNTX all pulse going low followed by the packet id of 0xDC111. In figure 8, one can see that the required all bits off during the trailing CAVNTX all pulse going low is satisfied, but somewhat marginally. It takes 8ns for the data bits to settle to 0x00000. This may be due to a sampling artifact of the logic analyzer or an inherent problem with the FPGA code in the DC FEM.

Figure 9: 5 L1Accept pulses.

Figure 9, the final logic analyzer plot, zooms in around the LV1Accept pulses. One can see 5 level 1 accept pulses being issued to the GTM which are separated by about 500ns. The time difference between the first and second pulses is 580ns, and the average time difference between the 2nd and 5th pulse is about 500ns which is consistent with the dead on 4 circuitry of the GL1. What is missing from this plot is the raw pulse which are being fed into the GL1 by the NIM pulser. The high rate of pulses being generated by the pulser unit limited the logic analyzer to recording data for only several microseconds, thus not being able to take data long enough to show the 3 levels of busy of the DAQ system. One can see the ALLBusy signal going low for the amount of time needed to GTM to receive 5 level 1 accept pulses from the GTM. Another indication of the proper operation of the GTM.

Conclusion:

With the analysis of the data recorded to disk and the analysis of the logic analyzer plots, one can conclude that this early version of the Phenix DAQ system is working properly. This includes the propagation of level 1 accept signals to the DC FEM, data transmission from the FEM to the DCM, and the proper handling of the busy data throttling mechanism by the GL1, the GTM and the DCM.

Further analysis of the data flow mechanism of this DAQ architecture indicates several race condition holes inherent in its clocked architecture. The first problem arose from the initial narrow pulse width of the NIM pulser unit. The pulse width was set to 80ns and from the logic analyzer plots, one could clearly tell that the GTM was dropping about 20% of its level 1 accept signals being sent to it by the GL1. Thus the need to readjust the pulse width of the pulser unit to 150 ns to make sure the GTM would latch every external level 1 accept signal. The other race condition which needs to be studied is the feed back time of the DCM and GTM busy assertion to the GL1. There will be a propagation time between the time the DCM or GTM asserts its busy and the time the GL1 receives it to prevent further level 1's. Without proper care, a local level 1 signal could be sent to the GL1 after the DCM or GTM asserts it busy, and before the GL1 receives the busy signal. This would cause an unwanted level 1 accept to be sent to the GTM. This would be a rather complicated problem to solve if on allows random local level 1 signals to be sent to the GL1 without proper synchronization to the clock driving the DAQ front and back ends.

One more word of caution. The use of the logic analyzer to record the data going into and out of the GTM proved to be very useful in documenting the proper operation of the DAQ system. In this particular setup, the Glink daughter cards had the proper test points which made it possible to hook up the logic analyzer to them. Other FEM board do not have this facility which will make it harder to try and diagnose the problem. Finally, in the final setup of the full Phenix DAQ system, with all the FEM boards sitting inside the collision hall, it will be even harder to get at key signals to try and diagnose any kind of possible problems due to rare race conditions. It would be a good idea to try and design in a method of accessing these signals if at all possible at this late stage in the design and construction of the PHENIX DAQ system.