LA-UR-03-6122

Approved for public release; distribution is unlimited.

| Title:        | Viewgraphs:<br>Consequences and Categories of SRAM FPGA<br>Configuration SEUs                                                                                                                       |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Author(s):    | Paul Graham,<br>Michael Caffrey,<br>Jason Zimmerman,<br>Darrel E. Johnson,<br>Los Alamos National Laboratory, Los Alamos NM<br>Prasanna Sundararajan, Cameron Patterson<br>Xilinx Inc., San Jose CA |
| Submitted to: | Military and Aerospace Programmable Logic Devices<br>International Conference, Washington DC<br>9/9-9/11/2003                                                                                       |



Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher's right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness.

Consequences and Categories of SRAM FPGA Configuration SEUs

Paul Graham, Michael Caffrey, Jason Zimmermann, Eric Johnson



Ideas That Change the World

Prasanna Sundararajan, Cameron Patterson



LA-UR-03-6122

#### Outline

#### Why care about FPGA SEU types?

#### Methodology for classification

#### Design studies

#### Conclusions and Future work



Graham

C6

# Why care about FPGA SEU types?

- We want to understand what SEU mitigation techniques work while being cost effective.
- Assumption: Some designs can allow lower reliability for a reduced mitigation cost.
  - A spectrum of reliability and cost points is possible.



#### FPGAs aren't ASICs

- Fault models for ASICs don't necessarily apply to FPGAs.
  - SEUs in an FPGA can change the *design*, not just user data.
- Not all SEUs have an effect.
  - Only a subset of resources are used per design.
- SEU mitigation techniques for ASICs may not work for FPGAs.
- FPGA-specific migitation techniques seem possible and may be more cost effective.



#### Virtex FPGA Architecture: General





NOTE: Not drawn to scale

JIE. NULUIAWILLUSCAI

Graham

C6



#### Virtex FPGA Architecture: Slice Input/Output Muxes



#### Virtex FPGA Architecture: CLB Routing Resources

- Single-length wires use programmable interconnect points (PIPs) for switching.
- Hex-length wires use muxes and buffers.
- Resources used for switching *between* routing types varies.

LOS Alamos

Ideas That Change the World



Bitstream SEU Failure Classification Methodology

- Find sensitive bits in a design using SEU simulator
- Identify each sensitive bit's function
- Classify each sensitivity based on
  - Type of resource upset
  - Whether the bit was "on" or "off" in the original bitstream





SLAAC-1V SEU Simulation Platform (BYU/LANL)

- Design loaded into X1 and X2 and run synchronously
- Inject faults into X1's bitstream
- X0 provides test vectors and compares outputs to identify when a specific bitstream upset affects design operation





#### Identifying Configuration Bitstream Functions

- Configuration bitstream function definitions were based on nondisclosure information provided by Xilinx.
- Additionally, JBits' documentation and APIs provided a useful model for understanding the low-level architecture of the Virtex devices.



## Classification of circuit failures

- Takes into account the type of resource and the state of the resource in the original bitstream
- Failure modes

Graham

- Mux select upsets
- Programmable interconnect point (PIP) upsets
  - Opens, shorts, loading
- Buffer upsets (on/off)
- Look-up table (LUT) value changes
- Control bit changes
- Unclassified (other failures)



C6

#### Failure Mode Examples: Mux Select Upsets



*(a) Initial mux configuration* 

*(b) Mux after bitstream SEU* 



Graham

C6

#### Failure Mode Examples: PIP Short Upset



# *(a) Intended PIP configuration*

*(b) PIP shorting two independent, active wires* 



Graham

15

Failure Mode Examples: PIP Open Upset



*(a) PIP connecting source wire to load wire* 

*(a) PIP upset disconnecting source wire from load wire* 





*(a) Buffer connecting source and load wires* 

Graham

*(b) Buffer upset disconnecting source and load wires* 



17

## Failure Mode Examples: Buffer On Upset



(a) Unused buffer

*(b) Buffer on between two unrelated active wires* 



Graham

#### Failure Mode Examples: LUT Value Change



*(a) LUT implementing a 4-input AND* 

Graham



*(b) LUT bit upset causing the LUT to implement a constant* <sup>19</sup> *"zero" function* 

### Failure Mode Examples: Control Bit Changes (Slice)



Graham

- Bits V, E, F, and G control the programmable inversion of inputs
- The *T* bits control whether the LUTs perform as LUTs, RAMs (16x1, dualported, 32x1), or shift registers



## Examples of Failure Modes Based on Resource Type

|                                | Failure Mode                                      | Resource Type Examples                                                                          |
|--------------------------------|---------------------------------------------------|-------------------------------------------------------------------------------------------------|
|                                | Mux select<br>change                              | Slice/IOB input and output muxes, internal slice<br>and IOB muxes, hex wire routing muxes, etc. |
| -                              | PIP upset                                         | PIPs for single wires, edge routing, etc.                                                       |
|                                | Buffer upset                                      | Bi-directional hex wire buffers, mux output buffers, etc.                                       |
|                                | LUT change                                        | F and G LUTs in slice/CLB resources                                                             |
|                                | Control change                                    | Slice and IOB control bits (programmable inversions, flip-flop configuration, etc.)             |
| $\mathcal{L}^1$                | Unclassified                                      | Clock routing and control, configuration register upsets, etc.                                  |
| LOS A<br>NATIONA<br>Ideas That | Alamos<br>L LABORATORY<br>Change the World Graham | 21 C6                                                                                           |

## Sample Design Analysis

Designs

Graham

- Simple: 8-bit counter
- Control-like: multiple 72-bit linear feedback shift registers (LFSRs)
- Data Path: 8 36-bit multipliers and a summing adder tree
- Designs do not use IOB flip-flop structures, Block SelectRAM, or advanced clocking features
- Upsets in user-accessible configuration registers were not modeled.



## Classification of Failures: 8-bit Counter (unweighted)



Graham

Ideas That Change the World

C6

23

## Classification of Failures: 72bit LFSRs (unweighted)

- Control upsets: 13 %
- LUT upsets: 2%
- Mux select upsets: 73%
- PIP upsets: 10%
- Buffer upsets: 1.8%
- Unclassified: <1%</p>
  - Routing: 84.8%
- LUT/Control: 15%
- Total failure bits: 392,166



All Bits by Failure (392166 bits)

slice\_imux: 50%





## Classification of Failures: 72bit LFSRs (weighted)



#### Classification of Failures: Multiply/Add Design (unweighted)



#### Classification of Failures: Multiply/Add Design (weighted)



Ideas That Change the World

#### Conclusions

- The SEU failure modes for FPGA designs can be complex—more complex than just stuckat-1, stuck-at-0, open, and short failures.
- Mux select change failures dominate the failure types.
  - Muxes are abundant, are often controlled by multiple configuration bits, and are generally affected by a change in any of these bits.
- Eliminating a single class of failures will not lead to an order of magnitude (10x)
  improvement in reliability.



#### **Future Work**

- Low-level, architecture specific mitigation techniques, for example:
  - Use abundant routing resources to create redundant routing
  - Employ unused LUT and slice inputs to provide input redundancy
- Evaluate non-architecture specific SEU mitigation techniques for types of failures



#### Back-up slides



C6

# Why Use SRAM FPGAs in Space?

- Performance: 100x vs. radiation hardened μP (for fixed volume, power, weight), continuous processing at 100+ MS/s
- On-orbit processing: can improve system sensitivity and reduce communication bandwidth
- On-orbit reprogrammability: counteract mission obsolescence and on-orbit faults
- *Cost*: cheaper than low-volume ASICs
- *Lead time*: no ASIC design, fab, and test
- Challenge: SEU sensitivities



Graham

C6

## Classification of Failures: 72bit LFSRs (100% failure bits)



#### Classification of Failures: Multiply/ Add Design (100% failure bits)

