Skip to content
pavels edited this page Nov 3, 2010 · 8 revisions

Description

This collection of tools is intended for simulating SEU errors in Xilinx FPGAs. It is fully automated system, which randomly places errors into content of LUTs. It uses real hardware (for now Spartan3 is supported and tested) and partial reconfiguration techniques.

The tool are divided into 3 parts:

  1. Fault Injection Script
  2. Test Control Script
  3. Test Harness

Fault Injection Script

Ir is PERL script. Following libraries is required. All can be install through CPAN:

Expect
threads;
IO::Socket::INET
Getopt::Long
Pod::Usage

Expect library is used to communicate between script and Xilinx tools. The script relies on following tools:

  1. fpga_edline – Xilinx ISE tool. This is used to manipulate netlists. This tool can read and make changes in .ncd proprietary netlists
  2. bitgen – Xilinx ISE tool. This tool is used to create partial bitstreams.
  3. xc3sprog – Custom tool. Will be explained later.

Because bitgen takes time to process bitstream, the operations of programming FPGA and preparing next bitstream is run in parallel.

The script has configuration part at the beginning of the file. There are paths to different tools used in the process. You must set them correctly.

The script also takes command line arguments as follows:

--ncd <file.ncd>       = NCD file from Xilinx tools
--bit <file.bit>       = Initial bitstream
--module <module_name> = Name of module (instance) into which inject faults
[--faultcount <num>]   = How many faults to inject simultaneously
--help                 = This help

The script does following operations:

  1. Loads .ncd netlist
  2. Parses netlist and finds all relevant LUTs using module_name. It uses wild-card search so for example if you use mod as module name than instances such as mod1 or modX is also considered for fault injection. If you don’t want this and instead want to do exact match, place dot
    at the end such as in mod. .
  3. Programs initial bitstream into FPGA
  4. Starts two processes, which does following in parallel
    1. Generates partial bitstream with fault in and partial bitstream which can correct his fault
    2. Programs faulty bitstream into FPGA, waits for test to complete and than programs correction bitstream. At the end of programming round, the FPGA is in initial state so next round is always done on clean design.

SEU simulation is done by randomly switching bits in LUTs. Number of faults generated at once can be set on command line. This is very useful for testing of SEU mitigation techniques. For example if you have technique, which can withstand one fault and you choose to generate 2 or more, you should see your system fail. If not, something is really wrong. If more than one fault is set, than they are randomly distributed along design. More than one fault can appear in single LUT.

The fault injection script doesn’t handle any tests. It is only for injecting of faults. This enables you to use this script without modification on wide variety of designs. The tests are controlled in external program. Communication between this script and so called Test Control Script is done using UDP protocol. Used port can be configured in configuration part at the beginning of injection script source. Main tasks is following:

  1. Wait for Test Control Script to connect – test injection script is always run first, than the test control
  2. After each programming of faulty bitstream, transmit the list of injected faults to test control. It is up to test control script what to do with this information. You can use it in test procedures or you can just log it to file or simply throw it away.
  3. Transmits to test control that FPGA is ready for tests (programming completed) and waits until the test control replies, that the test are completed and the whole operation can continue

The communication protocol uses ASCII messages with some keywords, the list of which (including meaning) follows:

from fault injection to test control
OK   - after successfully connecting of testbench driver 
FSET - after this command, list of generated faults will follow
       each fault is separate packet
       list is terminated with “###” escape sequence
FGEN - means, that target device is programmed and ready for test

from test control to fault injection
CONNECT  - is sent immediately after connecting to fault-generator
CONTINUE - as reply to FGEN after all test are finished

Programming of partial bitstream

The main problem is, that impact can’t program the bitstream without previously erasing the old one. Because of this, it can’t be used to program partials.

I used xc3sprog from http://sourceforge.net/projects/xc3sprog/ and patched is, so it doesn’t reset the FPGA before programming bitstream. The patch is simple:

--- xc3sprog-r216/progalgxc3s.cpp	2009-07-02 11:05:13.000000000 +0200
+++ xc3sprog-r216-patched/progalgxc3s.cpp	2009-11-12 15:49:56.000000000 +0100
@@ -132,10 +131,10 @@
 }
 void ProgAlgXC3S::array_program(BitFile &file)
 {
-  flow_enable();
+//  flow_enable();
   /* JPROGAM: Trigerr reconfiguration, not explained in ug332, but
      DS099 Figure 28:  Boundary-Scan Configuration Flow Diagram (p.49) */
-  jtag->shiftIR(&JPROGRAM);
+//  jtag->shiftIR(&JPROGRAM);
 
   switch(family)
     {
@@ -159,5 +158,5 @@
   /* use leagcy, if large transfers are faster then chunks */
   flow_program_legacy(file);
   /*flow_array_program(file);*/
-  flow_disable();
+//  flow_disable();
 }

Already patched version is contained in distribution package.

With this patch, it stops the operation of FPGA (disconnects clock signal and forces all outputs to high impedance), but the content of configuration memory, BRAMs and D-flip flops are left intact. Then it programs partial bitstream and starts FPGA again. This procedure is good for Spartan3, where partial bitstreams always reprograms full column by shifting new configuration in which generates lots of glitches and corrupts flip-flop and BRAM content if FPGA is not stopped before programming. Virtex family doesn’t behave like that, instead all unchanged logic is kept completely intact so no glitches or corruption appear even if FPGA is running.

This process was tested with Xilinx Parallel JTAG cable.

Test Control Script

This is design dependent part. It can control the test or can just trigger BIST and than collect the results.

Test Harness

This part is located in FPGA and can be anything from simple register reader/writer to full blown BIST.

Working Example – Tutorial

Example was developed and tested under Linux operating system (Gentoo). All instructions is for Linux. This doesn’t mean it wont run under Windows, but it wasn’t tested.

All examples are designed for Nexys board from Digilent with Spartan3 and RS232 PMOD.

Prerequisites

Perl libraries: Expect, threads,IO::Socket, IO::Socket::INET, Getopt::Long, Pod::Usage, Device::SerialPort

To install these libraries either use your distribution package manager or use CPAN.

Simplest way is using CPAN. To install through CPAN, issue following command:

pavel@pigster-pc ~ $ sudo cpan install Expect threads IO::Socket IO::Socket::INET Getopt::Long Pod::Usage Device::SerialPort

ISE WebPack – get it from http://www.xilinx.com/webpack

GIT – get it through your package manager or from http://git-scm.com/

ISE Make Based build system – more information on installation and configuration here http://github.com/pavels/ise-make-system

Download and Installation

  1. Get the sources from GIT:
    pavel@pigster-pc ~ $ mkdir Xilinx-SEU-Simulator
    pavel@pigster-pc ~ $ cd Xilinx-SEU-Simulator
    pavel@pigster-pc ~/Xilinx-SEU-Simulator $ git init
    pavel@pigster-pc ~/Xilinx-SEU-Simulator $ git pull git://github.com/pavels/Xilinx-SEU-Simulator.git
  2. Build HW examples
    pavel@pigster-pc ~/Xilinx-SEU-Simulator $ cd reconf_demo_mult
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/reconf_demo_mult $ make
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/reconf_demo_mult $ cd ..
    pavel@pigster-pc ~/Xilinx-SEU-Simulator $ cd reconf_demo_mult_tmr
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/reconf_demo_mult_tmr $ make
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/reconf_demo_mult_tmr $ cd ..

Configuration

If you have your ISE tools in $PATH, there is no need to configure anything in generator. Otherwise check top of fault-generator.pl

Check serial port settings for testbench scripts (testbench.pl in reconf_demo_mult and reconf_demo_mult_tmr)

Find this line:

$port = Device::SerialPort->new("/dev/ttyS0");

and correct path to serial port device to fit your system.

xc3sprog Installation

In fault-generator directory you can find xc3sprog precompiled binary. Check it if it work for you.

With Xilinx Parallel Cable connected to board and power switched ON try to run this command:

pavel@pigster-pc ~/Xilinx-SEU-Simulator/fault-generator $ ./xc3sprog -j

It should detect your FPGA. If it fails, you may try to recompile xc3sprog from source. If it’s OK, skip this step and go straight to next step.

To recompile xc3sprog, you need to do following:

  1. Get the source form SourceForge – http://sourceforge.net/projects/xc3sprog/files/xc3sprog/v216/xc3sprog-r216.tar.gz/download – always get release r216, different releases may not work.
  2. Extract the source and patch it with patch, which is part of Xilinx-SEU-Simulator package:
    pavel@pigster-pc ~/Xilinx-SEU-Simulator $ tar -xzf xc3sprog-r216.tar.gz 
    pavel@pigster-pc ~/Xilinx-SEU-Simulator $ cp xc3sprog.patch xc3sprog-r216
    pavel@pigster-pc ~/Xilinx-SEU-Simulator $ cd xc3sprog-r216
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/xc3sprog-r216 $ patch -p1 < xc3sprog.patch 
  3. Build xc3sprog:
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/xc3sprog-r216 $ mkdir build
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/xc3sprog-r216 $ cd build
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/xc3sprog-r216/build $ cmake ..
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/xc3sprog-r216/build $ make
  4. Copy resulting binary to fault-generator:
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/xc3sprog-r216/build $ cp xc3sprog ../../fault-generator/

Running simulation

First of all, we need to prepare the test HW.

  1. Connect RS232 PMOD adapter to most left port on Nexys board (JA)
  2. Connect RS232 cable between adapter and PC
  3. Connect parallel JTAG cable to Nexys board and PC
  4. Connect power to Nexys board and turn power switch ON
  5. Nexys board must be set for JTAG programming

At this moment, we are ready to run first test.

  1. Copy .ncd and .bit from reconf_demo_mult to fault-generator
  2. from one terminal window execute:
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/fault-generator $ ./fault-generator.pl --ncd reconf_demo_mult.ncd --bit reconf_demo_mult.bit --module 
    datapath

    If all goes well, you will see Waiting for controller to connect
  3. from second terminal window execute:
    pavel@pigster-pc ~/Xilinx-SEU-Simulator/reconf_demo_mult $ ./testbench.pl

Now the test should be running. As the reconf_demo_mult is not resistent to SEU in any way, you should see a lot of errors each time the test is
run with faulty bitstream.

Next you can repeat the same procedure, but use .ncd and .bit from reconf_demo_mult_tmr. This is actually the same 8bit multiplier, but his time it is secured by using TMR. You should see no errors now – the system is secure. Now, try to run this test with modified fault-generator command:

pavel@pigster-pc ~/Xilinx-SEU-Simulator/fault-generator $ ./fault-generator.pl --ncd reconf_demo_mult.ncd --bit reconf_demo_mult.bit --module 
datapath --faultcount 2

now you should see sometimes errors during test. TMR can’t handle more than one error when these errors are distributed among different redundant parts.

Example Description

The VHDL code for our examples is very simple. Basically it is just 8bit multiplier, which is forced to be synthesized as gates. Than there is test controller and UART controller.

Test controller takes two or three bytes from serial port and performs action accordingly. First byte is always command and than continues parameters.

R(0x52) <A> - reads content of register A, registers are numbered from 0
W(0x57) <A> <B> - writes value B to register A

The test bench is very simple. It just writes all possible combinations of operators in range 0-32 and checks the results using test controller in FPGA through UART. The test is so limited in values, because the UART is slow. It is made just as example.