neorv32_fifo vivado implementation issue #753

doud3157 · 2023-12-18T14:27:09Z

Describe the bug
With vivado 2023.1, the implementation of neorv32_fifo is incorrect. It uses CLB registers instead of BRAM.

To Reproduce
Synthesize the neorv32_fifo in standalone with a depth of 1024 bytes (for example)

Expected behavior
The neorv32_fifo should use BRAM and not CLB registers when DEPTH > 1.

Screenshots
NA

Environment:

Windows 11
vivado 2023.1

Hardware:

Zynq ultrascale+

Additional context
The problem is due to the usage of the record fifo_t. The parameter data (type fifo_data_t) should be define outside of the type fifo_t to help vivado. The processes (sync_read and async_read) should be modify with the new definition of data.

stnolting · 2023-12-18T21:24:46Z

Hey @doud3157. Thanks for finding this!

I know Vivado has some issues with multi-dimensional arrays sometimes (and a reocord could be seen as suchy), but I think the problem is somewhere else. This "dual-access" to the FIFO RAM might be an issue:

if (fifo_depth_c = 1) then
  rdata_o <= fifo.data(0);
else
  rdata_o <= fifo.data(to_integer(unsigned(fifo.r_pnt(fifo.r_pnt'left-1 downto 0))));
end if;

Anyway, I will do some tests and update the FIFO module.

robhancocksed · 2024-02-22T00:00:39Z

It looks like this might still be an issue, at least with respect to the SLINK module where it generates a 33 bit wide FIFO. If I set the FIFO depths to 1024 words for RX and 512 for TX, for example, it seems to not be inferring block RAM but 33-bit registers, which uses a large amount of logic resources.

stnolting · 2024-02-22T18:04:28Z

I've just tested that using Vivado - and you are right, Vivado is trying to build the SLINK's FIFOs from FF primitives. It seems like Vivado is not happy with these size checks:

neorv32/rtl/core/neorv32_fifo.vhd

Lines 190 to 196 in 4155b72

    
           if rising_edge(clk_i) then 
        
             if (fifo_depth_c > 1) then -- prevent a NULL assertion for fifo_depth_c of 1 
        
               rdata_o <= fifo_mem(to_integer(unsigned(r_pnt(r_pnt'left-1 downto 0)))); 
        
             else 
        
               rdata_o <= fifo_mem(0); 
        
             end if; 
        
           end if;

I think that somehow Vivado is not capable of identifying these checks as "runtime static". Maybe it is better to use if-generate here? 🤔

@mikaelsky what do you think (I think we've added those checks in #778)?

umarcor · 2024-02-22T18:09:22Z

@stnolting when implementing TLAST, did you attach/concat it to each word/value before sending it to the FIFOs? If that is the case, the problem may be Vivado not being able to use BRAMs as 33 bit words.

Since I believe the main use case of TLAST is signaling the end of vector, maybe it can modelled as a peripheral attached to the end of SLINK FIFOs, which enables TLAST every N transfers, where N is configurable through a memory mapped (wishbone) control bus. So, the peripheral would have three interfaces:

FIFO slave.
SLINK master.
Wishbone (set of control registers).

Then, when sending a matrix (or set of vectors of equal length) the user does only need to define the width/height.

EDIT

Indeed:

stnolting · 2024-02-22T18:14:01Z

If that is the case, the problem may be Vivado not being able to use BRAMs as 33 bit words.

That "should" be no problem as the BRAM primitives can operate in 512*36-bit mode - so they are wide enough. The "tlast" identifier is just another "data" bit written to the FIFO.

Since I believe the main use case of TLAST is signaling the end of vector, maybe it can modelled as a peripheral attached to the end of SLINK FIFOs, which enables TLAST every N transfers, where N is configurable through a memory mapped (wishbone) control bus.

That would be a nice work-around here, but at the cost of additional hardware and (software) complexity 🙈

umarcor · 2024-02-22T18:21:38Z

@stnolting check pages 16 and 17 of https://docs.xilinx.com/v/u/en-US/ug473_7Series_Memory_Resources:

(32) DI[A|B] Data input bus.
(4) DIP[A|B] (1) Data input parity bus. Can be used for additional data inputs.

So, it is true that it can hold up to 36 bits per word, but I'm not sure it can deal with them unless you use two separated ports.

Notes:

The Data-In Buses - DIADI, DIPADIP, DIBDI, and DIPBDIP section has more information on data
parity pins.

Then, in page 31:

Data-In Buses - DIADI, DIPADIP, DIBDI, and DIPBDIP
Data-in buses provide the new data value to be written into RAM. The regular data-in bus
(DI), plus the parity data-in bus (DIP) when available, have a total width equal to the port
width. For example the 36-bit port data width is represented by DI[31:0] and DIP[3:0], as
shown in Table 1-11 through Table 1-14. See Table 1-15 for SDP mode port name mapping.

mikaelsky · 2024-02-22T18:29:20Z

I would be surprised if its the boundary checks. Mostly as they fall out as compile time constants. If we believe that is the case we would need to wrap them in a conditional generate statement that results in either a N deep memory or a 1 deep memory.

What I did do is just read up on inferred block ram https://docs.xilinx.com/r/en-US/ug901-vivado-synthesis/Inferring-UltraRAM-in-Vivado-Synthesis which provide examples for how to infer a RAM vs direct.

vivado will try and map into a plurality of block RAMs with a width resolution of 18 bits. E.g 18, 36, 54 etc.

The one thing vivado can't do is infer a FIFO, which would have been neat for this use case.

umarcor · 2024-02-22T18:41:26Z

@mikaelsky I would expect BRAMs to be used, rather than UltraRAM. https://docs.xilinx.com/r/en-US/am007-versal-memory/Block-RAM-Resources

mikaelsky · 2024-02-22T18:51:34Z

@umarcor True true :) The ultraRam is more an example of how you can insure that Vivado can infer RAMs (block or ultra).

The chapter you are referring to just provides examples of direct block RAM usage. From the application note you will find the following limitations on block RAM inference.

In the same application note you will find multiple examples of how to infer a block ram e.g. https://docs.xilinx.com/r/en-US/ug901-vivado-synthesis/Single-Port-Block-RAM-with-Resettable-Data-Output-VHDL

What is important here is that there are limits to how you can write your code to ensure that Vivado infers a block RAM. If Vivado can't understand the code if will instead infer flops.

umarcor · 2024-02-22T19:08:32Z

What is important here is that there are limits to how you can write your code to ensure that Vivado infers a block RAM. If Vivado can't understand the code if will instead infer flops.

My point is that I believe using a single 32 bit port allows Vivado to infer a BRAM, but using a 33 bit port does not. That's because the block and the port description in the chapter I referred explains that there is a 32 bit port and a 4 bit port (which is meant for parity but can also be used for extra bits). Unfortunately, I don't have Vivado at hand at the moment, so I cannot test it.

mikaelsky · 2024-02-22T19:29:34Z

I get where you are going. But if you read the little snippet I copied in Vivado does support "any size and data width" and "Vivado synthesis maps the memory description into one or several RAM"

If we look at the 7-series block RAM data sheet: https://docs.xilinx.com/v/u/en-US/ug473_7Series_Memory_Resources

From the block RAM data sheet we get:

Basically what I'm hinting at is that I think you are running down a wrong path with the 33-bit vs 32-bit. Vivado will do what-ever and waste block RAM bits. So e.g. for a 33-bit it will indeed fly in a 36-bit block RAM. The 4 bits are there for ECC or Parity, but only if you enable it, otherwise they are generally available.

Now where things do get hairy with the "33 bits vs 32 bits" is if we are using byte write enables. As the byte writes are tied into the parity bit as well resulting in a 9-bit byte. In this case you are absolutely right that you only get 32-bits per 36-bits. Which will result in a 32bit block RAM and an additional 1 or 2 bit block RAM to get a combined 33/34 bit wide block RAM.

I will highlight that they always refer to "maximum bit width" for the data ports. This is because you can indeed freely define the port width with the above mentioned caveat that if the chosen bit width fits badly into the power 2 scheme that Vivado uses you will waste RAM bits. e.g. a width of 3 will result in a 4-bit wide block ram, wasting 25% of the FPGA block RAM in the process.

stnolting · 2024-02-22T19:55:01Z

What a great discussion that dives deep into FPGAs! :)

I think that the synthesis tool should have no issues with mapping an NxM RAM if

the HDL is written proberly
if the RAM depth exceeds a certain threshold

I just did some experiments.

If we believe that is the case we would need to wrap them in a conditional generate statement that results in either a N deep memory or a 1 deep memory.

That is the point! We need to specify two distinct signals here. One NxM wide array being used if the FIFO depth is > 1 and one M wide array being used if the FIFO size is = 1. With these changes synthesis happily infers an Nx33-bit blockRAM:

|arty7_35t_wrapper | neorv32_inst/io_system.neorv32_slink_inst_true.neorv32_slink_inst/rx_fifo_inst/memory_no_reset.fifo_mem_reg        | 1 K x 33(READ_FIRST)   | W |   | 1 K x 33(WRITE_FIRST)  |   | R | Port A and B     | 0      | 1      | 
|arty7_35t_wrapper | neorv32_inst/io_system.neorv32_slink_inst_true.neorv32_slink_inst/tx_fifo_inst/memory_no_reset.fifo_mem_reg        | 512 x 33(READ_FIRST)   | W |   | 512 x 33(WRITE_FIRST)  |   | R | Port A and B     | 1      | 0      |

I'll draft a PR for this.

umarcor · 2024-02-22T19:55:55Z

@mikaelsky you are correct. I could run some tests with Vivado v2022.2 on some CI and I can confirm that main, v1.9.5 and v1.9.0 with SLINK FIFO length set to 1024 cannot finish implementation due to resource starvation. However, by applying umarcor@f7946d1 as a dirty test based on #753 (comment), implementation suceeds and the following summary is produced:

+-------------------+------+-------+------------+-----------+-------+
|     Site Type     | Used | Fixed | Prohibited | Available | Util% |
+-------------------+------+-------+------------+-----------+-------+
| Block RAM Tile    |    9 |     0 |          0 |        50 | 18.00 |
|   RAMB36/FIFO*    |    7 |     0 |          0 |        50 | 14.00 |
|     RAMB36E1 only |    7 |       |            |           |       |
|   RAMB18          |    4 |     0 |          0 |       100 |  4.00 |
|     RAMB18E1 only |    4 |       |            |           |       |
+-------------------+------+-------+------------+-----------+-------+

For completeness, I run the test with the dirty fix on top of main using SLINK depth 4:

+-------------------+------+-------+------------+-----------+-------+
|     Site Type     | Used | Fixed | Prohibited | Available | Util% |
+-------------------+------+-------+------------+-----------+-------+
| Block RAM Tile    |    7 |     0 |          0 |        50 | 14.00 |
|   RAMB36/FIFO*    |    5 |     0 |          0 |        50 | 10.00 |
|     RAMB36E1 only |    5 |       |            |           |       |
|   RAMB18          |    4 |     0 |          0 |       100 |  4.00 |
|     RAMB18E1 only |    4 |       |            |           |       |
+-------------------+------+-------+------------+-----------+-------+

stnolting added HW Hardware-related optimization Make things faster, smaller and more efficient enhancement New feature or request and removed optimization Make things faster, smaller and more efficient labels Dec 18, 2023

stnolting self-assigned this Dec 18, 2023

stnolting linked a pull request Dec 19, 2023 that will close this issue

[rtl] rework FIFO module (to allow inferring block RAM) #754

Merged

stnolting closed this as completed in #754 Dec 19, 2023

stnolting reopened this Feb 22, 2024

stnolting linked a pull request Feb 23, 2024 that will close this issue

[fifo] fix (Vivado) synthesis issue #827

Merged

stnolting closed this as completed in #827 Feb 23, 2024

umarcor mentioned this issue Feb 24, 2024

[fifo] fix (Vivado) synthesis issue #827

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neorv32_fifo vivado implementation issue #753

neorv32_fifo vivado implementation issue #753

doud3157 commented Dec 18, 2023

stnolting commented Dec 18, 2023

robhancocksed commented Feb 22, 2024

stnolting commented Feb 22, 2024

umarcor commented Feb 22, 2024 •

edited

Loading

stnolting commented Feb 22, 2024

umarcor commented Feb 22, 2024

mikaelsky commented Feb 22, 2024

umarcor commented Feb 22, 2024

mikaelsky commented Feb 22, 2024

umarcor commented Feb 22, 2024

mikaelsky commented Feb 22, 2024

stnolting commented Feb 22, 2024

umarcor commented Feb 22, 2024 •

edited

Loading

neorv32_fifo vivado implementation issue #753

neorv32_fifo vivado implementation issue #753

Comments

doud3157 commented Dec 18, 2023

stnolting commented Dec 18, 2023

robhancocksed commented Feb 22, 2024

stnolting commented Feb 22, 2024

umarcor commented Feb 22, 2024 • edited Loading

stnolting commented Feb 22, 2024

umarcor commented Feb 22, 2024

mikaelsky commented Feb 22, 2024

umarcor commented Feb 22, 2024

mikaelsky commented Feb 22, 2024

umarcor commented Feb 22, 2024

mikaelsky commented Feb 22, 2024

stnolting commented Feb 22, 2024

umarcor commented Feb 22, 2024 • edited Loading

umarcor commented Feb 22, 2024 •

edited

Loading

umarcor commented Feb 22, 2024 •

edited

Loading