-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
neorv32_fifo vivado implementation issue #753
Comments
Hey @doud3157. Thanks for finding this! I know Vivado has some issues with multi-dimensional arrays sometimes (and a reocord could be seen as suchy), but I think the problem is somewhere else. This "dual-access" to the FIFO RAM might be an issue: if (fifo_depth_c = 1) then
rdata_o <= fifo.data(0);
else
rdata_o <= fifo.data(to_integer(unsigned(fifo.r_pnt(fifo.r_pnt'left-1 downto 0))));
end if; Anyway, I will do some tests and update the FIFO module. |
It looks like this might still be an issue, at least with respect to the SLINK module where it generates a 33 bit wide FIFO. If I set the FIFO depths to 1024 words for RX and 512 for TX, for example, it seems to not be inferring block RAM but 33-bit registers, which uses a large amount of logic resources. |
I've just tested that using Vivado - and you are right, Vivado is trying to build the SLINK's FIFOs from FF primitives. It seems like Vivado is not happy with these size checks: neorv32/rtl/core/neorv32_fifo.vhd Lines 190 to 196 in 4155b72
I think that somehow Vivado is not capable of identifying these checks as "runtime static". Maybe it is better to use @mikaelsky what do you think (I think we've added those checks in #778)? |
@stnolting when implementing TLAST, did you attach/concat it to each word/value before sending it to the FIFOs? If that is the case, the problem may be Vivado not being able to use BRAMs as 33 bit words. Since I believe the main use case of TLAST is signaling the end of vector, maybe it can modelled as a peripheral attached to the end of SLINK FIFOs, which enables TLAST every N transfers, where N is configurable through a memory mapped (wishbone) control bus. So, the peripheral would have three interfaces:
Then, when sending a matrix (or set of vectors of equal length) the user does only need to define the width/height. EDIT Indeed: |
That "should" be no problem as the BRAM primitives can operate in 512*36-bit mode - so they are wide enough. The "tlast" identifier is just another "data" bit written to the FIFO.
That would be a nice work-around here, but at the cost of additional hardware and (software) complexity 🙈 |
@stnolting check pages 16 and 17 of https://docs.xilinx.com/v/u/en-US/ug473_7Series_Memory_Resources:
So, it is true that it can hold up to 36 bits per word, but I'm not sure it can deal with them unless you use two separated ports.
Then, in page 31:
|
I would be surprised if its the boundary checks. Mostly as they fall out as compile time constants. If we believe that is the case we would need to wrap them in a conditional generate statement that results in either a N deep memory or a 1 deep memory. What I did do is just read up on inferred block ram https://docs.xilinx.com/r/en-US/ug901-vivado-synthesis/Inferring-UltraRAM-in-Vivado-Synthesis which provide examples for how to infer a RAM vs direct. vivado will try and map into a plurality of block RAMs with a width resolution of 18 bits. E.g 18, 36, 54 etc. The one thing vivado can't do is infer a FIFO, which would have been neat for this use case. |
@mikaelsky I would expect BRAMs to be used, rather than UltraRAM. https://docs.xilinx.com/r/en-US/am007-versal-memory/Block-RAM-Resources |
@umarcor True true :) The ultraRam is more an example of how you can insure that Vivado can infer RAMs (block or ultra). The chapter you are referring to just provides examples of direct block RAM usage. From the application note you will find the following limitations on block RAM inference. In the same application note you will find multiple examples of how to infer a block ram e.g. https://docs.xilinx.com/r/en-US/ug901-vivado-synthesis/Single-Port-Block-RAM-with-Resettable-Data-Output-VHDL What is important here is that there are limits to how you can write your code to ensure that Vivado infers a block RAM. If Vivado can't understand the code if will instead infer flops. |
My point is that I believe using a single 32 bit port allows Vivado to infer a BRAM, but using a 33 bit port does not. That's because the block and the port description in the chapter I referred explains that there is a 32 bit port and a 4 bit port (which is meant for parity but can also be used for extra bits). Unfortunately, I don't have Vivado at hand at the moment, so I cannot test it. |
I get where you are going. But if you read the little snippet I copied in Vivado does support "any size and data width" and "Vivado synthesis maps the memory description into one or several RAM" If we look at the 7-series block RAM data sheet: https://docs.xilinx.com/v/u/en-US/ug473_7Series_Memory_Resources From the block RAM data sheet we get: Basically what I'm hinting at is that I think you are running down a wrong path with the 33-bit vs 32-bit. Vivado will do what-ever and waste block RAM bits. So e.g. for a 33-bit it will indeed fly in a 36-bit block RAM. The 4 bits are there for ECC or Parity, but only if you enable it, otherwise they are generally available. Now where things do get hairy with the "33 bits vs 32 bits" is if we are using byte write enables. As the byte writes are tied into the parity bit as well resulting in a 9-bit byte. In this case you are absolutely right that you only get 32-bits per 36-bits. Which will result in a 32bit block RAM and an additional 1 or 2 bit block RAM to get a combined 33/34 bit wide block RAM. I will highlight that they always refer to "maximum bit width" for the data ports. This is because you can indeed freely define the port width with the above mentioned caveat that if the chosen bit width fits badly into the power 2 scheme that Vivado uses you will waste RAM bits. e.g. a width of 3 will result in a 4-bit wide block ram, wasting 25% of the FPGA block RAM in the process. |
What a great discussion that dives deep into FPGAs! :) I think that the synthesis tool should have no issues with mapping an NxM RAM if
I just did some experiments.
That is the point! We need to specify two distinct signals here. One NxM wide array being used if the FIFO depth is > 1 and one M wide array being used if the FIFO size is = 1. With these changes synthesis happily infers an Nx33-bit blockRAM:
I'll draft a PR for this. |
@mikaelsky you are correct. I could run some tests with Vivado v2022.2 on some CI and I can confirm that
For completeness, I run the test with the dirty fix on top of main using SLINK depth 4:
|
Describe the bug
With vivado 2023.1, the implementation of neorv32_fifo is incorrect. It uses CLB registers instead of BRAM.
To Reproduce
Synthesize the neorv32_fifo in standalone with a depth of 1024 bytes (for example)
Expected behavior
The neorv32_fifo should use BRAM and not CLB registers when DEPTH > 1.
Screenshots
NA
Environment:
Hardware:
Additional context
The problem is due to the usage of the record fifo_t. The parameter data (type fifo_data_t) should be define outside of the type fifo_t to help vivado. The processes (sync_read and async_read) should be modify with the new definition of data.
The text was updated successfully, but these errors were encountered: