-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Initialize max number of global memory definition for simulator #104
Conversation
1c9d0e1
to
08c6e94
Compare
Looks like git checkout is broken. Maybe GitHub changed their container setup?
|
… runners This resolves a recent regression with GitHub-hosted runners where actions/checkout@v2 fails to create a directory under /__w/_temp/ intel#104 (comment) actions/checkout#47
08c6e94
to
4ad9ada
Compare
… runners This resolves a recent regression with GitHub-hosted runners where actions/checkout@v2 fails to create a directory under /__w/_temp/ #104 (comment) actions/checkout#47
No regression was found in external&internal tests. This is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sherry-yuan, looks good!
-
The commit message contains very useful background. What do you think about moving the commit message to a comment in the source code?
-
@bsyrowik committed a similar change in d9df7a9, although it sets 128 memory systems instead of ACL_MAX_GLOBAL_MEM == 32. With your change that commit is no longer needed, correct? Could you add a
git revert d9df7a9e
to your pull request? Is there any downside to setting the default 32 instead of the non-default 128 memory systems?
4ad9ada
to
692eb87
Compare
Simulator does not have any global memory interface information until the actuall aocx is loaded. (Note this is only a problem for simulator not hardware run, in hardware run, we can communicate with BSP to query memory interface information) Prior to loading aocx it uses predefined autodiscovery [1] to initialize its global memory interface, which has only 1 global memory In the sycl runtime flow today, the USM device allocation call happens before aocx is loaded. The aocx is loaded when clCreateProgram is called, which typically happen on first kernel launch in sycl runtime. The USM device allocation on mutli global memory system will fail because there are in total 1 global memory as defined in [1] but the user is requesting more than 1 device global memory. User could go around this issue by launching a sacrificial kernel that uses shared allocation as kernel argument. This will setup the correct global memory interface in runtime. This change eliminate the need to run a sacrificial kernel. However there are a few downside: 1. The address range/size may not be exactly the same as the one that is in aocx, but this is not too large of a problem because runtime first fit allocation algorithm will fill the lowest address range first. Unless user requested more than what is availble. 2. it potentially occupied more space than required 3. will not error out when user requested a non-existing device global memory because we are using ACL_MAX_GLOBAL_MEM for num_global_mem_systems [1] https://github.com/intel/fpga-runtime-for-opencl/blob/950f21dd079dfd55a473ba4122a4a9dca450e36f/include/acl_shipped_board_cfgs.h#L7
This reverts commit d9df7a9.
692eb87
to
93696eb
Compare
Great idea, I have moved the commit message to line comment with slight modification.
Right its no longer needed
I do not think so, as far as I know, all the device so far support up to 32 global memory, therefore there is no point for simulation to support more than 32 global memory. |
Simulator does not have any global memory interface information until the actuall aocx is loaded.
(Note this is only a problem for simulator not hardware run, in hardware run, we can communicate with BSP to query memory interface information)
Prior to loading aocx it uses predefined autodiscovery [1] to initialize its global memory interface, which has only 1 global memory
In the sycl runtime flow today, the USM device allocation call happens before aocx is loaded.
The aocx is loaded when clCreateProgram is called, which typically happen on first kernel launch in sycl runtime.
The USM device allocation on mutli global memory system will fail because there are in total 1 global memory as defined in [1] but the user is requesting more than 1 device global memory.
User could go around this issue by launching a sacrificial kernel that uses shared allocation as kernel argument. This will setup the correct global memory interface in runtime.
This change eliminate the need to run a sacrificial kernel.
However there are a few downside:
[1]
fpga-runtime-for-opencl/include/acl_shipped_board_cfgs.h
Line 7 in 950f21d