Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Initialize max number of global memory definition for simulator #104

Merged
merged 2 commits into from
Apr 18, 2022

Conversation

sherry-yuan
Copy link
Contributor

Simulator does not have any global memory interface information until the actuall aocx is loaded.
(Note this is only a problem for simulator not hardware run, in hardware run, we can communicate with BSP to query memory interface information)
Prior to loading aocx it uses predefined autodiscovery [1] to initialize its global memory interface, which has only 1 global memory
In the sycl runtime flow today, the USM device allocation call happens before aocx is loaded.
The aocx is loaded when clCreateProgram is called, which typically happen on first kernel launch in sycl runtime.
The USM device allocation on mutli global memory system will fail because there are in total 1 global memory as defined in [1] but the user is requesting more than 1 device global memory.
User could go around this issue by launching a sacrificial kernel that uses shared allocation as kernel argument. This will setup the correct global memory interface in runtime.
This change eliminate the need to run a sacrificial kernel.
However there are a few downside:

  1. The address range/size may not be exactly the same as the one that is in aocx, but this is not too large of a problem because runtime first fit allocation algorithm will fill the lowest address range first. Unless user requested more than what is availble.
  2. it potentially occupied more space than required
  3. will not error out when user requested a non-existing device global memory because we are using ACL_MAX_GLOBAL_MEM for num_global_mem_systems

[1]

"23 16 sample40byterandomhash000000000000000000 a10gx 0 1 9 DDR 2 1 2 0 "

@sherry-yuan sherry-yuan added the bug Something isn't working label Apr 14, 2022
@pcolberg pcolberg added this to the 2022.3 milestone Apr 14, 2022
@pcolberg
Copy link
Contributor

Looks like git checkout is broken. Maybe GitHub changed their container setup?
https://github.com/intel/fpga-runtime-for-opencl/runs/6029451177?check_suite_focus=true

2022-04-14T19:08:35.1638886Z ##[group]Run sudo chown -R build:build .
2022-04-14T19:08:35.1639231Z �[36;1msudo chown -R build:build .�[0m
2022-04-14T19:08:35.1643269Z shell: sh -e {0}
2022-04-14T19:08:35.1643503Z ##[endgroup]
2022-04-14T19:08:35.3237562Z ##[group]Run actions/checkout@v2
2022-04-14T19:08:35.3237813Z with:
2022-04-14T19:08:35.3238075Z   repository: intel/fpga-runtime-for-opencl
2022-04-14T19:08:35.3238576Z   token: ***
2022-04-14T19:08:35.3238765Z   ssh-strict: true
2022-04-14T19:08:35.3238996Z   persist-credentials: true
2022-04-14T19:08:35.3239223Z   clean: true
2022-04-14T19:08:35.3239407Z   fetch-depth: 1
2022-04-14T19:08:35.3239606Z   lfs: false
2022-04-14T19:08:35.3239800Z   submodules: false
2022-04-14T19:08:35.3239993Z ##[endgroup]
2022-04-14T19:08:35.3334732Z ##[command]/usr/bin/docker exec  5b270e3f98b5ba860472430574ecd30e726cd60a8663a6981828793fee29b786 sh -c "cat /etc/*release | grep ^ID"
2022-04-14T19:08:35.8300296Z Syncing repository: intel/fpga-runtime-for-opencl
2022-04-14T19:08:35.8301911Z ##[group]Getting Git version info
2022-04-14T19:08:35.8302459Z Working directory is '/__w/fpga-runtime-for-opencl/fpga-runtime-for-opencl'
2022-04-14T19:08:35.8303000Z [command]/usr/bin/git version
2022-04-14T19:08:35.8303819Z git version 2.27.0
2022-04-14T19:08:35.8304529Z ##[endgroup]
2022-04-14T19:08:35.8357485Z ##[error]EACCES: permission denied, mkdir '/__w/_temp/dfdaa6c0-6817-4c98-9c22-84b0ea9c776a'

pcolberg added a commit to pcolberg/fpga-runtime-for-opencl that referenced this pull request Apr 15, 2022
… runners

This resolves a recent regression with GitHub-hosted runners where
actions/checkout@v2 fails to create a directory under /__w/_temp/

intel#104 (comment)

actions/checkout#47
pcolberg added a commit that referenced this pull request Apr 18, 2022
… runners

This resolves a recent regression with GitHub-hosted runners where
actions/checkout@v2 fails to create a directory under /__w/_temp/

#104 (comment)

actions/checkout#47
@sherry-yuan
Copy link
Contributor Author

No regression was found in external&internal tests. This is ready to merge.

Copy link
Contributor

@pcolberg pcolberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sherry-yuan, looks good!

  1. The commit message contains very useful background. What do you think about moving the commit message to a comment in the source code?

  2. @bsyrowik committed a similar change in d9df7a9, although it sets 128 memory systems instead of ACL_MAX_GLOBAL_MEM == 32. With your change that commit is no longer needed, correct? Could you add a git revert d9df7a9e to your pull request? Is there any downside to setting the default 32 instead of the non-default 128 memory systems?

Simulator does not have any global memory interface information until the actuall aocx is loaded.
(Note this is only a problem for simulator not hardware run, in hardware run, we can communicate with BSP to query memory interface information)

Prior to loading aocx it uses predefined autodiscovery [1] to initialize its global memory interface, which has only 1 global memory
In the sycl runtime flow today, the USM device allocation call happens before aocx is loaded.
The aocx is loaded when clCreateProgram is called, which typically happen on first kernel launch in sycl runtime.
The USM device allocation on mutli global memory system will fail because there are in total 1 global memory as defined in [1] but the user is requesting more than 1 device global memory.
User could go around this issue by launching a sacrificial kernel that uses shared allocation as kernel argument. This will setup the correct global memory interface in runtime.

This change eliminate the need to run a sacrificial kernel.
However there are a few downside:
1. The address range/size may not be exactly the same as the one that is in aocx, but this is not too large of a problem because runtime first fit allocation algorithm will fill the lowest address range first. Unless user requested more than what is availble.
2. it potentially occupied more space than required
3. will not error out when user requested a non-existing device global memory because we are using ACL_MAX_GLOBAL_MEM for num_global_mem_systems

[1] https://github.com/intel/fpga-runtime-for-opencl/blob/950f21dd079dfd55a473ba4122a4a9dca450e36f/include/acl_shipped_board_cfgs.h#L7
@sherry-yuan
Copy link
Contributor Author

  1. What do you think about moving the commit message to a comment in the source code?

Great idea, I have moved the commit message to line comment with slight modification.

With your change that commit is no longer needed, correct?

Right its no longer needed

Is there any downside to setting the default 32 instead of the non-default 128 memory systems?

I do not think so, as far as I know, all the device so far support up to 32 global memory, therefore there is no point for simulation to support more than 32 global memory.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants