-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
LSF provided affinity is not supported #791
Comments
You should just need to use HWLOC to convert the physical IDs to their logical equivalents. You might look at an old ORTE/OPAL code as that is what we used to do. |
This turns out to be trivial: obj = hwloc_get_pu_obj_by_os_index(topo, physical_id);
logical_id = obj->logical_index; Checked and that works all the way back to HWLOC 1.11, so it should be okay to use. |
I think that'll work fine in a homogeneous configuration. If we detect a heterogeneous configuration then we might have issues if we do the translation on the node with the HNP. In the short-term, that's an ok restriction. In the longer-term, we may want to handle this on the backend, but that would require re-introducing physical IDs more broadly which I don't know if we want to do. I'll see if I can get to the short-term fix next week. |
Fair point. I'd still do the translation on the HNP for simplicity, but you could do it in the plm/base where we receive the hetero topology from the remote node. You'd have to do it that way in the case (which I believe is common for LSF) where the HNP is on a login node and the compute node (due to cgroup or whatever) is different, even if the physical architecture is the same. |
I'm working on this now, and think I have a fix in progress. |
LSF allows the user to specify process affinity at
bsub
time similar to:This results in a non-empty file pointed to by
$LSB_AFFINITY_HOSTFILE
. This file will list the hardware threads that the process should be bound to using physical IDs. The hardware threads is already addressed by setting thePRTE_JOB_HWT_CPUS
attribute. However, the physical hardware thread IDs is the problem as PRTE no longer supports physical IDs.In PR #597 we now throw an error when we detect this scenario. We need to work on a solution to restore this functionality.
The text was updated successfully, but these errors were encountered: