Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support empty relative node specifier (+e) in rankfiles #807

Closed
acolinisi opened this issue Mar 5, 2021 · 1 comment
Closed

Support empty relative node specifier (+e) in rankfiles #807

acolinisi opened this issue Mar 5, 2021 · 1 comment
Milestone

Comments

@acolinisi
Copy link
Contributor

This is a feature request to add support for +e (empty relative node) to rankfiles. PR #720 (Rankfile per prun) does not cover the use-case with multiple concurrent prun invocations where you want to run multiple independent jobs in one DVM without sharing any nodes.

I think the latter use case needs support for +e relative node specifier. With +n, separate jobs end up allocated onto different slots on the same nodes.

Since jobs are independent it doesn't make sense to require constructing a set of rankfiles (one rankfile per job) that are aware of each other, i.e. one set of rankfiles per one particular set of jobs.

Desired:

cat arankfile 
rank 0=+e0 slot=0
rank 1=+e1 slot=0
# ^ one rankfile, re-used for each job, by means of relative node specs

prte --daemonize

prun -n 2  --map-by rankfile:FILE=arankfile:NOLOCAL bash hostname-sleep.sh 10 &
b07n13
b07n14

prun -n 2  --map-by rankfile:FILE=arankfile:NOLOCAL bash hostname-sleep.sh 10 &
b07n15
b07n16

pterm

Actual:
The above syntax is rejected (only +nX is supported).
With the following rankfile, both jobs are allocated onto the same two nodes (as expected):

rank 0=+n0 slot=0
rank 1=+n1 slot=0
@rhc54
Copy link
Contributor

rhc54 commented Mar 5, 2021

@acolinisi It will probably be awhile before I can get to this - you are pretty savvy, so perhaps you might want to take a crack at it? The rankfile code is in the src/mca/rmaps/rank_file directory. The file itself gets parsed using flex, and the lexical directives are in rmaps_rank_file_lex.l. Currently, it only picks up the +n as a "relative node syntax" directive, so you'd need to add +e to that one.

You then need to extend the code in rmaps_rank_file.c starting at line 271 to account for the +e option. You can see how +n was handled, so it shouldn't be too difficult (I think).

@jjhursey jjhursey added this to the Future milestone Mar 25, 2021
@rhc54 rhc54 closed this as completed May 24, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants