Skip to content

Commit

Permalink
Specify node feature for slurm job (#529)
Browse files Browse the repository at this point in the history
This PR adds the method `set_node_feature` to srunSettings that accepts
a str or list of strs. Users may now specify node constraints for slurm
jobs.

[ reviewed by @al-rigazzi ]
[ committed by @amandarichardsonn ]
  • Loading branch information
amandarichardsonn authored Mar 22, 2024
1 parent 06d6166 commit 4b35cc9
Show file tree
Hide file tree
Showing 6 changed files with 51 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/api/smartsim_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ steps to a batch.
.. autosummary::

SrunSettings.set_nodes
SrunSettings.set_node_feature
SrunSettings.set_tasks
SrunSettings.set_tasks_per_node
SrunSettings.set_walltime
Expand Down
7 changes: 6 additions & 1 deletion doc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ To be released at some future point in time

Description

- Colo Orchestrator setup now blocks application start until setup finished.
- Add method to specify node features for a Slurm job
- Colo Orchestrator setup now blocks application start until setup finished
- ExecArgs handling correction
- ReadTheDocs config file added and enabled on PRs
- Enforce changelog updates
Expand All @@ -31,6 +32,9 @@ Description

Detailed Notes

- Users can now specify node features for a Slurm job through
``SrunSettings.set_node_feature``. The method accepts a string
or list of strings. (SmartSim-PR529_)
- The request to the colocated entrypoints file within the shell script
is now a blocking process. Once the Orchestrator is setup, it returns
which moves the process to the background and allows the application to
Expand Down Expand Up @@ -61,6 +65,7 @@ Detailed Notes
Slurm and Open MPI. (SmartSim-PR520_)


.. _SmartSim-PR529: https://github.com/CrayLabs/SmartSim/pull/529
.. _SmartSim-PR522: https://github.com/CrayLabs/SmartSim/pull/522
.. _SmartSim-PR524: https://github.com/CrayLabs/SmartSim/pull/524
.. _SmartSim-PR520: https://github.com/CrayLabs/SmartSim/pull/520
Expand Down
13 changes: 13 additions & 0 deletions smartsim/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,19 @@ def set_time(self, hours: int = 0, minutes: int = 0, seconds: int = 0) -> None:
self._fmt_walltime(int(hours), int(minutes), int(seconds))
)

def set_node_feature(self, feature_list: t.Union[str, t.List[str]]) -> None:
"""Specify the node feature for this job
:param feature_list: node feature to launch on
:type feature_list: str | list[str]
"""
logger.warning(
(
"Feature specification not implemented for this "
f"RunSettings type: {type(self)}"
)
)

@staticmethod
def _fmt_walltime(hours: int, minutes: int, seconds: int) -> str:
"""Convert hours, minutes, and seconds into valid walltime format
Expand Down
15 changes: 15 additions & 0 deletions smartsim/settings/slurmSettings.py
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,21 @@ def set_broadcast(self, dest_path: t.Optional[str] = None) -> None:
"""
self.run_args["bcast"] = dest_path

def set_node_feature(self, feature_list: t.Union[str, t.List[str]]) -> None:
"""Specify the node feature for this job
This sets ``-C``
:param feature_list: node feature to launch on
:type feature_list: str | list[str]
:raises TypeError: if not str or list of str
"""
if isinstance(feature_list, str):
feature_list = [feature_list.strip()]
elif not all(isinstance(feature, str) for feature in feature_list):
raise TypeError("node_feature argument must be string or list of strings")
self.run_args["C"] = ",".join(feature_list)

@staticmethod
def _fmt_walltime(hours: int, minutes: int, seconds: int) -> str:
"""Convert hours, minutes, and seconds into valid walltime format
Expand Down
1 change: 1 addition & 0 deletions tests/test_run_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ def test_set_format_args(set_str, val, key):
pytest.param("set_task_map", (3,), id="set_task_map"),
pytest.param("set_cpus_per_task", (4,), id="set_cpus_per_task"),
pytest.param("set_hostlist", ("hostlist",), id="set_hostlist"),
pytest.param("set_node_feature", ("P100",), id="set_node_feature"),
pytest.param(
"set_hostlist_from_file", ("~/hostfile",), id="set_hostlist_from_file"
),
Expand Down
15 changes: 15 additions & 0 deletions tests/test_slurm_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,21 @@ def test_set_hostlist():
rs.set_hostlist([5])


def test_set_node_feature():
rs = SrunSettings("python")
rs.set_node_feature(["P100", "V100"])
assert rs.run_args["C"] == "P100,V100"

rs.set_node_feature("P100")
assert rs.run_args["C"] == "P100"

with pytest.raises(TypeError):
rs.set_node_feature(5)

with pytest.raises(TypeError):
rs.set_node_feature(["P100", 5])


def test_set_hostlist_from_file():
rs = SrunSettings("python")
rs.set_hostlist_from_file("./path/to/hostfile")
Expand Down

0 comments on commit 4b35cc9

Please # to comment.