Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implement Memory Limits for User Jobs on Compute Nodes #319

Open
andriy-safe-ai opened this issue Oct 15, 2024 · 0 comments
Open

Implement Memory Limits for User Jobs on Compute Nodes #319

andriy-safe-ai opened this issue Oct 15, 2024 · 0 comments
Assignees

Comments

@andriy-safe-ai
Copy link
Contributor

We need to set memory limits to prevent users from over-allocating RAM on our compute nodes. Recently, we’ve experienced instances where compute nodes become unresponsive when a job consumes all available memory.

To address this, we can configure memory limits using the RealMemory and MemSpecLimit options in the /etc/slurm/slurm.conf file for our nodelist. This will ensure that users are unable to request more memory than a node can safely allocate, improving node stability and preventing system crashes.

@andriy-safe-ai andriy-safe-ai self-assigned this Oct 15, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant