This module creates a slurm controller node via the SchedMD/slurm-gcp controller module.
More information about Slurm On GCP can be found at the project's GitHub page and in the Slurm on Google Cloud User Guide.
The user guide provides detailed instructions on customizing and enhancing the Slurm on GCP cluster as well as recommendations on configuring the controller for optimal performance at different scales.
- source: community/modules/scheduler/SchedMD-slurm-on-gcp-controller
kind: terraform
id: slurm_controller
use:
- network1
- homefs
- compute_partition
settings:
login_node_count: 1
This creates a controller node connected to the primary subnetwork with 1 login
node (defined elsewhere). The controller will also have the homefs
file system
mounted via the use
field and manage one partition, also declared in the use
field. For more context see the
hpc-cluster-small example.
The HPC Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.
Copyright 2022 Google LLC
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Name | Version |
---|---|
terraform | >= 0.14.0 |
>= 3.83 |
Name | Version |
---|---|
>= 3.83 |
Name | Source | Version |
---|---|---|
slurm_cluster_controller | github.com/SchedMD/slurm-gcp//tf/modules/controller/ | v4.1.8 |
Name | Type |
---|---|
google_compute_image.compute_image | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
boot_disk_size | Size of boot disk to create for the cluster controller node | number |
50 |
no |
boot_disk_type | Type of boot disk to create for the cluster controller node. Choose from: pd-ssd, pd-standard, pd-balanced, pd-extreme. pd-ssd is recommended if the controller is hosting the SlurmDB and NFS share. If SlurmDB and NFS share are not running on the controller, pd-standard is recommended. See "Controller configuration recommendations" in the Slurm on Google Cloud User Guide for more information: https://goo.gle/slurm-gcp-user-guide |
string |
"pd-ssd" |
no |
cloudsql | Define an existing CloudSQL instance to use instead of instance-local MySQL | object({ |
null |
no |
cluster_name | Name of the cluster | string |
null |
no |
compute_node_scopes | Scopes to apply to compute nodes. | list(string) |
[ |
no |
compute_node_service_account | Service Account for compute nodes. | string |
null |
no |
compute_startup_script | Custom startup script to run on the compute nodes | string |
null |
no |
controller_instance_template | Instance template to use to create controller instance | string |
null |
no |
controller_machine_type | Compute Platform machine type to use in controller node creation. c2-standard-4 is recommended for clusters up to 50 nodes, for larger clusters see "Controller configuration recommendations" in the Slurm on Google Cloud User Guide: https://goo.gle/slurm-gcp-user-guide |
string |
"c2-standard-4" |
no |
controller_scopes | Scopes to apply to the controller | list(string) |
[ |
no |
controller_secondary_disk | Create secondary disk mounted to controller node | bool |
false |
no |
controller_secondary_disk_size | Size of disk for the secondary disk | number |
100 |
no |
controller_secondary_disk_type | Disk type (pd-ssd or pd-standard) for secondary disk | string |
"pd-ssd" |
no |
controller_service_account | Service Account for the controller | string |
null |
no |
controller_startup_script | Custom startup script to run on the controller | string |
null |
no |
deployment_name | Name of the deployment | string |
n/a | yes |
disable_compute_public_ips | If set to true, create Cloud NAT gateway and enable IAP FW rules | bool |
true |
no |
disable_controller_public_ips | If set to true, create Cloud NAT gateway and enable IAP FW rules | bool |
false |
no |
instance_image | Slurm image to use for the controller instance | object({ |
{ |
no |
intel_select_solution | Configure the cluster to meet the performance requirement of the Intel Select Solution | string |
null |
no |
jwt_key | Specific libjwt key to use | any |
null |
no |
labels | Labels to add to controller instance. List of key key, value pairs. | any |
{} |
no |
login_node_count | Number of login nodes in the cluster | number |
0 |
no |
munge_key | Specific munge key to use | any |
null |
no |
network_storage | An array of network attached storage mounts to be configured on all instances. | list(object({ |
[] |
no |
partition | An array of configurations for specifying multiple machine types residing in their own Slurm partitions. | list(object({ |
n/a | yes |
project_id | Compute Platform project that will host the Slurm cluster | string |
n/a | yes |
region | Compute Platform region where the Slurm cluster will be located | string |
n/a | yes |
shared_vpc_host_project | Host project of shared VPC | string |
null |
no |
subnetwork_name | The name of the pre-defined VPC subnet you want the nodes to attach to based on Region. | string |
null |
no |
suspend_time | Idle time (in sec) to wait before nodes go away | number |
300 |
no |
zone | Compute Platform zone where the servers will be located | string |
n/a | yes |
Name | Description |
---|---|
controller_name | Name of the controller node |