Skip to content

Latest commit

 

History

History

terraform

Metaflow on Nebius AI Cloud: Minimal viable stack

What does it do?

It provisions all necessary Nebius AI Cloud resources:

  • Managed Service for Kubernetes cluster
  • Object Storage bucket
  • Managed Service for PostgreSQL cluster

After that, it deploys Metaflow services on the Managed Kubernetes cluster.

Prerequisites

After installing the prerequisite tools, run source ./.envrc.zsh if you run bash, or source ./.envrc.zsh if you prefer zsh.

Usage

The templates are organized into two modules, infra and services.

  1. Create terraform.tfvars with Terraform variables:

    org_prefix = "yourorg"
    

    This is used to help generate unique names for the following resources:

    • Object Storage bucket
    • Managed PostgreSQL cluster

    You can also add tenant_id, project_id and vpc_subnet_id to terraform.tfvars

    tenant_id = "tenant-***"
    project_id = "project-***"
    vpc_subnet_id = "vpcsubnet-***"
    
    • To get the tenant and project IDs, open the project menu at the top of the web console and click Copy tenant ID next to the tenant name and Copy project ID next to the project name.
    • To get the subnet ID, in the web console, go to Network and click Copy subnet ID next to the subnet name.
  2. Apply the infra module that creates the Nebius AI Cloud resources:

    terraform init  
    terraform apply -target="module.infra" -var-file=terraform.tfvars
  3. Set up authentication and authorization for the service account created with infra:

    1. In the web console, go to Access → Service accounts.

    2. Find the service account whose name starts with stmetaflow and click on it.

    3. Under the service account name, click Add to group.

    4. Add the account to the editors group and click Close.

    5. Click Create access key.

    6. Copy the key ID and the secret key and add them to terraform.tfvars:

      aws_access_key_id = "" # Key ID
      aws_secret_access_key = "" Secret key
  4. Apply the services module:

    terraform apply -target="module.services" -var-file=terraform.tfvars

Airflow

Note: This template only provides a quick start for testing purposes. We do not recommend it for real production deployments.

By default, this Terraform template does not deploy Airflow. To deploy it, set the deploy_airflow variable to true in terraform.tfvars:

deploy_airflow = true

If deploy_airflow is set to true, the services module will deploy Airflow on the Managed Kubernetes cluster deployed by the infra module. It uses the official Helm chart.

The Terraform template deploys Airflow configured with a LocalExecutor simplicity. Metaflow can work with any Airflow executor.

If you changed the value of deploy_airflow for an existing deployment, reapply both infra and services modules as described in the instructions.

Shipping Metaflow-compiled DAGs to Airflow

Airflow expects Python files with Airflow DAGS present in the dags_folder. By default, this Terraform template uses the default path set in the Airflow helm chart which is {AIRFLOW_HOME}/dags (/opt/airflow/dags).

The metaflow-tools repository also ships an airflow_dag_upload.py file that can help sync Airflow DAG file generated by Metaflow to the Airflow scheduler deployed by this template. Under the hood airflow_dag_upload.py uses the kubectl cp command to copy files from local to the Airflow scheduler's container. Example of how to use the file:

python airflow_dag_upload.py my-dag.py /opt/airflow/dags/my-dag.py

(Advanced) Terraform state management

By default, Terraform manages the state of the Nebius AI Cloud resources in local tfstate files.

If you plan to maintain the minimal stack for any significant period of time, it is highly recommended to store the state files in a cloud storage instead, like Object Storage in Nebius AI Cloud. This is especially useful in the following cases:

  • More than one person needs to manage the stack by using Terraform. Everyone should work off a single copy of the state file.
  • You want to mitigate the risk of data loss on your local disk.

For more details, see the Terraform documentation.

Destroying

To destroy infra, run:

terraform destroy -target="module.infra" -var-file=terraform.tfvars

To destroy services, run:

terraform destroy -target="module.services" -var-file=terraform.tfvars