Skip to content

task destroy, need of being able to specify output folder #307

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Tracked by #326
DavidGOrtega opened this issue Nov 25, 2021 · 9 comments · Fixed by #340
Closed
Tracked by #326

task destroy, need of being able to specify output folder #307

DavidGOrtega opened this issue Nov 25, 2021 · 9 comments · Fixed by #340
Assignees
Labels
enhancement New feature or request p1-important High priority resource-task iterative_task TF resource

Comments

@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented Nov 25, 2021

If I launch a cluster of tasks in the same workdir but different env vars or script I would like to be able to recover all the execution in different folders. The problem right now is that directory is used in the initial sync and in the final output.
We need to be able to specify directoryOutput aligned with directory by default to avoid this outputs to be overwritten

resource "iterative_task" "task1" {
  cloud     = "aws"
  machine   = "t2.micro"
  spot      = 0
  name      = "task1"
  directory = "."
  region    = "us-west"

  script = <<-END
    #!/bin/bash
    echo 'task1' >> report.md
  END
}

resource "iterative_task" "task2" {
  cloud     = "aws"
  machine   = "t2.micro"
  spot      = 0
  name      = "task2"
  directory = "."
  region    = "us-west"

  script = <<-END
    #!/bin/bash
    echo 'task2' >> report.md
  END
}

If I destroy the example above I have just only a mess in current folder we might want

resource "iterative_task" "task1" {
  cloud     = "aws"
  machine   = "t2.micro"
  spot      = 0
  name      = "task1"
  directory = "."
  output = "./task1"
  region    = "us-west"

  script = <<-END
    #!/bin/bash
    echo 'task1' >> report.md
  END
}

resource "iterative_task" "task2" {
  cloud     = "aws"
  machine   = "t2.micro"
  spot      = 0
  name      = "task2"
  directory = "."
  output = "./task2"
  region    = "us-west"

  script = <<-END
    #!/bin/bash
    echo 'task2' >> report.md
  END
}

note the output property

@DavidGOrtega DavidGOrtega changed the title task destroy, need of being able to autogenerate subfolders to sync data task destroy, need of being able to specify output folder Nov 25, 2021
@DavidGOrtega DavidGOrtega added p1-important High priority resource-task iterative_task TF resource labels Nov 25, 2021
@0x2b3bfa0
Copy link
Member

This probably belongs to an epic about parallel training. With the current behavior, users can solve that by writing the result of each machine to a different path:

resource "iterative_task" "task" {
  name        = "example"
  cloud       = "aws"
  parallelism = 4

  script = <<-END
    #!/bin/bash
    date >> result-$(uuidgen)
  END
}

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Nov 25, 2021

Machines have no means of knowing if they're resuming an interrupted tast or starting a new one. Without implementing some sort of leader election and task splitting mechanism, there won't be an 1:1 mapping between parallelism and the number of outputs.

@DavidGOrtega
Copy link
Contributor Author

DavidGOrtega commented Nov 25, 2021

Im not speaking about training in parallel, just launch the same task with different parameters within the same terraform file.
Nothing to be with parallelism

@DavidGOrtega
Copy link
Contributor Author

Im updating with a terraform example

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Nov 26, 2021

Thanks for the clarification! 🙏🏼 Still, it looks like my first reply is relevant: you can avoid overwriting artifacts by naming them differently on each task.

Nevertheless, there is a deeper problem with the current approach: running terraform destroy will overwrite all the files in the given directory and destroying several tasks simultaneously will produce the errors described in #306.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Nov 26, 2021

We should probably consider having separate directories for input and output:

Schema

Attributes

resource "iterative_task" "task" {
  name        = "example"
  cloud       = "aws"

  input_directory  = "."
  output_directory = "./output"

  script = <<-END
    #!/bin/bash
    date >> result
  END
}

Block

resource "iterative_task" "task" {
  name        = "example"
  cloud       = "aws"

  script = <<-END
    #!/bin/bash
    date >> result
  END

  directories {
    input  = "."
    output = "./output"
  }
}

Behavior

Droste effect prevention

  • If the input directory contains the output directory, the latter should be excluded from the upload.

Input exclusion

  • The output directory should only contain the produced artifacts and, perhaps, the task/machine logs.
  • The output directory should not contain the files on the input directory.

Existence

  • The output directory will be automatically created if it doesn't exist.
  • The input directory should exist; otherwise, an error will be displayed.

Example

  • The output directory should have subdirectories with the Long task name, each of them containing the output for a given task.
  output
    tpi-first-a1b2-c3d4.log
    tpi-first-a1b2-c3d4
      result.model
    tpi-second-1a2b-3c4d.log
    tpi-second-1a2b-3c4d
      result.model
  main.tf
  train.py
./
├── output/
│   ├── tpi-first-a1b2-c3d4.log
│   ├── tpi-first-a1b2-c3d4/
│   │   └── result.model
│   ├── tpi-second-1a2b-3c4d.log
│   └── tpi-second-1a2b-3c4d/
│       └── result.model
├── main.tf
└── train.py

@DavidGOrtega
Copy link
Contributor Author

directories {
    input  = "."
    output = "./output"
  }

Love this idea.
As I said In other issues I can recover a bunch of tasks and change the directory do apply and then the output folder will be the one I gave. However to make it work I have to first remove the tfstate and then the apply and then destroy... not very user friendly

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Nov 29, 2021

Regarding user experience, we may want to set ForceNew: false for the directory attribute in the schema, so it can be updated locally.

"directory": {
Type: schema.TypeString,
ForceNew: true,
Optional: true,
Default: "",
},

@DavidGOrtega
Copy link
Contributor Author

I have picked this @0x2b3bfa0

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request p1-important High priority resource-task iterative_task TF resource
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants