Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update Levant deployment to inspect the evaluation results. #40

Merged
merged 1 commit into from
Nov 20, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions levant/deploy.go
Original file line number Diff line number Diff line change
@@ -60,6 +60,15 @@ func (c *nomadClient) Deploy(job *nomad.Job, autoPromote int) (success bool) {
return
}

// Trigger the evaluationInspector to identify any potential errors in the
// Nomad evaluation run. As far as I can tell from testing; a single alloc
// failure in an evaluation means no allocs will be placed so we exit here.
err = c.evaluationInspector(&eval.EvalID)
if err != nil {
logging.Error("levant/deploy: %v", err)
return
}

switch *job.Type {
case nomadStructs.JobTypeService:
logging.Debug("levant/deploy: beginning deployment watcher for job %s", *job.Name)
@@ -72,6 +81,47 @@ func (c *nomadClient) Deploy(job *nomad.Job, autoPromote int) (success bool) {
return
}

func (c *nomadClient) evaluationInspector(evalID *string) error {

evalInfo, _, err := c.nomad.Evaluations().Info(*evalID, nil)
if err != nil {
return err
}

for {
switch evalInfo.Status {
case nomadStructs.EvalStatusComplete, nomadStructs.EvalStatusFailed, nomadStructs.EvalStatusCancelled:
if len(evalInfo.FailedTGAllocs) == 0 {
logging.Info("levant/deploy: evaluation %s finished successfully", *evalID)
return nil
}

var class, dimension []string

for group, metrics := range evalInfo.FailedTGAllocs {

// Iterate the classes and dimensions to generate lists of each failure.
for c := range metrics.ClassExhausted {
class = append(class, c)
}
for d := range metrics.DimensionExhausted {
dimension = append(dimension, d)
}

logging.Error("levant/deploy: task group %s failed to place %v allocs, failed on %v and exhausted %v",
group, metrics.CoalescedFailures+1, class, dimension)
}

return fmt.Errorf("evaluation %v finished with status %s but failed to place allocations",
*evalID, evalInfo.Status)

default:
time.Sleep(1 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the eval being fetched outside the loop it seems this is the cause for Levant hanging eternally now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is dumb. Will fix.

continue
}
}
}

func (c *nomadClient) deploymentWatcher(evalID string, autoPromote int) (success bool) {

var canaryChan chan interface{}