Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

panic for nil-pointer during exec DestroyTask #6760

Closed
tgross opened this issue Nov 22, 2019 · 2 comments · Fixed by #6763
Closed

panic for nil-pointer during exec DestroyTask #6760

tgross opened this issue Nov 22, 2019 · 2 comments · Fixed by #6763

Comments

@tgross
Copy link
Member

tgross commented Nov 22, 2019

During pre-RC testing of 0.10.2 I've found a bug in the exec driver's DestroyTask. It looks like a fairly tight race between handle.IsRunning() and handle.exec.Shutdown() at https://github.com/hashicorp/nomad/blob/v0.10.1/drivers/exec/driver.go#L462

$ /opt/gopath/bin/nomad --version
Nomad v0.10.2-dev (d2177b2f0d20051d371e9dba51475f313b815c93+CHANGES)
    2019-11-22T13:59:33.988Z [DEBUG] http: request complete: method=DELETE path=/v1/job/countdash?purge=false duration=4.303763ms
    2019-11-22T13:59:33.988Z [DEBUG] worker: dequeued evaluation: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be
    2019-11-22T13:59:33.988Z [DEBUG] worker.service_sched: reconciled current state with desired state: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be job_id=countdash namespace=default results="Total changes: (place 0) (destructive 0) (inplace 0) (stop 2)
Desired Changes for "dashboard": (place 0) (inplace 0) (destructive 0) (stop 1) (migrate 0) (ignore 0) (canary 0)
Desired Changes for "api": (place 0) (inplace 0) (destructive 0) (stop 1) (migrate 0) (ignore 0) (canary 0)"
    2019-11-22T13:59:33.989Z [DEBUG] http: request complete: method=GET path=/v1/evaluation/d84193ff-ff3c-6c74-9c93-dd215c3170be duration=103.832µs
    2019-11-22T13:59:33.990Z [DEBUG] http: request complete: method=GET path=/v1/evaluation/d84193ff-ff3c-6c74-9c93-dd215c3170be/allocations duration=113.013µs
    2019-11-22T13:59:33.992Z [DEBUG] worker: submitted plan for evaluation: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be
    2019-11-22T13:59:33.992Z [DEBUG] worker.service_sched: setting eval status: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be job_id=countdash namespace=default status=complete
    2019-11-22T13:59:33.993Z [DEBUG] worker: updated evaluation: eval="<Eval "d84193ff-ff3c-6c74-9c93-dd215c3170be" JobID: "countdash" Namespace: "default">"
    2019-11-22T13:59:33.993Z [DEBUG] worker: ack evaluation: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be
    2019-11-22T13:59:33.993Z [DEBUG] client: updated allocations: index=32 total=2 pulled=2 filtered=0
    2019-11-22T13:59:33.993Z [DEBUG] client: allocation updates: added=0 removed=0 updated=2 ignored=0
    2019-11-22T13:59:33.996Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=2 ignored=0 errors=0
    2019-11-22T13:59:34.136Z [DEBUG] client: updated allocations: index=34 total=2 pulled=0 filtered=2
    2019-11-22T13:59:34.136Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=2
    2019-11-22T13:59:34.136Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=2 errors=0
    2019-11-22T13:59:34.158Z [ERROR] client.alloc_runner.task_runner.task_hook.stats_hook: failed to start stats collection for task with unrecoverable error: alloc_id=0316f581-f098-22ed-cf5b-0b93a025fd0d task=connect-proxy-count-dashboard error="container stopped"
    2019-11-22T13:59:34.160Z [ERROR] client.alloc_runner.task_runner.task_hook.stats_hook: failed to start stats collection for task with unrecoverable error: alloc_id=96c3b0da-1355-2c46-a37a-b69b8d45fdf1 task=connect-proxy-count-api error="container stopped"
    2019-11-22T13:59:34.168Z [WARN]  client.driver_mgr.docker: RemoveImage on non-referenced counted image id: driver=docker image_id=sha256:72e91d8680d853b874d9aedda3a4b61048630d2043dd490ff36f5b0929f69874
    2019-11-22T13:59:34.170Z [WARN]  client.driver_mgr.docker: RemoveImage on non-referenced counted image id: driver=docker image_id=sha256:72e91d8680d853b874d9aedda3a4b61048630d2043dd490ff36f5b0929f69874
    2019-11-22T13:59:34.215Z [ERROR] client.driver_mgr.exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=96c3b0da-1355-2c46-a37a-b69b8d45fdf1 driver=exec task_name=web error="rpc error: code = Unavailable desc = transport is closing"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x135125f]
goroutine 489 [running]:
github.com/hashicorp/nomad/drivers/exec.(*Driver).DestroyTask(0xc0007a23c0, 0xc0004d3cc0, 0x31, 0x1, 0x1, 0x0)
	/opt/gopath/src/github.com/hashicorp/nomad/drivers/exec/driver.go:462 +0x17f
github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*TaskRunner).clearDriverHandle(0xc000428a00)
	/opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/taskrunner/task_runner_getters.go:85 +0xd8
github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*TaskRunner).Run(0xc000428a00)
	/opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/taskrunner/task_runner.go:533 +0x1035
created by github.com/hashicorp/nomad/client/allocrunner.(*allocRunner).runTasks
	/opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/alloc_runner.go:312 +0x98

Jobspec below, but I've run this particular job plenty of times during testing and never hit this previously:

job "countdash" {
  datacenters = ["dc1"]

  group "api" {
    network {
      mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      connect {
        sidecar_service {}
      }
    }

    task "web" {
      driver = "exec"

      env {
        PORT = "9001"
      }

      config {
        command = "/usr/local/bin/counting-service"
      }
    }
  }

  group "dashboard" {
    network {
      mode = "bridge"

      port "http" {
        static = 9002
        to     = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "dashboard" {
      driver = "exec"

      env {
        COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
        PORT                 = "9002"
      }

      config {
        command = "/usr/local/bin/dashboard-service"
      }
    }
  }
}
@tgross
Copy link
Member Author

tgross commented Nov 22, 2019

at https://github.com/hashicorp/nomad/blob/v0.10.1/drivers/exec/driver.go#L462

Turns out this code path was changed after 0.10.1 and I was looking at the old tag and not the current branch.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants