You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran a bunch of tests and ONE of the runs resulted in a ton of cascaded tracebacks. These aren't immediately helpful and there are some here that could be caught with some exception hanlding.
For some reason, the agent on this run was unable to resolve testflinger.canonical.com, and that led to all the tracebacks... Just for debugging purposes, perhaps these could be caught and handled with some friendlier messages.
This is the only run out of 30 that had this issue, all using the same agent, so I don't know what the actual problem was. This bug, as noted above, is just about hopefully making those traces a bit more friendly.
bladernr@weavile:~$ testflinger submit --poll 6md.yaml [61/61]
Job submitted successfully!
job_id: b933b67f-a71c-4917-bc5c-ee846659be62
This job is waiting on a node to become available.
Jobs ahead in queue: 14
Jobs ahead in queue: 13
Jobs ahead in queue: 12
Jobs ahead in queue: 11 Jobs ahead in queue: 10 Jobs ahead in queue: 9
Jobs ahead in queue: 8
Jobs ahead in queue: 7
ERROR: 2023-09-29 19:24:01 client.py:61 -- Timeout while trying to communicate with the server.
ERROR: 2023-09-29 19:25:16 client.py:61 -- Timeout while trying to communicate with the server.
Jobs ahead in queue: 6
Jobs ahead in queue: 5
Jobs ahead in queue: 4
Jobs ahead in queue: 3
Jobs ahead in queue: 2
ERROR: 2023-09-29 22:10:53 client.py:61 -- Timeout while trying to communicate with the server.
Jobs ahead in queue: 1
Jobs ahead in queue: 0
ERROR: 2023-09-29 22:46:28 client.py:61 -- Timeout while trying to communicate with the server.
***********************************************
* Starting testflinger setup phase on multi-3 *
***********************************************
Setup
***************************************************
* Starting testflinger provision phase on multi-3 *
***************************************************
2023-09-30 02:52:12,569 multi-3 INFO: DEVICE AGENT: BEGIN provision
2023-09-30 02:52:12,569 multi-3 INFO: DEVICE AGENT: Provisioning device
2023-09-30 02:52:12,569 multi-3 INFO: DEVICE AGENT: Creating test jobs
2023-09-30 02:52:16,845 multi-3 INFO: DEVICE AGENT: Created job d0f6945c-6903-4522-b26d-a872bcdd72b5
2023-09-30 02:52:21,316 multi-3 INFO: DEVICE AGENT: Created job 9114248e-1d0d-407a-84b0-7580349ba535
2023-09-30 02:52:26,187 multi-3 INFO: DEVICE AGENT: Created job 6a5f97b5-4952-47c1-8b90-21c77aab9fa0
2023-09-30 02:52:30,651 multi-3 INFO: DEVICE AGENT: Created job 32880554-e1c8-4a8f-87a8-e882c8602b8d
2023-09-30 02:52:35,490 multi-3 INFO: DEVICE AGENT: Created job 39cd04f3-2938-42c5-9db1-a026af67d083
2023-09-30 02:52:42,702 multi-3 INFO: DEVICE AGENT: Created job 7789a4a3-c9e3-4fe2-a685-f33b13943514
2023-09-30 02:54:16,828 multi-3 ERROR: DEVICE AGENT: Unable to communicate with specified server.
2023-09-30 02:54:16,828 multi-3 ERROR: DEVICE AGENT: Unable to get status for job 7789a4a3-c9e3-4fe2-a685-f33b13943514
2023-09-30 02:54:53,280 multi-3 ERROR: DEVICE AGENT: Job 39cd04f3-2938-42c5-9db1-a026af67d083 failed to allocate, cancelling remaining jobs
2023-09-30 02:55:19,313 multi-3 ERROR: DEVICE AGENT: Unable to communicate with specified server.
2023-09-30 02:55:19,313 multi-3 ERROR: DEVICE AGENT: Unable to cancel job 32880554-e1c8-4a8f-87a8-e882c8602b8d
2023-09-30 02:55:19,313 multi-3 ERROR: DEVICE AGENT: Unable to cancel job: 32880554-e1c8-4a8f-87a8-e882c8602b8d
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 203, in _new_conn
sock = connection.create_connection(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 790, in urlopen
response = self._make_request(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 1092, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 611, in connect
self.sock = sock = self._new_conn()
File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 210, in _new_conn
raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f8913fcc1c0>: Failed to resolve 'testflinger.canoni
cal.com' ([Errno -2] Name or service not known)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 844, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='testflinger.canonical.com', port=443): Max retries exceeded with url: /v1/job/32
880554-e1c8-4a8f-87a8-e882c8602b8d/action (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f8913fcc1c0>: Fai
led to resolve 'testflinger.canonical.com' ([Errno -2] Name or service not known)"))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/snappy_device_agents/devices/multi/multi.py", line 182, in cancel_jobs
self.client.cancel_job(job)
File "/usr/local/lib/python3.8/dist-packages/snappy_device_agents/devices/multi/tfclient.py", line 149, in cancel_job
self.post(f"/v1/job/{job_id}/action", {"action": "cancel"})
File "/usr/local/lib/python3.8/dist-packages/snappy_device_agents/devices/multi/tfclient.py", line 79, in post
req = requests.post(uri, json=data, timeout=timeout)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='testflinger.canonical.com', port=443): Max retries exceeded with url: /v1/job
/32880554-e1c8-4a8f-87a8-e882c8602b8d/action (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f8913fcc1c0>:
Failed to resolve 'testflinger.canonical.com' ([Errno -2] Name or service not known)"))
2023-09-30 02:55:27,110 multi-3 ERROR: DEVICE AGENT: Received status code 400 from server.
Traceback (most recent call last):
File "/usr/local/bin/snappy-device-agent", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/snappy_device_agents/cmd.py", line 59, in main
raise SystemExit(args.func(args))
File "/usr/local/lib/python3.8/dist-packages/snappy_device_agents/devices/multi/__init__.py", line 55, in provision
self.device.provision()
File "/usr/local/lib/python3.8/dist-packages/snappy_device_agents/devices/multi/multi.py", line 72, in provision
raise ProvisioningError("Unable to allocate all devices")
snappy_device_agents.devices.ProvisioningError: Unable to allocate all devices
*************************************************
* Starting testflinger cleanup phase on multi-3 *
*************************************************
2023-09-30 02:55:32,868 multi-3 ERROR: DEVICE AGENT: Unable to find multi-job data file, job_list.json not found
complete
The text was updated successfully, but these errors were encountered:
I ran a bunch of tests and ONE of the runs resulted in a ton of cascaded tracebacks. These aren't immediately helpful and there are some here that could be caught with some exception hanlding.
For some reason, the agent on this run was unable to resolve testflinger.canonical.com, and that led to all the tracebacks... Just for debugging purposes, perhaps these could be caught and handled with some friendlier messages.
This is the only run out of 30 that had this issue, all using the same agent, so I don't know what the actual problem was. This bug, as noted above, is just about hopefully making those traces a bit more friendly.
The text was updated successfully, but these errors were encountered: