-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Unstable Connections and constant resets #1813
Comments
The various context deadlines being exceeded suggest that the machine is out of resources (for instance, possibly swapping). Do you have information about CPU/RAM usage during this time? How much data was being transferred over how many concurrent connections, and how often were these being created? |
Hi, on which component should I check ? In my scenario I used 1 minio as a server and each test scenario is transfering a file to this server via boundary worker. The file is 100MB, transfer takes 3-4 seconds then s3 client is shutdown and restarted, so I would say every 4-5 seconds a connection is established via boundary. Sometimes i can do 4-5 of these sessions concurrently, sometimes i can reach 16 before the whole controller collapses. |
Ideally check for both the controller and worker. Also, what is your target definition? Specifically, what are your connection count and expiration settings? |
Do you have any way to figure out the environment that the controller is running in? I suspect that GOMAXPROCS is being set in some way which would limit the CPU cores available to a single core. I could see a scenario where, if the controller is being limited to a single core, the ability of the controller to process requests is being blocked by (relatively speaking) long calls to the database, whereas normally you'd expect the database connections and requests to be spread out across cores. In a similar line of questioning, do you have configured limits on the number of database connections allowed within the Boundary controller's config files? If you're running via something like systemd you could try running with Also, do you have any idea of how many sessions get established overall before you start seeing issues? |
In my case I am running the standard containers in a kubernetes cluster. I do not have any variables set other then the DB connection string. It looks like this: apiVersion: apps/v1 The postgres env mentioned there looks like: {{/*
I have tried adding BOUNDARY_STATUS_GRACE_PERIOD, changing max_open_connections to 2, 5, 10 without observing any improvement. The only thing I noticed is Postgres database server is very heavily used, already at 10 concurrent sessions is using 8 CPU cores and sometimes experiencing deadlocks in the function public.wh_update_session_connection(). |
Oh, you're the same reporter as #1812 . This all makes sense then. I think the first thing to tackle is that, because I think fixing the database deadlocks will likely have a big impact here too. |
We are also seeing these issues. On a controller:
On a worker at the same moment:
The controller service did get killed eventually by the OOM reaper: $ sudo journalctl -u boundary --no-pager | grep 9/KILL
Mar 14 15:49:40 ip-10-15-0-142 systemd[1]: boundary.service: Main process exited, code=killed, status=9/KILL
$ sudo dmesg | grep boundary
[229196.961106] [ 2891] 113 2891 350263 190888 1581056 0 0 boundary
[229196.961109] [ 2899] 113 2899 181883 814 126976 0 0 boundary-plugin
[229196.961111] [ 2906] 113 2906 178386 466 98304 0 0 boundary-plugin
[229196.961113] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/boundary.service,task=boundary,pid=2891,uid=113
[229196.961143] Out of memory: Killed process 2891 (boundary) total-vm:1401052kB, anon-rss:695128kB, file-rss:68424kB, shmem-rss:0kB, UID:113 pgtables:1544kB oom_score_adj:0
[229196.963277] oom_reaper: reaped process 2891 (boundary), now anon-rss:695224kB, file-rss:69052kB, shmem-rss:0kB While it is routine for the boundary controllers to get killed by Controllers$ sudo systemctl status boundary
● boundary.service - Boundary
Loaded: loaded (/etc/systemd/system/boundary.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-03-14 15:49:41 UTC; 52min ago
Main PID: 9302 (boundary)
Tasks: 23 (limit: 1081)
Memory: 583.6M
CGroup: /system.slice/boundary.service
├─9302 /usr/local/bin/boundary server -config /etc/boundary.hcl
├─9310 /tmp/656155661/boundary-plugin-host-aws
└─9318 /tmp/3661262235/boundary-plugin-host-azure $ sudo systemctl status boundary
● boundary.service - Boundary
Loaded: loaded (/etc/systemd/system/boundary.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-03-12 00:15:47 UTC; 2 days ago
Main PID: 3035 (boundary)
Tasks: 23 (limit: 1081)
Memory: 615.4M
CGroup: /system.slice/boundary.service
├─3035 /usr/local/bin/boundary server -config /etc/boundary.hcl
├─3042 /tmp/2645974992/boundary-plugin-host-aws
└─3049 /tmp/1177565111/boundary-plugin-host-azure Workers$ sudo systemctl status boundary
● boundary.service - Boundary
Loaded: loaded (/etc/systemd/system/boundary.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-03-12 00:19:37 UTC; 2 days ago
Main PID: 1049 (boundary)
Tasks: 8 (limit: 1081)
Memory: 56.0M
CGroup: /system.slice/boundary.service
└─1049 /usr/local/bin/boundary server -config /etc/boundary.hcl $ sudo systemctl status boundary
● boundary.service - Boundary
Loaded: loaded (/etc/systemd/system/boundary.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-03-12 00:18:30 UTC; 2 days ago
Main PID: 1042 (boundary)
Tasks: 8 (limit: 1081)
Memory: 60.2M
CGroup: /system.slice/boundary.service
└─1042 /usr/local/bin/boundary server -config /etc/boundary.hcl Also, our controllers constantly use 50-100% CPU usage, even under no load. The workers basically use no CPU usage. |
@giulio1979 @micchickenburger We have performance improvements that will be delivered soon in our 0.8.0 release; please let me know if you still see these issues after an upgrade. |
I believe this has been addressed by changes in 0.8.x and 0.9.0. I'm going to close this for the moment; if you still have issues please reopen with details! |
I doing some long running testing on boundary and I notice after a few hours that my connections became highly unstable:
For example:
WARNING: Upload failed: /20220105Jan1641368655?partNumber=5&uploadId=09d93ce5-6faf-46c6-82ed-3cf89b86262f ([Errno 104] Connection reset by peer)
WARNING: Waiting 3 sec...
upload: '/tmp/1' -> 's3://test/20220105Jan1641368655' (104857600 bytes in 14.6 seconds, 6.83 MB/s) [1 of 1]
WARNING: Retrying failed request: /?delimiter=%2F (Remote end closed connection without response)
WARNING: Waiting 3 sec...
delete: 's3://test/20220105Jan1641368655'
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
WARNING: Upload failed: /20220105Jan1641368675?partNumber=4&uploadId=1c0ac6bd-5fae-459a-8b9b-966287daa702 ([Errno 104] Connection reset by peer)
WARNING: Waiting 3 sec...
upload: '/tmp/1' -> 's3://test/20220105Jan1641368675' (104857600 bytes in 9.3 seconds, 10.73 MB/s) [1 of 1]
delete: 's3://test/20220105Jan1641368680'
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
upload: '/tmp/1' -> 's3://test/20220105Jan1641368687' (104857600 bytes in 3.2 seconds, 30.81 MB/s) [1 of 1]
delete: 's3://test/20220105Jan1641368684'
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
WARNING: Upload failed: /20220105Jan1641368692?partNumber=2&uploadId=07ada446-cc1e-46cb-8941-800368f12797 ([Errno 32] Broken pipe)
WARNING: Waiting 3 sec...
upload: '/tmp/1' -> 's3://test/20220105Jan1641368692' (104857600 bytes in 9.3 seconds, 10.78 MB/s) [1 of 1]
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
upload: '/tmp/1' -> 's3://test/20220105Jan1641368703' (104857600 bytes in 3.6 seconds, 27.58 MB/s) [1 of 1]
delete: 's3://test/20220105Jan1641368700'
delete: 's3://test/20220105Jan1641368703'
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
WARNING: Upload failed: /20220105Jan1641368708?partNumber=2&uploadId=f9b807d3-f41f-4941-ad89-94c85193c484 ([Errno 32] Broken pipe)
WARNING: Waiting 3 sec...
upload: '/tmp/1' -> 's3://test/20220105Jan1641368708' (104857600 bytes in 8.6 seconds, 11.57 MB/s) [1 of 1]
WARNING: Retrying failed request: /20220105Jan1641368708 (Remote end closed connection without response)
WARNING: Waiting 3 sec...
The boundary controller is filled with:
2022-01-05T07:46:02.752Z [ERROR] event.WriteError: unable to write error: session.(Repository).CloseDeadConnectionsForWorker: db.DoTx: unknown, unknown: error #0: sql: transaction has already been committed or rolled back
2022-01-05T07:46:10.455Z [ERROR] error event: op=session.(Repository).ConnectConnection request_info:id=gtraceid_jjg7wzTAXddMoW460QX5 request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection error="session.(Repository).ConnectConnection: db.Create: create failed: unknown: error #0: timeout: context canceled" error_fields:Op=session.(Repository).ConnectConnection error_fields:Wrapped="map[Code:unknown Msg:create failed Op:db.Create Wrapped:timeout: context canceled]" error_fields:Code=unknown error_fields:Msg="" id=e_Vm5QSmXa67 version=v0.1
2022-01-05T07:46:10.455Z [ERROR] error event: error="session.(Repository).ConnectConnection: db.DoTx: unknown, unknown: error #0: sql: transaction has already been committed or rolled back" error_fields:Code=unknown error_fields:Msg="" error_fields:Op=session.(Repository).ConnectConnection error_fields:Wrapped="map[Code:unknown Msg: Op:db.DoTx Wrapped:sql: transaction has already been committed or rolled back]" id=e_Vm5QSmXa67 version=v0.1 op=session.(Repository).ConnectConnection request_info:id=gtraceid_jjg7wzTAXddMoW460QX5 request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection
2022-01-05T07:46:10.455Z [ERROR] error event: error="db.DoTx: unknown, unknown: error #0: timeout: context canceled; sql: transaction has already been committed or rolled back" error_fields:Code=unknown error_fields:Msg="" error_fields:Op=db.DoTx error_fields:Wrapped="timeout: context canceled; sql: transaction has already been committed or rolled back" id=e_nQITvmnLHb version=v0.1 op=db.DoTx request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection request_info:id=gtraceid_QiHMuDXvyxH2lMdBMzOR
2022-01-05T07:46:10.455Z [ERROR] error event: request_info:id=gtraceid_QiHMuDXvyxH2lMdBMzOR request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection error="session.(Repository).ConnectConnection: db.DoTx: unknown, unknown: error #0: timeout: context canceled; sql: transaction has already been committed or rolled back" error_fields:Code=unknown error_fields:Msg="" error_fields:Op=session.(Repository).ConnectConnection error_fields:Wrapped="map[Code:unknown Msg: Op:db.DoTx Wrapped:timeout: context canceled; sql: transaction has already been committed or rolled back]" id=e_nQITvmnLHb version=v0.1 op=session.(Repository).ConnectConnection
2022-01-05T07:46:10.461Z [ERROR] error event: op=db.DoTx request_info:id=gtraceid_OhuoX7GEqN0sVjyC84Lj request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection error="db.DoTx: unknown, unknown: error #0: timeout: context canceled; sql: transaction has already been committed or rolled back" error_fields:Code=unknown error_fields:Msg="" error_fields:Op=db.DoTx error_fields:Wrapped="timeout: context canceled; sql: transaction has already been committed or rolled back" id=e_ATlZjUOqoP version=v0.1
2022-01-05T07:46:10.461Z [ERROR] error event: error="session.(Repository).ConnectConnection: db.DoTx: unknown, unknown: error #0: timeout: context canceled; sql: transaction has already been committed or rolled back" error_fields:Wrapped="map[Code:unknown Msg: Op:db.DoTx Wrapped:timeout: context canceled; sql: transaction has already been committed or rolled back]" error_fields:Code=unknown error_fields:Msg="" error_fields:Op=session.(Repository).ConnectConnection id=e_ATlZjUOqoP version=v0.1 op=session.(Repository).ConnectConnection request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection request_info:id=gtraceid_OhuoX7GEqN0sVjyC84Lj
2022-01-05T07:46:17.666Z [ERROR] error event: id=e_UL1Tjy2BDb version=v0.1 op=session.fetchConnectionStates request_info:id=gtraceid_6TJkH2PfRZUFcuavJ8aM request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection error="session.fetchConnectionStates: db.SearchWhere: unknown, unknown: error #0: context canceled" error_fields:Code=unknown error_fields:Msg="" error_fields:Op=session.fetchConnectionStates error_fields:Wrapped="map[Code:unknown Msg: Op:db.SearchWhere Wrapped:context canceled]"
2022-01-05T07:46:17.666Z [ERROR] error event: request_info:id=gtraceid_sgltIGrlE95mcS3s7Bu5 request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection error="session.(Repository).ConnectConnection: db.Update: unknown, unknown: error #0: timeout: context canceled" error_fields:Code=unknown error_fields:Msg="" error_fields:Op=session.(Repository).ConnectConnection error_fields:Wrapped="map[Code:unknown Msg: Op:db.Update Wrapped:timeout: context canceled]" id=e_cerA6Lzl4c version=v0.1 op=session.(Repository).ConnectConnection
2022-01-05T07:46:17.666Z [ERROR] error event: error="session.(Repository).ConnectConnection: session.fetchConnectionStates: db.SearchWhere: unknown, unknown: error #0: context canceled" error_fields:Code=unknown error_fields:Msg="" error_fields:Op=session.(Repository).ConnectConnection error_fields:Wrapped="map[Code:unknown Msg: Op:session.fetchConnectionStates Wrapped:map[Code:unknown Msg: Op:db.SearchWhere Wrapped:context canceled]]" id=e_UL1Tjy2BDb version=v0.1 op=session.(Repository).ConnectConnection request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection request_info:id=gtraceid_6TJkH2PfRZUFcuavJ8aM
2022-01-05T07:46:17.666Z [ERROR] error event: error="session.(Repository).ConnectConnection: db.DoTx: unknown, unknown: error #0: sql: transaction has already been committed or rolled back" error_fields:Code=unknown error_fields:Msg="" error_fields:Op=session.(Repository).ConnectConnection error_fields:Wrapped="map[Code:unknown Msg: Op:db.DoTx Wrapped:sql: transaction has already been committed or rolled back]" id=e_UL1Tjy2BDb version=v0.1 op=session.(Repository).ConnectConnection request_info:id=gtraceid_6TJkH2PfRZUFcuavJ8aM request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection
2022-01-05T07:46:17.666Z [ERROR] error event: error_fields:Code=unknown error_fields:Msg="" error_fields:Op=session.(Repository).ConnectConnection error_fields:Wrapped="map[Code:unknown Msg: Op:db.DoTx Wrapped:sql: transaction has already been committed or rolled back]" id=e_cerA6Lzl4c version=v0.1 op=session.(Repository).ConnectConnection request_info:id=gtraceid_sgltIGrlE95mcS3s7Bu5 request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection error="session.(Repository).ConnectConnection: db.DoTx: unknown, unknown: error #0: sql: transaction has already been committed or rolled back"
2022-01-05T07:46:32.368Z [ERROR] encountered an error sending an error event:
error:=
| 5 errors occurred:
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: reached max of 3: too many retries
|
2022-01-05T07:46:32.369Z [ERROR] event.WriteError: event.(Eventer).writeError: 5 errors occurred:
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: reached max of 3: too many retries
2022-01-05T07:46:32.369Z [ERROR] event.WriteError: unable to write error: session.(Repository).CloseDeadConnectionsForWorker: db.Exec: unknown, unknown: error #0: timeout: context deadline exceeded
2022-01-05T07:46:32.523Z [ERROR] encountered an error sending an error event:
error:=
| 5 errors occurred:
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: reached max of 3: too many retries
|
2022-01-05T07:46:32.523Z [ERROR] event.WriteError: event.(Eventer).writeError: 5 errors occurred:
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: reached max of 3: too many retries
2022-01-05T07:46:32.524Z [ERROR] event.WriteError: unable to write error: session.(Repository).CloseDeadConnectionsForWorker: db.DoTx: unknown, unknown: error #0: sql: transaction has already been committed or rolled back
2022-01-05T07:46:32.543Z [ERROR] error event: error_fields:Code=unknown error_fields:Msg="" error_fields:Op=session.(Repository).ConnectConnection error_fields:Wrapped="map[Code:unknown Msg: Op:db.Update Wrapped:map[Code:unknown Msg: Op:db.lookupAfterWrite Wrapped:map[Code:unknown Msg: Op:db.LookupById Wrapped:timeout: context canceled]]]" id=e_iY5khR2Iub version=v0.1 op=session.(Repository).ConnectConnection request_info:id=gtraceid_8z1hlE6J3SbVc15yUiTN request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection error="session.(Repository).ConnectConnection: db.Update: db.lookupAfterWrite: db.LookupById: unknown, unknown: error #0: timeout: context canceled"
2022-01-05T07:46:32.544Z [ERROR] error event: error="session.(Repository).ConnectConnection: db.DoTx: unknown, unknown: error #0: sql: transaction has already been committed or rolled back" error_fields:Msg="" error_fields:Op=session.(Repository).ConnectConnection error_fields:Wrapped="map[Code:unknown Msg: Op:db.DoTx Wrapped:sql: transaction has already been committed or rolled back]" error_fields:Code=unknown id=e_iY5khR2Iub version=v0.1 op=session.(Repository).ConnectConnection request_info:method=/controller.servers.services.v1.SessionService/ConnectConnection request_info:id=gtraceid_8z1hlE6J3SbVc15yUiTN
2022-01-05T07:46:39.874Z [ERROR] encountered an error sending an error event:
error:=
.....
On the worker side I have similar logs:
2022-01-05T07:57:50.391Z [ERROR] event.WriteError: unable to write error: status error grace period has expired, canceling all sessions on worker
2022-01-05T07:57:50.391Z [INFO] system event: version=v0.1 op=worker.(Worker).cleanupConnections data:session_id=s_cSDF88rUIR data:connection_id=sc_XT6JbMGssl data:msg="terminated connection due to cancellation or expiration"
2022-01-05T07:57:50.391Z [INFO] system event: version=v0.1 op=worker.(Worker).cleanupConnections data:session_id=s_Y8R1f0BbHC data:connection_id=sc_0qtJjeQAEs data:msg="terminated connection due to cancellation or expiration"
2022-01-05T07:57:50.391Z [ERROR] error event: error="error marking connection as connected: rpc error: code = Canceled desc = context canceled" id=e_swQCtfM1fZ version=v0.1 op=worker.(Worker).handleProxy info:session_id=s_Y8R1f0BbHC info:endpoint=tcp://minio:9000 info:msg="error handling proxy"
2022-01-05T07:57:50.391Z [INFO] system event: version=v0.1 op=worker.(Worker).cleanupConnections data:session_id=s_Y8R1f0BbHC data:connection_id=sc_tgtzYwGXWH data:msg="terminated connection due to cancellation or expiration"
2022-01-05T07:57:50.391Z [INFO] system event: version=v0.1 op=worker.(Worker).cleanupConnections data:session_id=s_Y8R1f0BbHC data:connection_id=sc_8hpuyvaBuv data:msg="terminated connection due to cancellation or expiration"
2022-01-05T07:57:50.391Z [INFO] system event: op=worker.(Worker).cleanupConnections data:session_id=s_apiHjEz0MZ data:connection_id=sc_vdX9KjJedu data:msg="terminated connection due to cancellation or expiration" version=v0.1
2022-01-05T07:57:50.391Z [INFO] system event: version=v0.1 op=worker.(Worker).cleanupConnections data:msg="terminated connection due to cancellation or expiration" data:session_id=s_apiHjEz0MZ data:connection_id=sc_ICKWDQcssi
2022-01-05T07:57:50.392Z [ERROR] error event: id=e_SAzzTGQS1t version=v0.1 op=worker.(Worker).handleProxy info:session_id=s_apiHjEz0MZ info:endpoint=tcp://minio:9000 info:msg="error handling proxy" error="error marking connection as connected: rpc error: code = Canceled desc = context canceled"
2022-01-05T07:57:50.541Z [ERROR] error event: error="could not find session ID "s_r30xHLEI78" in local state after closing connections" id=e_iMCCpP1Wnq version=v0.1 op=session.CloseConnections info:msg="error marking connection closed in state"
2022-01-05T07:57:50.605Z [ERROR] error event: version=v0.1 op=session.CloseConnections info:msg="error marking connection closed in state" error="could not find session ID "s_r30xHLEI78" in local state after closing connections" id=e_W9JKYjtQf8
2022-01-05T07:57:50.657Z [ERROR] error event: id=e_Ke2oamVehD version=v0.1 op=session.CloseConnections info:msg="error marking connection closed in state" error="could not find session ID "s_r30xHLEI78" in local state after closing connections"
2022-01-05T07:57:57.753Z [ERROR] error event: id=e_nzIWC8Nplj version=v0.1 op=worker.(Worker).sendWorkerStatus info:msg="error making status request to controller" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
2022-01-05T07:57:57.765Z [ERROR] error event: error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" id=e_nzIWC8Nplj version=v0.1 op=worker.(Worker).sendWorkerStatus info:msg="error making status request to controller"
2022-01-05T07:57:57.789Z [ERROR] error event: id=e_nzIWC8Nplj version=v0.1 op=worker.(Worker).sendWorkerStatus info:msg="error making status request to controller" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
2022-01-05T07:57:57.828Z [ERROR] error event: op=worker.(Worker).sendWorkerStatus info:msg="error making status request to controller" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" id=e_nzIWC8Nplj version=v0.1
2022-01-05T07:57:57.880Z [ERROR] encountered an error sending an error event:
error:=
| 5 errors occurred:
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: reached max of 3: too many retries
|
2022-01-05T07:57:57.881Z [ERROR] event.WriteError: event.(Eventer).writeError: 5 errors occurred:
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: reached max of 3: too many retries
2022-01-05T07:57:57.881Z [ERROR] event.WriteError: unable to write error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2022-01-05T07:57:57.881Z [ERROR] error event: error="status error grace period has expired, canceling all sessions on worker" id=e_6tLK728ACa version=v0.1 op=worker.(Worker).sendWorkerStatus info:last_status_time="2022-01-05 07:42:14.841128169 +0000 UTC m=+0.035583208" info:grace_period=15s
2022-01-05T07:57:57.890Z [ERROR] error event: error="status error grace period has expired, canceling all sessions on worker" id=e_6tLK728ACa version=v0.1 op=worker.(Worker).sendWorkerStatus info:last_status_time="2022-01-05 07:42:14.841128169 +0000 UTC m=+0.035583208" info:grace_period=15s
2022-01-05T07:57:57.906Z [ERROR] error event: id=e_6tLK728ACa version=v0.1 op=worker.(Worker).sendWorkerStatus info:grace_period=15s info:last_status_time="2022-01-05 07:42:14.841128169 +0000 UTC m=+0.035583208" error="status error grace period has expired, canceling all sessions on worker"
2022-01-05T07:57:57.951Z [ERROR] error event: info:last_status_time="2022-01-05 07:42:14.841128169 +0000 UTC m=+0.035583208" info:grace_period=15s error="status error grace period has expired, canceling all sessions on worker" id=e_6tLK728ACa version=v0.1 op=worker.(Worker).sendWorkerStatus
2022-01-05T07:57:58.046Z [ERROR] encountered an error sending an error event:
error:=
| 5 errors occurred:
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: event not written to enough sinks
| * event.(Eventer).retrySend: reached max of 3: too many retries
|
2022-01-05T07:57:58.046Z [ERROR] event.WriteError: event.(Eventer).writeError: 5 errors occurred:
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: event not written to enough sinks
* event.(Eventer).retrySend: reached max of 3: too many retries
2022-01-05T07:57:58.046Z [ERROR] event.WriteError: unable to write error: status error grace period has expired, canceling all sessions on worker
2022-01-05T07:57:58.046Z [INFO] system event: version=v0.1 op=worker.(Worker).cleanupConnections data:session_id=s_cSDF88rUIR data:connection_id=sc_3Huj82F6Fy data:msg="terminated connection due to cancellation or expiration"
2022-01-05T07:57:58.047Z [ERROR] error event: id=e_pF0m45MQUa version=v0.1 op=worker.(Worker).handleProxy info:session_id=s_cSDF88rUIR info:endpoint=tcp://minio:9000 info:msg="error handling proxy" error="error marking connection as connected: rpc error: code = Canceled desc = context canceled"
2022-01-05T07:57:58.047Z [INFO] system event: version=v0.1 op=worker.(Worker).cleanupConnections data:msg="terminated connection due to cancellation or expiration" data:session_id=s_cSDF88rUIR data:connection_id=sc_C4hhrI4Ozv
2022-01-05T07:57:58.047Z [INFO] system event: data:session_id=s_Ma0N94Y7PK data:connection_id=sc_XAXwPPCUqb data:msg="terminated connection due to cancellation or expiration" version=v0.1 op=worker.(Worker).cleanupConnections
2022-01-05T07:57:58.047Z [ERROR] error event: error="error marking connection as connected: rpc error: code = Canceled desc = context canceled" id=e_moRNR6GVCK version=v0.1 op=worker.(Worker).handleProxy info:msg="error handling proxy" info:session_id=s_Ma0N94Y7PK info:endpoint=tcp://minio:9000
2022-01-05T07:57:58.047Z [ERROR] error event: version=v0.1 op=worker.(Worker).handleProxy info:session_id=s_Ma0N94Y7PK info:endpoint=tcp://minio:9000 info:msg="error handling proxy" error="error marking connection as connected: rpc error: code = Canceled desc = context canceled" id=e_qAHZNQG1Jc
2022-01-05T07:57:58.047Z [INFO] system event: op=worker.(Worker).cleanupConnections data:session_id=s_Ma0N94Y7PK data:connection_id=sc_9l0nPWfulY data:msg="terminated connection due to cancellation or expiration" version=v0.1
2022-01-05T07:57:58.047Z [INFO] system event: version=v0.1 op=worker.(Worker).cleanupConnections data:session_id=s_Ma0N94Y7PK data:connection_id=sc_aUude9EOyz data:msg="terminated connection due to cancellation or expiration"
2022-01-05T07:57:58.047Z [INFO] system event: version=v0.1 op=worker.(Worker).cleanupConnections data:session_id=s_r30xHLEI78 data:connection_id=sc_9VceKSkPBf data:msg="terminated connection due to cancellation or expiration"
2022-01-05T07:57:58.214Z [ERROR] error event: id=e_1cduNMyIff version=v0.1 op=session.CloseConnections info:msg="error marking connection closed in state" error="could not find session ID "s_Y8R1f0BbHC" in local state after closing connections"
2022-01-05T07:57:58.215Z [ERROR] error event: error="could not find session ID "s_Y8R1f0BbHC" in local state after closing connections" id=e_l8HRS0JgnX version=v0.1 op=session.CloseConnections info:msg="error marking connection closed in state"
2022-01-05T07:57:58.241Z [ERROR] error event: info:msg="error marking connection closed in state" error="could not find connection ID "sc_8uJi4FBqar" for session ID "s_apiHjEz0MZ" in local state after closing connections" id=e_8XTtbHlsR8 version=v0.1 op=session.CloseConnections
2022-01-05T07:58:05.350Z [ERROR] error event: error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" id=e_YL8putv1hy version=v0.1 op=worker.(Worker).sendWorkerStatus info:msg="error making status request to controller"
2022-01-05T07:58:05.357Z [ERROR] error event: info:msg="error making status request to controller" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" id=e_YL8putv1hy version=v0.1 op=worker.(Worker).sendWorkerStatus
2022-01-05T07:58:05.381Z [ERROR] error event: info:msg="error making status request to controller" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" id=e_YL8putv1hy version=v0.1 op=worker.(Worker).sendWorkerStatus
2022-01-05T07:58:05.421Z [ERROR] error event: id=e_YL8putv1hy version=v0.1 op=worker.(Worker).sendWorkerStatus info:msg="error making status request to controller" error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
2022-01-05T07:58:05.521Z [ERROR] encountered an error sending an error event:
Steps to reproduce the behavior:
Expected behavior
Connections can be established and data transfer is stable across time
Additional context
Boundary version: 0.7.3
Postgres: 12
Boundary server: 4CPU core/8GB RAM
Boundary worker: 4CPU core/8GB RAM
Postgres: 8 CPU Cores/8GB RAM, high iops disks
The text was updated successfully, but these errors were encountered: