Description
Describe the bug
When using aws ec2-instance-connect open-tunnel
as a pipe, after a successful connection happens and parent process closes the pipe, the aws
process continues to linger and uses a surprising amount of CPU. When attaching a debugger to the process, it appears stuck on a futex
syscall.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
The aws
process should terminate gracefully once the parent connection closes the pipe and exits.
Current Behavior
The process lingers after the parent process exits. (Note high CPU usage)
matt 93039 84.1 0.2 631504 87456 pts/6 SNl 13:10 0:03 aws --debug --region us-east-1 ec2-instance-connect open-tunnel --instance-id i-xxxx --private-ip-address 172.x.x.x --instance-connect-endpoint-id eice-xxxx --max-tunnel-duration 3600
In debug mode, it logs a lot of this to stderr in this state.
[DEBUG] [2025-03-05T13:11:22Z] [00007e1d716006c0] [websocket] - id=0x7e1d64017240: Enqueuing outgoing frame with opcode=2(binary) length=0 fin=T
[DEBUG] [2025-03-05T13:11:22Z] [00007e1d716006c0] [websocket] - id=0x7e1d64017240: Enqueuing outgoing frame with opcode=2(binary) length=0 fin=T
[DEBUG] [2025-03-05T13:11:22Z] [00007e1d716006c0] [websocket] - id=0x7e1d64017240: Enqueuing outgoing frame with opcode=2(binary) length=0 fin=T
[DEBUG] [2025-03-05T13:11:22Z] [00007e1d716006c0] [websocket] - id=0x7e1d64017240: Enqueuing outgoing frame with opcode=2(binary) length=0 fin=T
[DEBUG] [2025-03-05T13:11:22Z] [00007e1d716006c0] [websocket] - id=0x7e1d64017240: Enqueuing outgoing frame with opcode=2(binary) length=0 fin=T
[DEBUG] [2025-03-05T13:11:22Z] [00007e1d716006c0] [websocket] - id=0x7e1d64017240: Enqueuing outgoing frame with opcode=2(binary) length=0 fin=T
[DEBUG] [2025-03-05T13:11:22Z] [00007e1d716006c0] [websocket] - id=0x7e1d64017240: Enqueuing outgoing frame with opcode=2(binary) length=0 fin=T
Attaching to the process, we see:
strace: Process 93039 attached
futex(0x7ad7bf0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY
Also looking at ss -np
output, we can see AWS is still holding open established TCP connections for the websockets.
Reproduction Steps
I haven't found a general reproduction case outside of what I was attempting to use this command for. I was adding proxy_command support to Hashicorp's Terraform communicator (hashicorp/terraform#36643). Basically, if the aws process doesn't get a signal to cleanup, it will still continue to run with the websocket active. I got the desired behavior by ensuring the aws process receives a SIGHUP when cleaning up the connection but I also got the desired behavior by adding more cleanup to the websocket code in aws. PR forthcoming.
Possible Solution
Implement more cleanup to the websocket code.
Additional Information/Context
No response
CLI version used
2.24.17
Environment details (OS name and version, etc.)
Ubuntu 22.04.2