Skip to content

feat(vdev): do not terminate before child on SIGINT #22906

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fbs
Copy link

@fbs fbs commented Apr 17, 2025

The SIGINT(control-c) is sent to both the parent and child, as they're in the same group. The vdev process quits immediately, as its just waiting on the child. However the child will go through its shutdown process, which by default takes a minute if it cannot cleanly shutdown. During this minute your controlling terminal gets flooded with shutdown messages.
The way to stop it is to kill vector again, e.g. pkill vector but its annoying and could kill other vector instances.

This lets the vdev ignore the SIGINT and just rely on the parent to terminate after the child did.

Its currently only implemented on unix using the libc crate. Vector uses the nix crate, but that seemed a bit overkill for 2 lines.

Summary

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

Tested in terminal.

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • The CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
      • ./scripts/check_changelog_fragments.sh
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

References

The SIGINT(control-c) is sent to both the parent and child, as they're
in the same group. The `vdev` process quits immediately, as its just
waiting on the child. However the child will go through its shutdown
process, which by default takes a minute if it cannot cleanly shutdown.
During this minute your controlling terminal gets flooded with shutdown
messages.

The way to stop it is to `kill` vector again, e.g. `pkill vector` but
its annoying and could kill other vector instances.

This lets the vdev ignore the SIGINT and just rely on the parent to
terminate after the child did.

Its currently only implemented on unix using the libc crate. Vector uses
the nix crate, but that seemed a bit overkill for 2 lines.
@fbs fbs requested a review from a team as a code owner April 17, 2025 23:36
@github-actions github-actions bot added the domain: vdev Anything related to the vdev tooling label Apr 17, 2025
@pront
Copy link
Member

pront commented Apr 18, 2025

Can you share an example of how you are using vdev and the messages that flood your terminal?

@fbs
Copy link
Author

fbs commented Apr 19, 2025

Let me try to explain in a bit more detail (hope i got the details of processgroups and terminals right)

Starting point is an unreachable sink destination:

sources:
  demo:
    type: demo_logs
    format: json

sinks:
  vector:
    type: vector
    inputs:
    - demo
    address: http://127.0.0.1:4545

If you run this and want to quickly exit you need to send 2 SIGINTS. The first one enters the 'clean shutdown', the 2nd forces it.

$ vector -c broken.yaml
2025-04-19T08:32:34.295774Z ERROR vector::topology::builder: msg="Healthcheck failed." error=Request failed: status: Unavailable, message: "error trying to connect: ...
^C
025-04-19T08:32:35.371397Z  INFO vector::signal: Signal received. signal="SIGINT"
2025-04-19T08:32:35.371693Z  INFO vector: Vector has stopped.
2025-04-19T08:32:35.372855Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="demo, vector" time_remaining="59 seconds left"
2025-04-19T08:32:35.406564Z  WARN sink{component_kind="sink" component_id=vector component_type=vector}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] is being suppressed to avoid flooding.
^C
2025-04-19T08:32:36.483711Z  INFO vector::signal: Signal received. signal="SIGINT"
2025-04-19T08:32:36.483843Z  INFO vector: Vector has quit.

When running with vdev run broken.yaml vdev will span the child target/debug/vector -c broken.yaml and will block on it until it terminates.

if you look at the process hierarchy you'll see:

USER    PID  PPID  PGID   SESS JOBC STAT   TT       TIME COMMAND
bsmit 78347 74486 78347      0    1 S+   s000    0:00.01 vdev run broken.yaml
bsmit 78359 78347 78347      0    1 S+   s000    0:02.63 target/debug/vector --config broken.yaml

as they're in the same PGID/process group the SIGINT that the terminal sends on ^c is sent to both and both are asked to terminate. However vdev will terminate right away, as it has no SIGINT handler, while vector will enter its shutdown loop. With vdev gone vector will reparent, but is still connected to the terminal and running.

this can be seen with ps in the 3 stages:

# before the target is running, notice cargo is a child of vdev
 $ ps -j |grep -E 'PP|vector|vdev'
USER    PID  PPID  PGID   SESS JOBC STAT   TT       TIME COMMAND
bsmit 79012 74486 79012      0    1 S+   s000    0:00.01 vdev run broken.yaml
bsmit 79024 79012 79012      0    1 S+   s000    0:00.91 cargo run --no-default-features --features sinks-vector,sources-demo_logs -- --config broken.yaml
bsmit 79165 79025 79164      0    2 R+   s001    0:00.00 ggrep -E PP|vector|vdev

# here, cargo `exec`d into `vector`, 
github/vectordotdev/vector vdevsignal[$✗?] $ ps -j |grep -E 'PP|vector|vdev'
USER    PID  PPID  PGID   SESS JOBC STAT   TT       TIME COMMAND
bsmit 79012 74486 79012      0    1 S+   s000    0:00.01 vdev run broken.yaml
bsmit 79024 79012 79012      0    1 S+   s000    0:02.17 target/debug/vector --config broken.yaml
bsmit 79188 79025 79187      0    2 S+   s001    0:00.00 ggrep -E PP|vector|vdev

# SIGINT was sent, `vdev` is gone and `vector` has a new parent (PPID)
github/vectordotdev/vector vdevsignal[$✗?] $ ps -j |grep -E 'PP|vector|vdev'
USER    PID  PPID  PGID   SESS JOBC STAT   TT       TIME COMMAND
bsmit 79024     1 79012      0    0 S    s000    0:02.25 target/debug/vector --config broken.yaml
bsmit 79313 79025 79312      0    2 S+   s001    0:00.00 ggrep -E PP|vector|vdev

the effect in the terminal:

Screenshot 2025-04-19 at 10 48 45

An alternative is to exec without fork, but then you loose the option to run code post execution, it also seemed like a bigger structural change.

@fbs
Copy link
Author

fbs commented Apr 19, 2025

actually typing this out made me realized I miss one case:

If you kill vdev by sending a signal instead of through the terminal, the signal isn't propagated from vdev to the child, so vdev will exit but vector will stay running.

(os error 61)", details: [], metadata: MetadataMap { headers: {} } internal_log_rate_limit=true
zsh: terminated  vdev run broken.yaml

❌143 github/vectordotdev/vector vdevsignal[$✗?] $
2025-04-19T10:07:00.417440Z  WARN sink{component_kind="sink" component_id=vector component_type=vector}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=Request failed: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 61)", details: [], metadata: MetadataMap { headers: {} } internal_log_rate_limit=true

github/vectordotdev/vector vdevsignal[$✗?] $ echo test
test

github/vectordotdev/vector vdevsignal[$✗?] $ ps aux | grep vector
bsmit            81680   0.1  0.3 411589888  51168 s000  S    12:05PM   0:01.35 target/debug/vector --config broken.yaml

To fix that vdev needs a signal handler that propagates signals to the child. Not sure thats worth the effort as afaik vdev is meant for interactive use

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
domain: vdev Anything related to the vdev tooling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants