Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

mtr-exporter leaking defunct mtr and mtr-packet processes #24

Open
gberche-orange opened this issue Jan 9, 2025 · 9 comments
Open

mtr-exporter leaking defunct mtr and mtr-packet processes #24

gberche-orange opened this issue Jan 9, 2025 · 9 comments

Comments

@gberche-orange
Copy link
Contributor

Thanks again for sharing this work with the community. I'm observing that the mtr-exporter leaves many defunct zombies processes open. This reproduces with ghcr.io/mgumz/mtr-exporter:0.4.0 and the command /usr/bin/mtr-exporter -bind :8089 -- iperf-public-listener-r4-z1-1.mydomain.org 4 --tcp -P 443 --timeout 2 --gracetime 2 --no-dns

One one system, I'm observing 120k+ processes

ps -ef | wc -l
#> 121041

ps -ef | grep mtr | grep defunct | wc -l
#> 120492

ps -ef | head -n 100
#> [...]
#> root      135250  120141  0 Jan06 ?        00:00:00 [mtr-packet] <defunct>
#> root      135258  120116  0 Jan06 ?        00:00:00 [mtr-packet] <defunct>
#> root      135260  120141  0 Jan06 ?        00:00:00 [mtr] <defunct>       
#> root      135261  120348  0 Jan06 ?        00:00:00 [mtr-packet] <defunct>
#> root      135262  120532  0 Jan06 ?        00:00:00 [mtr-packet] <defunct>                                                                                                                                         

ps -f -p 135250  
#> UID          PID    PPID  C STIME TTY          TIME CMD
#> root      135250  120141  0 Jan06 ?        00:00:00 [mtr-packet] <defunct>

ps -f -p 12014
#> UID          PID    PPID  C STIME TTY          TIME CMD
#> root      120141  118627  0 Jan06 ?        00:00:27 /usr/bin/mtr-exporter -bind :8089 -- iperf-public-listener-r4-z1-1.mydomain.org 4 --tcp -P 443 --timeout 2 --gracetime 2 --no-dns

https://en.m.wikipedia.org/wiki/Zombie_process defines zombies/defunct as

On Unix and Unix-like computer operating systems, a zombie process or defunct process is a process that has completed execution (via the exit system call) but still has an entry in the process table: it is a process in the "terminated state". This occurs for the child processes, where the entry is still needed to allow the parent process to read its child's exit status: once the exit status is read via the wait system call, the zombie's entry is removed from the process table and it is said to be "reaped".
[...]
Zombies can hold handles to file descriptors, which prevents the space for those files from being available to the filesystem.

This eventually leads to resource exhaustion on the host with messages such as info: "mtr-exporter-cli" failed: fork/exec /usr/sbin/mtr: resource temporarily unavailable.

I'm suspecting the root cause in my case is the default schedule (60s) is too fast and mtr commands are sometimes created at an higher rate than they terminate, possibly hanging at times

mtr-exporter/README.md

Lines 67 to 68 in 6e852a7

-schedule <schedule>
schedule at which often mtr is launched (default: "@every 60s")

The options given to the mtr --timeout 2 --gracetime 2 --no-dns might trigger the mtr process to hang (race condition between timeout and gracetime)

Usage:
 mtr [options] hostname

 -F, --filename FILE        read hostname(s) from a file
 -4                         use IPv4 only
 -6                         use IPv6 only
 -u, --udp                  use UDP instead of ICMP echo
 -T, --tcp                  use TCP instead of ICMP echo
 -I, --interface NAME       use named network interface
 -a, --address ADDRESS      bind the outgoing socket to ADDRESS
 -f, --first-ttl NUMBER     set what TTL to start
 -m, --max-ttl NUMBER       maximum number of hops
 -U, --max-unknown NUMBER   maximum unknown host
 -P, --port PORT            target port number for TCP, SCTP, or UDP
 -L, --localport LOCALPORT  source port number for UDP
 -s, --psize PACKETSIZE     set the packet size used for probing
 -B, --bitpattern NUMBER    set bit pattern to use in payload
 -i, --interval SECONDS     ICMP echo request interval
 -G, --gracetime SECONDS    number of seconds to wait for responses
 -Q, --tos NUMBER           type of service field in IP header
 -e, --mpls                 display information from ICMP extensions
 -Z, --timeout SECONDS      seconds to keep probe sockets open
 -M, --mark MARK            mark each sent packet
 -r, --report               output using report mode
 -w, --report-wide          output wide report
 -c, --report-cycles COUNT  set the number of pings sent
 -j, --json                 output json
 -x, --xml                  output xml
 -C, --csv                  output comma separated values
 -l, --raw                  output raw format
 -p, --split                split output
 -t, --curses               use curses terminal interface
     --displaymode MODE     select initial display mode
 -n, --no-dns               do not resolve host names
 -b, --show-ips             show IP numbers and host names
 -o, --order FIELDS         select output fields
 -y, --ipinfo NUMBER        select IP information in output
 -z, --aslookup             display AS number
 -h, --help                 display this help and exit
 -v, --version              output version information and exit

Would it make sense to include a fail-safe mechanism that prevents running new mtr processes beyond a given nb of non completed commands ?

@mgumz
Copy link
Owner

mgumz commented Jan 9, 2025

@gberche-orange the failsafe mechanism is tini - the entrypoint of the container-image, see #17 and https://github.com/krallin/tini?tab=readme-ov-file#why-tini

So, by plan, tini should have reaped the zombies. you stated that you used the container: did you specify a different --entrypoint? The process hierarchy should reveal if tini is the parent of mtr-exporter

You might want to play https://github.com/krallin/tini?tab=readme-ov-file#using-tini … it is built into docker already, maybe worth a try.

@mgumz
Copy link
Owner

mgumz commented Jan 9, 2025

I can see the potential need to use https://pkg.go.dev/os/exec#CommandContext with killing the "previous" mtr-process automatically "before" the new one is about to be launched. something like if "schedule" is "@every 60s", the previous mtr-run should be forced to be timed out around 59s or alike.

Wont help with the zombies though I assume.

@gberche-orange
Copy link
Contributor Author

thanks a lot @mgumz for your prompt and detailed answer, and sorry I had missed #17

did you specify a different --entrypoint?

Indeed I had modified the whole entry points instead of just adding the args to it (in k8s using the args entry and leaving the cmdundefined, see https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1Container.md

args: Arguments to the entrypoint. The container image's CMD is used if this is not provided.

Thanks for the hint related to tini that I'm currently testing.

Reading tini's documentation and in particular krallin/tini#8 (comment)

Zombie processes are processes that:
* Have exited.
* Were not waited on by their parent process (wait is the syscall parent processes use to retrieve the exit code of their children).
* Have lost their parent (i.e. their parent exited as well), which means they'll never be waited on by their parent.

When a zombie is created (i.e. which happens when its parent exits, and therefore all chances of it ever being waited by it are gone), it is reparent to init, which is expected to reap it (which means calling wait on it).

In other words, someone has to clean up after "irresponsible" parents that leave their children un-wait'ed, and that's PID 1's job.

That's what Tini does, and is something the JVM (which is what runs when you do exec java ...) does not do, which his why you don't want to run Jenkins as PID 1.

I'm not sure why this would be the case with mtr-exporter as Cmd.run() is called (which explicitly calls Cmd.wait()), and from above traces was still alive and parent of the defunct processes.

if err := cmd.Run(); err != nil {
return err

https://pkg.go.dev/os/exec#Cmd.Run

Run starts the specified command and waits for it to complete.

https://cs.opensource.google/go/go/+/master:src/os/exec/exec.go;l=622-627

func (c *Cmd) Run() error {
	if err := c.Start(); err != nil {
		return err
	}
	return c.Wait()
}

https://pkg.go.dev/os/exec#Cmd.Wait

Wait releases any resources associated with the Cmd.

I'll report though if the tiny workaround isn't sufficient.

I can see the potential need to use https://pkg.go.dev/os/exec#CommandContext with killing the "previous" mtr-process automatically "before" the new one is about to be launched. something like if "schedule" is "@every 60s", the previous mtr-run should be forced to be timed out around 59s or alike.

Wont help with the zombies though I assume.

Rather than killing the previous mtr process, could mtr-exporter wait for it to complete ? This approach expect that the mtr arguments (timeouts remated) are properly configured to prevent an infinite hang.

This approach would preserve the mtr data result parsing and would handle cases where the default 60s schedule is unexpectedly shorter than the mtr response time (cases like network outages).

@mgumz
Copy link
Owner

mgumz commented Jan 10, 2025

@gberche-orange: via #21, @clavinjune is kind of also working on k8s deployment (which end up as helm chart). In that, he will also tackle the issue of using either command (docker "ENTRYPOINT") or args (docker "CMD"). So, tini really should remain as entrypoint/command start in order to reap all the spawned childs - successfully exited or timed out etc.

Rather than killing the previous mtr process, could mtr-exporter wait for it to complete ?

Yes, doable. The interesting piece then becomes the design of the <schedule> parameter. Coz @every 60s becomes then something @every 60s or as-soon-as-the-previous-job-finally-get-its-act-together.

@clavinjune
Copy link
Contributor

yes, in our environment (production) we're also utilizing tini in order to eliminate zombie process and it is working well

@mgumz
Copy link
Owner

mgumz commented Jan 20, 2025

@gberche-orange all settled?

@gberche-orange
Copy link
Contributor Author

Thanks @mgumz for your support on this issue. The use of tini as command indeed solved the symptom in my environment.

Rather than killing the previous mtr process, could mtr-exporter wait for it to complete ?

Yes, doable. The interesting piece then becomes the design of the <schedule> parameter. Coz @every 60s becomes then something @every 60s or as-soon-as-the-previous-job-finally-get-its-act-together.

What problem would this documentation at most @every 60s, or as-soon-as-the-previous-mtr-completes cause to users ?

Can you detail what's the current behavior of the exporter in face of parallel overlapping mtr executions N-1 and N , and responses to the /metrics endpoint ?

@mgumz
Copy link
Owner

mgumz commented Jan 20, 2025

@gberche-orange

parallel execution: an external process is started and runs until it finishes. once that is done, the metrics for that run are fed into an internal datastructure and just presented as is in the /metrics output.

so, in theory: run-256 is launched, a network-situation occurs which delays / slows down this run-256. meanwhile, run-257 is triggered, finishes, the collector gets "recent updates", then run-256 finishes and "overwrites" the data of run-257.

also: if you scrape the prometheus endpoint every 30 minutes … and launch mtr every 60s … then you would see only the last of the 30 runs.

the feature-request for @every 60s or whenever the previous job finishes would ideally cause nothing to the users - if the expression of the schedule is designed "just right". @every 60s really means @every 60s - some people might want this behaviour. since the schedule is expressed in between those two "--" … something like @every 60s unless still running is doable.

gberche-orange added a commit to orange-cloudfoundry/mtr-exporter that referenced this issue Jan 22, 2025
@gberche-orange
Copy link
Contributor Author

Thanks @mgumz for your feedback !

I submitted #25 to help new users from hitting this issue until #21 or similar helm chart is automating this configuration for k8s users.

I believe the feature of a schedule @every 60s unless still running is beneficial to users

  • would natively implement backpressure protecting against resource exhaustion in the case the mtr command duration increases
  • would surface mtr command response time problems to users (currently AFAIK from your explaination this is hard to discover)
  • likely coupled with a max time to wait for the mtr command to complete (such as https://pkg.go.dev/os/exec#Cmd.WaitDelay )
  • if info logs help debug issues

This might deserve a distinct issue though

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants