Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

k0s error "an instance of k0s is already running" but it is not running #5399

Closed
4 tasks done
peterhoneder opened this issue Jan 3, 2025 · 1 comment · Fixed by #5435
Closed
4 tasks done

k0s error "an instance of k0s is already running" but it is not running #5399

peterhoneder opened this issue Jan 3, 2025 · 1 comment · Fixed by #5435
Labels
bug Something isn't working

Comments

@peterhoneder
Copy link

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

actually OpenWRT, but overall reproducible on multiple platforms, just more rarely, e.g. here:
Linux 6.8.0-51-generic #52-Ubuntu SMP PREEMPT_DYNAMIC Thu Dec  5 13:32:09 UTC 2024 aarch64 GNU/Linux
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

Version

v1.31.1+k0s.0-448-g95fc4b1de

Sysinfo

`k0s sysinfo`
Total memory: 12.8 GiB (pass)
File system of /var/lib/k0s: ext4 (pass)
Disk space available for /var/lib/k0s: 55.8 GiB (pass)
Relative disk space available for /var/lib/k0s: 47% (pass)
Name resolution: localhost: [127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.8.0-51-generic (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": unknown (warning: insufficient permissions, try with elevated permissions)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

When running k0s in one of our deployments, we noticed that during startup with an existing runtime config file, it thinks it's running, but it isn't since the system just started up from scratch. The root cause is, that the detection in the linux runtime works by checking if the pid from the runtime config is a running process on the system, but not if the pid is actually the same executable image.

So what happens is, that if any other process during startup takes up that pid value from the runtime config, k0s thinks it is already running although it isn't.

Steps to reproduce

  1. boot system (with very few processes, as few as possible)
  2. start k0s -> will start e.g. under pid X
  3. reboot
  4. some other system process starts and runs under pid X
  5. start k0s -> will fail because some other process runs under pid X -> k0s shows error "an instance of k0s is already running"

Expected behavior

k0s should just start without erroring that it is already running

Actual behavior

k0s shows error "an instance of k0s is already running"

Screenshots and logs

No response

Additional context

No response

@twz123
Copy link
Member

twz123 commented Jan 10, 2025

When running k0s in one of our deployments, we noticed that during startup with an existing runtime config file, it thinks it's running, but it isn't since the system just started up from scratch. The root cause is, that the detection in the linux runtime works by checking if the pid from the runtime config is a running process on the system, but not if the pid is actually the same executable image.

I see you're using OpenWRT. The runtime config file is stored in /run, which is usually a tmpfs and won't survive reboots. Maybe that's not the case with OpenWRT? K0s deletes the file when it exits, unless if it is e.g. forcibly killed or the computer is unplugged while running, which can happen with devices like routers.

So what happens is, that if any other process during startup takes up that pid value from the runtime config, k0s thinks it is already running although it isn't.

The good ol' problem with PID file races. This has been addressed in in other parts of k0s already, but it's still not solved in the case you just described.

To mitigate this, you could try to make /run a tmpfs, or to have some init script running on startup before k0s is started which deletes the runtime config.

ncopa added a commit to ncopa/k0s that referenced this issue Jan 14, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This also solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 14, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This also solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 14, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This also solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 14, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This also solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 16, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 16, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 16, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 17, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 17, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 17, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 17, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 17, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 20, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 20, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 20, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 20, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 20, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 22, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
ncopa added a commit to ncopa/k0s that referenced this issue Jan 22, 2025
Use lock file and flock(2) to ensure there is only a single instance of
k0s running. This is more reliable than storing the pid in the runtime
config.

This solves false positives with k0s runtime config leftovers.

Fixes: k0sproject#5399
Signed-off-by: Natanael Copa <ncopa@mirantis.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
2 participants