Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

mount-s3-1.13.0.service Failed with result 'oom-kill' #1206

Open
nikitadom opened this issue Dec 23, 2024 · 3 comments
Open

mount-s3-1.13.0.service Failed with result 'oom-kill' #1206

nikitadom opened this issue Dec 23, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@nikitadom
Copy link

nikitadom commented Dec 23, 2024

Mountpoint for Amazon S3 version

mount-s3 1.13.0

AWS Region

Describe the running environment

Running on OpenStack Kubernetes Managed SaaS service Helm chart v.1.11.0 https://github.com/awslabs/mountpoint-s3-csi-driver/tree/main/charts/aws-mountpoint-s3-csi-driver.
OS: linux amd64
OS Image: Ubuntu 22.04.5 LTS
Kernel version: 5.15.0-124-generic
Container runtime: containerd://1.7.22
Kubelet version: v1.29.9
AWS credentials of IAM User provided via k8s secrets.

Mountpoint options

Mount Options
allow-delete, allow-other, region eu-west-2, prefix static/

What happened?

Pod can not start because of mount volume setup failed:

MountVolume.SetUp failed for volume "s3-pv" : rpc error: code = Internal desc = Could not mount "k8s-s3-static-files" at "/var/lib/kubelet/pods/d72649b9-a67b-45fd-ab49-9ffe7764ca89/volumes/kubernetes.io~csi/s3-pv/mount": Mount failed: Failed to start systemd unit, context cancelled output:

Relevant log output

Dec 23 21:21:21 systemd[1]: Starting Mountpoint for Amazon S3 CSI driver FUSE daemon...
Dec 23 21:21:28 kernel: mount-s3 invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Dec 23 21:21:28 kernel: CPU: 2 PID: 2110387 Comm: mount-s3 Not tainted 5.15.0-124-generic #134-Ubuntu
Dec 23 21:21:28 kernel: Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Dec 23 21:21:28 kernel: Call Trace:
Dec 23 21:21:28 kernel:  <TASK>
Dec 23 21:21:28 kernel:  show_stack+0x52/0x5c
Dec 23 21:21:28 kernel:  dump_stack_lvl+0x4a/0x63
Dec 23 21:21:28 kernel:  dump_stack+0x10/0x16
Dec 23 21:21:28 kernel:  dump_header+0x53/0x228
Dec 23 21:21:28 kernel:  oom_kill_process.cold+0xb/0x10
Dec 23 21:21:28 kernel:  out_of_memory+0x106/0x2e0
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  mem_cgroup_out_of_memory+0x13f/0x160
Dec 23 21:21:28 kernel:  try_charge_memcg+0x687/0x740
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  ? kernel_init_free_pages.part.0+0x4a/0x70
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  ? get_page_from_freelist+0x353/0x540
Dec 23 21:21:28 kernel:  charge_memcg+0x45/0xb0
Dec 23 21:21:28 kernel:  __mem_cgroup_charge+0x2d/0x90
Dec 23 21:21:28 kernel:  __add_to_page_cache_locked+0x2d8/0x350
Dec 23 21:21:28 kernel:  ? scan_shadow_nodes+0x40/0x40
Dec 23 21:21:28 kernel:  add_to_page_cache_lru+0x4d/0xd0
Dec 23 21:21:28 kernel:  pagecache_get_page+0x192/0x590
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  ? page_cache_ra_unbounded+0x163/0x210
Dec 23 21:21:28 kernel:  filemap_fault+0x488/0xab0
Dec 23 21:21:28 kernel:  ? srso_alias_return_thunk+0x5/0x7f
Dec 23 21:21:28 kernel:  ? filemap_map_pages+0x309/0x400
Dec 23 21:21:28 kernel:  __do_fault+0x3c/0x120
Dec 23 21:21:28 kernel:  do_read_fault+0xeb/0x160
Dec 23 21:21:28 kernel:  do_fault+0xa0/0x2e0
Dec 23 21:21:28 kernel:  handle_pte_fault+0x1cd/0x240
Dec 23 21:21:28 kernel:  __handle_mm_fault+0x405/0x6f0
Dec 23 21:21:28 kernel:  handle_mm_fault+0xd8/0x2c0
Dec 23 21:21:28 kernel:  do_user_addr_fault+0x1c9/0x640
Dec 23 21:21:28 kernel:  exc_page_fault+0x77/0x170
Dec 23 21:21:28 kernel:  asm_exc_page_fault+0x27/0x30
Dec 23 21:21:28 kernel: RIP: 0033:0x55f2fc0ea4ed
Dec 23 21:21:28 kernel: Code: Unable to access opcode bytes at RIP 0x55f2fc0ea4c3.
Dec 23 21:21:28 kernel: RSP: 002b:00007f299e9d5790 EFLAGS: 00010246
Dec 23 21:21:28 kernel: RAX: 000055f2fc2cbf34 RBX: 00007f299e9d57f0 RCX: 00007f2998000030
Dec 23 21:21:28 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000594
Dec 23 21:21:28 kernel: RBP: 00007f299e9d5790 R08: 0000000000000000 R09: 00007f2998000d50
Dec 23 21:21:28 kernel: R10: 0000000000000077 R11: 00007f2998000090 R12: 0000000000000001
Dec 23 21:21:28 kernel: R13: 00007f299e9d5ad8 R14: 00000000000040fb R15: 00007f299e9d57e8
Dec 23 21:21:28 kernel:  </TASK>
Dec 23 21:21:28 kernel: memory: usage 393216kB, limit 393216kB, failcnt 65040923
Dec 23 21:21:28 kernel: swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Dec 23 21:21:28 kernel: Memory cgroup stats for /system.slice:
Dec 23 21:21:28 kernel: anon 375468032
                                                   file 5414912
                                                   kernel_stack 1343488
                                                   pagetables 3022848
                                                   percpu 1196608
                                                   sock 0
                                                   shmem 1507328
                                                   file_mapped 3399680
                                                   file_dirty 0
                                                   file_writeback 0
                                                   swapcached 0
                                                   anon_thp 0
                                                   file_thp 0
                                                   shmem_thp 0
                                                   inactive_anon 356020224
                                                   active_anon 1437696
                                                   inactive_file 372736
                                                   active_file 139264
                                                   unevictable 22913024
                                                   slab_reclaimable 5072048
                                                   slab_unreclaimable 9662144
                                                   slab 14734192
                                                   workingset_refault_anon 0
                                                   workingset_refault_file 68014397
                                                   workingset_activate_anon 0
                                                   workingset_activate_file 1696685
                                                   workingset_restore_anon 0
                                                   workingset_restore_file 366767
                                                   workingset_nodereclaim 105839
                                                   pgfault 78387338
                                                   pgmajfault 2230983
                                                   pgrefill 389283562
                                                   pgscan 3739026161
                                                   pgsteal 76549324
                                                   pgactivate 388064783
                                                   pgdeactivate 388780914
                                                   pglazyfree 0
                                                   pglazyfreed 0
                                                   thp_fault_alloc 0
                                                   thp_collapse_alloc 0
Dec 23 21:21:28 kernel: Tasks state (memory values in pages):
Dec 23 21:21:28 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Dec 23 21:21:28 kernel: [    864]     0   864     1555      221    49152        0             0 agetty
Dec 23 21:21:28 kernel: [    449]     0   449    19290     3956   176128        0          -250 systemd-journal
Dec 23 21:21:28 kernel: [    480]     0   480    72338     6905   114688        0         -1000 multipathd
Dec 23 21:21:28 kernel: [    487]     0   487     6405     1127    69632        0         -1000 systemd-udevd
Dec 23 21:21:28 kernel: [    673]   113   673     2026      857    53248        0             0 rpcbind
Dec 23 21:21:28 kernel: [    701]   101   701     6450     1925    86016        0             0 systemd-resolve
Dec 23 21:21:28 kernel: [    792]     0   792     1822      585    53248        0             0 cron
Dec 23 21:21:28 kernel: [    797]   102   797     2276      945    57344        0          -900 dbus-daemon
Dec 23 21:21:28 kernel: [    803]     0   803    20713      779    61440        0             0 irqbalance
Dec 23 21:21:28 kernel: [    806]     0   806     8273     3073   102400        0             0 networkd-dispat
Dec 23 21:21:28 kernel: [    807]   104   807    55601      651    77824        0             0 rsyslogd
Dec 23 21:21:28 kernel: [    838]     0   838     3859     1079    73728        0         -1000 sshd
Dec 23 21:21:28 kernel: [    824]     0   824     7855      804    69632        0             0 systemd-logind
Dec 23 21:21:28 kernel: [    872]     0   872     1544      217    45056        0             0 agetty
Dec 23 21:21:28 kernel: [    906]     0   906    27527     2975   114688        0             0 unattended-upgr
Dec 23 21:21:28 kernel: [   1105]   103  1105    22341     1131    77824        0             0 systemd-timesyn
Dec 23 21:21:28 kernel: [   1237]   100  1237     4225     1414    73728        0         -1000 systemd-network
Dec 23 21:21:28 kernel: [   1353]     0  1353    58865      371    90112        0             0 polkitd
Dec 23 21:21:28 kernel: [   1895]     0  1895      621       10    45056        0            10 sh
Dec 23 21:21:28 kernel: [   1921]     0  1921   306742      117    90112        0            10 go-runner
Dec 23 21:21:28 kernel: [   1926]     0  1926   315588     2281   180224        0            10 cinder-csi-plug
Dec 23 21:21:28 kernel: [   2627]     0  2627   311473     1303   143360        0            10 csi-node-driver
Dec 23 21:21:28 kernel: [ 158439]     0 158439    74068     1472   159744        0             0 packagekitd
Dec 23 21:21:28 kernel: [1352765]     0 1352765   441273     3188   319488        0          -900 snapd
Dec 23 21:21:28 kernel: [2110384]     0 2110384    22195      809   102400        0             0 mount-s3
Dec 23 21:21:28 kernel: [2110385]     0 2110385   325873    69389   827392        0             0 mount-s3
Dec 23 21:21:28 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=mount-s3-1.13.0-257205cc-195f-42e2-85bb-f325cdefc6e0.service,mems_allowed=0,oom_memcg=/system.slice,task_mem>
Dec 23 21:21:28 kernel: Memory cgroup out of memory: Killed process 2110385 (mount-s3) total-vm:1303492kB, anon-rss:277556kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:808kB oom_score_adj:0
Dec 23 21:21:28 systemd[1]: mount-s3-1.13.0-257205cc-195f-42e2-85bb-f325cdefc6e0.service: A process of this unit has been killed by the OOM killer.
Dec 23 21:21:28 systemd[1]: mount-s3-1.13.0-257205cc-195f-42e2-85bb-f325cdefc6e0.service: Failed with result 'oom-kill'.
Dec 23 21:21:28 systemd[1]: Failed to start Mountpoint for Amazon S3 CSI driver FUSE daemon.
Dec 23 21:21:28 systemd[1]: mount-s3-1.13.0-257205cc-195f-42e2-85bb-f325cdefc6e0.service: Consumed 15.479s CPU time.
@nikitadom nikitadom added the bug Something isn't working label Dec 23, 2024
@unexge
Copy link
Contributor

unexge commented Dec 27, 2024

Hey @nikitadom, Mountpoint currently tries to use 512 MiB memory by default, by looking at the logs it seems like it has less than minimum memory available – which might be causing OOM. Would you be able to increase Mountpoint's memory to a higher limit?

You might be also running into awslabs/mountpoint-s3-csi-driver#82, the CSI Driver currently spawns Mountpoint in systemd context and consumes systemd resources rather than Kubernetes/container resources.

@nikitadom
Copy link
Author

Hey @nikitadom, Mountpoint currently tries to use 512 MiB memory by default, by looking at the logs it seems like it has less than minimum memory available – which might be causing OOM. Would you be able to increase Mountpoint's memory to a higher limit?

You might be also running into awslabs/mountpoint-s3-csi-driver#82, the CSI Driver currently spawns Mountpoint in systemd context and consumes systemd resources rather than Kubernetes/container resources.

Where should I increase the memory?

@unexge
Copy link
Contributor

unexge commented Dec 30, 2024

The CSI Driver spawns systemd units with mount-s3-<mp-version>-<uuid>.service format, I think you can use drop-in files for Mountpoint units to tweak its configuration.

For example, you can create /etc/systemd/system/mount-s3-.service.d/50-memory.conf with the content:

$ cat /etc/systemd/system/mount-s3-.service.d/50-memory.conf
[Service]
MemoryHigh=2G

and reload systemd daemon to apply changes:

$ systemctl daemon-reload

After that, existing or newly created systemd units for Mountpoint will have MemoryHigh=2G:

$ systemctl status mount-s3-1.13.0-a3bb5010-9341-49c6-9806-dfbb84aba93b.service | grep Memory
     Memory: 16.4M (high: 2.0G available: 1.9G)

See systemd documentation and this SO answer on configuring memory limit for systemd units.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants