Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

KRR not detecting memory peak usage and suggesting too low limit #379

Open
d47zm3 opened this issue Dec 19, 2024 · 3 comments
Open

KRR not detecting memory peak usage and suggesting too low limit #379

d47zm3 opened this issue Dec 19, 2024 · 3 comments

Comments

@d47zm3
Copy link

d47zm3 commented Dec 19, 2024

Describe the bug
When running KRR v1.18.0 with

krr simple -n keycloak-prod --use-oomkill-data --history_duration 48 --mem-min 50 --allow-hpa

it produces these recommendations

┏━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃        ┃              ┃             ┃      ┃          ┃             ┃              ┃          ┃ CPU         ┃              ┃             ┃ Memory      ┃ Memory       ┃
┃ Number ┃ Namespace    ┃ Name        ┃ Pods ┃ Old Pods ┃ Type        ┃ Container    ┃ CPU Diff ┃ Requests    ┃ CPU Limits   ┃ Memory Diff ┃ Requests    ┃ Limits       ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│     1. │ keycloak-pr… │ keycloak-p… │ 2    │ 0        │ StatefulSet │ keycloak     │ -944m    │ (-472m)     │ 750m ->      │ +750Mi      │ (+375Mi)    │ 768Mi ->     ││        │              │             │      │          │             │              │ (2 pods) │ 500m -> 28m │ unset        │ (2 pods)    │ 512Mi ->    │ 887Mi        │
│        │              │             │      │          │             │              │          │             │              │             │ 887Mi       │              │
│     2. │              │             │      │          │             │ cloud-sql-p… │ +20m     │ (+10m)      │ unset        │ +100Mi      │ (+50Mi)     │ unset ->     │
│        │              │             │      │          │             │              │ (2 pods) │ unset ->    │              │ (2 pods)    │ unset ->    │ 50Mi         ││        │              │             │      │          │             │              │          │ 10m         │              │             │ 50Mi        │              │
└────────┴──────────────┴─────────────┴──────┴──────────┴─────────────┴──────────────┴──────────┴─────────────┴──────────────┴─────────────┴─────────────┴──────────────┘

however, in Grafana we see peaks 2 days ago with 2GB of RAM usage, which would result in Keycloak being OOM-killed. You're saying

For memory, we take the maximum value over the past week and add a 15% buffer.

it doesn't seem like it takes maximum value, please check attached screenshot from Grafana. How to work around this?

image

@aantn
Copy link
Contributor

aantn commented Dec 22, 2024

Are there any warnings in the KRR output?

@d47zm3
Copy link
Author

d47zm3 commented Dec 22, 2024

Nope, nothing, maybe it's about used step (1.25), how can I customise this to check e.g. every 30 seconds or so?

[20:33:59] INFO     Using clusters: ['REDACTED']                                                                                                                runner.py:280
           INFO     No Prometheus URL is specified, trying to auto-detect a metrics service                                                                    loader.py:58
           INFO     Trying to connect to Victoria Metrics for REDACTED cluster                                                               prometheus_metrics_service.py:68
[20:34:00] INFO     Victoria Metrics not found: Victoria Metrics instance could not be found while scanning in REDACTED cluster.                                 loader.py:67
           INFO     Trying to connect to Thanos for REDACTED cluster                                                                         prometheus_metrics_service.py:68
[20:34:01] INFO     Thanos not found: Thanos instance could not be found while scanning in REDACTED cluster.                                                     loader.py:67
           INFO     Trying to connect to Mimir for REDACTED cluster                                                                          prometheus_metrics_service.py:68
           INFO     Mimir not found: Mimir instance could not be found while scanning in REDACTED cluster.                                                       loader.py:67
           INFO     Trying to connect to Prometheus for REDACTED cluster                                                                     prometheus_metrics_service.py:68
           INFO     Using Prometheus at                                                                                                    prometheus_metrics_service.py:97
                    https://REDACTED/api/v1/namespaces/monitoring/services/kube-prometheus-stack-prometheus:9090/proxy for cluster
                    REDACTED
           INFO     Prometheus found                                                                                                                           loader.py:74
           INFO     Prometheus connected successfully for REDACTED cluster                                                                                       loader.py:47
           INFO     Listing scannable objects in REDACTED                                                                                                      __init__.py:96
[20:34:03] INFO     Calculated recommendations for StatefulSet keycloak-prod/keycloak-prod/cloud-sql-proxy (using 5 metrics)                                  runner.py:210
           INFO     Calculated recommendations for StatefulSet keycloak-prod/keycloak-prod/keycloak (using 5 metrics)                                         runner.py:210
           INFO     Result collected, displaying...                                                                                                           runner.py:343

Simple Strategy

CPU request: 95.0% percentile, limit: unset
Memory request: max + 15.0%, limit: max + 15.0%
History: 160.0 hours
Step: 1.25 minutes

@d47zm3
Copy link
Author

d47zm3 commented Dec 24, 2024

Tried also with

krr simple -n keycloak-prod --use-oomkill-data --history_duration 300 --mem-min 50 --timeframe_duration 0.5 --points_required 1500 --allow-hpa

same results

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants