Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

tests: add more tests #1002

Merged

Conversation

lorenzofelletti
Copy link
Contributor

In light of the recent issue with the API server availability rules, which was hard to catch by simply looking at the git diff, I think that adding more thorough testing is due to make repeating such a mistake harder in the future.

In this PR I'm adding more tests around the API server rules, and I'm also moving the tests to a dedicated folder, allowing to break the giant tests.yam file in multiple, more readable ones. The Makefile is updated accordingly, so to run all test files in the tests folder on make test.

@skl
Copy link
Collaborator

skl commented Dec 17, 2024

Thank you for doing this, @lorenzofelletti! I was thinking the same.

One thing to watch out for - there's a few dependabot PRs open which seem to depend on go.mod being bumped to 1.23, as well as the prometheus version being bumped from 0.53 to 0.300, for example:

Note that the prometheus version bump appears to change the behaviour of the tests (understandable given how out of date the deps were).

I can fix these on #994 but you'll need to make sure to merge/rebase latest master branch, just wanted to give you a heads up in case your test behaviour is affected.

@lorenzofelletti
Copy link
Contributor Author

Hi @skl, thanks for letting me know. I'll rebase/merge master to keep this in sync.

@skl
Copy link
Collaborator

skl commented Dec 19, 2024

@lorenzofelletti deps updated including prometheus, fixed tests to match latest behaviour in a2e2d9a. Please pull latest master and apologies for the conflicts!

Copy link

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

@github-actions github-actions bot added the stale label Jan 19, 2025
@github-actions github-actions bot closed this Jan 27, 2025
@lorenzofelletti
Copy link
Contributor Author

Hey @skl, I just didn't have bandwidth to progress on this for the last month or so!
Now I'm willing to continue with this one, could you please reopen this PR or should I create a new one?

@skl skl reopened this Jan 27, 2025
@skl
Copy link
Collaborator

skl commented Jan 27, 2025

Reopened! Make sure to merge or rebase latest main, thanks!

@github-actions github-actions bot removed the stale label Jan 28, 2025
@lorenzofelletti lorenzofelletti marked this pull request as ready for review February 7, 2025 16:52
@lorenzofelletti
Copy link
Contributor Author

@skl I have a question:

Why is it that if I teak the named tests to fail, I get the failed test name reported in the pre-existing tests, but not in the ones I added?

E.g.:

Name shown:

tests:
# ...
- name: KubeletTooManyPods alert (single-node with instance as IP address)
  FAILED:
    name: KubeletTooManyPods alert (single-node with instance as IP address),
    alertname: KubeletTooManyPods, time: 1m, 
        exp:[
            0:
              Labels:{alertname="KubeletTooManyPods", cluster="kubernetes", instance="10.0.0.0", node="node0", severity="info"}
              Annotations:{description="Kubelet 'node0' is running at 100% of its Pod capacity.", runbook_url="https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods", summary="Kubelet is running at capacity."}
            ], 
        got:[]

Name not shown:

tests:
# ...
- name: calculate apiserver request total increase 30d rate
  FAILED:
    expr: "code_verb:apiserver_request_total:increase30d{verb=\"GET\"}", time: 1m,
        exp: {__name__="code_verb:apiserver_request_total:increase30d", code="200", verb="GET"} 3.96E+04, {__name__="code_verb:apiserver_request_total:increase30d", code="500", verb="GET"} 3.24E+03
        got: {__name__="code_verb:apiserver_request_total:increase30d", code="200", verb="GET"} 1.08E+04, {__name__="code_verb:apiserver_request_total:increase30d", code="500", verb="GET"} 3.6E+02

I can't understand why this behaviour...

@skl
Copy link
Collaborator

skl commented Feb 7, 2025

@lorenzofelletti maybe it's a bug in promtool - there seems to be an inconsistency between the failure messages shown for promql_expr_test definitions, vs those shown for alert_rule_test definitions (the test name seems to only appear on the latter).

@skl skl merged commit 96d585c into kubernetes-monitoring:master Feb 7, 2025
9 checks passed
rexagod pushed a commit to rexagod/kubernetes-mixin that referenced this pull request Feb 10, 2025
* test: move tests to dedicated directory

* test: (wip) add apiserver tests

* feat: add test group name

* test: update apiserver availability tests to include new metrics and calculations

* refactor(tests): update labels format in apiserver availability tests
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants