Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: compute alerts on systematic failure #4

Merged
merged 7 commits into from
Apr 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions .github/workflows/overflow-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
on: [push]

defaults:
run:
# default failure handling for shell scripts in 'run' steps
shell: 'bash -Eeuo pipefail -x {0}'

jobs:
overflow_test:
runs-on: ubuntu-latest
name: Test CIclops with summary overflow
steps:
- uses: actions/checkout@v3

- name: Generate Test Summary
uses: ./
with:
artifact_directory: example-artifacts/
output_file: test-summary.md
short_file: short.md
alerts_file: alerts.txt
# NOTE: it does not seem possible to pass $GITHUB_STEP_SUMMARY as a
# regular file to the underlying script. Hence, we create a file and
# in a later step write with >> $GITHUB_STEP_SUMMARY

- name: Create local file that is bigger than GH limit
run: |
dd if=/dev/zero of=big-test-summary.md bs=1M count=2

- name: Check full summary fits within GH limit
# $GITHUB_STEP_SUMMARY will reject content over 1024 bytes
# on exceeding, the workflow WILL FAIL and still count as success()
# With this step, we do proper error flow, and fail if the limit would be
# exceeded.
id: check-overflow
run: |
size=$(stat -c '%s' big-test-summary.md)
if [ "$size" -gt 1024 ]; then
echo "overflow=true" >> $GITHUB_OUTPUT
fi
# Here we force the "big test summary" to overflow GH limits, and we
# create an output that further steps can leverage: steps.check-overflow.outputs.overflow

- name: If the full summary fits within GH limits, use it
# This step should be skipped, we expect
if: ${{!steps.check-overflow.outputs.overflow}}
run: |
cat big-test-summary.md >> $GITHUB_STEP_SUMMARY

- name: If the full summary is too big, use short version
if: ${{steps.check-overflow.outputs.overflow}}
run: |
cat short.md >> $GITHUB_STEP_SUMMARY

- name: If full summary is too big, archive it
if: ${{steps.check-overflow.outputs.overflow}}
uses: actions/upload-artifact@v3
with:
name: test-summary.md
path: test-summary.md
retention-days: 7

- name: Create slack body with alerts
run: |
echo 'slack-message<<EOF' >> $GITHUB_OUTPUT
cat alerts.txt >> $GITHUB_OUTPUT
echo 'EOF' >> $GITHUB_OUTPUT
16 changes: 13 additions & 3 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
on: [push]

jobs:
custom_test:
plain_test:
runs-on: ubuntu-latest
name: Test action with act
name: Test CIclops with normal summary
steps:
- uses: actions/checkout@v3

Expand All @@ -12,8 +12,18 @@ jobs:
with:
artifact_directory: example-artifacts/
output_file: test-summary.md
short_file: short.md
alerts_file: alerts.txt
# NOTE: it does not seem possible to pass $GITHUB_STEP_SUMMARY as a
# regular file to the underlying script. Hence, we create a file and
# in a later step write with >> $GITHUB_STEP_SUMMARY

- name: Create GitHub Job Summary from report
run: cat test-summary.md >> $GITHUB_STEP_SUMMARY
run: |
cat test-summary.md >> $GITHUB_STEP_SUMMARY

- name: Create slack body with alerts
run: |
echo 'slack-message<<EOF' >> $GITHUB_OUTPUT
cat alerts.txt >> $GITHUB_OUTPUT
echo 'EOF' >> $GITHUB_OUTPUT
19 changes: 19 additions & 0 deletions .github/workflows/unit-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
on: [push]

jobs:
plain_test:
runs-on: ubuntu-latest
name: Unit test
steps:
- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'

- name: Install dependencies
run: python -m pip install --upgrade pip prettytable

- name: Run suite
run: python test_summary.py -v
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# editor and IDE paraphernalia
.idea
__pycache__/
12 changes: 10 additions & 2 deletions DEVELOPERS_DEVELOPERS_DEVELOPERS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# building and testing locally
# Building and testing locally

The `ciclops` GitHub Action runs using a Docker container that encapsulates the
Python script that does the CI test analysis.
Expand Down Expand Up @@ -39,6 +39,14 @@ act -b --env GITHUB_STEP_SUMMARY='github-summary.md'

Running this should create a file `github-summary.md` with the test summary.

## Unit tests

CIclops has the beginning of a unit test suite. You can run it with:

``` sh
python3 -m unittest
```

## How it works

The files in this repository are needed for the Dockerfile to build and run, of
Expand All @@ -57,4 +65,4 @@ See [GitHub support for Dockerfile](https://docs.github.com/en/actions/creating-

**NOTE**: the behavior of the `COPY` command in Dockerfiles seems quite
finicky on whether it's done recursively or not. The invocation used,
`COPY . .`, ensured that the copy was recursive.
`COPY . .`, ensures the copy is done recursively.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ RUN pip install --no-cache-dir -r requirements.txt
COPY . .

ENTRYPOINT [ "python", "/summarize_test_results.py"]
CMD ["--dir", "./test-artifacts", "--out", ""]
CMD ["--dir", "./test-artifacts", "--out", "", "--short", "", "--alerts", ""]
75 changes: 75 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,81 @@ For example:
run: cat test-summary.md >> $GITHUB_STEP_SUMMARY
```

## Advanced Usage

There are two advanced cases we want to call attention to:

1. Summary overflow. \
The `GITHUB_STEP_SUMMARY` variable set to receive the CIclops summary will
overflow if the summary is bigger than 1024 bytes. To work around this,
CIclops can be directed to create a short summary on top of the "normal"
summary. The calling workflow can verify if the full summary is too big, and
if so, can display the short summary, and perhaps Archive the full summary.

2. Slackops \
CIclops will create a series of alerts when systematic failures are detected.
By "systematic", we mean cases such as:

- all test combinations have failed
- all combinations fail for a given test
- all tests fail for a given version of Postgres

The alerts are included in the summary generated by CIclops, but it is also
possible to direct CIclops to put the alerts only in an output file.
This file can then be sent via Slack message to alert DevOps teams.

The following snippet shows how to use these features:

``` yaml
- name: Generate Test Summary
uses: cloudnative-pg/ciclops@main
with:
artifact_directory: test-artifacts/data
output_file: test-summary.md
short_file: short.md
alerts_file: alerts.txt

- name: Check full summary fits within GH limit
id: check-overflow
run: |
size=$(stat -c '%s' test-summary.md)
if [ "$size" -gt 1024 ]; then
echo "overflow=true" >> $GITHUB_OUTPUT
fi

- name: If the full summary would not overflow, use it
if: ${{!steps.check-overflow.outputs.overflow}}
run: |
cat test-summary.md >> $GITHUB_STEP_SUMMARY

- name: If the full summary is too big, use short version
if: ${{steps.check-overflow.outputs.overflow}}
run: |
cat short.md >> $GITHUB_STEP_SUMMARY

- name: If full summary is too big, archive it
if: ${{steps.check-overflow.outputs.overflow}}
uses: actions/upload-artifact@v3
with:
name: test-summary.md
path: test-summary.md
retention-days: 7

- name: Create Slack body with alerts
id: alerts
run: |
echo 'slack-message<<EOF' >> $GITHUB_OUTPUT
cat alerts.txt >> $GITHUB_OUTPUT
echo 'EOF' >> $GITHUB_OUTPUT

- name: Send Slack Notification
uses: rtCamp/action-slack-notify@v2
env:
SLACK_USERNAME: cnpg-bot
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
SLACK_MESSAGE: ${{ steps.alerts.outputs.slack-message }}
```

## Origin

At EDB, working on a series of Kubernetes operators for PostgreSQL, we have an
Expand Down
10 changes: 10 additions & 0 deletions action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ inputs:
output_file:
description: "file where the markdown report should be written"
required: false
short_file:
description: "file where the abridged markdown report should be written"
required: false
alerts_file:
description: "file where any alerts found should be written"
required: false
runs:
using: "docker"
image: "Dockerfile"
Expand All @@ -22,3 +28,7 @@ runs:
- "./${{ inputs.artifact_directory }}"
- "--out"
- "${{ inputs.output_file }}"
- "--short"
- "${{ inputs.short_file }}"
- "--alerts"
- "${{ inputs.alerts_file }}"
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@
"workflow_id": 12,
"repo": "my-repo",
"branch": "my-branch"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@
"workflow_id": 12,
"repo": "my-repo",
"branch": "my-branch"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@
"workflow_id": 12,
"repo": "my-repo",
"branch": "my-branch"
}
}
Original file line number Diff line number Diff line change
@@ -1 +1,18 @@
{"name": "Update operand -- upgrades operand version", "state": "failed", "start_time": "2023-01-25T10:26:03.771385171Z", "end_time": "2023-01-25T10:38:44.769651724Z", "error": "Timed out after 600.001s.\nExpected success, but got an error:\n <*errors.errorString | 0xc0003c4c20>: {\n s: \"Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\\nExpected\\n <int32>: 0\\nto equal\\n <int32>: 1\",\n }\n Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\n Expected\n <int32>: 0\n to equal\n <int32>: 1", "error_file": "/home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go", "error_line": 268, "platform": "local", "postgres_kind": "PostgreSQL", "redwood": false, "matrix_id": "local-v1.21.14-PGD-PostgreSQL-12.13-5.0.0-0.0git256.g2a1b38b.1.dev-1", "postgres_version": "12.13-5.0.0-0.0git256.g2a1b38b.1.dev-1", "k8s_version": "v1.21.14", "workflow_id": 4004501040, "repo": "EnterpriseDB/pg4k-pgd", "branch": "dev/cnp-3285-2"}
{
"name": "Update operand -- upgrades operand version",
"state": "failed",
"start_time": "2023-01-25T10:26:03.771385171Z",
"end_time": "2023-01-25T10:38:44.769651724Z",
"error": "Timed out after 600.001s.\nExpected success, but got an error:\n <*errors.errorString | 0xc0003c4c20>: {\n s: \"Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\\nExpected\\n <int32>: 0\\nto equal\\n <int32>: 1\",\n }\n Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\n Expected\n <int32>: 0\n to equal\n <int32>: 1",
"error_file": "/home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go",
"error_line": 268,
"platform": "local",
"postgres_kind": "PostgreSQL",
"redwood": false,
"matrix_id": "local-v1.21.14-PGD-PostgreSQL-12.13-5.0.0-0.0git256.g2a1b38b.1.dev-1",
"postgres_version": "12.13-5.0.0-0.0git256.g2a1b38b.1.dev-1",
"k8s_version": "v1.21.14",
"workflow_id": 4004501040,
"repo": "EnterpriseDB/pg4k-pgd",
"branch": "dev/cnp-3285-2"
}
Original file line number Diff line number Diff line change
@@ -1 +1,18 @@
{"name": "Update operand -- upgrades operand version", "state": "failed", "start_time": "2023-01-25T11:08:20.179946291Z", "end_time": "2023-01-25T11:20:54.376679592Z", "error": "Timed out after 600.001s.\nExpected success, but got an error:\n <*errors.errorString | 0xc000a41e40>: {\n s: \"Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\\nExpected\\n <int32>: 0\\nto equal\\n <int32>: 1\",\n }\n Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\n Expected\n <int32>: 0\n to equal\n <int32>: 1", "error_file": "/home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go", "error_line": 268, "platform": "local", "postgres_kind": "PostgreSQL", "redwood": false, "matrix_id": "local-v1.21.14-PGD-PostgreSQL-13.9-5.0.0-0.0git256.g2a1b38b.1.dev-1", "postgres_version": "13.9-5.0.0-0.0git256.g2a1b38b.1.dev-1", "k8s_version": "v1.21.14", "workflow_id": 4004501040, "repo": "EnterpriseDB/pg4k-pgd", "branch": "dev/cnp-3285-2"}
{
"name": "Update operand -- upgrades operand version",
"state": "failed",
"start_time": "2023-01-25T11:08:20.179946291Z",
"end_time": "2023-01-25T11:20:54.376679592Z",
"error": "Timed out after 600.001s.\nExpected success, but got an error:\n <*errors.errorString | 0xc000a41e40>: {\n s: \"Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\\nExpected\\n <int32>: 0\\nto equal\\n <int32>: 1\",\n }\n Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\n Expected\n <int32>: 0\n to equal\n <int32>: 1",
"error_file": "/home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go",
"error_line": 268,
"platform": "local",
"postgres_kind": "PostgreSQL",
"redwood": false,
"matrix_id": "local-v1.21.14-PGD-PostgreSQL-13.9-5.0.0-0.0git256.g2a1b38b.1.dev-1",
"postgres_version": "13.9-5.0.0-0.0git256.g2a1b38b.1.dev-1",
"k8s_version": "v1.21.14",
"workflow_id": 4004501040,
"repo": "EnterpriseDB/pg4k-pgd",
"branch": "dev/cnp-3285-2"
}
Original file line number Diff line number Diff line change
@@ -1 +1,18 @@
{"name": "Smoke test -- sets up a PGD Group according to flexible three regions architecture", "state": "failed", "start_time": "2023-01-25T10:38:50.421338678Z", "end_time": "2023-01-25T10:52:23.675119439Z", "error": "Timed out after 602.794s.\nError: Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:463 failed:\nExpected\n <*fmt.wrapError | 0xc00007d8e0>: {\n msg: \"context deadline exceeded - \",\n err: <context.deadlineExceededError>{},\n }\nto be nil\n <*errors.errorString | 0xc0003b2650>: {\n s: \"Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:463 failed:\\nExpected\\n <*fmt.wrapError | 0xc00007d8e0>: {\\n msg: \\\"context deadline exceeded - \\\",\\n err: <context.deadlineExceededError>{},\\n }\\nto be nil\",\n }", "error_file": "/home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go", "error_line": 466, "platform": "local", "postgres_kind": "PostgreSQL", "redwood": false, "matrix_id": "local-v1.21.14-PGD-PostgreSQL-13.9-5.0.0-0.0git256.g2a1b38b.1.dev-1", "postgres_version": "13.9-5.0.0-0.0git256.g2a1b38b.1.dev-1", "k8s_version": "v1.21.14", "workflow_id": 4004501040, "repo": "EnterpriseDB/pg4k-pgd", "branch": "dev/cnp-3285-2"}
{
"name": "Smoke test -- sets up a PGD Group according to flexible three regions architecture",
"state": "failed",
"start_time": "2023-01-25T10:38:50.421338678Z",
"end_time": "2023-01-25T10:52:23.675119439Z",
"error": "Timed out after 602.794s.\nError: Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:463 failed:\nExpected\n <*fmt.wrapError | 0xc00007d8e0>: {\n msg: \"context deadline exceeded - \",\n err: <context.deadlineExceededError>{},\n }\nto be nil\n <*errors.errorString | 0xc0003b2650>: {\n s: \"Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:463 failed:\\nExpected\\n <*fmt.wrapError | 0xc00007d8e0>: {\\n msg: \\\"context deadline exceeded - \\\",\\n err: <context.deadlineExceededError>{},\\n }\\nto be nil\",\n }",
"error_file": "/home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go",
"error_line": 466,
"platform": "local",
"postgres_kind": "PostgreSQL",
"redwood": false,
"matrix_id": "local-v1.21.14-PGD-PostgreSQL-13.9-5.0.0-0.0git256.g2a1b38b.1.dev-1",
"postgres_version": "13.9-5.0.0-0.0git256.g2a1b38b.1.dev-1",
"k8s_version": "v1.21.14",
"workflow_id": 4004501040,
"repo": "EnterpriseDB/pg4k-pgd",
"branch": "dev/cnp-3285-2"
}
Original file line number Diff line number Diff line change
@@ -1 +1,18 @@
{"name": "Update operand -- upgrades operand version", "state": "failed", "start_time": "2023-01-25T10:42:32.45376546Z", "end_time": "2023-01-25T10:54:59.12104098Z", "error": "Timed out after 600.001s.\nExpected success, but got an error:\n <*errors.errorString | 0xc0000b3ce0>: {\n s: \"Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\\nExpected\\n <int32>: 0\\nto equal\\n <int32>: 1\",\n }\n Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\n Expected\n <int32>: 0\n to equal\n <int32>: 1", "error_file": "/home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go", "error_line": 268, "platform": "local", "postgres_kind": "PostgreSQL", "redwood": false, "matrix_id": "local-v1.21.14-PGD-PostgreSQL-14.6-5.0.0-0.0git256.g2a1b38b.1.dev-1", "postgres_version": "14.6-5.0.0-0.0git256.g2a1b38b.1.dev-1", "k8s_version": "v1.21.14", "workflow_id": 4004501040, "repo": "EnterpriseDB/pg4k-pgd", "branch": "dev/cnp-3285-2"}
{
"name": "Update operand -- upgrades operand version",
"state": "failed",
"start_time": "2023-01-25T10:42:32.45376546Z",
"end_time": "2023-01-25T10:54:59.12104098Z",
"error": "Timed out after 600.001s.\nExpected success, but got an error:\n <*errors.errorString | 0xc0000b3ce0>: {\n s: \"Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\\nExpected\\n <int32>: 0\\nto equal\\n <int32>: 1\",\n }\n Assertion in callback at /home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go:266 failed:\n Expected\n <int32>: 0\n to equal\n <int32>: 1",
"error_file": "/home/runner/work/pg4k-pgd/pg4k-pgd/pgd-operator/tests/e2e/asserts_test.go",
"error_line": 268,
"platform": "local",
"postgres_kind": "PostgreSQL",
"redwood": false,
"matrix_id": "local-v1.21.14-PGD-PostgreSQL-14.6-5.0.0-0.0git256.g2a1b38b.1.dev-1",
"postgres_version": "14.6-5.0.0-0.0git256.g2a1b38b.1.dev-1",
"k8s_version": "v1.21.14",
"workflow_id": 4004501040,
"repo": "EnterpriseDB/pg4k-pgd",
"branch": "dev/cnp-3285-2"
}
Loading