An alerting tool for Kubernetes clusters issues of all types, in real time, with intelligent redundancy, and easily extendable api.
Pod default/test-2-broken-image-7cbf974df9-gbnk9 is un-healthy:
Container test-2-broken-image still waiting due to ImagePullBackOff: Back-off pulling image "nginx:l4t3st"
Event by kubelet: Failed x4 since 27 Oct 21 14:20 UTC (last seen 2 minutes ago):
Failed to pull image "nginx:l4t3st": rpc error: code = Unknown desc = Error response from daemon: manifest for nginx:l4t3st not found: manifest unknown: manifest unknown
Event by kubelet: Failed x4 since 27 Oct 21 14:20 UTC (last seen 2 minutes ago):
Error: ErrImagePull
Event by kubelet: Failed x6 since 27 Oct 21 14:20 UTC (last seen 2 minutes ago):
Error: ImagePullBackOff
----------------
Pod default/test-3-excessive-resources-699d58f55f-9gfft is un-healthy:
Unschedulable: 0/1 nodes are available: 1 Insufficient memory. (last transition: 4 minutes ago)
Event by default-scheduler: FailedScheduling since 27 Oct 21 14:20 UTC (last seen 4 minutes ago):
0/1 nodes are available: 1 Insufficient memory.
----------------
Pod default/test-4-crashlooping-dbdd84589-jvplc is un-healthy:
Container test-4-crashlooping is in CrashLoopBackOff: restarted 5 times, last exit due to Error (exit code 1)
Event by kubelet: BackOff x7 since 27 Oct 21 14:20 UTC (last seen 2 minutes ago):
Back-off restarting failed container
Logs of container test-4-crashlooping:
--------
1
2
3
4
5
--------
----------------
Pod default/test-5-completed-757685986-r4tg2 is un-healthy:
Container test-5-completed is in CrashLoopBackOff: restarted 5 times, last exit due to Completed (exit code 0)
Event by kubelet: BackOff x8 since 27 Oct 21 14:20 UTC (last seen 2 minutes ago):
Back-off restarting failed container
Logs of container test-5-completed:
--------
1
2
3
4
5
--------
----------------
Pod default/test-6-crashlooping-init-644545f5b7-bsvrn is un-healthy:
Container test-6-crashlooping-init-container (init) is in CrashLoopBackOff: restarted 5 times, last exit due to Error (exit code 1)
test-6-crashlooping-init-container (init) terminated due to Error (exit code 1)
Container test-6-crashlooping-init-container (init) restarted 5 times
Event by kubelet: BackOff x8 since 27 Oct 21 14:20 UTC (last seen 2 minutes ago):
Back-off restarting failed container
Logs of container test-6-crashlooping-init-container:
--------
1
2
3
4
5
--------
----------------
Or in json format:
{
"alerts_by_cluster_name": {
"minikube": [
{
"cluster_name": "minikube",
"namespace": "default",
"name": "test-2-broken-image-7cbf974df9-gbnk9",
"kind": "Pod",
"messages": [
"Container test-2-broken-image still waiting due to ErrImagePull: rpc error: code = Unknown desc = Error response from daemon: manifest for nginx:l4t3st not found: manifest unknown: manifest unknown",
"Container test-2-broken-image still waiting due to ImagePullBackOff: Back-off pulling image \"nginx:l4t3st\""
],
"events": [
"Event by kubelet: Failed x4 since 27 Oct 21 14:20 UTC (last seen 1 minute ago):\n\tFailed to pull image \"nginx:l4t3st\": rpc error: code = Unknown desc = Error response from daemon: manifest for nginx:l4t3st not found: manifest unknown: manifest unknown",
"Event by kubelet: Failed x4 since 27 Oct 21 14:20 UTC (last seen 1 minute ago):\n\tError: ErrImagePull",
"Event by kubelet: Failed x6 since 27 Oct 21 14:20 UTC (last seen 1 minute ago):\n\tError: ImagePullBackOff"
],
"timestamp": "2021-10-27T14:24:21.181725Z"
},
{
"cluster_name": "minikube",
"namespace": "default",
"name": "test-3-excessive-resources-699d58f55f-9gfft",
"kind": "Pod",
"messages": [
"Unschedulable: 0/1 nodes are available: 1 Insufficient memory. (last transition: 3 minutes ago)"
],
"events": [
"Event by default-scheduler: FailedScheduling since 27 Oct 21 14:20 UTC (last seen 3 minutes ago):\n\t0/1 nodes are available: 1 Insufficient memory."
],
"timestamp": "2021-10-27T14:24:21.181725Z"
},
{
"cluster_name": "minikube",
"namespace": "default",
"name": "test-4-crashlooping-dbdd84589-jvplc",
"kind": "Pod",
"messages": [
"Container test-4-crashlooping is in CrashLoopBackOff: restarted 5 times, last exit due to Error (exit code 1)"
],
"events": [
"Event by kubelet: BackOff x7 since 27 Oct 21 14:20 UTC (last seen 2 minutes ago):\n\tBack-off restarting failed container"
],
"logs_by_container_name": {
"test-4-crashlooping": "1\n2\n3\n4\n5"
},
"timestamp": "2021-10-27T14:24:21.181725Z"
},
{
"cluster_name": "minikube",
"namespace": "default",
"name": "test-5-completed-757685986-r4tg2",
"kind": "Pod",
"messages": [
"Container test-5-completed is in CrashLoopBackOff: restarted 5 times, last exit due to Completed (exit code 0)"
],
"events": [
"Event by kubelet: BackOff x8 since 27 Oct 21 14:20 UTC (last seen 2 minutes ago):\n\tBack-off restarting failed container"
],
"logs_by_container_name": {
"test-5-completed": "1\n2\n3\n4\n5"
},
"timestamp": "2021-10-27T14:24:21.181725Z"
},
{
"cluster_name": "minikube",
"namespace": "default",
"name": "test-6-crashlooping-init-644545f5b7-bsvrn",
"kind": "Pod",
"messages": [
"Container test-6-crashlooping-init-container (init) is in CrashLoopBackOff: restarted 5 times, last exit due to Error (exit code 1)"
],
"events": [
"Event by kubelet: BackOff x8 since 27 Oct 21 14:20 UTC (last seen 2 minutes ago):\n\tBack-off restarting failed container"
],
"logs_by_container_name": {
"test-6-crashlooping-init-container": "1\n2\n3\n4\n5"
},
"timestamp": "2021-10-27T14:24:21.181725Z"
}
]
}
}
- ✓ Pod evictions
- _ Clean up
- ✓ Pod stuck terminating
- _ Graceful clean up
- ✓ Pod pending/unschedulable/pull-backoff
- ✓ Pod stuck initializing
- ✓ Pod excessively restating or crashloops
- ✓ Logs of relevant containers when applicable
- ✓ Node taints/unready
- ✓ Warning events on any entity
- _ Node excessive disk usage per fs partition
- _ Node excessive process allocation
- _ Node excessive inode allocation
- Warning/errors in native logs
- _ Docker
- _ Containerd
- _ Kubelet
- _ System
- _ Helm failures
NAME:
kubescout - 0.1.15 - Scout for alarming issues in your Kubernetes cluster
USAGE:
kubescout [optional flags]
OPTIONS:
--verbose, --vv Verbose logging (default: false) [$VERBOSE]
--logs-tail value Specifies the logs tail length when reporting logs from a problematic pod, use 0 to disable log extraction (default: 250) [$LOGS_TAIL]
--events-limit value Maximum number of namespace events to fetch (default: 150) [$EVENTS_LIMIT]
--kubeconfig value, -k value kubeconfig file path, defaults to env var KUBECONFIG or ~/.kube/config, can be omitted when running in cluster [$KUBECONFIG]
--time-format value, -f value timestamp print format (default: "02 Jan 06 15:04 MST") [$TIME_FORMAT]
--locale value, -l value timestamp print localization (default: "UTC") [$LOCALE]
--pod-creation-grace-sec value grace period in seconds since pod creation before checking its statuses (default: 5) [$POD_CREATION_GRACE_SEC]
--pod-starting-grace-sec value grace period in seconds since pod creation before alarming on non running states (default: 600) [$POD_STARTING_GRACE_SEC]
--pod-termination-grace-sec value grace period in seconds since pod termination (default: 60) [$POD_TERMINATION_GRACE_SEC]
--pod-restart-grace-count value grace count for pod restarts (default: 3) [$POD_RESTART_GRACE_COUNT]
--node-resource-usage-threshold value node resources usage threshold (default: 0.85)
--exclude-ns value, -e value namespaces to skip [$EXCLUDE_NS]
--include-ns value, -n value namespaces to include (will skip any not listed if this option is used) [$INCLUDE_NS]
--dedup-minutes value, -d value time in minutes to silence duplicate or already observed alerts, or 0 to disable deduplication (default: 60) [$DEDUP_MINUTES]
--store-filepath value, -s value path to store file where state will be persisted or empty string to disable persistency (default: "kube-scout.store.json") [$STORE_FILEPATH]
--output value, -o value output mode, one of pretty/json/yaml/discard (default: "pretty") [$OUTPUT_MODE]
--context value, -c value context name to use from kubeconfig, defaults to current context
--not-in-cluster hint to scan out of cluster even if technically kubescout is running in a pod (default: false) [$NOT_IN_CLUSTER]
--all-contexts, -a iterate all kubeconfig contexts, 'context' flag will be ignored if this flag is set (default: false)
--exclude-contexts value a comma separated list of kubeconfig context names to skip, only relevant if 'all-contexts' flag is set
--help, -h show help (default: false)
--version, -v print the version (default: false)
For example:
kubescout --kubeconfig /root/.kube/config --name staging-cluster
kubescout --exclude-ns kube-system
kubescout --include-ns default,test,prod
kubescout -n default -c aws-cluster
curl -s https://raw.githubusercontent.com/reallyliri/kubescout/main/install.sh | sudo bash
# or for a specific version:
curl -s https://raw.githubusercontent.com/reallyliri/kubescout/main/install.sh | sudo bash -s 0.1.0
If that doesn't work, try:
curl -s https://raw.githubusercontent.com/reallyliri/kubescout/main/install.sh -o install.sh
sudo bash install.sh
then run: kubescout -h
Scout your cluster(s) on a schedule or on demand with the following setup options:
helm upgrade -i -n default kubescout ./chart
Values configuration examples:
helm upgrade -i -f custom-values.yaml kubescout ./chart
helm upgrade -i \
--set image.tag="$(go run . --version | cut -d" " -f 3)" \
--set run.mode="CronJob" \
--set run.cronJob.schedule="0 0 * * *" \
--set config.excludeNamespaces={kube-node-lease\,kube-public\,kube-system} \
kubescout ./chart
To get the plain job manifests, you can render the Helm chart:
NAMESPACE=default
helm template \
-n $NAMESPACE \
--set image.tag="$(go run . --version | cut -d" " -f 3)" \
--set run.mode="Job" \
--set persistency.enable=false \
kubescout ./chart > kubescout-job.yaml
kubectl apply -n $NAMESPACE -f kubescout-job.yaml
To get the plain cron-job manifests, you can render the Helm chart:
NAMESPACE=default
helm template \
-n $NAMESPACE \
--set image.tag="$(go run . --version | cut -d" " -f 3)" \
--set run.mode="CronJob" \
--set run.cronjob.schedule="* * * * *" \
kubescout ./chart > kubescout-cronjob.yaml
kubectl apply -n $NAMESPACE -f kubescout-cronjob.yaml
Alpine based slim image that wraps the CLI and used by the Kubernetes solutions.
docker run --rm -it reallyliri/kubescout:latest --help
# or with actual kubeconfig
docker run --rm -it \
-v ${HOME}/.kube/config:/root/.kube/config \
reallyliri/kubescout:latest -a -o json
Build yourself:
docker build -t kubescout .
# then run
docker run --rm -it kubescout --help
Simply fetch/compile the binary and use the CLI with your required config options.
Quick setup of the native binary in a cronjob should be simple enough.
For example, to scout every 3 minutes:
if ! crontab -l | grep kubescout >/dev/null 2>&1 ; then
(crontab -l ; echo "*/3 * * * * kubescout --all-contexts --kubeconfig /root/.kube/config --dedup-minutes 180") | crontab -
fi
You can also use the tool as a package from your own code setup.
go get github.com/reallyliri/kubescout
package example
import (
"crypto/tls"
"fmt"
kubescout "github.com/reallyliri/kubescout/pkg"
kubescoutconfig "github.com/reallyliri/kubescout/config"
kubescoutsink "github.com/reallyliri/kubescout/sink"
"net/http"
)
func main() {
// simple default execution:
_ = kubescout.Scout(nil, nil)
// example using Slack webhook as sink:
cfg, _ := kubescoutconfig.DefaultConfig()
cfg.KubeconfigFilePath = "/root/configs/staging-kubeconfig"
sink, _ := kubescoutsink.CreateWebSink(
"https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX",
func() (http.RoundTripper, error) {
skipVerifyTransport := http.DefaultTransport.(*http.Transport).Clone()
skipVerifyTransport.TLSClientConfig = &tls.Config{InsecureSkipVerify: true}
return skipVerifyTransport, nil
},
func(request *http.Request) error {
request.Header.Add("Content-Type", "application/json")
return nil
},
func(response *http.Response, responseBody string) error {
if responseBody != "ok" {
return fmt.Errorf("non-ok response from Slack: '%v'", responseBody)
}
return nil
},
false,
)
_ = kubescout.Scout(cfg, sink)
}
# vet and lint
go vet
docker run --rm -v $(pwd):/app -w /app golangci/golangci-lint:latest-alpine golangci-lint run --deadline=180s
# tests
go test -v ./...
# integration tests (requires minikube)
go test -v --tags=integration ./integration_test.go
# build
CGO_ENABLED=0 go build -o bin/kubescout-$(go run . --version | cut -d" " -f 3) .