-
Category: Compatibility, Installability and Upgradability Tests
[Increase decrease capacity] | [Helm chart published] | [Helm chart valid] | [Helm deploy] | [Rollback] | [Rolling version change] | [Rolling update] | [Rolling downgrade] | [CNI compatible]
-
[Reasonable Image Size] | [Reasonable Startup Time] | [Single Process Type in One Container] | [Service Discovery] | [Shared Database] | [Specialized Init Systems] | [Sigterm Handled] | [Zombie Handled]
-
[Node drain] | [No local volume configuration] | [Elastic volumes] | [Database persistence]
-
Category: Reliability, Resilience and Availability Tests
[CNF under network latency] | [CNF with host disk fill] | [Pod delete] | [Memory hog] | [IO Stress] | [Network corruption] | [Network duplication] | [Pod DNS errors] | [Helm chart liveness entry] | [Helm chart readiness entry]
-
Category: Observability and Diagnostic Tests
[Use stdout/stderr for logs] | [Prometheus installed] | [Routed logs] | [OpenMetrics compatible] | [Jaeger tracing]
-
[Container socket mounts] | [Privileged Containers] | [External IPs] | [SELinux Options] | [Sysctls] | [Privilege escalation] | [Symlink file system] | [Application credentials] | [Host network] | [Service account mapping] | [Ingress and Egress blocked] | [Insecure capabilities] | [Non-root containers] | [Host PID/IPC privileges] | [Linux hardening] | [CPU limits] | [Memory limits] | [Immutable File Systems] | [HostPath Mounts]
-
[Default namespaces] | [Latest tag] | [Require labels] | [Versioned tag] | [NodePort not used] | [HostPort not used] | [Hardcoded IP addresses in K8s runtime configuration] | [Secrets used] | [Immutable configmap] | [Kubernetes Alpha APIs PoC]
-
[K8s Conformance] | [ClusterAPI enabled] | [OCI Compliant] | [(POC) Worker reboot recovery] | [Cluster admin] | [Control plane hardening] | [Tiller images]
CNFs should work with any Certified Kubernetes product and any CNI-compatible network that meet their functionality requirements. The CNTI Test Catalog will check for usage of standard, in-band deployment tools such as Helm (version 3) charts. The CNTI Test Catalog checks to see if CNFs support horizontal scaling (across multiple machines) and vertical scaling (between sizes of machines) by using the native K8s kubectl.
Service providers have historically had issues with the installability of vendor network functions. This category tests the installability and lifecycle management (the create, update, and delete of network applications) against widely used K8s installation solutions such as Helm.
All compatibility: ./cnf-testsuite compatibility
HPA (horizonal pod autoscale) will autoscale replicas to accommodate when there is an increase of CPU, memory or other configured metrics to prevent disruption by allowing more requests by balancing out the utilisation across all of the pods. Decreasing replicas works the same as increase but rather scale down the number of replicas when the traffic decreases to the number of pods that can handle the requests. You can read more about horizonal pod autoscaling to create replicas here and in the K8s scaling cheatsheet. Expectation: The number of replicas for a Pod increases and then decreases.
A CNF should be able to increase and decrease its capacity without running into errors.
Check out the kubectl docs for how to manually scale your cnf. Also here is some info about things that could cause failures.
./cnf-testsuite increase_decrease_capacity
Checks if the helm chart is found in a remote repository when running helm search
.
Expectation: The Helm chart is published in a Helm Repsitory.
If a helm chart is published, it is significantly easier to install for the end user. The management and versioning of the helm chart are handled by the helm registry and client tools rather than manually as directly referencing the helm chart source.
Make sure your CNF helm charts are published in a Helm Repository.
./cnf-testsuite helm_chart_published
Checks the syntax & validity of the chart using helm lint
Expectation: No syntax or validation problems are found in the chart.
A chart should pass the lint specification
Make sure your helm charts pass lint tests.
./cnf-testsuite helm_chart_valid
Checks if the CNF is installed by using a Helm Chart. Expectation: The CNF was installed using Helm.
A helm chart should be deployable to a cluster
Make sure your helm charts are valid and can be deployed to clusters.
./cnf-testsuite helm_deploy
Checks if the Pod can be upgraded to a new software version, then restored back to the orginal software version by using the Kubectl Set Image & Kubectl Rollout Undo commands. Expectation: The CNF Software version can be successfully incremented, then rolled back.
K8s best practice is to allow K8s to manage the rolling back of an application resource instead of having operators manually rolling back the resource by using something like blue/green deploys.
Ensure that you can upgrade your CNF using the Kubectl Set Image command, then rollback the upgrade using the Kubectl Rollout Undo command.
./cnf-testsuite rollback
Checks if the Pod can be rolled back to the original software version by using the Kubectl Set Image to perform a rollback. Expectation: The CNF Software version is successfully rolled back to its original version.
(update, version change, downgrade): K8s best practice for version/installation management (lifecycle management) of applications is to have K8s track the version of the manifest information for the resource (deployment, pod, etc) internally. Whenever a rollback is needed the resource will have the exact manifest information that was tied to the application when it was deployed. This adheres the principles driving immutable infrastructure and declarative specifications.
Ensure that you can successfuly rollback the software version of your CNF by using the Kubectl Set Image command.
./cnf-testsuite rolling_version_change
Checks if the Pod can be upgraded to a new software version by using the Kubectl Set Image Expectation: The CNF Software version can be successfully incremented.
See rolling version change.
Ensure that you can successfuly perform a rolling upgrade of your CNF using the Kubectl Set Image command.
./cnf-testsuite rolling_update
Checks if the Pod can be rolled back older software version(Older than the original software version) by using the Kubectl Set Image to perform a downgrade. Expectation: The CNF Software version is successfully downgraded to a software version older than the orginal installation version.
See rolling version change.
Ensure that you can successfuly change the software version of your CNF back to an older version by using the Kubectl Set Image command.
./cnf-testsuite rolling_downgrade
This installs temporary kind clusters and will test the CNF against both Calico and Cilium CNIs. Expectation: CNF should be compatible with multiple and different CNIs
A CNF should be runnable by any CNI that adheres to the CNI specification
Ensure that your CNF is compatible with Calico, Cilium and other available CNIs.
./cnf-testsuite cni_compatible
The CNF should be developed and delivered as a microservice. The CNTI Test Catalog tests to determine the organizational structure and rate of change of the CNF being tested. Once these are known we can detemine whether or not the CNF is a microservice. See: Microservice-Principles
Good microservice practices promote agility which means less time will occur between deployments. One benefit of more agility is it allows for different organizations and teams to deploy at the rate of change that they build out features, instead of deploying in lock step with other teams. This is very important when it comes to changes that are time sensitive like security patches.
All microservice: ./cnf-testsuite microservice
Checks the size of the image used. Expectation: CNF image size is under 5 gigs
A CNF with a large image size of 5 gigabytes or more tends to indicate a monolithic application.
Ensure your CNF's image size is under 5GB.
./cnf-testsuite reasonable_image_size
Checks how long it takes for the CNF to pass a Readiness Probe and reach a ready/running state. Expectation: CNF starts up under one minute
A CNF that starts up with a time (adjusted for server resources) that is approaching a minute is indicative of a monolithic application. The liveness probe's initialDelaySeconds
and failureThreshold
determine the startup time and retry amount of the CNF. Specifically, if the initialDelay
is too long, it is indicative of a monolithic application. If the failureThreshold
is too high, it is indicative of a CNF or a component of the CNF that has too many intermittent failures.
Ensure that your CNF gets into a running state within 30 seconds.
./cnf-testsuite reasonable_startup_time
This verifies that there is only one process type within one container. This does not count against child processes. For example, nginx or httpd could have a parent process and then 10 child processes, but if both nginx and httpd were running, this test would fail. Expectation: CNF container has one process type
A microservice should have only one process (or set of parent/child processes) that is managed by a non-homegrown supervisor or orchestrator. The microservice should not spawn other process types (e.g., executables) as a way to contribute to the workload but rather should interact with other processes through a microservice API.
Ensure that there is only one process type within a container. This does not count against child processes, e.g., nginx or httpd could be a parent process with 10 child processes and pass this test, but if both nginx and httpd were running, this test would fail.
./cnf-testsuite single_process_type
This tests and checks if the containers within a CNF have services exposed via a Kubernetes Service resource. Application access for microservices within a cluster should be exposed via a Service. Read more about K8s Service here. Expectation: CNFs accessible to other applications should be exposed via a Service.
A K8s microservice should expose its API through a K8s service resource. K8s services handle service discovery and load balancing for the cluster, ensuring that microservices can efficiently communicate and distribute traffic among themselves.
Make sure the CNF exposes any of its containers as a Kubernetes Service. This is crucial for enabling service discovery and load balancing within the cluster, facilitating smoother operation and communication between microservices. You can learn more about Kubernetes Service here.
./cnf-testsuite service_discovery
This tests if multiple CNFs are using the same database. Expectation: Multiple microservices should not share the same database.
A K8s microservice should not share a database with another K8s database because it forces the two services to upgrade in lock step.
Make sure that your CNFs containers are not sharing the same database.
./cnf-testsuite shared_database
This tests if containers in pods have dumb-init, tini or s6-overlay as init processes. Expectation: Container images should use specialized init systems for containers.
There are proper init systems and sophisticated supervisors that can be run inside of a container. Both of these systems properly reap and pass signals. Sophisticated supervisors are considered overkill because they take up too many resources and are sometimes too complicated. Some examples of sophisticated supervisors are: supervisord, monit, and runit. Proper init systems are smaller than sophisticated supervisors and therefore suitable for containers. Some of the proper container init systems are tini, dumb-init, and s6-overlay.
Use init systems that are purpose-built for containers like tini, dumb-init, s6-overlay.
./cnf-testsuite specialized_init_system
This tests if the PID 1 process of containers handles SIGTERM. Expectation: Sigterm is handled by PID 1 process of containers.
The Linux kernel handles signals differently for the process that has PID 1 than it does for other processes. Signal handlers aren't automatically registered for this process, meaning that signals such as SIGTERM or SIGINT will have no effect by default. By default, one must kill processes by using SIGKILL, preventing any graceful shutdown. Depending on the application, using SIGKILL can result in user-facing errors, interrupted writes (for data stores), or unwanted alerts in a monitoring system.
Make the PID 1 container process to handle SIGTERM; enable process namespace sharing in Kubernetes or use specialized Init system.
./cnf-testsuite sig_term_handled
This tests if the PID 1 process of containers handles/reaps zombie processes. Expectation: Zombie processes are handled/reaped by PID 1 process of containers.
Classic init systems such as systemd are also used to remove (reap) orphaned, zombie processes. Orphaned processes — processes whose parents have died - are reattached to the process that has PID 1, which should reap them when they die. A normal init system does that. But in a container, this responsibility falls on whatever process has PID 1. If that process doesn't properly handle the reaping, you risk running out of memory or some other resources.
Make the PID 1 container process to handle/reap zombie processes; enable process namespace sharing in Kubernetes or use specialized Init system.
./cnf-testsuite zombie_handled
The CNTI Test Catalog checks if state is stored in a custom resource definition or a separate database (e.g. etcd) rather than requiring local storage. It also checks to see if state is resilient to node failure
If infrastructure is immutable, it is easily reproduced, consistent, disposable, will have a repeatable deployment process, and will not have configuration or artifacts that are modifiable in place. This ensures that all configuration is stateless. Any data that is persistent should be managed by K8s statefulsets.
All state: ./cnf-testsuite state
A node is drained and workload resources rescheduled to another node, passing with a liveness and readiness check. This will skip when the cluster only has a single node. Expectation: All workload resources are successfully rescheduled onto other available node(s).
No CNF should fail because of stateful configuration. A CNF should function properly if it is rescheduled on other nodes. This test will remove resources which are running on a target node and reschedule them on the another node.
Ensure that your CNF can be successfully rescheduled when a node fails or is drained
./cnf-testsuite node_drain
This tests if local volumes are being used for the CNF. Expectation: Local storage should not be used or configured.
A CNF should refrain from using the local storage class
Ensure that your CNF isn't using any persistent volumes that use a ["local"] mount point.
./cnf-testsuite no_local_volume_configuration
This checks for elastic persistent volumes in use by the CNF. Expectation: Elastic persistent volumes should be configured for statefulness.
A cnf that uses elastic volumes can be rescheduled to other nodes by the orchestrator easily
Setup and use elastic persistent volumes instead of local storage.
./cnf-testsuite elastic_volume
This checks if elastic volumes and stateful sets are used for MySQL databases. If no MySQL database is found, the test is skipped. Expectation: Elastic volumes and or statefulsets should be used for databases to maintain a minimum resilience level in K8s clusters.
When a traditional database such as mysql is configured to use statefulsets, it allows the database to use a persistent identifier that it maintains across any rescheduling. Persistent Pod identifiers make it easier to match existing volumes to the new Pods that have been rescheduled. https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
Select a database configuration that uses statefulsets and elastic storage volumes.
./cnf-testsuite database_persistence
Cloud Native Definition requires systems to be Resilient to failures inevitable in cloud environments. CNF Resilience should be tested to ensure CNFs are designed to deal with non-carrier-grade shared cloud HW/SW platform
Cloud native systems promote resilience by putting a high priority on testing individual components (chaos testing) as they are running (possibly in production). Reliability in traditional telecommunications is handled differently than in Cloud Native systems. Cloud native systems try to address reliability (MTBF) by having the subcomponents have higher availability through higher serviceability (MTTR) and redundancy. For example, having ten redundant subcomponents where seven components are available and three have failed will produce a top level component that is more reliable (MTBF) than a single component that "never fails" in the cloud native world.
All resilience: ./cnf-testsuite resilience
This experiment causes network degradation without the pod being marked unhealthy/unworthy of traffic by kube-proxy (unless you have a liveness probe of sorts that measures latency and restarts/crashes the container). The idea of this experiment is to simulate issues within your pod network OR microservice communication across services in different availability zones/regions etc. The applications may stall or get corrupted while they wait endlessly for a packet. The experiment limits the impact (blast radius) to only the traffic you want to test by specifying IP addresses or application information. This experiment will help to improve the resilience of your services over time. Expectation: The CNF should continue to function when network latency occurs
Network latency can have a significant impact on the overall performance of the application. Network outages that result from low latency can cause a range of failures for applications and can severely impact user/customers with downtime. This chaos experiment allows you to see the impact of latency traffic on the CNF.
Ensure that your CNF doesn't stall or get into a corrupted state when network degradation occurs. A mitigation stagagy (in this case keep the timeout i.e., access latency low) could be via some middleware that can switch traffic based on some SLOs parameters.
./cnf-testsuite pod_network_latency
This experiment stresses the disk with continuous and heavy IO to cause degradation in the shared disk. This experiment also reduces the amount of scratch space available on a node which can lead to a lack of space for newer containers to get scheduled. This can cause (Kubernetes gives up by applying an "eviction" taint like "disk-pressure") a wholesale movement of all pods to other nodes. Expectation: The CNF should continue to function when disk fill occurs and pods should not be evicted to another node.
Disk Pressure is a scenario we find in Kubernetes applications that can result in the eviction of the application replica and impact its delivery. Such scenarios can still occur despite whatever availability aids K8s provides. These problems are generally referred to as "Noisy Neighbour" problems.
Ensure that your CNF is resilient and doesn't stall when heavy IO causes a degradation in storage resource availability.
./cnf-testsuite disk_fill
This experiment helps to simulate such a scenario with forced/graceful pod failure on specific or random replicas of an application resource and checks the deployment sanity (replica availability & uninterrupted service) and recovery workflow of the application. Expectation: The CNF should continue to function when pod delete occurs
In a distributed system like Kubernetes, application replicas may not be sufficient to manage the traffic (indicated by SLIs) when some replicas are unavailable due to any failure (can be system or application). The application needs to meet the SLO (service level objectives) for this. It's imperative that the application has defenses against this sort of failure to ensure that the application always has a minimum number of available replicas.
Ensure that your CNF is resilient and doesn't fail on a forced/graceful pod failure on specific or random replicas of an application.
./cnf-testsuite pod_delete
The pod-memory hog experiment launches a stress process within the target container - which can cause either the primary process in the container to be resource constrained in cases where the limits are enforced OR eat up available system memory on the node in cases where the limits are not specified. Expectation: The CNF should continue to function when pod memory hog occurs
If the memory policies for a CNF are not set and granular, containers on the node can be killed based on their oom_score and the QoS class a given pod belongs to (best-effort ones are first to be targeted). This eval is extended to all pods running on the node, thereby causing a bigger blast radius.
Ensure that your CNF is resilient to heavy memory usage and can maintain some level of availability.
./cnf-testsuite pod_memory_hog
The pod-io stress experiment the disk with continuous and heavy IO to cause degradation in reads/writes by other microservices that use this shared disk. Expectation: The CNF should continue to function when pod io stress occurs
Stressing the disk with continuous and heavy IO can cause degradation in reads/ writes by other microservices that use this shared disk. Scratch space can be used up on a node which leads to the lack of space for newer containers to get scheduled which causes a movement of all pods to other nodes. This test determines the limits of how a CNF uses its storage device.
Ensure that your CNF is resilient to continuous and heavy disk IO load and can maintain some level of availability
./cnf-testsuite pod_io_stress
The pod-network corruption experiment injects packet corruption on the CNF by starting a traffic control (tc) process with netem rules to add egress packet corruption. Expectation: The CNF should be resilient to a lossy/flaky network and should continue to provide some level of availability.
A higher quality CNF should be resilient to a lossy/flaky network. This test injects packet corruption on the specified CNF's container by starting a traffic control (tc) process with netem rules to add egress packet corruption.
Ensure that your CNF is resilient to a lossy/flaky network and can maintain a level of availability.
./cnf-testsuite pod_network_corruption
The pod-network duplication experiment injects network duplication into the CNF by starting a traffic control (tc) process with netem rules to add egress delays. Expectation: The CNF should continue to function and be resilient to a duplicate network.
A higher quality CNF should be resilient to erroneously duplicated packets. This test injects network duplication on the specified container by starting a traffic control (tc) process with netem rules to add egress delays.
Ensure that your CNF is resilient to erroneously duplicated packets and can maintain a level of availability.
./cnf-testsuite pod_network_duplication
The pod-dns error experiment injects chaos to disrupt DNS resolution in kubernetes pods and causes loss of access to services by blocking DNS resolution of hostnames/domains. Expectation: That the CNF dosen't crash is resilient to DNS resolution failures.
A CNF should be resilient to name resolution (DNS) disruptions within the kubernetes pod. This ensures that at least some application availability will be maintained if DNS resolution fails.
Ensure that your CNF is resilient to DNS resolution failures can maintain a level of availability.
./cnf-testsuite pod_dns_error
This test scans all of the CNFs workload resources and check if a Liveness Probe has been configuered for each container. Expectation: The Helm chart should have a liveness probe configured.
A cloud native principle is that application developers understand their own resilience requirements better than operators:
"No one knows more about what an application needs to run in a healthy state than the developer. For a long time, infrastructure administrators have tried to figure out what “healthy” means for applications they are responsible for running. Without knowledge of what actually makes an application healthy, their attempts to monitor and alert when applications are unhealthy are often fragile and incomplete. To increase the operability of cloud native applications, applications should expose a health check." -- Garrison, Justin; Nova, Kris. Cloud Native Infrastructure: Patterns for Scalable Infrastructure and Applications in a Dynamic Environment. O'Reilly Media. Kindle Edition.
This is exemplified in the Kubernetes best practice of pods declaring how they should be managed through the liveness and readiness entries in the pod's configuration.
Ensure that your CNF has a Liveness Probe configured.
./cnf-testsuite liveness
This test scans all of the CNFs workload resources and check if a Readiness Probe has been configuered for each container. Expectation: The Helm chart should have a readiness probe configured.
A CNF should tell Kubernetes when it is ready to serve traffic.
Ensure that your CNF has a Readiness Probe configured.
./cnf-testsuite readiness
In order to maintain, debug, and have insight into a protected environment, infrastructure elements must have the property of being observable. This means these elements must externalize their internal states in some way that lends itself to metrics, tracing, and logging.
In order to maintain, debug, and have insight into a production environment that is protected (versioned, kept in source control, and changed only by using a deployment pipeline), its infrastructure elements must have the property of being observable. This means these elements must externalize their internal states in some way that lends itself to metrics, tracing, and logging.
All observability: ./cnf-testsuite observability
This checks and verifies that STDOUT/STDERR logging is configured for the CNF. Expectation: Resource output logs should be sent to STDOUT/STDERR
By sending logs to standard out/standard error logs will be treated like event streams as recommended by 12 factor apps principles.
Make sure applications and CNF's are sending log output to STDOUT and or STDERR.
./cnf-testsuite log_output
Tests for the presence of Prometheus and if the CNF configured to sent metrics to the prometheus server. Expectation: The CNF is configured and sending metrics to a Prometheus server.
Recording metrics within a cloud native deployment is important because it gives the maintainer of a cluster of hundreds or thousands of services the ability to pinpoint small anomalies, such as those that will eventually cause a failure.
Install and configure Prometheus for your CNF.
./cnf-testsuite prometheus_traffic
Checks for presence of a Unified Logging Layer and if the CNFs logs are being captured by the Unified Logging Layer. fluentd and fluentbit are currently supported. Expectation: Fluentd or FluentBit is installed and capturing logs for the CNF.
A CNF should have logs managed by a unified logging layer It's considered a best-practice for CNFs to route logs and data through programs like fluentd to analyze and better understand data.
Install and configure fluentd or fluentbit to collect data and logs. See more at fluentd.org for fluentd or fluentbit.io for fluentbit.
./cnf-testsuite routed_logs
Checks if the CNFs metrics are OpenMetrics compliant. Expectation: CNF should emit OpenMetrics compatible traffic.
OpenMetrics is the de facto standard for transmitting cloud native metrics at scale, with support for both text representation and Protocol Buffers and brings it into an Internet Engineering Task Force (IETF) standard. A CNF should expose metrics that are OpenMetrics compatible
Ensure that your CNF is publishing OpenMetrics compatible metrics.
./cnf-testsuite open_metrics
Checks if Jaeger is installed and the CNF is configured to send traces to the Jaeger Server. Expectation: The CNF is sending traces to Jaeger.
A CNF should provide tracing that conforms to the open telemetry tracing specification
Ensure that your CNF is both using & publishing traces to Jaeger.
./cnf-testsuite tracing
CNF containers should be isolated from one another and the host. The CNTI Test Catalog uses tools like OPA Gatekeeper and Armosec Kubescape
"Cloud native security is a [...] mutifaceted topic [...] with multiple, diverse components that need to be secured. The cloud platform, the underlying host operating system, the container runtime, the container orchestrator,and then the applications themselves each require specialist security attention" -- Chris Binne, Rory Mccune. Cloud Native Security. (Wiley, 2021)(pp. xix)
All security: ./cnf-testsuite security
This test checks all of the CNFs containers and looks to see if any of them have access a container runtime socket from the host. Expectation: Container runtime sockets should not be mounted as volumes
Container daemon socket bind mounts allows access to the container engine on the node. This access can be used for privilege escalation and to manage containers outside of Kubernetes, and hence should not be allowed.
Make sure your CNF doesn't mount /var/run/docker.sock
, /var/run/containerd.sock
or /var/run/crio.sock
on any containers.
./cnf-testsuite container_sock_mounts
Checks if any containers are running in privileged mode. Expectation: Containers should not run in privileged mode
"... docs describe Privileged mode as essentially enabling “…access to all devices on the host as well as [having the ability to] set some configuration in AppArmor or SElinux to allow the container nearly all the same access to the host as processes running outside containers on the host.” In other words, you should rarely, if ever, use this switch on your container command line." -- Binnie, Chris; McCune, Rory (2021-06-17T23:58:59). Cloud Native Security . Wiley. Kindle Edition.
Remove privileged capabilities by setting the securityContext.privileged to false. If you must deploy a Pod as privileged, add other restriction to it, such as network policy, Seccomp etc and still remove all unnecessary capabilities.
./cnf-testsuite privileged_containers
Checks if the CNF has services with external IPs configured Expectation: A CNF should not run services with external IPs
Service external IPs can be used for a MITM attack (CVE-2020-8554). Restrict external IPs or limit to a known set of addresses. See: kyverno/kyverno#1367
Make sure to not define external IPs in your kubernetes service configuration
./cnf-testsuite external_ips
Checks if the CNF has escalatory SELinuxOptions configured. Expectation: A CNF should not have any 'seLinuxOptions' configured that allow privilege escalation.
If SELinux options is configured improperly it can be used to escalate privileges and should not be allowed.
Ensure the following guidelines are followed for any cluster resource that allow SELinux options:
- If the SELinux option
type
is set, it should only be one of the allowed values:container_t
,container_init_t
, orcontainer_kvm_t
. - SELinux options
user
orrole
should not be set.
./cnf-testsuite selinux_options
Checks the CNF for usage of non-namespaced sysctls mechanisms that can affect the entire host. Expectation: The CNF should only have "safe" sysctls mechanisms configured, that are isolated from other Pods.
Sysctls can disable security mechanisms or affect all containers on a host, and should be disallowed except for an allowed "safe" subset. A sysctl is considered safe if it is namespaced in the container or the Pod, and it is isolated from other Pods or processes on the same Node. This test ensures that only those "safe" subsets are specified in a Pod.
The spec.securityContext.sysctls field must be unset or not use.
./cnf-testsuite sysctls
Check that the allowPrivilegeEscalation field in the securityContext of each container is set to false. Expectation: Containers should not allow privilege escalation
*When privilege escalation is enabled for a container, it will allow setuid binaries to change the effective user ID, allowing processes to turn on extra capabilities. In order to prevent illegitimate escalation by processes and restrict a processes to a NonRoot user mode, escalation must be disabled.
If your application does not need it, make sure the allowPrivilegeEscalation field of the securityContext is set to false. See more at ARMO-C0016
./cnf-testsuite privilege_escalation
This test checks for vulnerable K8s versions and the actual usage of the subPath feature for all Pods in the CNF. Expectation: No vulnerable K8s version being used in conjunction with the subPath feature.
Due to CVE-2021-25741, subPath or subPathExpr volume mounts can be used to gain unauthorised access to files and directories anywhere on the host filesystem. In order to follow a best-practice security standard and prevent unauthorised data access, there should be no active CVEs affecting either the container or underlying platform.
To mitigate this vulnerability without upgrading kubelet, you can disable the VolumeSubpath feature gate on kubelet and kube-apiserver, or remove any existing Pods using subPath or subPathExpr feature.
./cnf-testsuite symlink_file_system
Checks the CNF for sensitive information in environment variables, by using list of known sensitive key names. Also checks for configmaps with sensitive information. Exepectation: Application credentials should not be found in the CNFs configuration files
Developers store secrets in the Kubernetes configuration files, such as environment variables in the pod configuration. Such behavior is commonly seen in clusters that are monitored by Azure Security Center. Attackers who have access to those configurations, by querying the API server or by accessing those files on the developer’s endpoint, can steal the stored secrets and use them.
Use Kubernetes secrets or Key Management Systems to store credentials.
./cnf-testsuite application_credentials
Checks if there is a host network attached to any of the Pods in the CNF. Expectation: The CNF should not have access to the host systems network.
When a container has the hostNetwork feature turned on, the container has direct access to the underlying hostNetwork. Hackers frequently exploit this feature to facilitate a container breakout and gain access to the underlying host network, data and other integral resources.
Only connect PODs to the hostNetwork when it is necessary. If not, set the hostNetwork field of the pod spec to false, or completely remove it (false is the default). Allow only those PODs that must have access to host network by design.
./cnf-testsuite host_network
heck if the CNF is using service accounts that are automatically mapped. Expectation: The automatic mapping of service account tokens should be disabled.
When a pod gets created and a service account wasn't specified, then the default service account will be used. Service accounts assigned in this way can unintentionally give third-party applications root access to the K8s APIs and other applicaton services. In order to follow a zero-trust / fine-grained security methodology, this functionality will need to be explicitly disabled by using the automountServiceAccountToken: false flag. In addition, if RBAC is not enabled, the SA has unlimited permissions in the cluster.
Disable automatic mounting of service account tokens to PODs either at the service account level or at the individual POD level, by specifying the automountServiceAccountToken: false. Note that POD level takes precedence.
./cnf-testsuite service_account_mapping
Checks each Pod in the CNF for a defined ingress and egress policy. Expectation: Ingress and Egress traffic should be blocked on Pods.
By default, no network policies are applied to Pods or namespaces, resulting in unrestricted ingress and egress traffic within the Pod network. In order to prevent lateral movement or escalation on a compromised cluster, administrators should implement a default policy to deny all ingress and egress traffic. This will ensure that all Pods are isolated by default and further policies could then be used to specifically relax these restrictions on a case-by-case basis.
By default, you should disable or restrict Ingress and Egress traffic on all pods.
./cnf-testsuite ingress_egress_blocked
Checks the CNF for any usage of insecure capabilities using the following deny list Expectation: Containers should not have insecure capabilities enabled.
Giving insecure and unnecessary capabilities for a container can increase the impact of a container compromise.
Remove all insecure capabilities which aren’t necessary for the container.
./cnf-testsuite insecure_capabilities
Checks if the CNF has runAsUser and runAsGroup set to a user id greater than 999. Also checks that the allowPrivilegeEscalation field is set to false for the CNF. Read more at ARMO-C0013 Expectation: Containers should run with non-root user and allowPrivilegeEscalation should be set to false.
Container engines allow containers to run applications as a non-root user with non-root group membership. Typically, this non-default setting is configured when the container image is built. . Alternatively, Kubernetes can load containers into a Pod with SecurityContext:runAsUser specifying a non-zero user. While the runAsUser directive effectively forces non-root execution at deployment, NSA and CISA encourage developers to build container applications to execute as a non-root user. Having non-root execution integrated at build time provides better assurance that applications will function correctly without root privileges.
If your application does not need root privileges, make sure to define the runAsUser and runAsGroup under the PodSecurityContext to use user ID 1000 or higher, do not turn on allowPrivlegeEscalation bit and runAsNonRoot is true.
./cnf-testsuite non_root_containers
Checks if containers are running with hostPID or hostIPC privileges. Read more at ARMO-C0038 Expectation: Containers should not have hostPID and hostIPC privileges
Containers should be isolated from the host machine as much as possible. The hostPID and hostIPC fields in deployment yaml may allow cross-container influence and may expose the host itself to potentially malicious or destructive actions. This control identifies all PODs using hostPID or hostIPC privileges.
Apply least privilege principle and remove hostPID and hostIPC from the yaml configuration privileges unless they are absolutely necessary.
./cnf-testsuite host_pid_ipc_privileges
Check if there are AppArmor, Seccomp, SELinux or Capabilities defined in the securityContext of the CNF's containers and pods. Read more at ARMO-C0055. Expectation: Security services are being used to harden application.
In order to reduce the attack surface, it is recommend, when it is possible, to harden your application using security services such as SELinux®, AppArmor®, and seccomp. Starting from Kubernetes version 1.22, SELinux is enabled by default.
Use AppArmor, Seccomp, SELinux and Linux Capabilities mechanisms to restrict containers abilities to utilize unwanted privileges.
./cnf-testsuite linux_hardening
Check if there is a ‘containers[].resources.limits.cpu’ field defined for all pods in the CNF. Expectation: Containers should have cpu limits defined
Every container should have a limit set for the CPU available for it set for every container or a namespace to prevent resource exhaustion. This test identifies all the Pods without CPU limit definitions by checking their yaml definition file as well as their namespace LimitRange objects. It is also recommended to use ResourceQuota object to restrict overall namespace resources, but this is not verified by this test.
Define LimitRange and ResourceQuota policies to limit CPU usage for namespaces or in the deployment/POD yamls.
./cnf-testsuite cpu_limits
Check if there is a ‘containers[].resources.limits.memory’ field defined for all pods in the CNF. Expectation: Containers should have memory limits defined
Every container should have a limit set for the memory available for it set for every container or a namespace to prevent resource exhaustion. This test identifies all the Pods without memory limit definitions by checking their yaml definition file as well as their namespace LimitRange objects. It is also recommended to use ResourceQuota object to restrict overall namespace resources, but this is not verified by this test.
Define LimitRange and ResourceQuota policies to limit memory usage for namespaces or in the deployment/POD yamls.
./cnf-testsuite memory_limits
Checks whether the readOnlyRootFilesystem field in the SecurityContext is set to true. Read more at ARMO-C0017 Expectation: Containers should use an immutable file system when possible.
Mutable container filesystem can be abused to gain malicious code and data injection into containers. By default, containers are permitted unrestricted execution within their own context. An attacker who has access to a container, can create files and download scripts as they wish, and modify the underlying application running on the container.
Set the filesystem of the container to read-only when possible. If the containers application needs to write into the filesystem, it is possible to mount secondary filesystems for specific directories where application require write access.
./cnf-testsuite immutable_file_systems
Checks the CNF's POD spec for any hostPath volumes, if found it checks the volume for the field mount.readOnly == false (or if it doesn’t exist). Read more at ARMO-C0045 Expectation: Containers should not have hostPath mounts
hostPath mount can be used by attackers to get access to the underlying host and thus break from the container to the host. (See “3: Writable hostPath mount” for details).
Refrain from using a hostPath mount.
./cnf-testsuite hostpath_mounts
Configuration should be managed in a declarative manner, using ConfigMaps, Operators, or other declarative interfaces.
Declarative APIs for an immutable infrastructure are anything that configures the infrastructure element. This declaration can come in the form of a YAML file or a script, as long as the configuration designates the desired outcome, not how to achieve said outcome.
"Because it describes the state of the world, declarative configuration does not have to be executed to be understood. Its impact is concretely declared. Since the effects of declarative configuration can be understood before they are executed, declarative configuration is far less error-prone." -- Hightower, Kelsey; Burns, Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of Infrastructure (Kindle Locations 183-186). Kindle Edition*
All configuration: ./cnf-testsuite configuration_lifecycle
Checks if any of the CNF's resources are deployed in the default namespace. Expectation: Resources should not be deployed in the default namespace.
Namespaces provide a way to segment and isolate cluster resources across multiple applications and users. As a best practice, workloads should be isolated with Namespaces and not use the default namespace.
Ensure that your CNF is configured to use a Namespace and is not using the default namespace.
./cnf-testsuite default_namespace
Checks if the CNF is using a 'latest' tag instead of a semantic version. Expectation: The CNF should use an immutable tag that maps to a symantic version of the application.
You should avoid using the :latest tag when deploying containers in production as it is harder to track which version of the image is running and more difficult to roll back properly.
When specifying container images, always specify a tag and ensure to use an immutable tag that maps to a specific version of an application Pod. Remove any usage of the latest
tag, as it is not guaranteed to be always point to the same version of the image.
./cnf-testsuite latest_tag
Checks if the CNF validates that the label app.kubernetes.io/name
is specified with some value.
Expectation: Checks if pods are using the 'app.kubernetes.io/name' label
Defining and using labels help identify semantic attributes of your application or Deployment. A common set of labels allows tools to work collaboratively, while describing objects in a common manner that all tools can understand. You should use recommended labels to describe applications in a way that can be queried.
Make sure to define app.kubernetes.io/name
label under metadata for your CNF.
./cnf-testsuite require_labels
Checks if the CNF is using a 'latest' tag instead of a semantic version using OPA Gatekeeper. Expectation: The CNF should use an immutable tag that maps to a symantic version of the application.
You should avoid using the :latest tag when deploying containers in production as it is harder to track which version of the image is running and more difficult to roll back properly.
When specifying container images, always specify a tag and ensure to use an immutable tag that maps to a specific version of an application Pod. Remove any usage of the latest
tag, as it is not guaranteed to be always point to the same version of the image.
./cnf-testsuite versioned_tag
Checks the CNF for any associated K8s Services that configured to expose the CNF by using a nodePort. Expectation: The nodePort configuration field is not found in any of the CNF's services.
Using node ports ties the CNF to a specific node and therefore makes the CNF less portable and scalable.
Review all Helm Charts & Kubernetes Manifest files for the CNF and remove all occurrences of the nostPort field in you configuration. Alternatively, configure a service or use another mechanism for exposing your container.
./cnf-testsuite nodeport_not_used
Checks the CNF's workload resources for any containers using the hostPort configuration field to expose the application. Expectation: The hostPort configuration field is not found in any of the defined containers.
Using host ports ties the CNF to a specific node and therefore makes the CNF less portable and scalable.
Review all Helm Charts & Kubernetes Manifest files for the CNF and remove all occurrences of the hostPort field in you configuration. Alternatively, configure a service or use another mechanism for exposing your container.
./cnf-testsuite hostport_not_used
The hardcoded ip address test will scan all of the CNF's workload resources and check for any static, hardcoded ip addresses being used in the configuration. Expectation: That no hardcoded IP addresses or subnet masks are found in the Kubernetes workload resources for the CNF.
Using a hard coded IP in a CNF's configuration designates how (imperative) a CNF should achieve a goal, not what (declarative) goal the CNF should achieve.
Review all Helm Charts & Kubernetes Manifest files of the CNF and look for any hardcoded usage of ip addresses. If any are found, you will need to use an operator or some other method to abstract the IP management out of your configuration in order to pass this test.
./cnf-testsuite hardcoded_ip_addresses_in_k8s_runtime_configuration
The secrets used test will scan all the Kubernetes workload resources to see if K8s secrets are being used. Expectation: The CNF is using K8s secrets for the management of sensitive data.
If a CNF uses kubernetes K8s secrets instead of unencrypted environment variables or configmaps, there is less risk of the Secret (and its data) being exposed during the workflow of creating, viewing, and editing Pods.
Remove any sensitive data stored in configmaps, environment variables and instead utilize K8s Secrets for storing such data. Alternatively, you can use an operator or some other method to abstract hardcoded sensitive data out of your configuration. The whole test passes if any workload resource in the cnf uses a (non-exempt) secret. If no workload resources use a (non-exempt) secret, the test is skipped.
./cnf-testsuite secrets_used
The immutable configmap test will scan the CNF's workload resources and see if immutable configmaps are being used. Expectation: Immutable configmaps are being used for non-mutable data.
For clusters that extensively use ConfigMaps (at least tens of thousands of unique ConfigMap to Pod mounts), preventing changes to their data has the following advantages:
- protects you from accidental (or unwanted) updates that could cause applications outages
- improves performance of your cluster by significantly reducing load on kube-apiserver, by closing watches for ConfigMaps marked as immutable.
Use immutable configmaps for any non-mutable configuration data.
./cnf-testsuite immutable_configmap
This checks if a CNF uses alpha or unstable versions of Kubernetes APIs Expectation: CNF should not use Kubernetes alpha APIs
If a CNF uses alpha or undocumented APIs, the CNF is tightly coupled to an unstable platform
Make sure your CNFs are not utilizing any Kubernetes alpha APIs. You can learn more about Kubernetes API versioning here.
./cnf-testsuite alpha_k8s_apis
A 5g core is an important part of the service provider's telecommuncations offering. A cloud native 5g architecture uses immutable infrastructure, declarative configuration, and microservices when creating and hosting 5g cloud native network functions.
All 5G: ./cnf-testsuite 5g
Checks the pfcp heartbeat between the smf and upf to make sure it remains close to baseline. Expectation: 5g core should continue to function during various CNF tests.
A 5g core's SMF and UPF CNFs have a hearbeat, implemented use the PFCP protocol standard, which measures if the connection between the two CNFs is active. After measure a baseline of the heartbeat a comparison between the baseline and the performance of the heartbeat while running test functions will expose the cloud native resilience of the cloud native 5g core.
./cnf-testsuite smf_upf_core_validator
Checks to see if the 5g core supports suci concealment. Expectation: 5g core should use suci concealment.
In order to protect identifying information from being sent over the network as clear text, 5g cloud native cores should implement SUPI and SUCI concealment
./cnf-testsuite suci_enabled
All RAN: ./cnf-testsuite ran
A cloud native radio access network's (RAN) cloud native functions should use immutable infrastructure, declarative configuration, and microservices. ORAN cloud native functions should adhere to cloud native principles while also complying with the ORAN alliance's standards.
Checks if a RIC uses a oran compatible e2 connection. Expectation: An ORAN RIC should use an e2 connection.
*A near real-time RAN intelligent controler (RIC) uses the E2 standard as an open, interoperable, interface to connect to RAN-optimizated applications, onboarded as xApps. The xApps use platform services available in the near-RT RIC to communicate with the downstream network functions through the E2 interface.
./cnf-testsuite oran_e2_connection
All platform: ./cnf-testsuite platform
All platform hardware and scheduling: ./cnf-testsuite platform:hardware_and_scheduling
All platform resilience: ./cnf-testsuite platform:resilience poc
All platform security: ./cnf-testsuite platform:security
Check if your platform passes the K8s conformance test. See https://github.com/cncf/k8s-conformance for details on what is tested. Expectation: The K8s cluster passes the K8s conformance tests
A Vendor's Kubernetes Platform should pass Kubernetes Conformance. This ensures that the platform offering meets the same required APIs, features & interoperability expectations as in open source community versions of K8s. Applications that can operate on a Certified Kubernetes should be cross-compatible with any other Certified Kubernetes platform.
Check that Sonobuoy can be successfully run and passes without failure on your platform. Any failures found by Sonobuoy will provide debug and remediation steps required to get your K8s cluster into a conformant state.
./cnf-testsuite k8s_conformance
Checks the platforms Kubernetes Nodes to see if they were instansiated by ClusterAPI. Expectation: The cluster has Cluster API enabled which manages at least one Node.
A Kubernetes Platform should leverage Cluster API to ensure that best-practices are followed for both bootstrapping & cluster lifecycle management. Kubernetes is a complex system that relies on several components being configured correctly, maintaining an in-house lifecycle management system for kubernetes is unlikey to meet best practice guideline unless significant resources are deticated to it.
Enable ClusterAPI and start using it to manage the provisioning and lifecycle of your Kubernetes clusters.
./cnf-testsuite clusterapi_enabled
Inspects all worker nodes and checks if the run-time being used for scheduling is OCI compliant. Expectation: All worker nodes are using an OCI compliant run-time.
The OCI Initiative was created to ensure that runtimes conform to both the runtime-spec and image-spec. These two specifications outline how a “filesystem bundle” is unpacked on disk and that the image itself contains sufficient information to launch the application on the target platform. As a best practice, your platform must use an OCI compliant runtime, this ensures that the runtime used is cross-compatible and supports interoperability with other runtimes. This means that workloads can be freely moved to other runtimes and prevents vendor lock in.
Check if your Kuberentes Platform is using an OCI Compliant Runtime. If you platform is not using an OCI Compliant Runtime, you'll need to switch to a new runtime that is OCI Compliant in order to pass this test.
./cnf-testsuite platform:oci_compliant
WARNING: this is a destructive test and will reboot your host node! Do not run this unless you have completely separate cluster, e.g. development or test cluster.
Run node failure test which forces a reboot of the Node ("host system"). The Pods on that node should be rescheduled to a new Node. Expectation: Pods should reschedule after a node failure.
Cloud native systems should be self-healing. To follow cloud-native best practices your platform should be resiliant and reschedule all workloads when such node failures occur.
Reboot a worker node in your Kubernetes cluster verify that the node can recover and re-join the cluster in a schedulable state. Workloads should also be rescheduled to the node once it's back online.
./cnf-testsuite platform:worker_reboot_recovery poc destructive
Check which subjects have cluster-admin RBAC permissions – either by being bound to the cluster-admin clusterrole, or by having equivalent high privileges. Expectation: The cluster admin role should not be bound to a Pod
Role-based access control (RBAC) is a key security feature in Kubernetes. RBAC can restrict the allowed actions of the various identities in the cluster. Cluster-admin is a built-in high privileged role in Kubernetes. Attackers who have permissions to create bindings and cluster-bindings in the cluster can create a binding to the cluster-admin ClusterRole or to other high privileges roles. As a best practice, a principle of least privilege should be followed and cluster-admin privilege should only be used on an as-needed basis.
You should apply least privilege principle. Make sure cluster admin permissions are granted only when it is absolutely necessary. Don't use subjects with high privileged permissions for daily operations.
./cnf-testsuite platform:cluster_admin
Checks if the insecure-port flag is set for the K8s API Server. Expectation: That the the k8s control plane is secure and not hosted on an insecure port
The control plane is the core of Kubernetes and gives users the ability to view containers, schedule new Pods, read Secrets, and execute commands in the cluster. Therefore, it should be protected. It is recommended to avoid control plane exposure to the Internet or to an untrusted network and require TLS encryption.
Set the insecure-port flag of the API server to zero. See more at ARMO-C0005
./cnf-testsuite platform:control_plane_hardening
Checks if a Helm v2 / Tiller image is deployed and used on the platform. Expectation: The platform should be using Helm v3+ without Tiller.
Tiller, found in Helm v2, has known security challenges. It requires administrative privileges and acts as a shared resource accessible to any authenticated user. Tiller can lead to privilege escalation as restricted users can impact other users. It is recommend to use Helm v3+ which does not contain Tiller for these reasons
Switch to using Helm v3+ and make sure not to pull any images with name tiller in them
./cnf-testsuite platform:helm_tiller