-
Code in the
bootstrap
dir is used to bootstrap the cluster. -
Code in the
apps
dir is sync'd to the cluster after the cluster has been bootstrapped.
The steps below are run after the cluster is created with Talos to start the flux-focused GitOps workflow. One the steps below are run, all the K8s cluster components and apps should install onto the cluster.
In this directory, there are two secrets that must be applied to the cluster for flux to function properly:
age.secret.sops.yaml
: The age secret that Flux will use to decrypt secrets checked into the codebase.github.secret.sops.yaml
: The Github SSH keys and access token necessary for Flux to access this repository on github.com.
These secrets can be decrypted by either an age key (defined in the top-level .sops.yaml
file) OR a KMS key (ARN also defined in the top-level .sops.yaml
file). Age is the primary key used to decrypt secrets by Flux at deploy time. KMS key can be used a backup to decrypt and recover the bootstrap secrets if needed.
To deploy these secret during initial bootstrapping:
sops --decrypt kubernetes/homelab/bootstrap/age.bootstrap.sops.yaml | kubectl apply --server-side --filename -
sops --decrypt kubernetes/homelab/bootstrap/github.bootstrap.sops.yaml | kubectl apply --server-side --filename -
Most of the Kubernetes components are added via Flux defined in the kubernetes directory. For the remaining components that are installed during cluster instantiation, the instructions are defined below.
There are a few components that need to installed manually before the cluster can start updating itself.
After the initial Talos cluster creation (with the CNI set to none), the cluster will be waiting for a CNI to be installed (docs).
To start, install Flux itself. Flux is responsible for installing the rest of the cluster's apps and services.
To do this, use helmfile
:
helmfile --file kubernetes/homelab/bootstrap/helmfile.yaml apply --skip-diff-on-install --suppress-diff
Going to have 3 kinds of storage for my k8s clusters:
-
Storage that I'd like to persist across pod restarts, but it's really not a big deal if I lose this data. Ex: prometheus data. This data is usually specific to k8s and doesn't have a particular need to persist outside of Kubernetes. Local node data is fine here, replication isn't needed.
-
Storage that I'd like to be able to create on the fly (i.e. not pre-existing folders). This is important data and would like to maintain good backups of it, but the total size is relatively small. This data won't be 100% mission critical, so I'm comfortable delegating to the k8s control planes. For this type of storage, there is an excellent guide here, which I'll attempt to use. This 2nd type of storage is one where the value of TrueNAS is up in the air. What if instead... I just used rook/ceph for these use-cases?
-
For that media storage that is pre-created (i.e. my existing media) and is both HUGE and CRITICAL. This data is 100% critical to the homelab and CANNOT be lost. As such, the k8s control plane can't be trusted with this data and instead it will be managed by TrueNAS (i.e. software and configuration that I don't maintain) and mounted to pods via NFS PVs. For this type of storage, I'll try to use the
node-manual
CSI driver (example here)
I might be able to use democratic-csi
for all 3 of these, using these 3 drivers, respectively: democratic-csi/local-hostpath
, democratic-csi/freenas-api-nfs
& democratic-csi/freenas-api-iscsi
, and democratic-csi/node-manual
Now that I'm further along, I think I have an idea for how I'm wanna do storage in the future:
- Run Ceph/Rook directly in k8s to replace option 1 & 2 from above. Can expose iSCSI or NFS mounts when needed. Then run volsync to backup these drives to TrueNAS.
- When using the
/media
from the NAS, just do a simple NFS mount PVC in k8s (nothing fancy).
I use sops
to manage secrets in a GitOps way. There's a good overview of sops here.
To properly ensure secrets are GitOps-ified and still kept secret across the wide array of apps in this repo, there are numerous methods in which an app can be supplied secrets. Here’s a breakdown of some common methods using the tools in this repo: Flux and SOPS.
This guide will not be covering how to integrate SOPS into Flux initially (i.e. bootstrapping SOPS with Flux during initial setup). For that be sure to check out the Flux documentation on integrating SOPS
For the first three examples, the following secret will be used:.
apiVersion: v1
kind: Secret
metadata:
name: application-secret
namespace: default
stringData:
SUPER_SECRET_KEY: "SUPER SECRET VALUE"
Use
envFrom
in a deployment or a Helm chart that supports the setting, this will pass all secret items from the secret into the containers environment.
envFrom:
- secretRef:
name: application-secret
View example Helm Release and corresponding Secret.
Similar to the above but it's possible with
env
to pick an item from a secret.
env:
- name: WAY_COOLER_ENV_VARIABLE
valueFrom:
secretKeyRef:
name: application-secret
key: SUPER_SECRET_KEY
View example Helm Release and corresponding Secret.
The Flux HelmRelease option
valuesFrom
can inject a secret item into the Helm values of aHelmRelease
- Does not work with merging array values
- Care needed with keys that contain dot notation in the name
valuesFrom:
- targetPath: config."admin\.password"
kind: Secret
name: application-secret
valuesKey: SUPER_SECRET_KEY
View example Helm Release and corresponding Secret.
Flux variable substitution can inject secrets into any YAML manifest. This requires the Flux Kustomization configured to enable variable substitution. Correctly configured this allows you to use
${GLOBAL_SUPER_SECRET_KEY}
in any YAML manifest.
apiVersion: v1
kind: Secret
metadata:
name: cluster-secrets
namespace: flux-system
stringData:
GLOBAL_SUPER_SECRET_KEY: "GLOBAL SUPER SECRET VALUE"
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
# ...
spec:
# ...
decryption:
provider: sops
secretRef:
name: sops-age
postBuild:
substituteFrom:
- kind: Secret
name: cluster-secrets
View example Fluxtomization, Helm Release, and corresponding Secret.
-
TODO: For the first three methods consider using a tool like stakater/reloader to restart the pod when the secret changes. Using reloader on a pod using a secret provided by Flux Variable Substitution will lead to pods being restarted during any change to the secret while related to the pod or not.
-
The last method should be used when all other methods are not an option, or used when you have a “global” secret used by numerous HelmReleases across the cluster.
When managing dependencies between HelmReleases and Flux Kustomizations (i.e. KS), there are some import configuration flags that could have a large impact on developer experience: wait
and dependsOn
. As a quick overview: there are two bits of configuration that are relevant here:
wait: true
only marks the Kustomization as successful if all the resources it creates are healthy
wait: false
just does a kubectl apply -k and then says 'all good, chief'
dependsOn
tells either the KS or the HelmRelease to confirm the health of another KS or HelmRelease before trying to apply. The health of the KS/HelmRelease could depend on HealthChecks or wait
There are two camps here, mostly: You can either handle the dependencies via dependsOn at the KS level, or at the HelmRelease level. There are pros and cons to each:
If you do it at the KS level, you'll run into situations where a KS fails to apply but then you have to wait for it to timeout before it notices you pushed a change and applies that instead, so it's a bit more clunky
Doing it at the HR level is a bit nicer in terms of developer experience, but it has limitations. For example, if your KS applies manifests that are not helm releases, then you can't really do depends on at the HR level, so you'll have to mix and match.
As a rule of thumb, if your KS only applies a HelmRelease (and associated configmaps, secrets etc), then you can set wait to false in the KS and implement your depends on at the HR level.
If you need to apply other things that depend on a HR, think applying your cert-manager cluster issuers as raw manifests, but they depend on the cert-manager HR, then you must do it at the KS level
Thanks to mirceanton
for the overview in the Home Operations discord server.
In the future, I might choose to go down a more "hyperconverged" route and manage storage directly from k8s (instead of having TrueNAS handle most of this). In that case, I'd need to migrate the StorageClass
of most of my pods, which would be a big lift. To do that, there is a great article here.
For this hyperconverged route, I might consider using Harvester, which is a more cloud-native hypervisor and VM-management solution.