Ollama, get up and running with large language models, locally.
This Community Chart is for deploying Ollama.
-
Kubernetes:
>= 1.16.0-0
for CPU only -
Kubernetes:
>= 1.26.0-0
for GPU stable support (NVIDIA and AMD)
Not all GPUs are currently supported with ollama (especially with AMD)
To install the ollama
chart in the ollama
namespace:
helm repo add ollama-helm https://otwld.github.io/ollama-helm/
helm repo update
helm install ollama ollama-helm/ollama --namespace ollama
First please read the release notes of Ollama to make sure there are no backwards incompatible changes.
Make adjustments to your values as needed, then run helm upgrade
:
# -- This pulls the latest version of the ollama chart from the repo.
helm repo update
helm upgrade ollama ollama-helm/ollama --namespace ollama --values values.yaml
To uninstall/delete the ollama
deployment in the ollama
namespace:
helm delete ollama --namespace ollama
Substitute your values if they differ from the examples. See helm delete --help
for a full reference on delete
parameters and flags.
- Ollama documentation can be found HERE
- Interact with RESTful API: Ollama API
- Interact with official clients libraries: ollama-js and ollama-python
- Interact with langchain: langchain-js and langchain-python
- It's highly recommended to run an updated version of Kubernetes for deploying ollama with GPU
ollama:
gpu:
# -- Enable GPU integration
enabled: true
# -- GPU type: 'nvidia' or 'amd'
type: 'nvidia'
# -- Specify the number of GPU to 1
number: 1
# -- List of models to pull at container startup
models:
pull:
- mistral
- llama2
ollama:
models:
pull:
- llama2
ingress:
enabled: true
hosts:
- host: ollama.domain.lan
paths:
- path: /
pathType: Prefix
- API is now reachable at
ollama.domain.lan
The version 1.X.X introduces the ability to load models in memory at startup, the values have been changed.
Please change ollama.models
to ollama.models.pull
to avoid errors before upgrading:
ollama:
models:
- mistral
- llama2
To:
ollama:
models:
pull:
- mistral
- llama2
- See values.yaml to see the Chart's default values.
Key | Type | Default | Description |
---|---|---|---|
affinity | object | {} |
Affinity for pod assignment |
autoscaling.enabled | bool | false |
Enable autoscaling |
autoscaling.maxReplicas | int | 100 |
Number of maximum replicas |
autoscaling.minReplicas | int | 1 |
Number of minimum replicas |
autoscaling.targetCPUUtilizationPercentage | int | 80 |
CPU usage to target replica |
extraArgs | list | [] |
Additional arguments on the output Deployment definition. |
extraEnv | list | [] |
Additional environments variables on the output Deployment definition. For extra OLLAMA env, please refer to https://github.com/ollama/ollama/blob/main/envconfig/config.go |
extraEnvFrom | list | [] |
Additionl environment variables from external sources (like ConfigMap) |
fullnameOverride | string | "" |
String to fully override template |
hostIPC | bool | false |
Use the host’s ipc namespace. |
hostNetwork | bool | false |
Use the host's network namespace. |
hostPID | bool | false |
Use the host’s pid namespace |
image.pullPolicy | string | "IfNotPresent" |
Docker pull policy |
image.repository | string | "ollama/ollama" |
Docker image registry |
image.tag | string | "" |
Docker image tag, overrides the image tag whose default is the chart appVersion. |
imagePullSecrets | list | [] |
Docker registry secret names as an array |
ingress.annotations | object | {} |
Additional annotations for the Ingress resource. |
ingress.className | string | "" |
IngressClass that will be used to implement the Ingress (Kubernetes 1.18+) |
ingress.enabled | bool | false |
Enable ingress controller resource |
ingress.hosts[0].host | string | "ollama.local" |
|
ingress.hosts[0].paths[0].path | string | "/" |
|
ingress.hosts[0].paths[0].pathType | string | "Prefix" |
|
ingress.tls | list | [] |
The tls configuration for hostnames to be covered with this ingress record. |
initContainers | list | [] |
Init containers to add to the pod |
knative.containerConcurrency | int | 0 |
Knative service container concurrency |
knative.enabled | bool | false |
Enable Knative integration |
knative.idleTimeoutSeconds | int | 300 |
Knative service idle timeout seconds |
knative.responseStartTimeoutSeconds | int | 300 |
Knative service response start timeout seconds |
knative.timeoutSeconds | int | 300 |
Knative service timeout seconds |
lifecycle | object | {} |
Lifecycle for pod assignment (override ollama.models startup pulling) |
livenessProbe.enabled | bool | true |
Enable livenessProbe |
livenessProbe.failureThreshold | int | 6 |
Failure threshold for livenessProbe |
livenessProbe.initialDelaySeconds | int | 60 |
Initial delay seconds for livenessProbe |
livenessProbe.path | string | "/" |
Request path for livenessProbe |
livenessProbe.periodSeconds | int | 10 |
Period seconds for livenessProbe |
livenessProbe.successThreshold | int | 1 |
Success threshold for livenessProbe |
livenessProbe.timeoutSeconds | int | 5 |
Timeout seconds for livenessProbe |
nameOverride | string | "" |
String to partially override template (will maintain the release name) |
namespaceOverride | string | "" |
String to override the namespace |
nodeSelector | object | {} |
Node labels for pod assignment. |
ollama.gpu.enabled | bool | false |
Enable GPU integration |
ollama.gpu.mig.devices | object | {} |
Specify the mig devices and the corresponding number |
ollama.gpu.mig.enabled | bool | false |
Enable multiple mig devices If enabled you will have to specify the mig devices If enabled is set to false this section is ignored |
ollama.gpu.number | int | 1 |
Specify the number of GPU If you use MIG section below then this parameter is ignored |
ollama.gpu.nvidiaResource | string | "nvidia.com/gpu" |
only for nvidia cards; change to (example) 'nvidia.com/mig-1g.10gb' to use MIG slice |
ollama.gpu.type | string | "nvidia" |
GPU type: 'nvidia' or 'amd' If 'ollama.gpu.enabled', default value is nvidia If set to 'amd', this will add 'rocm' suffix to image tag if 'image.tag' is not override This is due cause AMD and CPU/CUDA are different images |
ollama.insecure | bool | false |
Add insecure flag for pulling at container startup |
ollama.models.pull | list | [] |
List of models to pull at container startup The more you add, the longer the container will take to start if models are not present pull: - llama2 - mistral |
ollama.models.run | list | [] |
List of models to load in memory at container startup run: - llama2 - mistral |
ollama.mountPath | string | "" |
Override ollama-data volume mount path, default: "/root/.ollama" |
persistentVolume.accessModes | list | ["ReadWriteOnce"] |
Ollama server data Persistent Volume access modes Must match those of existing PV or dynamic provisioner Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ |
persistentVolume.annotations | object | {} |
Ollama server data Persistent Volume annotations |
persistentVolume.enabled | bool | false |
Enable persistence using PVC |
persistentVolume.existingClaim | string | "" |
If you'd like to bring your own PVC for persisting Ollama state, pass the name of the created + ready PVC here. If set, this Chart will not create the default PVC. Requires server.persistentVolume.enabled: true |
persistentVolume.size | string | "30Gi" |
Ollama server data Persistent Volume size |
persistentVolume.storageClass | string | "" |
Ollama server data Persistent Volume Storage Class If defined, storageClassName: If set to "-", storageClassName: "", which disables dynamic provisioning If undefined (the default) or set to null, no storageClassName spec is set, choosing the default provisioner. (gp2 on AWS, standard on GKE, AWS & OpenStack) |
persistentVolume.subPath | string | "" |
Subdirectory of Ollama server data Persistent Volume to mount Useful if the volume's root directory is not empty |
persistentVolume.volumeMode | string | "" |
Ollama server data Persistent Volume Binding Mode If defined, volumeMode: If empty (the default) or set to null, no volumeBindingMode spec is set, choosing the default mode. |
persistentVolume.volumeName | string | "" |
Pre-existing PV to attach this claim to Useful if a CSI auto-provisions a PV for you and you want to always reference the PV moving forward |
podAnnotations | object | {} |
Map of annotations to add to the pods |
podLabels | object | {} |
Map of labels to add to the pods |
podSecurityContext | object | {} |
Pod Security Context |
readinessProbe.enabled | bool | true |
Enable readinessProbe |
readinessProbe.failureThreshold | int | 6 |
Failure threshold for readinessProbe |
readinessProbe.initialDelaySeconds | int | 30 |
Initial delay seconds for readinessProbe |
readinessProbe.path | string | "/" |
Request path for readinessProbe |
readinessProbe.periodSeconds | int | 5 |
Period seconds for readinessProbe |
readinessProbe.successThreshold | int | 1 |
Success threshold for readinessProbe |
readinessProbe.timeoutSeconds | int | 3 |
Timeout seconds for readinessProbe |
replicaCount | int | 1 |
Number of replicas |
resources.limits | object | {} |
Pod limit |
resources.requests | object | {} |
Pod requests |
runtimeClassName | string | "" |
Specify runtime class |
securityContext | object | {} |
Container Security Context |
service.annotations | object | {} |
Annotations to add to the service |
service.loadBalancerIP | string | nil |
Load Balancer IP address |
service.nodePort | int | 31434 |
Service node port when service type is 'NodePort' |
service.port | int | 11434 |
Service port |
service.type | string | "ClusterIP" |
Service type |
serviceAccount.annotations | object | {} |
Annotations to add to the service account |
serviceAccount.automount | bool | true |
Automatically mount a ServiceAccount's API credentials? |
serviceAccount.create | bool | true |
Specifies whether a service account should be created |
serviceAccount.name | string | "" |
The name of the service account to use. If not set and create is true, a name is generated using the fullname template |
tolerations | list | [] |
Tolerations for pod assignment |
topologySpreadConstraints | object | {} |
Topology Spread Constraints for pod assignment |
updateStrategy.type | string | "Recreate" |
Deployment strategy can be "Recreate" or "RollingUpdate". Default is Recreate |
volumeMounts | list | [] |
Additional volumeMounts on the output Deployment definition. |
volumes | list | [] |
Additional volumes on the output Deployment definition. |
Jean Baptiste Detroyes |
Nathan Tréhout |
- For questions, suggestions, and discussion about Ollama please refer to the Ollama issue page
- For questions, suggestions, and discussion about this chart please visit Ollama-Helm issue page or join our OTWLD Discord