Pachyderm is a language-agnostic and cloud infrastructure-agnostic large-scale data processing framework based on software containers. This chart can be used to deploy Pachyderm backed by object stores of different Cloud providers.
- Dynamic provisioning of PVs (for non-local deployments)
The following table lists the configurable parameters of pachd
and their default values:
Parameter | Description | Default |
rbac.create |
Enable RBAC | true |
pachd.exposeObjApi |
Expose S3 API | false |
pachd.image.repository |
Container image name | pachyderm/pachd |
pachd.pfsCache |
File System cache size | 0G |
*.image.tag |
Container image tag | <latest version> |
*.image.pullPolicy |
Image pull policy | Always |
*.worker.repository |
Worker image name | pachyderm/worker |
*.worker.tag |
Worker image tag | <latest version> |
*.replicaCount |
Number of pachds | 1 |
*.resources.requests |
Memory and cpu request | {512M,250m} |
*.resources.limits |
Memory and cpu limit | nil |
*.service.grpc.annotations |
GRPC service additional annotations | {} |
* |
GRPC service pord | 30650 |
*.service.grpc.type |
GRPC service type | NodePort |
Next table lists the configurable parameters of etcd
and their default values:
Parameter | Description | Default |
etcd.image.repository |
Container image name | |
*.image.tag |
Container image tag | <latest version> |
*.image.pullPolicy |
Image pull policy | IfNotPresent |
*.resources.requests |
Memory and cpu request | {250M,250m} |
*.resources.limits |
Memory and cpu limit | nil |
*.persistence.enabled |
Enable persistence | false |
*.persistence.size |
Storage request | 20G |
*.persistence.accessMode |
Access mode for PV | ReadWriteOnce |
*.persistence.storageClass |
PVC storage class | nil |
Optional kubernetes pod scheduling parameters
Parameter | Description | Default |
nodeSelector |
Specify node selector | {} |
tolerations |
Specify a toleration to podSpec | {} |
Example of values:
group: pachyderm
- key: "group"
operator: "Equal"
value: "pachyderm"
In order to set which object store credentials you want to use, please set the flag credentials
with one of the following values: local | s3 | google | amazon | microsoft
Parameter | Description | Default |
credentials |
Backend credentials | "" |
Based on the storage credentials used, fill in the corresponding parameters for your object store. Note that The local
installation will deploy Pachyderm on your local Kubernetes cluster (i.e: minikube) backed by your local storage unit.
- On
Amazon Web Services
, please set the next values:
Parameter | Description | Default |
amazon.bucketName |
Amazon bucket name | "" |
amazon.distribution |
Amazon distribution | "" | |
Amazon id | "" |
amazon.region |
Amazon region | "" |
amazon.roleArn |
Amazon role arn | "" |
amazon.secret |
Amazon secret | "" |
amazon.token |
Amazon token | "" |
We strongly suggest that the installation of Pachyderm should be performed in its own namespace. Note that you should have RBAC enabled in your cluster to make the installation work with the default settings. The default installation will deploy Pachyderm on your local Kubernetes cluster:
$ helm install --namespace pachyderm --name my-release stable/pachyderm
You should install the chart specifying each parameter using the --set key=value[,key=value]
argument to helm install. Please consult the values.yaml
file for more information regarding the parameters. For example:
$ helm install --namespace pachyderm --name my-release \
--set credentials=s3,s3.accessKey=myaccesskey,s3.secretKey=mysecretkey,s3.bucketName=default_bucket,s3.endpoint=domain.subdomain:8080,etcd.persistence.enabled=true,etcd.persistence.accessMode=ReadWriteMany \
Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart:
$ helm install --namespace pachyderm --name my-release -f values.yaml stable/pachyderm
To specify a pachyderm version run the following command:
$ helm install --namespace pachyderm --name my-release \
--set pachd.image.tag=1.8.6,pachd.worker.tag=1.8.6 \
In order to use Pachyderm, please login through ssh to the master node and install the Pachyderm client:
$ curl -o /tmp/pachctl.deb -L && sudo dpkg -i /tmp/pachctl.deb
Please note that the client version should correspond with the pachd service version. For more information please consult the official documentation . Also, if you have your kubernetes client properly configured to talk with your remote cluster, you can simply install pachctl
on your local machine and execute: pachctl --namespace <namespace> port-forward &
In order to remove the Pachyderm release, you can execute the following commands:
$ helm list
$ helm delete --purge <release-name>