A very common use of mops
is to run Python functions on Kubernetes. mops
can’t provide the cluster for you,
but if you have access to one that’s already up and running, it should be easy to start using.
You must add mops to your project with the k8s
extras, or your imports of the k8s
module will fail.
-
pip install thds.mops[k8s]
This approach is preferred for cases where the code to be run on Kubernetes can be considered a pure function and is implemented as or wrapped by Python.
If you’re trying to wrap a Docker image that you can’t or don’t want to install
mops
on, you’ll need to use the low-level imperative API instead.
You will need to construct a k8s_shell
with configuration appropriate to the Python function that will
ultimately be invoked remotely. This includes specifying the name of your Docker container as well as
providing resource requests or limitations. For example, you might create a module called k8s_shim.py
:
from thds import mops
df_k8s_spot_11g_shim = mops.k8s.shim(
mops.k8s.autocr('ds/demand-forecast:latest'),
# gotta specify which container this runs on
node_narrowing=dict(
resource_requests={"cpu": "3.6", "memory": "11G"},
tolerations=[mops.k8s.tolerates_spot()],
),
)
mops
does not distribute your code for you, and neither does the provided k8s shim. You must
perform your own code distribution by building and pushing a Docker image to an accessible container
registry, and configuring the K8s runtime shim so that it knows what pre-published image to use.
This will require your Docker image to exist and to be accessible to Kubernetes via the configured Container Registry. Additionally, that image must somehow be provided with read and write access to your blob store, since all data will be transferred back and forth from your local environment via your blob store.
Once you have constructed an appropriate runtime shim, you need only pass it to the
pure.magic()
decorator factory, and apply that decorator to any pure function to the
remote K8s environment, and inject its result right back into the local/orchestrator
context.
Declarative usage is recommended because it confers additional capabilities such as memoization (and therefore a form of fault-tolerance) but if you have a non-Python image you want to launch, see the docs on the low-level imperative interface here.
The basic container specification will be the same regardless of which approach is used.
For Kubernetes only, mops
will automatically try to scrape and print the logs from each launched pod.
ANSI terminal Colors will be automatically chosen for each separate pod being scraped.
If you are running many pods in parallel, it may be desirable to turn off logging, as it has been observed to leak memory over the course of long runs, and a separate OS thread is used for each pod being scraped.
Either set the MOPS_NO_K8S_LOGS
environment variable, or programmatically pass suppress_logs=True
to
either k8s_shell
(declarative API) or launch
(imperative API).
Kubernetes will often kill pods that use more than their requested memory, but what makes this sometimes tricky to diagnose is that Kubernetes will not kill pods because they used more memory than requested - pod death will only occur when other pods scheduled on the same node also end up using more memory than they requested… and then the node itself runs out of RAM, and K8s has to choose a pod to kill.
If you are CPU bound, you may as well request roughly 4x more RAM than CPU, since CPU-optimized nodes generally have 4GB of RAM for each CPU. If you are strictly memory bound, you will simply need to request enough memory for the worst case scenario for each of your Jobs.
If you have a hybrid workload (different types of Jobs running concurrently) and are willing to occasionally let partially-complete pods die from OOM, you can set your memory requests lower than the max pressure – may the odds be ever in your favor ;)
We’re working to make this easier to identify when using mops
, but on a basic level, the Kubernetes API
does not ever reject a request to launch a Job or Pod when provided with a Docker image tag that it can’t
find. It will happily accept your request, and then there will just be infinite ImageBackOff errors that
can be found by querying the API (or using a console like k9s). This is fundamentally because K8s wants
to make allowance for the possibility that your container registry might go down for periods of time,
which shouldn’t prevent you from making a request for something to work once the registry comes back up.
In any case, if you’re running a new process, or have recently changed something about your Docker image,
keep an eye on the API or console for ImageBackOff errors that might be an indication that you’ve asked
mops
to create a Job with an image that simply does not exist.
Alternatively, you can take a look at the machinery we have made available that may help you detect this.
If you’re running a big or long-running process with remote workers in K8s, because of common network limitations, you may wish to run from an orchestrator pod also hosted in Kubernetes.
See here.
See the source README for more specifics on things you can do with
Kubernetes via mops
.