-
Notifications
You must be signed in to change notification settings - Fork 34
Mxnet operator v1 API #36
Comments
If this is reasonable, I can help do this. |
@wackxu Thanks for your summary. It makes sense to trace the changes of tf-operator/pytorch-operator. Just one question, we have already merged the third item "Use pod group instead of PDB for gang scheduling" to v1beta1, does it introduce the issue of compatibility? |
It may affect but I think it does not block the development of v1beta2. Actually, We are working on v1 in tfjob now. Maybe we could implement v1 directly in mxnet-operator. |
@suleisl2000 It should have no effect. For old mxjob that use pdb, new controller will also create the podgroup for the mxjob and delete the podgroup when mxjob is deleted. for pdb that was created before by the controller, when mxjob is deleting, the k8s garbagecollector will delete the pdb and finally everything about the mxjob is deleted. |
Agree with @gaocegege Since the changes in the list has been added to tf-operator for a while and has been tested enough and we can implement v1 directly in mxnet-operator. @suleisl2000 WDYT |
@wackxu It is ok to me to work on v1 directly. |
There are couple of minor api changes that are suggested. We can incorporate all these changes in the next API version.
Related: kubeflow/trainer#935
@suleisl2000 @gaocegege
The text was updated successfully, but these errors were encountered: