Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[SDK] Fix env per Trial parameter in tune API #2304

Merged
merged 1 commit into from
Apr 11, 2024

Conversation

andreyvelich
Copy link
Member

I fixed the env_per_trial parameter assignment.
Currently, if user sets dict in env_per_trial we override it to None in this condition.

/assign @kubeflow/wg-training-leads @shipengcheng1230 @droctothorpe

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@shipengcheng1230
Copy link
Contributor

Nice catch, TY!

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [andreyvelich,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 086093f into kubeflow:master Apr 11, 2024
61 checks passed
@andreyvelich andreyvelich deleted the sdk-fix-env-per-trial branch April 11, 2024 13:14
@quloos
Copy link

quloos commented May 17, 2024

Wait. 0.17rc0 is not working for my notebook in deploykf.

(base) jovyan@newnew-0:~/katib$ python3 ~/katib/katib_example.py 
Traceback (most recent call last):
  File "/home/jovyan/katib/katib-c4c3eb52437a7e5bf119e77c6fcf6455cec2c927/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py", line 111, in create_experiment
    outputs = self.custom_api.create_namespaced_custom_object(
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 231, in create_namespaced_custom_object
    return self.create_namespaced_custom_object_with_http_info(group, version, namespace, plural, body, **kwargs)  # noqa: E501
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 354, in create_namespaced_custom_object_with_http_info
    return self.api_client.call_api(
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/rest.py", line 279, in POST
    return self.request("POST", url,
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/rest.py", line 238, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'df2548f1-84ab-47ac-8b73-411a23f80b2f', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'bd326f83-4321-4a89-b499-cf6c80bfe034', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'a9cf1252-5f5b-4044-ac03-5887b4cb5296', 'Date': 'Fri, 17 May 2024 06:15:12 GMT', 'Content-Length': '1417'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"admission webhook \"validator.experiment.katib.kubeflow.org\" denied the request: invalid spec.trialTemplate: unable to convert: /spec/template/spec/containers/0/env - [] to Job, converted template: {\"kind\":\"Job\",\"apiVersion\":\"batch/v1\",\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"template\":{\"metadata\":{\"creationTimestamp\":null,\"annotations\":{\"sidecar.istio.io/inject\":\"false\"}},\"spec\":{\"containers\":[{\"name\":\"training-container\",\"image\":\"lib.boe.com.cn/docker/tensorflow/tensorflow:2.13.0\",\"command\":[\"bash\",\"-c\"],\"args\":[\"\\nprogram_path=$(mktemp -d)\\nread -r -d '' SCRIPT \\u003c\\u003c EOM\\n\\ndef objective(parameters):\\n    # Import required packages.\\n    import time\\n    time.sleep(5)\\n    # Calculate objective function.\\n    result = 4 * int(parameters[\\\"a\\\"]) - float(parameters[\\\"b\\\"]) ** 2\\n    # Katib parses metrics in this format: \\u003cmetric-name\\u003e=\\u003cmetric-value\\u003e.\\n    print(f\\\"result={result}\\\")\\n\\nobjective({'a': 'test-value', 'b': 'test-value'})\\n\\nEOM\\nprintf \\\"%s\\\" \\\"$SCRIPT\\\" \\u003e $program_path/ephemeral_objective.py\\npython3 -u $program_path/ephemeral_objective.py\"],\"resources\":{\"limits\":{\"cpu\":\"2\"},\"requests\":{\"cpu\":\"2\"}}}],\"restartPolicy\":\"Never\"}}},\"status\":{}}","code":400}



During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jovyan/katib/katib_example.py", line 21, in <module>
    katib.KatibClient().tune(
  File "/home/jovyan/katib/katib-c4c3eb52437a7e5bf119e77c6fcf6455cec2c927/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py", line 414, in tune
    self.create_experiment(experiment, namespace)
  File "/home/jovyan/katib/katib-c4c3eb52437a7e5bf119e77c6fcf6455cec2c927/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py", line 130, in create_experiment
    raise RuntimeError(
RuntimeError: Failed to create Katib Experiment: team-1/tune-experiment

But 0.16 work perfect for me.

@tenzen-y
Copy link
Member

@andreyvelich Could you take a look this?

@andreyvelich
Copy link
Member Author

Wait. 0.17rc0 is not working for my notebook in deploykf.

(base) jovyan@newnew-0:~/katib$ python3 ~/katib/katib_example.py 
Traceback (most recent call last):
  File "/home/jovyan/katib/katib-c4c3eb52437a7e5bf119e77c6fcf6455cec2c927/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py", line 111, in create_experiment
    outputs = self.custom_api.create_namespaced_custom_object(
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 231, in create_namespaced_custom_object
    return self.create_namespaced_custom_object_with_http_info(group, version, namespace, plural, body, **kwargs)  # noqa: E501
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api/custom_objects_api.py", line 354, in create_namespaced_custom_object_with_http_info
    return self.api_client.call_api(
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/rest.py", line 279, in POST
    return self.request("POST", url,
  File "/opt/conda/lib/python3.8/site-packages/kubernetes/client/rest.py", line 238, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'df2548f1-84ab-47ac-8b73-411a23f80b2f', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'bd326f83-4321-4a89-b499-cf6c80bfe034', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'a9cf1252-5f5b-4044-ac03-5887b4cb5296', 'Date': 'Fri, 17 May 2024 06:15:12 GMT', 'Content-Length': '1417'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"admission webhook \"validator.experiment.katib.kubeflow.org\" denied the request: invalid spec.trialTemplate: unable to convert: /spec/template/spec/containers/0/env - [] to Job, converted template: {\"kind\":\"Job\",\"apiVersion\":\"batch/v1\",\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"template\":{\"metadata\":{\"creationTimestamp\":null,\"annotations\":{\"sidecar.istio.io/inject\":\"false\"}},\"spec\":{\"containers\":[{\"name\":\"training-container\",\"image\":\"lib.boe.com.cn/docker/tensorflow/tensorflow:2.13.0\",\"command\":[\"bash\",\"-c\"],\"args\":[\"\\nprogram_path=$(mktemp -d)\\nread -r -d '' SCRIPT \\u003c\\u003c EOM\\n\\ndef objective(parameters):\\n    # Import required packages.\\n    import time\\n    time.sleep(5)\\n    # Calculate objective function.\\n    result = 4 * int(parameters[\\\"a\\\"]) - float(parameters[\\\"b\\\"]) ** 2\\n    # Katib parses metrics in this format: \\u003cmetric-name\\u003e=\\u003cmetric-value\\u003e.\\n    print(f\\\"result={result}\\\")\\n\\nobjective({'a': 'test-value', 'b': 'test-value'})\\n\\nEOM\\nprintf \\\"%s\\\" \\\"$SCRIPT\\\" \\u003e $program_path/ephemeral_objective.py\\npython3 -u $program_path/ephemeral_objective.py\"],\"resources\":{\"limits\":{\"cpu\":\"2\"},\"requests\":{\"cpu\":\"2\"}}}],\"restartPolicy\":\"Never\"}}},\"status\":{}}","code":400}



During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jovyan/katib/katib_example.py", line 21, in <module>
    katib.KatibClient().tune(
  File "/home/jovyan/katib/katib-c4c3eb52437a7e5bf119e77c6fcf6455cec2c927/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py", line 414, in tune
    self.create_experiment(experiment, namespace)
  File "/home/jovyan/katib/katib-c4c3eb52437a7e5bf119e77c6fcf6455cec2c927/sdk/python/v1beta1/kubeflow/katib/api/katib_client.py", line 130, in create_experiment
    raise RuntimeError(
RuntimeError: Failed to create Katib Experiment: team-1/tune-experiment

But 0.16 work perfect for me.

Hi @quloos, please can you open a GitHub issue for this problem ?

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants