Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Recommendation Service is crashing after initial deployment #563

Closed
evekhm opened this issue Jun 21, 2021 · 9 comments
Closed

Recommendation Service is crashing after initial deployment #563

evekhm opened this issue Jun 21, 2021 · 9 comments
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. type: docs Improvement to the documentation for an API.

Comments

@evekhm
Copy link

evekhm commented Jun 21, 2021

Screenshot from 2021-06-20 21-46-46

Started following way:

  • minikube start --cpus=4 --memory 4096 --disk-size 32g
  • kubectl apply -f release/kubernetes-manifests.yaml
  • minikube service frontend-external

Running on: Linux localhost.localdomain 4.18.0-305.3.1.el8.x86_64 #1 SMP Tue Jun 1 16:14:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

@evekhm
Copy link
Author

evekhm commented Jun 21, 2021

pod logs:

{"timestamp": 1624255338.1007357, "severity": "INFO", "name": "recommendationservice-server", "message": "initializing recommendationservice"}
{"timestamp": 1624255338.1008198, "severity": "INFO", "name": "recommendationservice-server", "message": "Profiler enabled."}
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 1 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 2 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 3 of 3. Reason: timed out
WARNING:google.auth._default:Authentication failed using Compute Engine authentication due to unavailable metadata server.
{"timestamp": 1624255347.1106029, "severity": "INFO", "name": "recommendationservice-server", "message": "Unable to start Stackdriver Profiler Python agent. Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started"}
{"timestamp": 1624255347.110662, "severity": "INFO", "name": "recommendationservice-server", "message": "Sleeping 10 seconds to retry Stackdriver Profiler agent initialization"}
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 1 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 2 of 3. Reason: timed out

Please suggest how to fix this issue

@evekhm
Copy link
Author

evekhm commented Jun 21, 2021

I believe Instructions for Local deployment really needs improvements.
I have create a service account and have stored keys and used GOOGLE_APPLICATION_CREDENTIALS, however still having issues.

Screenshot from 2021-06-20 23-53-52

@sadikekin
Copy link

Do you need to use Google keys? You can disable the Google Cloud usage by uncommenting the following lines for each deployment in therelease/kubernetes-manifests.yaml file.

          # - name: DISABLE_STATS
          #   value: "1"
          # - name: DISABLE_TRACING
          #   value: "1"
          # - name: DISABLE_PROFILER
          #   value: "1"

@evekhm
Copy link
Author

evekhm commented Jun 21, 2021

Thank you for the response! I was wondering if it is possible to connect to gcloud project while doing the local deployment and how would that work. Anybody knows that by chance?
Thank you

@evekhm
Copy link
Author

evekhm commented Jun 21, 2021

By disabling the livenessProbe of recommendation system I could get the site working, but still not sure how to get it properly fixed.

@Shabirmean Shabirmean added type: docs Improvement to the documentation for an API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jun 22, 2021
@Shabirmean
Copy link
Member

Shabirmean commented Jun 22, 2021

Thank you @evekhm for raising this. Yes! we are working on improving the documentation around local setup. There is a pending PR here which focusses towards explaining this better. It has been kept pending until we finalize some details around the use of kustomize.

You can have a look at the docs from the PR branch to get an idea of the changes.

Linking related issues/PRs for posterity:

@sabotenwork
Copy link

Hi, @Shabirmean
I have a same problem with my local microk8s, and I checked issue-359 Development Guide,
so started "Option 2 - Local Cluster" step 2.1 to 2.6.

However, recommendationservice pod is crashing...

{"timestamp": 1627435972.6740053, "severity": "INFO", "name": "recommendationservice-server", "message": "initializing recommendationservice"}
{"timestamp": 1627435972.6742656, "severity": "INFO", "name": "recommendationservice-server", "message": "Profiler enabled."}
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 1 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 2 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 3 of 3. Reason: timed out
WARNING:google.auth._default:Authentication failed using Compute Engine authentication due to unavailable metadata server.
{"timestamp": 1627435981.6837926, "severity": "INFO", "name": "recommendationservice-server", "message": "Unable to start Stackdriver Profiler Python agent. Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started"}
{"timestamp": 1627435981.6839278, "severity": "INFO", "name": "recommendationservice-server", "message": "Sleeping 10 seconds to retry Stackdriver Profiler agent initialization"}
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 1 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 2 of 3. Reason: timed out

Below was also implemented, but the situation remained the same.

          # - name: DISABLE_STATS
          #   value: "1"
          # - name: DISABLE_TRACING
          #   value: "1"
          # - name: DISABLE_PROFILER
          #   value: "1"

If there is something else I should try, could you let me know?
I am waiting the use of kustomize.

@askmeegs askmeegs self-assigned this Aug 4, 2021
@sabotenwork
Copy link

Hi, @askmeegs, thank you for your attention.
and I misunderstood what directory skaffold was reading the configuration file from.
I tried injecting the environment variable directly with the following command, and the error was resolved.

kubectl patch deployment recommendationservice -p '{"spec":{"template":{"spec":{"containers":[{"env":[{"name":"DISABLE_PROFILER","value":"1"}],"name":"server"}]}}}}'
kubectl patch deployment recommendationservice -p '{"spec":{"template":{"spec":{"containers":[{"env":[{"name":"DISABLE_DEBUGGER","value":"1"}],"name":"server"}]}}}}'
kubectl patch deployment recommendationservice -p '{"spec":{"template":{"spec":{"containers":[{"env":[{"name":"DISABLE_TRACING","value":"1"}],"name":"server"}]}}}}'

I had been changing release/kubernetes-manifests.yaml for a long time,
but skaffold was reading ./kubernetes-manifests/recommendationservice.yaml.
By uncommenting the file DISABLE_* , the error was resolved and all pods were successfully started.

@askmeegs
Copy link
Contributor

Thank you for this update! I will close this issue for now - note that we do still plan to add a local development Kustomize profile to make this easier, in the future.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. type: docs Improvement to the documentation for an API.
Projects
None yet
Development

No branches or pull requests

5 participants