Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

OpenCensus Zipkin backend issues #703

Closed
mdemirhan opened this issue Apr 20, 2018 · 0 comments · Fixed by #718
Closed

OpenCensus Zipkin backend issues #703

mdemirhan opened this issue Apr 20, 2018 · 0 comments · Fixed by #718
Assignees
Labels
area/monitoring kind/bug Categorizes issue or PR as related to a bug. tracking This label is applied to Knative issues that track issues external to Knative (e.g. Istio, K8s)

Comments

@mdemirhan
Copy link
Contributor

This is a tracking issue for an OpenCensus bug we opened: census-instrumentation/opencensus-go#719

@mdemirhan mdemirhan added kind/bug Categorizes issue or PR as related to a bug. area/monitoring tracking This label is applied to Knative issues that track issues external to Knative (e.g. Istio, K8s) labels Apr 20, 2018
@mdemirhan mdemirhan self-assigned this Apr 20, 2018
mdemirhan added a commit that referenced this issue Apr 24, 2018
Fixes #703

When we migrated from OpenZipkin to OpenCensus libraries, the distributed traces started to appear different than with OpenZipkin. Investigating this with OpenCensus team, it turns out that Open Zipkin closes spans at the end of RoundTrip method vs res.Body.Close() for OpenCensus. OpenCensus' behavior is the correct one here: In our code, instead of using defer res.Body.Close(), we need to do res.Body.Close in order to correctly calculate the span timing information.
markusthoemmes added a commit to markusthoemmes/knative-serving that referenced this issue Apr 7, 2021
…ency (knative#703)

* Only use exec probe at startup time (knative#10741)

* Only use exec probe at startup time

Now that StartupProbe is available, we can avoid using spawning the exec
probe other than at startup time. For requests after startup this
directly uses the same endpoint as the exec probe in the QP as the
target of a HTTP readiness probe.

Following on from this I think we may want to rework quite a bit of how
our readiness probe stuff works (e.g. it'd be nice to keep the probes on
the user container so failures are on the right object, and we currently
ignore probes ~entirely after startup if periodSeconds>0), but this is a
minimal change that should be entirely backwards-compatible and saves
quite a few cpu cycles.

* Use ProgressDeadline as failure timeout for startup probe

- Also just drop exec probe entirely for periodSeconds > 1 since these
  can just use the readiness probe now. (Easier than figuring out how to
  do square ProgressDeadline with a custom period).

* See if flag is what's making upgrades unhappy

* reorganize comments

* Default PeriodSeconds of the readiness probe to 1 if unset (knative#10992)

* Avoid deleting a stat if a request raced the reporter (knative#10729) (knative#10748)

* Avoid deleting a stat if a request raced the reporter

This prevents a race between the report routine and requests flowing in and out. Since we're trying to minimize contention of the request path, the locking routines try to grab as little write-locks as possible, to allow things to progress in parallel.

That breaks though if a report would report AverageConcurrency == 0 and hence marking the stat for deletion. If between this being done and the entry actually being deleted (two separate locks as we only grab a read lock for determining the deletion) comes a nwe request, it'll grab the stat that is now going to be deleted and hence not seen by the next report routine.

The In event is lost and the stats concurrency becomes negative, unrecoverably.

* Avoid pointer

Co-authored-by: Julian Friedman <julz.friedman@uk.ibm.com>
markusthoemmes added a commit to markusthoemmes/knative-serving that referenced this issue Apr 7, 2021
…ency (knative#703)

* Only use exec probe at startup time (knative#10741)

* Only use exec probe at startup time

Now that StartupProbe is available, we can avoid using spawning the exec
probe other than at startup time. For requests after startup this
directly uses the same endpoint as the exec probe in the QP as the
target of a HTTP readiness probe.

Following on from this I think we may want to rework quite a bit of how
our readiness probe stuff works (e.g. it'd be nice to keep the probes on
the user container so failures are on the right object, and we currently
ignore probes ~entirely after startup if periodSeconds>0), but this is a
minimal change that should be entirely backwards-compatible and saves
quite a few cpu cycles.

* Use ProgressDeadline as failure timeout for startup probe

- Also just drop exec probe entirely for periodSeconds > 1 since these
  can just use the readiness probe now. (Easier than figuring out how to
  do square ProgressDeadline with a custom period).

* See if flag is what's making upgrades unhappy

* reorganize comments

* Default PeriodSeconds of the readiness probe to 1 if unset (knative#10992)

* Avoid deleting a stat if a request raced the reporter (knative#10729) (knative#10748)

* Avoid deleting a stat if a request raced the reporter

This prevents a race between the report routine and requests flowing in and out. Since we're trying to minimize contention of the request path, the locking routines try to grab as little write-locks as possible, to allow things to progress in parallel.

That breaks though if a report would report AverageConcurrency == 0 and hence marking the stat for deletion. If between this being done and the entry actually being deleted (two separate locks as we only grab a read lock for determining the deletion) comes a nwe request, it'll grab the stat that is now going to be deleted and hence not seen by the next report routine.

The In event is lost and the stats concurrency becomes negative, unrecoverably.

* Avoid pointer

Co-authored-by: Julian Friedman <julz.friedman@uk.ibm.com>
mgencur referenced this issue in openshift-knative/serving Oct 12, 2022
…ency (#703)

* Only use exec probe at startup time (knative#10741)

* Only use exec probe at startup time

Now that StartupProbe is available, we can avoid using spawning the exec
probe other than at startup time. For requests after startup this
directly uses the same endpoint as the exec probe in the QP as the
target of a HTTP readiness probe.

Following on from this I think we may want to rework quite a bit of how
our readiness probe stuff works (e.g. it'd be nice to keep the probes on
the user container so failures are on the right object, and we currently
ignore probes ~entirely after startup if periodSeconds>0), but this is a
minimal change that should be entirely backwards-compatible and saves
quite a few cpu cycles.

* Use ProgressDeadline as failure timeout for startup probe

- Also just drop exec probe entirely for periodSeconds > 1 since these
  can just use the readiness probe now. (Easier than figuring out how to
  do square ProgressDeadline with a custom period).

* See if flag is what's making upgrades unhappy

* reorganize comments

* Default PeriodSeconds of the readiness probe to 1 if unset (knative#10992)

* Avoid deleting a stat if a request raced the reporter (knative#10729) (knative#10748)

* Avoid deleting a stat if a request raced the reporter

This prevents a race between the report routine and requests flowing in and out. Since we're trying to minimize contention of the request path, the locking routines try to grab as little write-locks as possible, to allow things to progress in parallel.

That breaks though if a report would report AverageConcurrency == 0 and hence marking the stat for deletion. If between this being done and the entry actually being deleted (two separate locks as we only grab a read lock for determining the deletion) comes a nwe request, it'll grab the stat that is now going to be deleted and hence not seen by the next report routine.

The In event is lost and the stats concurrency becomes negative, unrecoverably.

* Avoid pointer

Co-authored-by: Julian Friedman <julz.friedman@uk.ibm.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area/monitoring kind/bug Categorizes issue or PR as related to a bug. tracking This label is applied to Knative issues that track issues external to Knative (e.g. Istio, K8s)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant