[RELEASE-0.20] Avoid deleting a stat if a request raced the reporter #10748

markusthoemmes · 2021-02-11T07:35:15Z

Backport of #10729

* Avoid deleting a stat if a request raced the reporter This prevents a race between the report routine and requests flowing in and out. Since we're trying to minimize contention of the request path, the locking routines try to grab as little write-locks as possible, to allow things to progress in parallel. That breaks though if a report would report AverageConcurrency == 0 and hence marking the stat for deletion. If between this being done and the entry actually being deleted (two separate locks as we only grab a read lock for determining the deletion) comes a nwe request, it'll grab the stat that is now going to be deleted and hence not seen by the next report routine. The In event is lost and the stats concurrency becomes negative, unrecoverably. * Avoid pointer

codecov · 2021-02-11T07:40:18Z

Codecov Report

Merging #10748 (5829993) into release-0.20 (560a4f6) will decrease coverage by 0.00%.
The diff coverage is 90.00%.

@@               Coverage Diff                @@
##           release-0.20   #10748      +/-   ##
================================================
- Coverage         88.08%   88.08%   -0.01%     
================================================
  Files               187      187              
  Lines              8865     8869       +4     
================================================
+ Hits               7809     7812       +3     
- Misses              816      817       +1     
  Partials            240      240

Impacted Files	Coverage Δ
pkg/activator/handler/concurrency_reporter.go	`90.00% <90.00%> (-0.70%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 560a4f6...26fa83a. Read the comment docs.

julz

/lgtm

knative-prow-robot · 2021-02-11T08:59:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: julz, markusthoemmes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/activator/OWNERS~~ [julz,markusthoemmes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…knative#10748) * Avoid deleting a stat if a request raced the reporter This prevents a race between the report routine and requests flowing in and out. Since we're trying to minimize contention of the request path, the locking routines try to grab as little write-locks as possible, to allow things to progress in parallel. That breaks though if a report would report AverageConcurrency == 0 and hence marking the stat for deletion. If between this being done and the entry actually being deleted (two separate locks as we only grab a read lock for determining the deletion) comes a nwe request, it'll grab the stat that is now going to be deleted and hence not seen by the next report routine. The In event is lost and the stats concurrency becomes negative, unrecoverably. * Avoid pointer

…ency (#703) * Only use exec probe at startup time (knative#10741) * Only use exec probe at startup time Now that StartupProbe is available, we can avoid using spawning the exec probe other than at startup time. For requests after startup this directly uses the same endpoint as the exec probe in the QP as the target of a HTTP readiness probe. Following on from this I think we may want to rework quite a bit of how our readiness probe stuff works (e.g. it'd be nice to keep the probes on the user container so failures are on the right object, and we currently ignore probes ~entirely after startup if periodSeconds>0), but this is a minimal change that should be entirely backwards-compatible and saves quite a few cpu cycles. * Use ProgressDeadline as failure timeout for startup probe - Also just drop exec probe entirely for periodSeconds > 1 since these can just use the readiness probe now. (Easier than figuring out how to do square ProgressDeadline with a custom period). * See if flag is what's making upgrades unhappy * reorganize comments * Default PeriodSeconds of the readiness probe to 1 if unset (knative#10992) * Avoid deleting a stat if a request raced the reporter (knative#10729) (knative#10748) * Avoid deleting a stat if a request raced the reporter This prevents a race between the report routine and requests flowing in and out. Since we're trying to minimize contention of the request path, the locking routines try to grab as little write-locks as possible, to allow things to progress in parallel. That breaks though if a report would report AverageConcurrency == 0 and hence marking the stat for deletion. If between this being done and the entry actually being deleted (two separate locks as we only grab a read lock for determining the deletion) comes a nwe request, it'll grab the stat that is now going to be deleted and hence not seen by the next report routine. The In event is lost and the stats concurrency becomes negative, unrecoverably. * Avoid pointer Co-authored-by: Julian Friedman <julz.friedman@uk.ibm.com>

…ency (knative#703) * Only use exec probe at startup time (knative#10741) * Only use exec probe at startup time Now that StartupProbe is available, we can avoid using spawning the exec probe other than at startup time. For requests after startup this directly uses the same endpoint as the exec probe in the QP as the target of a HTTP readiness probe. Following on from this I think we may want to rework quite a bit of how our readiness probe stuff works (e.g. it'd be nice to keep the probes on the user container so failures are on the right object, and we currently ignore probes ~entirely after startup if periodSeconds>0), but this is a minimal change that should be entirely backwards-compatible and saves quite a few cpu cycles. * Use ProgressDeadline as failure timeout for startup probe - Also just drop exec probe entirely for periodSeconds > 1 since these can just use the readiness probe now. (Easier than figuring out how to do square ProgressDeadline with a custom period). * See if flag is what's making upgrades unhappy * reorganize comments * Default PeriodSeconds of the readiness probe to 1 if unset (knative#10992) * Avoid deleting a stat if a request raced the reporter (knative#10729) (knative#10748) * Avoid deleting a stat if a request raced the reporter This prevents a race between the report routine and requests flowing in and out. Since we're trying to minimize contention of the request path, the locking routines try to grab as little write-locks as possible, to allow things to progress in parallel. That breaks though if a report would report AverageConcurrency == 0 and hence marking the stat for deletion. If between this being done and the entry actually being deleted (two separate locks as we only grab a read lock for determining the deletion) comes a nwe request, it'll grab the stat that is now going to be deleted and hence not seen by the next report routine. The In event is lost and the stats concurrency becomes negative, unrecoverably. * Avoid pointer Co-authored-by: Julian Friedman <julz.friedman@uk.ibm.com>

knative-prow-robot assigned julz and vagababov Feb 11, 2021

knative-prow-robot added the area/autoscale label Feb 11, 2021

knative-prow-robot requested review from andrew-su and yanweiguo February 11, 2021 07:35

knative-prow-robot added area/networking size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 11, 2021

google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Feb 11, 2021

markusthoemmes changed the title ~~Avoid deleting a stat if a request raced the reporter (#10729)~~ [RELEASE-0.20] Avoid deleting a stat if a request raced the reporter Feb 11, 2021

julz approved these changes Feb 11, 2021

View reviewed changes

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 11, 2021

knative-prow-robot merged commit f0c5561 into knative:release-0.20 Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE-0.20] Avoid deleting a stat if a request raced the reporter #10748

[RELEASE-0.20] Avoid deleting a stat if a request raced the reporter #10748

markusthoemmes commented Feb 11, 2021

codecov bot commented Feb 11, 2021

julz left a comment

knative-prow-robot commented Feb 11, 2021

[RELEASE-0.20] Avoid deleting a stat if a request raced the reporter #10748

[RELEASE-0.20] Avoid deleting a stat if a request raced the reporter #10748

Conversation

markusthoemmes commented Feb 11, 2021

codecov bot commented Feb 11, 2021

Codecov Report

julz left a comment

Choose a reason for hiding this comment

knative-prow-robot commented Feb 11, 2021