Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix race condition in ClientEdmModel.GetOrCreateEdmType #2533

Merged
merged 5 commits into from
Oct 24, 2022

Conversation

habbes
Copy link
Contributor

@habbes habbes commented Oct 19, 2022

Issues

*This pull request fixes #2532

This removes a race condition from ClientEdmModel.GetOrCreateEdmTypeInternal method that allows one thread to store an instance of an EDM type in ClientEdmModel.typeNameToClientTypeAnnotationCache dictionary and another thread to store a different instance representing the same EDM type in ClientEdmModel.clrToEdmTypeCache. Having instances representing the same EDM type causes the ODataWriter to throw a validation exception because it thinks they are two different types that don't match.

For more details about the cause of the issue, read this article

Description

The fix puts the GetOrCreateClientTypeAnnotation call (which updates the typeNameToClientTypeAnnotationCache) and the code that was below it in the same lock "scope" the code that updates the clrToEdmTypeCache. This ensures that the thread that inserts the edm type in the first cache also inserts it in the second cache before releasing the lock (which would provide a chance to other threads to "overtake it" and get to the second cache first).

I found this to be the simplest fix and the least disruptive. I think we can find better and more efficient solutions, but I think the code here is sufficiently complex that it may be better to rethink the design another time. Since we're adding more code to the same lock's critical section, we might increase lock contention. But at the same, there's one less lock to fight for. More importantly, we've eliminated the bug.

The GetOrCreateClientTypeAnnotation method still acquires the lock to typeNameToClientTypeAnnotationCache, I considered removing this lock because this method is only called by GetOrCreateEdmTypeInternal and the cache is not updated/locked by another method. However, I thought doing so might require more effort to reason about the correctness of the code, and easier to make a mistake in the future. Also, since only one thread at a time will call this method, there will be no contention for this lock. And apparently, it's relatively cheap to acquire a free lock.

Before implementing the fix, I created a failing unit test that reproduces the race condition. It spawns two threads that each calls GetOrCreateEdmType() on the same CLR type, then verifies whether the edm type is the same as the one returned from the annotation and the model. Then it runs this 5000 times. I tried different number of iterations:

  • For 20 iterations the test was passing sometimes and failing sometimes
  • For 100 iterations it failed consistently 20 times in a row on my machine, but it passed on CI
  • For 5000 iterations it failed on CI two consecutive runs of the pipeline. It was failing on at least one of the framework versions, did pass on others in at least one of the runs. One downside for this is that it takes ~4s to run on my machine when the test passes.

The test may have some false positives, but if it fails we'll know there's a bug.

image

After implementing the fix, the test passes consistently.

Checklist (Uncheck if it is not completed)

  • Test cases added
  • Build and test with one-click build and test script passed

Additional work necessary

If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.

mikepizzo
mikepizzo previously approved these changes Oct 19, 2022
@mikepizzo
Copy link
Member

LGTM --

Do you want to create an issue for 8.0 to revisit this locking logic?

gathogojr
gathogojr previously approved these changes Oct 21, 2022
Copy link
Contributor

@gathogojr gathogojr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

g2mula
g2mula previously approved these changes Oct 21, 2022
Copy link
Member

@g2mula g2mula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@habbes
Copy link
Contributor Author

habbes commented Oct 21, 2022

LGTM --

Do you want to create an issue for 8.0 to revisit this locking logic?

Done: #2536

@habbes habbes dismissed stale reviews from g2mula, gathogojr, and mikepizzo via d8006b8 October 24, 2022 03:38
@pull-request-quantifier-deprecated

This PR has 40 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Extra Small
Size       : +33 -7
Percentile : 16%

Total files changed: 2

Change summary by file extension:
.cs : +33 -7

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

@habbes habbes merged commit ed9810a into OData:master Oct 24, 2022
@habbes habbes deleted the 2532-duplicate-edm-types-race-condition branch October 24, 2022 04:37
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate EDM types in ClientEdmModel due to race conditions cause sporadic ODataWriter validation errors
4 participants