Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Enforcer hangs during multi-threaded modifications #336

Closed
v-loboda opened this issue Nov 10, 2023 · 12 comments · Fixed by #363, #362 or #361
Closed

Enforcer hangs during multi-threaded modifications #336

v-loboda opened this issue Nov 10, 2023 · 12 comments · Fixed by #363, #362 or #361
Assignees
Labels
bug Something isn't working released

Comments

@v-loboda
Copy link

Hi guys

We encountered a problem in an environment where intensive policy modification occurs. It's very difficult to reproduce, and I can't do it reliably.
We have rbac model, several nodes with synchronization via IWatcher and a lot of policy changes including removing them.

Sometimes, very rarely, we catch an error
Error Write lock may not be acquired with read lock held. This pattern is prone to deadlocks. Please ensure that read locks are released before taking a write lock. If an upgrade is necessary, use an upgrade lock in place of the read lock.

From my point of view, there are 3 problem pleces:

  1. DefaultPolicyStore.Node class, where PolicyTextSet collection can be changed from different threads in several places
  2. Casbin.Rbac.Role class where _roles collection can be changed from different threads
  3. InternalEnforce class and the InternalEnforce method, where Scanner.Interrupt is not executed on every execution path. This can lead to a deadlock in the internal Iterator->node.Lock, where after _node.Lock.EnterReadLock ExitReadLock will not be executed.
@casbin-bot casbin-bot added the question Further information is requested label Nov 10, 2023
@casbin-bot
Copy link
Member

@sagilio sagilio self-assigned this Nov 10, 2023
@sagilio sagilio added bug Something isn't working and removed question Further information is requested labels Nov 10, 2023
@sagilio
Copy link
Member

sagilio commented Nov 10, 2023

Hi! Thank you for your feedback. I will review your points later and it would be helpful if you could provide more details like the error throw line.

@v-loboda
Copy link
Author

This happen in the Default PolicyStore.Node class in TryAddPolicy and TryRemovePolicy, lines 74 and 133 respectively.

@dmolochnikov
Copy link

Hello! Any updates on this issue?

@hsluoyz
Copy link
Member

hsluoyz commented Apr 4, 2024

@v-loboda @dmolochnikov can you provide stable reproduce steps? and also:

  1. Casbin.NET version?
  2. Related code (like a test case)
  3. Casbin model
  4. Casbin policy

@v-loboda
Copy link
Author

v-loboda commented Apr 5, 2024

Hi, as I said before, it is hard to reprocude becase of multithreaded nature. You just need to modify your policies in several threads and read them simultaneously. In the first message I have pointed out several places that do not meet the multithreading requirements.

  1. It was 2.1.1 version
  2. There is no specific code, just multithreaded operations with AddPoliciesAsync, AddRoleForUserAsync, RemovePolicyAsync, EnforceAsync and GetImplicitPermissionsForUser
[request_definition]
r = sub, resource_type, resource, act

[policy_definition]
p = sub, resource_type, resource, act

[role_definition]
g = _, _

[policy_effect]
e = some(where (p.eft == allow))

[matchers]
m = g(r.sub, p.sub) && (r.resource_type == p.resource_type || p.resource_type == "*") && (r.resource == p.resource || p.resource == "*") && (r.act == p.act || p.act == "*")
  1. Policy pattern: ["user1", "test1", "test1", "test1"]

@hsluoyz
Copy link
Member

hsluoyz commented Apr 14, 2024

@v-loboda how to reproduce it stably?

@v-loboda
Copy link
Author

I would make a test case if I could reproduce the bug reliably.

I was able to reproduce it several times by running the modification and receiving permissions in multiple threads and waiting for errors for a long time. Sometimes the error appeared within a few seconds.

@seanamorosoamtote
Copy link
Contributor

I definitely see similar issues. I didn't reproduce the deadlocking in a test yet, but I am seeing it as well. However, The PR I submitted here handles some issues as well as a test case that would fail prior in the same vein of concurrent access issues. #361

I would urge the team to investigate all usages of HashSet reading and writing as they are not thread safe.

@hsluoyz
Copy link
Member

hsluoyz commented Jun 12, 2024

If you cannot reproduce it, it cannot be fixed.

@hsluoyz hsluoyz closed this as completed Jun 12, 2024
@hsluoyz hsluoyz added invalid This doesn't seem right and removed bug Something isn't working labels Jun 12, 2024
@sagilio sagilio added bug Something isn't working invalid This doesn't seem right and removed invalid This doesn't seem right labels Jun 12, 2024
@sagilio
Copy link
Member

sagilio commented Jun 12, 2024

I believe this issue is indeed likely to occur, and it's very possible that the problem arises from making changes to the same Node while the Scanner is not interrupted. I will continue to try to find a way to consistently reproduce it, and once I find a method or have a new idea, I will reopen this issue.

During the discussion, there are other potential issues, I will create a new issue to link them.

Copy link

github-actions bot commented Sep 3, 2024

🎉 This issue has been resolved in version 2.10.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

# for free to join this conversation on GitHub. Already have an account? # to comment