-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
(WIP) Distributed in-memory cache for Singularity data #1965
Conversation
Additions to this. When trying to get atomix into our current unit testing setup, the tests were so slow and expensive that I couldn't run them on local laptop. Even on our m5.12x it was taking 10mins+ and sometimes failing. We've known about the slow tests for a while and this was a good excuse to fix. So, Pr also includes and update to junit5, which allows us to use their BeforeAll/AfterAll methods with dependency injection. This means we only create hk2/test zk server/atomix once per test class instead of once per test method. Average build times on our infra down from 6-7mins to just under 2 mins for the SingularityService module. |
remove the rest of junit4 fix deps for guice extension deps for integration test module
while a cool technology, this ended up over complicating the startup/leader procedure to the point where I'd be too worried about data consistency/integrity to move forward with it.I'm going to pick apart some of the more usable pieces of this PR into smaller PRs and find a different approach for speeding up some endpoints on non-leading instances |
Even with recent updates to zk usage, non-leader instances are still very hard on zookeeper and much slower than the leader that has all data in memory. This PR aims to remedy that situation by keeping an eventually consistent view of the leader's data in memory on all non-leader instances as well. It also will unify the caching across different manager classes since we currently have leader cache, web cache, and multiple ZkCache classes.
The distributed view of data is currently accomplished using atomix, with the goal of keeping all data in memory on the leader, with writes replicated to other cluster members (primary-backup in atomix). Atomix member discovery is done via zk leader latch since we already have zk present (though DNS could be another option we provide). Atomix provides leader election/raft protocols as well, but we are using it mostly as a readonly cache so leaving those alone for now
TODOs: