(WIP) Distributed in-memory cache for Singularity data #1965

ssalinas · 2019-06-23T15:04:41Z

Even with recent updates to zk usage, non-leader instances are still very hard on zookeeper and much slower than the leader that has all data in memory. This PR aims to remedy that situation by keeping an eventually consistent view of the leader's data in memory on all non-leader instances as well. It also will unify the caching across different manager classes since we currently have leader cache, web cache, and multiple ZkCache classes.

The distributed view of data is currently accomplished using atomix, with the goal of keeping all data in memory on the leader, with writes replicated to other cluster members (primary-backup in atomix). Atomix member discovery is done via zk leader latch since we already have zk present (though DNS could be another option we provide). Atomix provides leader election/raft protocols as well, but we are using it mostly as a readonly cache so leaving those alone for now

TODOs:

ssalinas · 2019-07-11T18:45:50Z

Additions to this. When trying to get atomix into our current unit testing setup, the tests were so slow and expensive that I couldn't run them on local laptop. Even on our m5.12x it was taking 10mins+ and sometimes failing. We've known about the slow tests for a while and this was a good excuse to fix. So, Pr also includes and update to junit5, which allows us to use their BeforeAll/AfterAll methods with dependency injection. This means we only create hk2/test zk server/atomix once per test class instead of once per test method. Average build times on our infra down from 6-7mins to just under 2 mins for the SingularityService module.

remove the rest of junit4 fix deps for guice extension deps for integration test module

ssalinas · 2019-07-16T13:53:26Z

while a cool technology, this ended up over complicating the startup/leader procedure to the point where I'd be too worried about data consistency/integrity to move forward with it.I'm going to pick apart some of the more usable pieces of this PR into smaller PRs and find a different approach for speeding up some endpoints on non-leading instances

ssalinas added 16 commits June 21, 2019 16:02

very wip distributed cache

85eb5d3

more wip

e3c46ff

more wip

c96f369

even more wip

388fa7b

clean up deps and resource class calls

b1767c5

atomix starts, now I broke jade

be8ebaa

fix jade

8b83bf5

turn log level back down in tests

840ef49

slow these down to reduce mem usage

ec60df9

down to 5 failing tests

fa47126

down to 2 failing tests

b152510

attempts at fixing odd guava stuff

5c270f7

fix serialization

cbb7fb6

fix history persist cache clear

82e97f5

fix merge conflicts with master

a122e9c

update port selection for tests

f70241a

ssalinas added 13 commits July 16, 2019 09:19

junit5, make the tests fast enough to actually run in reasonable time

b2efea6

remove the rest of junit4 fix deps for guice extension deps for integration test module

fix map sync and other startup issues

c04a7a1

fix startup races

10d30b6

Calculate max task lag after excluding on demands with instance limit

2351dba

wrong var here

207154d

tweak cooldown thresholds and evaluation logic

9879f42

fix failure window condition

f724d7a

rework membership discovery model

e2fbf6e

longer default startup time

d3343f6

always use fresh host states

093bbfa

this isn't thrown anymore

371b38e

Add test resource endpoint to get cache data

8d86230

set number of backups

4d25306

ssalinas added 2 commits July 16, 2019 09:19

avoid npe

395a6ab

missing @produces

b75420d

ssalinas force-pushed the caching_update branch from a84a446 to b75420d Compare July 16, 2019 13:19

ssalinas mentioned this pull request Jul 16, 2019

junit5 #1975

Merged

ssalinas closed this Jul 16, 2019

ssalinas deleted the caching_update branch September 5, 2019 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) Distributed in-memory cache for Singularity data #1965

(WIP) Distributed in-memory cache for Singularity data #1965

ssalinas commented Jun 23, 2019 •

edited

Loading

ssalinas commented Jul 11, 2019

ssalinas commented Jul 16, 2019

(WIP) Distributed in-memory cache for Singularity data #1965

(WIP) Distributed in-memory cache for Singularity data #1965

Conversation

ssalinas commented Jun 23, 2019 • edited Loading

ssalinas commented Jul 11, 2019

ssalinas commented Jul 16, 2019

ssalinas commented Jun 23, 2019 •

edited

Loading