-
Notifications
You must be signed in to change notification settings - Fork 137
Proposal: Mesos-DNS should stay running even when disconnected from Master/ZK #284
Comments
SGTM |
Ugh. this actually bit me hard today when trying to migrate a ZK cluster on a running mesos. I will try and see if I can fix this quickly ^^ any suggestions? What i'm thinking right now is to do a reload regardless of master/zk connection and ensure that the data structures holding the records are kept around. Also making the first reload async of the connection to masters may be an option i will look into. |
Ok quick update. It seems to be relatively simple as most of the logic is async already (yay go!) I think i can get a clean PR for review and feedback ready pretty quickly. Basically whats can be done is: Dont log fatal, instead just very verbose. E.g. For static records after the error is checked those could still be added to the rrs maps but outside the InsertState mehod to avoid recreating the maps. |
Please keep the static records proposal apart from this one. |
I am no worries. Just making sure that the logic is sound and compatible for any non-master driven records if there are/will be any. |
Currently 2 issues exist:
1.) If the initial ZK connection doesn't success mesos-dns will not start
2.) If the master detection doesn't find a master in 30sec it will crash
Issue 2 is probably the most critical as it has a chance to be fatal during a network partition and will not recover automatically after the network is restored.
Suggestion:
If ZK connection doesn't succeed keep retrying (possibly with a capped backoff to reduce chatter) indefinitely. Continue service requests (e.g. to allow Resolvers to work)
If Master detection fails, don't panic :) just log it and wait for the next reload cycle to try again.
The text was updated successfully, but these errors were encountered: