-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
flexible reply for sending entries #787
Comments
👋 Thanks for opening this issue! Get help or engage by:
|
I think @Licenser 's requirement is to stop sending out anything if a node is known to be unreachable( Pass in a And with current framework, openraft can not deal with a partial successful result, and it will always resend the full bulk of entries again. For now, maybe it is feasible to let the network implementation return a timeout hint back to openraft therefore the next time openraft will use a bigger timeout. I think you were talking about a stream like send-entries API, with which the a big bulk of entries is split into small chunks. |
I think this would be a really neat feature that improves network reliability and would work well hand in hand with One thing we've been doing in tremor is using dynamic backpressure from the downstream system to throttle messages (either drop which isn't reasonable here, or slow down sending which might work well). Perhaps a similar logic could be added here? I think with a timeout hint it could even be done inside the Network implementation and not force openraft to make a decision, a downstream system could send back a "please slow don" message that the network protocol implements, and then the sender goes and increases timeouts or adds delays as required. |
Exactly, this was the intention :-). |
So this is the partial success reply? |
Many messaging systems use always increasing message IDs and sparse akcs to allow for batching (in this fashion) without needing to have full batches to be confirmed. I think we can "steal" the idea here. The basic idea is not every message has to be acknowledged but any acknowledgement given means this and everything before was received and processed. I think that would satisfied @schreter's requirement and be quite scalable as we still have 1 ack / batch in most cases even during partial acks |
Certainly! At the moment, openraft follows a request/response design. Implementing a streaming pattern as you proposed would bring about increased flexibility and efficiency. I have been focusing on refactoring the storage API to a stream-like interface, after which the network API will be revised accordingly. |
Ja no hurry, I'm just thinking out loud :) |
There are two parts to this:
For the former, it's sufficient to pass TTL into the method instead of timeouting on the outside. The The latter is the possibility to advance commit index before the entire batch is processed. Here, the Alternatively, it could be possible to completely decouple it (streaming API), so the |
BTW, I'm wondering if the backoff is the right way to go. It depends on the network implementation. If you have a relatively low volume of updates and use, for instance, UDP to send the messages, then yes, this will likely work very well.
What I am worried about is when you have a lot of data to send. Then, the server will retry sending entire missing log at once. Maybe it would be better to allow the connection to report a partial success. Say, I want to send 100 entries, 50 are through and then the timeout hits. The next time, we'll try to re-send all 100 entries (and some) again, so the timeout hits after 50 entries again. And we'll never make progress. Of course, this can likely be handled in the
Network
implementation by noting down the last successful entry index before the send was killed by the timeout and next time just sending the rest.But, it might be more desirable to pass the
ttl
tosend_append_entries()
instead of limiting the time to send from the outside. As long as there is progress sending individual entries, the call would not time out. I.e., instead of timeouting the entire call of sending 100 entries, you'd time out sending individual entries in the network implementation. This could be made even compatible, by invoking a default-implemented wrapper methodsend_append_entries_with_timeout()
, which would by default just calltimeout(ttl, send_append_entries(...)).await
.Any take on this?
Originally posted by @schreter in #462 (comment)
The text was updated successfully, but these errors were encountered: