Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Eventual consistency issue + proposed solution #19

Open
hackergrrl opened this issue Dec 22, 2023 · 0 comments
Open

Eventual consistency issue + proposed solution #19

hackergrrl opened this issue Dec 22, 2023 · 0 comments

Comments

@hackergrrl
Copy link
Member

hackergrrl commented Dec 22, 2023

Hey y'all. Posting this here to hopefully get some smart eyeballs on it for feedback before I start making the changes.


Premises

  1. In Cable there are implied Conflict-free Replicated Data Types:

    1. each user's latest post/info
    2. each user's, per channel, membership status (regulated by post/{join,leave})
    3. each channel's topic (regulated by post/topic)
  2. For any given post/{topic,join,leave,info}, P, a node could miss it during normal sync, by means of missing it in their usual sync window. Imagining a sync window of 1 week, a node might sync right before P was created, then go offline for just over a week, and then not receive it using the normal time range request.

  3. For each of these, there is a single "latest value".

  4. Latest value is determined using a combination of post timestamps and causal chains.

  5. Causal chains are treated as authoritative over timestamps, when available.

  6. A < C is true if A.timestamp < C.timestamp or the node knows of a causal chain such that A <- ... <- C.

  7. If no causal chain between A and C is known, timestamp alone is used to determine sort order.

  8. Given any posts A and C, a node is always able to order the posts, using timestamps. This is not always true with causal chains, where some intermediary posts may be missing from the local node.

  9. Assume the following:

    1. The A <- B <- C causal chain exists.
    2. N1 is a node that knows about A and C but not B.
    3. N2 is a node that knows about A, B, and C.
  10. If we assume A.timestamp < C.timestamp, N1 would report C as the latest value. N2 would report the same. No conflicts here.

  11. However, if we assume A.timestamp > C.timestamp, then N1 would think A is the latest value, while N2 would still think C is the latest, thanks to N2's knowledge of the causal chain.

  12. (11) is a problem that inhibits eventual consistency. N1 will report A as latest even after syncing with N2. The only way N1 would ever stop considering A as latest is if N1 acquired all posts in a causal chain such that A <- ... <- C. However, due to (2), this may never happen.

  13. This problem also arises for deletions. Thanks to (2), if a deletion is missed during periodic time range sync, there is no way for a node to learn about it after.

  14. The result is that some nodes may see and report old "latest" values for user info, membership status, and channel topics, including deleted ones, even after syncing with clients who "know" what the newest values are.

  15. Cable is not eventually consistent for these data types.

Summary

To summarize, any time someone posts a post/{topic,info,join,leave}, due to clock skew or malice, with a timestamp newer than those that follow it, this situation can happen, where some nodes continue to display and return old data, due to an inaccurate timestamp and lack of causal chain information.

What nefarious about this is that no amount of syncing with nodes who know about the causal chains will fix it. The nodes "stuck" on old data will only use a newer value once a new post, D, is made where D.timestamp > A.timestamp. In a malicious case where the timestamp is set to the VERY distant future, a manufactured value could get locked in essentially forever.

This has, imo, big consequences:

  • deleted posts can continue to circulate
  • old user info can continue to circulate
  • old channel membership info can continue to circulate
  • old channel topics can continue to circulate

Solution 1

Premise: If a node N2 knows the causal chain A <- ... <- C, they must also know whether A.timestamp > C.timestamp.

Proposed Edit 1: First, when N2 responds to a Channel State request, they would detect this case outlined in the premises section above, and choose to send ALL of the hashes that make up the causal chain between A and C. This would let a node with a broken causal chain to "fill in the blanks" and realize that C is actually sorted after A, and is thus the latest value.

Proposed Edit 2: This solution also proposes making a change to how deletions are made. In order to be able to send the complete causal chain when needed, holes -- like are caused by deletions -- can't exist. However, if deletions were to copy the links from the post it is deleting and make those its links, clients will still be able to traverse the causal chain even in the presence of deletions. e.g. given A <- B <- C, a deletion of B, D, would have D.links = [A]. So, the causal chain sent would include D instead of B, and send {A,D,C}.

Pros:

  1. Solves the problem without needing any major changes, and no wire format changes.
  2. If a new post E is made s.t. E.timestamp > A.timestamp, the causal chain won't ned to be sent any longer, avoiding the problem of this creating grow-only sets.

Cons:

  1. The node would need to send ALL of these hashes on EVERY Channel State request where it's relevant. For post/info this effort may be duplicated across each channel the requester and requestee have in common.
  2. In the case of a maliciously futuristic timestamp, like A.timestamp = Jan 1 2035111, the chain effectively becomes a grow-only set, since the full causal chain will be needed for the foreseeable future.
  3. Another piece of implementation complexity.

Mitigations

For Con#2, we could add a rule in the spec that says that posts with a timestamp newer than local_time + N SHOULD be ignored and not ingested. This would put more of an upper limit on how long a must-send causal chain could get.

Proposed Edit 3: Add a rule to the wire spec that says to discard any posts with timestamp > local_time + 6 hours.

Other

Proposed Edit 4: I also propose we get rid of this clause under Channel State Request:

"If future = 1 and a post that is part of the latest state for a channel is deleted, the responder MUST immediately send the hash of the next-latest piece of state of that same type as a Hash Response. For example, if the latest post/topic setting the channel's topic string is deleted by its author with a post/delete post, the hash of the second-latest post/topic for that channel SHOULD be sent."

I think it's redundant and adds extra complexity. It's already written that whenever state changes, the relevant update hashes are sent.

Oh! Also:

Proposed Edit 5: Add a new section to the wire spec that explains how these implied CRDTs work. Right now we just imply their existance. I think it'd be useful to make it explicit. It would also make it simpler to add future similar data types in the future!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant