From 25fa1c4684edbf616b3541e78e627be9aaad6fd8 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Thu, 1 Sep 2022 17:14:51 -0500 Subject: [PATCH 1/4] Clarify `(room_id, event_id)` uniqueness Summarized from @richvdh's reply at https://github.com/matrix-org/synapse/pull/13589#discussion_r961116999 --- docs/development/database_schema.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/docs/development/database_schema.md b/docs/development/database_schema.md index d996a7caa2c6..bfa12db6a1aa 100644 --- a/docs/development/database_schema.md +++ b/docs/development/database_schema.md @@ -191,3 +191,27 @@ There are three separate aspects to this: flavour will be accepted by SQLite 3.22, but will give a column whose default value is the **string** `"FALSE"` - which, when cast back to a boolean in Python, evaluates to `True`. + + +## `event_id` uniqueness + +In room versions `1` and `2` it's possible to end up with two events with the +same `event_id` (in the same or different rooms). After room version `3`, that +can only happen with a hash collision, which we basically hope will never +happen. + +There are several places in Synapse and even Matrix API's like [`GET +/_matrix/federation/v1/event/{eventId}`](https://spec.matrix.org/v1.1/server-server-api/#get_matrixfederationv1eventeventid) +where we assume that event IDs are globally unique. + +But hash collisions are still possible, and by treating event IDs as room +scoped, we can reduce the possibility of a hash collision. When scoping +`event_id` in the database schema, it should be also accompanied by `room_id` +(`PRIMARY KEY (room_id, event_id)`) and lookups should be done through the pair +`(room_id, event_id)`. + +There has been a lot of debate on this in places like +https://github.com/matrix-org/matrix-spec-proposals/issues/2779 and +[MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which +has no resolution yet (as of 2022-09-01). + From 14c1dd56fb71a93f871f00d5a64918bac0ba6a56 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Thu, 1 Sep 2022 17:17:08 -0500 Subject: [PATCH 2/4] Add changelog --- changelog.d/13701.doc | 1 + 1 file changed, 1 insertion(+) create mode 100644 changelog.d/13701.doc diff --git a/changelog.d/13701.doc b/changelog.d/13701.doc new file mode 100644 index 000000000000..b438e066d809 --- /dev/null +++ b/changelog.d/13701.doc @@ -0,0 +1 @@ +Clarify `(room_id, event_id)` global uniqueness and how we should scope our database schemas. From 5dd24f34bdb4c557ab0433678b9540e4183206b6 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Fri, 2 Sep 2022 11:58:19 -0500 Subject: [PATCH 3/4] No apostrophe needed Co-authored-by: reivilibre --- docs/development/database_schema.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/development/database_schema.md b/docs/development/database_schema.md index bfa12db6a1aa..23f0653b928e 100644 --- a/docs/development/database_schema.md +++ b/docs/development/database_schema.md @@ -200,7 +200,7 @@ same `event_id` (in the same or different rooms). After room version `3`, that can only happen with a hash collision, which we basically hope will never happen. -There are several places in Synapse and even Matrix API's like [`GET +There are several places in Synapse and even Matrix APIs like [`GET /_matrix/federation/v1/event/{eventId}`](https://spec.matrix.org/v1.1/server-server-api/#get_matrixfederationv1eventeventid) where we assume that event IDs are globally unique. From 893d4c542c848a852ca8b46b781e912a1c070f31 Mon Sep 17 00:00:00 2001 From: Eric Eastwood Date: Fri, 2 Sep 2022 11:59:15 -0500 Subject: [PATCH 4/4] Put the uniqueness in the right context --- docs/development/database_schema.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/development/database_schema.md b/docs/development/database_schema.md index 23f0653b928e..e9b925ddd835 100644 --- a/docs/development/database_schema.md +++ b/docs/development/database_schema.md @@ -193,7 +193,7 @@ There are three separate aspects to this: in Python, evaluates to `True`. -## `event_id` uniqueness +## `event_id` global uniqueness In room versions `1` and `2` it's possible to end up with two events with the same `event_id` (in the same or different rooms). After room version `3`, that