-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: allow spread operators in to-many relationships #3640
base: main
Are you sure you want to change the base?
Conversation
1a02416
to
dd89c12
Compare
My approach right now is to generate this query for a to-many request: curl 'localhost:3000/clients?select=name,...projects(name,id)' SELECT "test"."clients"."name",
"clients_projects_1"."name",
"clients_projects_1"."id"
FROM "test"."clients"
LEFT JOIN LATERAL (
SELECT json_agg("projects_1"."name") AS "name",
json_agg("projects_1"."id") AS "id"
FROM "test"."projects" AS "projects_1"
WHERE "projects_1"."client_id" = "test"."clients"."id"
) AS "clients_projects_1" ON TRUE Right now this gives the expected result. But aggregates are not working correctly, because they are designed to be selected in the top query with a SELECT "test"."clients"."name",
json_agg("clients_projects_1"."name") AS "name",
json_agg("clients_projects_1"."id") AS "id"
FROM "test"."clients"
LEFT JOIN LATERAL (
SELECT "projects_1"."name",
"projects_1"."id"
FROM "test"."projects" AS "projects_1"
WHERE "projects_1"."client_id" = "test"."clients"."id"
) AS "clients_projects_1" ON TRUE
GROUP BY "test"."clients"."name" Not sure which one is better/easier right now... I'm thinking the latter. |
Having the json_agg in the outer query would make the query cleaner, imho. |
dd89c12
to
dce8597
Compare
Some caveats I encountered: Repeated values and orderDo we want to keep repeated values in the results? For example (not the best use case, just to illustrate): curl 'localhost:3000/project?select=name,...tasks(tasks:name,due_dates:due_date)' [
{
"name": "project 1",
"tasks": ["task 1", "task 2", "task 3", "task 4"],
"due_dates": [null, "2024-08-08", "2024-08-08", null]
}
] Here we're repeating Nested To-Many SpreadsI have a doubt on what to expect with nested to-many spreads. For example, on a non-nested to-many spread like this one: curl 'localhost:3000/entities?select=name,...child_entities(children:name)' We would expect: [
{"name": "entity 1", "children": ["child entity 1", "child entity 2"]},
{"name": "entity 2", "children": ["child entity 3"]},
"..."
] But what if we nest another to-many spread embedding with a new column to aggregate: curl 'localhost:3000/entities?select=name,...child_entities(children:name,...grandchild_entities(grandchildren:name))' I understand that we're hoisting all the aggregates to the top level, and not grouping by the intermediate columns ( [
{"name": "entity 1", "children": ["child entity 1", "child entity 2"], "grandchildren": ["grandchild entity 1", "grandchild entity 2", "..."]},
{"name": "entity 2", "children": ["child entity 3"], "grandchildren": []},
"..."
] This cannot be achieved by a simple SELECT "api"."entities"."name",
json_agg(DISTINCT "entities_child_entities_1"."children") AS "children",
json_agg(DISTINCT "entities_child_entities_1"."grandchildren") AS "grandchildren"
FROM "api"."entities"
LEFT JOIN LATERAL (
SELECT "child_entities_1"."name" AS "children",
"child_entities_grandchild_entities_2"."grandchildren" AS "grandchildren"
FROM "api"."child_entities" AS "child_entities_1"
LEFT JOIN LATERAL (
SELECT "grandchild_entities_2"."name" AS "grandchildren"
FROM "api"."grandchild_entities" AS "grandchild_entities_2"
WHERE "grandchild_entities_2"."parent_id" = "child_entities_1"."id"
) AS "child_entities_grandchild_entities_2" ON TRUE
WHERE "child_entities_1"."parent_id" = "api"."entities"."id"
) AS "entities_child_entities_1" ON TRUE
GROUP BY "api"."entities"."name"; If there is no sensible interpretation of the query, another option is to prohibit these intermediate columns altogether (aggregates like sum, avg, etc. should still be possible). |
6507878
to
bd93514
Compare
38abc0e
to
6e64707
Compare
OK, this is what I got implemented so far. For example, using the tables in our spec test:
curl 'localhost:3000/factories?select=name,...processes(processes:name,...supervisors(supervisors:name))' [
{
"name": "Factory C",
"processes": ["Process C1", "Process C2", "Process XX"],
"supervisors": ["Peter", "Peter", null]
},
{
"name": "Factory B",
"process": ["Process B1", "Process B1", "Process B2", "Process B2"],
"supervisors": ["Peter", "Sarah", "Mary", "John"]
},
{
"name": "Factory A",
"process": ["Process A1", "Process A2"],
"supervisors": ["Mary", "John"]
},
{
"name": "Factory D",
"process": [null],
"supervisors": [null]
}
]⏎ [
{
"name":"Factory C",
"processes":["Process C1", "Process C2", "Process XX"],
"supervisors":[{"name": "Peter"}, {"name": "Peter"}, null]},
{
"name":"Factory B",
"processes":["Process B1", "Process B1", "Process B2", "Process B2"],
"supervisors":[{"name": "Peter"}, {"name": "Sarah"}, {"name": "Mary"}, {"name": "John"}]},
{
"name":"Factory A",
"processes":["Process A1", "Process A2"],
"supervisors":[{"name": "Mary"}, {"name": "John"}]},
{
"name":"Factory D",
"processes":[null],
"supervisors":[null]
}
] As I mentioned in previous comments, some values will repeat, since we're grouping by the factory There's a problem when the embeddings have no values, as seen in the |
9002110
to
b3e5483
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature should be ready for review now.
I'm leaving the Edit: Nvm. I figured that it should be OK to include that feature here too, although in different commits...
for DISTINCT
and NOT NULL
for another PR to keep it cleaner.
Here are some comments on the changes done:
67e6419
to
87a13ef
Compare
87a13ef
to
19466a8
Compare
I'm not deep enough into the Planner and Query Builder to be able to properly review the code side of things. @steve-chavez could you look at it again? There have been quite some changes since your last review, I think. |
Unlike the to-one spreads, the columns inside a :ref:`to-many spread relationship <spread_to_many_embed>` are not treated as if they were part of the top-level resource. | ||
The aggregates will be done :ref:`in the context of the to-many spread resource <aggregate_functions_embed_context>`. | ||
For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These special cases make the feature really confusing. And this problem is a consequence of supporting multiple columns (#3640 (comment)), if we supported only a single column the feature would be simple.
And if users wanted multiple columns, they could still do:
...orders(amount),...orders(order_date)
Which already "conveys" why an order is necessary. If the user wants the columns coming from both orders
if they want them to be correlated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some of the work it has been done here can be reused as a form of query optimization though. To not repeat work when two ...orders()
are done.
But I would strongly suggest trying to eliminate these special cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's the way the documentation is presented though. I'll try to play with this and get back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a documentation issue. From looking at the test-cases, this felt very natural now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a documentation issue. From looking at the test-cases, this felt very natural now.
Yes I think the documentation needs to be improved. Maybe we could start with the "only aggregates" example, as it works exactly the same as to-one spreads. After that, we'd mention this case, and reword it to something like: "If you need to group by the columns inside the spread embed, then...`, then give the example.
...orders(amount.sum()),...orders(order_date)
(I added .sum()
to the example) This can still be an alternative to not group by the columns. Just noticed that this doesn't work right now, because we don't allow aliases to the spread embed names. I'll fix that in another commit PR (to not complicate this one any more).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs are looking much better now. I think one problem tangential to this PR is that https://docs.postgrest.org/en/v12/references/api/aggregate_functions.html are too verbose. I made a pass at reducing verbosity on 33b69b5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one problem tangential to this PR is that https://docs.postgrest.org/en/v12/references/api/aggregate_functions.html are too verbose.
I gave it a go here: #3913
898cad3
to
a51008f
Compare
Hey guys, what is preventing this from being approved? |
I'd say this mostly needs a code review. Unfortunately, I am not deep enough into the haskell code in general right now, so I can't provide one. |
The code looks good overall. The implementation is complex, but that's inevitable because it's a "big feature" (kinda like aggregate functions in #2925, or data representations #2523). I think we just need to refine the docs. I've added some more feedback. |
a51008f
to
0fbace2
Compare
} | ||
] | ||
|
||
Note that the aggregate is now returned inside an array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two lines that follow don't seem to add much info or are confusing. I suggest to remove them or make them clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, just to make sure I understand this.
If you have:
...orders(amount.sum())
That sum will be returned as a single value.
But if you add one column
....orders(amount.sum(),order_date)
Then sum will be returned as array right?
(That could be surprising and a a bit complicated to type in Typescript)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, I wonder if it would be better to always return an array. Even on cases of a single column spread.. an array of 1 might be more consistent. What would be the drawback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above is done, we could always say that a to-many spread returns an array. While to-one returns an object. That would make things more consistent IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sum will be returned as a single value.
But if you add one column
Then sum will be returned as array right?
Yes. The idea is that embedding an aggregation without grouping is essentially to-one, not to-many. We know it returns only a single row, so we shouldn't return an array.
(That could be surprising and a a bit complicated to type in Typescript)
I'm not sure how much relevance Typescript should have for our API output.
If the above is done, we could always say that a to-many spread returns an array. While to-one returns an object. That would make things more consistent IMO.
Yes, see above. That's essentially what we're doing. Adding an aggregation function to a "relation" changes that relation. An aggregated table is not the same as a non-aggregated table. Thus, when embedding an aggregated relation, it would also potentially have different to-many/to-one characteristics. I find this quite consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking more thoroughly, the special case is not so much the above to-many-aggregate->to-one-response, but the hoisting of the to-one-aggregates.
I think we need to discuss a very basic question here: Do we want to error out on some cases with "not supported" or do we want to try to give a sane response in as many cases as possible?
We have at least the following cases:
- to-one embed, no aggregate, no spread: returns an object
- to-one embed, no aggregate, spread: spreads values into the parent object
- to-one embed, with aggregate, no spread: returns an object (useless)
- to-one embed, with aggregate, spread: hoisted into the parent object (currently discussed)
- to-many embed, no aggregate, no spread: returns an array of objects
- to-many embed, no aggregate, spread: spreads arrays of values into the parent object
- to-many embed, with aggregate, no group by, no spread: returns single item array of object (inconsistent)
- to-many embed, with aggregate, no group by, spread: spreads values into the parent object
- to-many embed, with aggregate, group by, no spread: returns array of objects
- to-many embed, with aggregate, group by, spread: spreads arrays of values into the parent object
"at least", because we don't make a difference with the number of columns in the embedding, but we could in theory :D.
There's multiple different ways to look at the list above:
- "It's a mess": Everything looks different, we need to cut down on the cases, to be able to explain them in the docs.
- "It's beautiful": We try to only return sensible stuff and try to capture the intent of the user to the best degree possible. We should remove inconsistencies and useless case and make them better.
I think we shouldn't discuss the individual cases back and forth, but instead decide on a general approach to take. I am in the "beautiful API" camp.
I think we should do the following:
- Fix the "inconsistent" case (to-many, with aggregate, no group by, no spread) and make it return an object instead of an array of a single object. Return an array with a known-to-be-single-value is always bad - just think about how annoying that would be to type in Haskell, having a list, where you always need to catch cases that the list could be empty or have more items.. even though you know it can only contain exactly a single item.
- Keep the "currently discussed" case about hoisting.
- Actually take it a step further and change the "useless" case (to-one, with aggregate, no spread) and apply spreading automatically, because.. that's the only sensible thing to interpret this request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't discuss the individual cases back and forth, but instead decide on a general approach to take. I am in the "beautiful API" camp.
Doing it the way of the "beautiful API" scares me a lot because we already been through something similar with "duck typing" (#424, which then was removed in #1430, very painfully so as it was a major breaking change). It may capture "intent" for certain users, but some others might be confused (as me) because of the change of shape in the output.
Just think about how annoying that would be to type in Haskell, having a list, where you always need to catch cases that the list could be empty or have more items..
This was my point when referring to typescript above #3640 (comment), and yes a surprising change of output will be difficult to type in any client library in any language. That's why I think we need to be consistent, rather than beautiful.
Going back a bit, the whole "hoisting" and "automatic group by" confuses me a lot, and I assume users familiar with SQL will be too. I think we should make these explicit operations.
So a group by would be /tbl?select=id,cost.sum()&groupby=id
and a "hoisted" groupby would be analogous to top-level ordering: /tbl?select=id,embed(cost.sum())&groupby=embed(id)
. This is much more consistent with our functionality, our whole top-level
features could be a section in the docs now.
(An aggregate without a groupby wouldn't have to fail, I've floated the idea on how to do it before on #2066 (comment))
So yes, sometimes this might imply useless output, but if the user can quickly make sense of the syntax then it's easy to correct.
Overall I'd like to normalize the output of an aggregate, and not make it a "transformation" but a type of filtering. Only our spreads should be "transformation". This will help a lot with typing on clients too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just think about how annoying that would be to type in Haskell, having a list, where you always need to catch cases that the list could be empty or have more items..
This was my point when referring to typescript above #3640 (comment), and yes a surprising change of output will be difficult to type in any client library in any language. That's why I think we need to be consistent, rather than beautiful.
Uh, that misses the point. This consistency leads to those "always-a-single-item lists". And those are hard to deal with in strongly typed languages. They might be easier to "type", but then harder to work with afterwards, imho. But that's just because... the typing is weaker. Weaker typing is always easier to "type".
and I assume users familiar with SQL will be too. I think we should make these explicit operations.
I think that both the TypeScript and the SQL arguments don't fly. We are not trying to create a TypeScript client for a PostgreSQL database. And we are not trying to expose SQL via http either.
We provide a REST API. This doesn't need to be "SQL-like" at all. Familiarity with SQL should give you exactly zero advantage when using this API. If it does.. then we designed the API in a bad way, because it suddenly requires SQL knowledge to use it well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it the way of the "beautiful API" scares me a lot because we already been through something similar with "duck typing"
So IIUC correctly the problem back then was that you couldn't tell from the syntax of your request what the response would look like, correct?
This is certainly a problem with embeddings in general - and it still is today.
But that's not the case for the aggregation things we're discussing here, right? Those are predictable from syntax.
If you want consistency, then you'd need to return "to-one" embeddings as single-item array all the time. Because that's the part that is not predictable and hard to type.
But I don't think we want to go that way, right? (I certainly don't)
re TypeScript: My vision for first class types would be a custom root endpoint mimetype handler: Once we have split OpenApi off, I hope that we can work towards allowing multiple handlers for the root endpoint. Then, we can implement a handler that returns type definitions straight from the database. And those should be able to treat all those special cases much better, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I'm not yet convinced for or against the group_by
proposal. I think adding explicit operations helps in understanding what is "expected" or not. But I still don't want to lose the simplicity of not needing a separate group_by
when aggregating. Also when trying different queries I keep seeing/expecting the to-many
aggregates as single responses (what was implemented before this convo), so I'm not against that either.
I'd like to complete the implementation of to-many spreads without aggregates in this PR (which got too big) and continue this debate in a different PR.
0fbace2
to
d0173e7
Compare
Just tried this:
Is the error expected? |
Oh, nice catch. It's not expected at all. This error is for |
d0173e7
to
74829f4
Compare
* Note: Aggregates are not implemented
74829f4
to
6328e59
Compare
Closes #3041