Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Could :in support NULL? #315

Closed
brjann opened this issue Mar 19, 2021 · 10 comments
Closed

Could :in support NULL? #315

brjann opened this issue Mar 19, 2021 · 10 comments
Labels

Comments

@brjann
Copy link

brjann commented Mar 19, 2021

SQL handles NULL in a special way, so you cannot write WHERE col = NULL. HoneySQL helps you out by parsing [:= col NULL] as col IS NULL.

IN has the same problem. col IN (1,2,NULL) does not work as expected, it does not return true if col is NULL

Would it be too much magic if HoneySQL did something similar with IN as it does with =?

I.e. instead of [:in col [1 2 nil]] => [col IN (?, ?, NULL)], it would parse as (([col IN (?, ?)]) OR col IS NULL). And if the vector only includes nil, it would simply become col IS NULL

@seancorfield
Copy link
Owner

That feels like "too much magic" to me but I'll give it some thought.

@seancorfield
Copy link
Owner

Other cases to consider:

  • [:in :col []]
  • [:in :col [nil]]

@seancorfield seancorfield added this to the 2.0.0 milestone Apr 10, 2021
@seancorfield
Copy link
Owner

@brjann I'd love to get your feedback on the changes I've made to support this (on the v2 branch, it was added after Beta 1).

@brjann
Copy link
Author

brjann commented Apr 12, 2021

Thanks @seancorfield , I really appreciate this!

I have looked at the tests for the cases that include one or more nil values and they look exactly like I suggested. Two thoughts:

  1. In MySQL an empty IN() statement is an error. This is a pain sometimes when I want exactly the effect that you propose, for the clause to match nothing (i.e., WHERE FALSE). So I always have to remember to wrap my queries with a (when (seq in-coll) ...). However, sometimes an empty in-coll also has a meaning, which is to not limit the query by the in clause (for example if a user has permission to view all records in a table, not only the ones matching some group id or similar). So then I instead exclude the IN clause from the query. OTOH, I guess that an empty in-coll in that case should be considered as "no permissions". But I have assumed that this why MySQL does not accept empty IN() clauses, because it it is ambiguous if it should match all or no records. So although it sometimes bites me, I find that it has been good that I am forced to make a conscious decision on how to handle an empty in-coll.
  2. I realize that this is maybe arguing against my own proposal... But is there a risk that one would assume that this special handling of nil will also work with tuples? For WHERE (col1, col2) IN ((nil, 1), (2, nil), (1, 1)) to work, it would have to be expanded to WHERE (col1, col2) IN ((1, 1)) OR (Col1 IS NULL AND Col2=1) OR (Col1=1 AND Col2 IS NULL)

@brjann
Copy link
Author

brjann commented Apr 12, 2021

Regarding my first thought. Maybe there is actually no valid case where IN() should not be interpreted as matching no rows.

@seancorfield
Copy link
Owner

Yeah, I'm on the fence about #1 -- but it sort of falls out of WHERE col IN (NULL) because if you lift the NULL check out, you're left with (col IS NULL OR col IN ()) so you have to eliminate the OR clause in that case. But there are definitely arguments in favor of leaving WHERE col IN () alone and making it an error so users must explicitly choose how to handle it. Once you take that step though, where a user has to explicitly choose an action based on (seq in-coll), then you could also argue that the user should also take care of any nil values in in-coll!

The problem with #2 is that it would become a combinatoric explosion of cases to handle so we definitely do not want to expand the NULL-handling to cover the tuple case. And, you are right: that does sort of argue against even handling the base case here in this ticket to some extent.

@brjann
Copy link
Author

brjann commented Apr 12, 2021

#1. Sure. But I do have a specific use case where NULL / nil is a valid value of a column that has a specific meaning. But it may be too much of a corner case. I have actually written my own in* handler now to cover this use case.

#2. I agree, supporting tuples would be very magical. But wouldn't become combinatorial though? One would just move every nil containing tuple out and check it separately. Or am I missing something?

@seancorfield
Copy link
Owner

You're right that #2 wouldn't become combinatoric but in the case where you have a lot of tuples in the collection and, say, half of them have nil in one slot or the other, then the naive expansion is going to be one IN for half of them and a separate OR for each of the other ones that contain nil -- which could be a pretty large expansion of SQL code -- or a "simplification engine" that tries to separate out a NULL check for each column that could be nil and an accompanying IN for all the other columns, which gets substantially more complex as the number of columns in the tuple grows: consider the possible expansions for 3 columns, 4, 5, etc. I definitely don't want to go down either of those paths.

As for possibly handling nil in a single collection and/or an empty collection, I'm beginning to side with "this is also too much magic" and your in* approach sounds like the right approach when you know that you're dealing with a potentially NULL column that has a specific meaning. I might even consider adding :in* as a built-in clause for that check (WHERE col IS NULL OR col IN (...)) at some point in the future.

So far, the only feedback on Slack is against this change on the basis that in general having an empty collection or nil in the collection is more likely to be a programmer error.

@brjann
Copy link
Author

brjann commented Apr 12, 2021

So far, the only feedback on Slack is against this change on the basis that in general having an empty collection or nil in the collection is more likely to be a programmer error.

Yes. That is probably the case. So silently passing that nil through may be a bad idea. And it has it advantages that I can identify this specific case by :in*. Anyway, I really appreciate that you entertained the idea and followed through with tests and all.

seancorfield added a commit that referenced this issue Apr 12, 2021
@seancorfield
Copy link
Owner

I've reverted the changes and I'm going to close this out. I'm probably going to add a :checking option (default none/off) that would allow for a mode where HoneySQL does various additional checks on data and throws exceptions. The cases here could be: IN () always being wrong and IN (NULL,?) likely being wrong.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants