Skip to content

Annotations for set.__sub__ and related methods are inconsistent #9004

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
LeeeeT opened this issue Oct 27, 2022 · 17 comments · May be fixed by #11403
Open

Annotations for set.__sub__ and related methods are inconsistent #9004

LeeeeT opened this issue Oct 27, 2022 · 17 comments · May be fixed by #11403

Comments

@LeeeeT
Copy link
Contributor

LeeeeT commented Oct 27, 2022

According to the documentation s1 - s2 returns the set of all elements that are in s1 but not in s2. So what's the point of constraining s2 elements type? I believe the correct annotation for the argument of set.__sub__ would be AbstractSet[object] rather than AbstractSet[T | None].

def __sub__(self, __s: AbstractSet[_T | None]) -> set[_T]: ...

@Akuli Akuli added the reason: inexpressable Closed, because this can't be expressed within the current type system label Oct 27, 2022
@Akuli
Copy link
Collaborator

Akuli commented Oct 27, 2022

You want an error for set[int] - set[str]. Unfortunately there's no way to say "any type that overlaps with _T", and _T | None is an approximation of that to avoid errors in the common case set[int] - set[int | None].

@Akuli Akuli closed this as completed Oct 27, 2022
@LeeeeT
Copy link
Contributor Author

LeeeeT commented Oct 28, 2022

@Akuli why do you think there should be an error for set[int] - set[str]? The resulting set can only contain ints, regardless of the elements type of the second set.

@Akuli
Copy link
Collaborator

Akuli commented Oct 28, 2022

A user shipped code that did set[bytes] - set[str] to production, and of course, it didn't work as expected (#1840). There is no use case for set[int] - set[str]: if your code does it, that's very likely not intentional.

@LeeeeT
Copy link
Contributor Author

LeeeeT commented Oct 28, 2022

@Akuli that makes sense. But what about __isub__, __and__, __iand__, intersection, difference, intersection_update and difference_update?

def __isub__(self: Self, __s: AbstractSet[object]) -> Self: ...

Shouldn't they take AbstractSet[_T | None] (for dunder methods) and Iterable[_T | None] (for others) too? I don't see any logic.

I'm also wondering where this _T | None came from, and why isn't it just _T.

@Akuli
Copy link
Collaborator

Akuli commented Oct 28, 2022

__isub__

Good point. In fact, the code from #1840 no longer errors as it should. Apparently this was broken in #7161. @AlexWaygood, what are your thoughts on this?

_T | None

You often want to do set[int] - set[int | None]. The result is a set of integers. There's nothing wrong about it: because it is possible for a set[int | None] to contain integers, it isn't a no-op. Ideally we would say

set[AnythingThatCanBe[_T]]

but that isn't possible, and _T | None is the best approximation of that we have.

I think we should use the _T | None trick for .discard() too. I created #7121 to discuss it but never got around to fixing it.

@Akuli Akuli reopened this Oct 28, 2022
@Akuli Akuli changed the title Incorrect annotation for set.__sub__ Annotations for set.__sub__ and related methods are inconsistent Oct 28, 2022
@Akuli Akuli removed the reason: inexpressable Closed, because this can't be expressed within the current type system label Oct 28, 2022
@LeeeeT
Copy link
Contributor Author

LeeeeT commented Oct 28, 2022

@Akuli you mentioned that when we do set[TypeA] - set[TypeB], we want TypeB to be AnythingThatCanBe[TypeA]. But isn't it just any type? We can always create a subtype of TypeB that is also a subtype of TypeA using multiple inheritance.

Consider the following example:

class TypeA: pass

class TypeB: pass

class TypeAB(TypeA, TypeB): pass

ab = TypeAB()

a: set[TypeA] = {TypeA(), ab}
b: set[TypeB] = {TypeB(), ab}

print(a - b)  # {<TypeA object>}
# (isn't no-op)

Even if TypeB is not a subtype of TypeA, set[TypeB] can still contain instances of TypeA.
That's not the case for int and str since we can't create a subclass of both of them, but classes with __slots__ are just an exception. In general case, variable of arbitrary type TypeA can be set to an instance of any other arbitrary type TypeB (to an instance of intersection of these two types, to be more precise).

Currently, type checkers complain about the last line, and I believe they shouldn't. What do you think?

@Akuli
Copy link
Collaborator

Akuli commented Oct 28, 2022

An intersection instance of str and int cannot exist, for example, because of the memory layout of int and str objects in CPython. They store different kinds of data in the beginning of the structure, and because objects are represented as pointers to the beginning, there's no way to have something contain the datas needed for both.

Here's what the interpreter says if you try:

>>> class Foo(int, str): pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict

Edit: Nevermind, you mentioned this already. I get your point, it would be practically almost any type. Maybe we should just stick to _T | None.

@LeeeeT
Copy link
Contributor Author

LeeeeT commented Oct 28, 2022

My proposal is to make __sub__ and related methods accept AbstractSet[object]. Passing a set of arbitrary type to these methods would never cause an error at runtime. So I think, it's safe, and it wouldn't affect the existing code because it would make type checkers more lenient.
Preventing a user from doing set[TypeA] - set[TypeB], where TypeA and TypeB do not overlap, is also incorrect, because it's not always no-op as I've shown in the above example.

@Akuli
Copy link
Collaborator

Akuli commented Oct 28, 2022

I would prefer "practicality beats purity". Even though we usually want to prevent false positives, we also don't want people to ship broken code to production, which IMO is more important than rarely getting extra warnings during development.

@LeeeeT
Copy link
Contributor Author

LeeeeT commented Oct 28, 2022

What broken code are you talking about? The code in #1840 may be considered broken because it's no-op. But it's no-op only because we can't create a subclass of both str and bytes. For slotless classes this is not necessarily no-op, and there's no point in disallowing this.

@Akuli
Copy link
Collaborator

Akuli commented Oct 28, 2022

By broken code, I mean code that doesn't work as intended, in this case set[Foo] - set[Bar] where the author doesn't intend to deal with objects that are simultaneously Foo and Bar. This is a real problem that users are facing. If you have some production / non-toy-example code that fails to type check because of this, I would be interested to see it.

@LeeeeT
Copy link
Contributor Author

LeeeeT commented Oct 29, 2022

When you write set[Foo] or set[Bar] you have to keep in mind that these sets can contain objects that are simultaneously Foo and Bar. The fact that you can't annotate an object that is Foo but not Bar is a different problem.
I'd also be glad to see real examples where set[Foo] - set[Bar] could be dangerous :)

@LeeeeT
Copy link
Contributor Author

LeeeeT commented Oct 30, 2022

Another example:

my_ints = {1, 2, 3}
my_objs = {1, None, 'a', 2}

print(my_ints - my_objs)  # {3}

I'm getting an error for the last line. This is pretty common scenario though. How am I supposed to do it differently?

I can write my_ints.difference(my_objs) instead, but this is due to inconsistency between __sub__ and difference (which do exactly the same), and when the inconsistency is fixed, difference won't be an option either.

@Akuli
Copy link
Collaborator

Akuli commented Nov 3, 2022

@AlexWaygood What are your thoughts on this issue? Your PR #7161 made the methods inconsistent, and while you argued that Any is better (as in this issue), you apparently didn't consider the "no-op in production" argument from #1840.

@AlexWaygood
Copy link
Member

I'll try to take a look at this today, sorry for the radio silence!

@AlexWaygood AlexWaygood linked a pull request Feb 11, 2024 that will close this issue
@randolf-scholz
Copy link
Contributor

randolf-scholz commented Mar 8, 2024

Here's an annoying consequence of this: pyright raises an error if we consider set[LiteralString] - set[str] (pyright-playground):

COLS = ("TIME", "VALUE")
"""Needed columns."""

csv_data: list[list[str]] = [[]]
header = csv_data[0]

if missing_cols := set(COLS) - set(header):  # reportOperatorIssue
    raise ValueError(f"Missing columns: {missing_cols}")

Operator "-" not supported for types "set[Literal['TIME', 'VALUE']]" and "set[str]"

I would agree that in most cases set[T] - set[S] is probably an error. Given the example above, maybe a good compromise is to actually allow the case when S is a supertype of T? This would also cover the requested set[T] - set[T | None] scenario, but not the case when T and S are incomparable, such as set[int] - set[str].

@randolf-scholz
Copy link
Contributor

randolf-scholz commented Mar 8, 2024

Not sure if that can be implemented at the moment, I think it may require generic bounds to type-vars (python/typing#548, python/mypy#8278)

class MySet[T](MutableSet[T]):
    @overload  # case A≥B
    def __sub__(self, other: AbstractSet[T]) -> Self: ...
    @overload  # case A≤B
    def __sub__[B, A: B](self: MySet[A], other: AbstractSet[B]) -> MySet[A]: ...

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants