Annotations for `set.sub` and related methods are inconsistent #9004

LeeeeT · 2022-10-27T16:44:43Z

According to the documentation s1 - s2 returns the set of all elements that are in s1 but not in s2. So what's the point of constraining s2 elements type? I believe the correct annotation for the argument of set.__sub__ would be AbstractSet[object] rather than AbstractSet[T | None].

typeshed/stdlib/builtins.pyi

Line 1100 in cf9bdc2

def __sub__(self, __s: AbstractSet[_T | None]) -> set[_T]: ...

The text was updated successfully, but these errors were encountered:

Akuli · 2022-10-27T20:08:23Z

You want an error for set[int] - set[str]. Unfortunately there's no way to say "any type that overlaps with _T", and _T | None is an approximation of that to avoid errors in the common case set[int] - set[int | None].

LeeeeT · 2022-10-28T07:09:04Z

@Akuli why do you think there should be an error for set[int] - set[str]? The resulting set can only contain ints, regardless of the elements type of the second set.

Akuli · 2022-10-28T12:50:30Z

A user shipped code that did set[bytes] - set[str] to production, and of course, it didn't work as expected (#1840). There is no use case for set[int] - set[str]: if your code does it, that's very likely not intentional.

LeeeeT · 2022-10-28T14:52:11Z

@Akuli that makes sense. But what about __isub__, __and__, __iand__, intersection, difference, intersection_update and difference_update?

typeshed/stdlib/builtins.pyi

Line 1101 in cf9bdc2

def __isub__(self: Self, __s: AbstractSet[object]) -> Self: ...

Shouldn't they take AbstractSet[_T | None] (for dunder methods) and Iterable[_T | None] (for others) too? I don't see any logic.

I'm also wondering where this _T | None came from, and why isn't it just _T.

Akuli · 2022-10-28T18:15:57Z

__isub__

Good point. In fact, the code from #1840 no longer errors as it should. Apparently this was broken in #7161. @AlexWaygood, what are your thoughts on this?

_T | None

You often want to do set[int] - set[int | None]. The result is a set of integers. There's nothing wrong about it: because it is possible for a set[int | None] to contain integers, it isn't a no-op. Ideally we would say

set[AnythingThatCanBe[_T]]

but that isn't possible, and _T | None is the best approximation of that we have.

I think we should use the _T | None trick for .discard() too. I created #7121 to discuss it but never got around to fixing it.

LeeeeT · 2022-10-28T19:58:57Z

@Akuli you mentioned that when we do set[TypeA] - set[TypeB], we want TypeB to be AnythingThatCanBe[TypeA]. But isn't it just any type? We can always create a subtype of TypeB that is also a subtype of TypeA using multiple inheritance.

Consider the following example:

class TypeA: pass

class TypeB: pass

class TypeAB(TypeA, TypeB): pass

ab = TypeAB()

a: set[TypeA] = {TypeA(), ab}
b: set[TypeB] = {TypeB(), ab}

print(a - b)  # {<TypeA object>}
# (isn't no-op)

Even if TypeB is not a subtype of TypeA, set[TypeB] can still contain instances of TypeA.
That's not the case for int and str since we can't create a subclass of both of them, but classes with __slots__ are just an exception. In general case, variable of arbitrary type TypeA can be set to an instance of any other arbitrary type TypeB (to an instance of intersection of these two types, to be more precise).

Currently, type checkers complain about the last line, and I believe they shouldn't. What do you think?

Akuli · 2022-10-28T20:19:35Z

An intersection instance of str and int cannot exist, for example, because of the memory layout of int and str objects in CPython. They store different kinds of data in the beginning of the structure, and because objects are represented as pointers to the beginning, there's no way to have something contain the datas needed for both.

Here's what the interpreter says if you try:

>>> class Foo(int, str): pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: multiple bases have instance lay-out conflict

Edit: Nevermind, you mentioned this already. I get your point, it would be practically almost any type. Maybe we should just stick to _T | None.

LeeeeT · 2022-10-28T20:41:54Z

My proposal is to make __sub__ and related methods accept AbstractSet[object]. Passing a set of arbitrary type to these methods would never cause an error at runtime. So I think, it's safe, and it wouldn't affect the existing code because it would make type checkers more lenient.
Preventing a user from doing set[TypeA] - set[TypeB], where TypeA and TypeB do not overlap, is also incorrect, because it's not always no-op as I've shown in the above example.

Akuli · 2022-10-28T20:44:07Z

I would prefer "practicality beats purity". Even though we usually want to prevent false positives, we also don't want people to ship broken code to production, which IMO is more important than rarely getting extra warnings during development.

LeeeeT · 2022-10-28T21:27:12Z

What broken code are you talking about? The code in #1840 may be considered broken because it's no-op. But it's no-op only because we can't create a subclass of both str and bytes. For slotless classes this is not necessarily no-op, and there's no point in disallowing this.

Akuli · 2022-10-28T21:39:32Z

By broken code, I mean code that doesn't work as intended, in this case set[Foo] - set[Bar] where the author doesn't intend to deal with objects that are simultaneously Foo and Bar. This is a real problem that users are facing. If you have some production / non-toy-example code that fails to type check because of this, I would be interested to see it.

LeeeeT · 2022-10-29T11:56:08Z

When you write set[Foo] or set[Bar] you have to keep in mind that these sets can contain objects that are simultaneously Foo and Bar. The fact that you can't annotate an object that is Foo but not Bar is a different problem.
I'd also be glad to see real examples where set[Foo] - set[Bar] could be dangerous :)

LeeeeT · 2022-10-30T11:38:05Z

Another example:

my_ints = {1, 2, 3}
my_objs = {1, None, 'a', 2}

print(my_ints - my_objs)  # {3}

I'm getting an error for the last line. This is pretty common scenario though. How am I supposed to do it differently?

I can write my_ints.difference(my_objs) instead, but this is due to inconsistency between __sub__ and difference (which do exactly the same), and when the inconsistency is fixed, difference won't be an option either.

Akuli · 2022-11-03T15:54:00Z

@AlexWaygood What are your thoughts on this issue? Your PR #7161 made the methods inconsistent, and while you argued that Any is better (as in this issue), you apparently didn't consider the "no-op in production" argument from #1840.

AlexWaygood · 2022-11-04T10:08:32Z

I'll try to take a look at this today, sorry for the radio silence!

randolf-scholz · 2024-03-08T14:10:33Z

Here's an annoying consequence of this: pyright raises an error if we consider set[LiteralString] - set[str] (pyright-playground):

COLS = ("TIME", "VALUE")
"""Needed columns."""

csv_data: list[list[str]] = [[]]
header = csv_data[0]

if missing_cols := set(COLS) - set(header):  # reportOperatorIssue
    raise ValueError(f"Missing columns: {missing_cols}")

Operator "-" not supported for types "set[Literal['TIME', 'VALUE']]" and "set[str]"

I would agree that in most cases set[T] - set[S] is probably an error. Given the example above, maybe a good compromise is to actually allow the case when S is a supertype of T? This would also cover the requested set[T] - set[T | None] scenario, but not the case when T and S are incomparable, such as set[int] - set[str].

randolf-scholz · 2024-03-08T14:33:15Z

Not sure if that can be implemented at the moment, I think it may require generic bounds to type-vars (python/typing#548, python/mypy#8278)

class MySet[T](MutableSet[T]):
    @overload  # case A≥B
    def __sub__(self, other: AbstractSet[T]) -> Self: ...
    @overload  # case A≤B
    def __sub__[B, A: B](self: MySet[A], other: AbstractSet[B]) -> MySet[A]: ...

Akuli added the reason: inexpressable Closed, because this can't be expressed within the current type system label Oct 27, 2022

Akuli closed this as completed Oct 27, 2022

Akuli reopened this Oct 28, 2022

Akuli changed the title ~~Incorrect annotation for set.__sub__~~ Annotations for set.__sub__ and related methods are inconsistent Oct 28, 2022

Akuli removed the reason: inexpressable Closed, because this can't be expressed within the current type system label Oct 28, 2022

AlexWaygood linked a pull request Feb 11, 2024 that will close this issue

Improve annotations for various set methods #11403

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotations for `set.sub` and related methods are inconsistent #9004

Annotations for `set.sub` and related methods are inconsistent #9004

LeeeeT commented Oct 27, 2022

Akuli commented Oct 27, 2022 •

edited

Loading

LeeeeT commented Oct 28, 2022

Akuli commented Oct 28, 2022

LeeeeT commented Oct 28, 2022

Akuli commented Oct 28, 2022

LeeeeT commented Oct 28, 2022 •

edited

Loading

Akuli commented Oct 28, 2022 •

edited

Loading

LeeeeT commented Oct 28, 2022 •

edited

Loading

Akuli commented Oct 28, 2022

LeeeeT commented Oct 28, 2022

Akuli commented Oct 28, 2022

LeeeeT commented Oct 29, 2022

LeeeeT commented Oct 30, 2022

Akuli commented Nov 3, 2022

AlexWaygood commented Nov 4, 2022

randolf-scholz commented Mar 8, 2024 •

edited

Loading

randolf-scholz commented Mar 8, 2024 •

edited

Loading

Annotations for set.__sub__ and related methods are inconsistent #9004

Annotations for set.__sub__ and related methods are inconsistent #9004

Comments

LeeeeT commented Oct 27, 2022

Akuli commented Oct 27, 2022 • edited Loading

LeeeeT commented Oct 28, 2022

Akuli commented Oct 28, 2022

LeeeeT commented Oct 28, 2022

Akuli commented Oct 28, 2022

LeeeeT commented Oct 28, 2022 • edited Loading

Akuli commented Oct 28, 2022 • edited Loading

LeeeeT commented Oct 28, 2022 • edited Loading

Akuli commented Oct 28, 2022

LeeeeT commented Oct 28, 2022

Akuli commented Oct 28, 2022

LeeeeT commented Oct 29, 2022

LeeeeT commented Oct 30, 2022

Akuli commented Nov 3, 2022

AlexWaygood commented Nov 4, 2022

randolf-scholz commented Mar 8, 2024 • edited Loading

randolf-scholz commented Mar 8, 2024 • edited Loading

Annotations for `set.sub` and related methods are inconsistent #9004

Annotations for `set.sub` and related methods are inconsistent #9004

Akuli commented Oct 27, 2022 •

edited

Loading

LeeeeT commented Oct 28, 2022 •

edited

Loading

Akuli commented Oct 28, 2022 •

edited

Loading

LeeeeT commented Oct 28, 2022 •

edited

Loading

randolf-scholz commented Mar 8, 2024 •

edited

Loading

randolf-scholz commented Mar 8, 2024 •

edited

Loading