Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support for python 3.10 match statement #10191

Merged
merged 84 commits into from
Jan 18, 2022

Conversation

freundTech
Copy link
Contributor

@freundTech freundTech commented Mar 9, 2021

Description

This PR will add support for the python 3.10 match statement to mypy.

Current progress:

  • parser and ast nodes
  • semantic analysis
  • type-checking
    • as-pattern
    • or-pattern
    • literal-pattern
    • value-pattern
    • capture-pattern
    • wildcard-pattern
    • sequence-pattern
    • mapping-pattern
    • class-pattern
  • Lots of tests

Test Plan

Some tests have been added, but they don't pass yet. There will be a lot more tests in the future.

I'm currently developing and testing on cpython 3.10.0b1 and will make sure to update to newer versions every now and then.

mypy-requirements.txt Outdated Show resolved Hide resolved
@brandtbucher
Copy link
Member

brandtbucher commented Mar 10, 2021

Just curious: does our decision to reuse expression nodes as pattern nodes in the Python AST cause any trouble for you?

Our intuition is that creating duplicates of all of the expression nodes used by patterns would be less convenient than the current design, but we're open to feedback if not. As I understand it, you transform Python's AST into your own very early on.

@brandtbucher
Copy link
Member

(Also, let me know if you have any questions or want reviews on this. I'm very much looking forward to mypy's support for this feature!)

@freundTech
Copy link
Contributor Author

freundTech commented Mar 10, 2021

@brandtbucher The reuse of expression nodes does create some problems, but it shouldn't be too bad.

I currently have two solutions in mind, but haven't decided on which would be better:

  1. Duplicate the nodes in mypy: currently the mypy ast nodes very closely follow the python ast nodes. It would be possible to just duplicate the needed nodes in mypy and choose the correct one while converting from python ast to mypy ast.
  2. Duplicate the visitors: It would also be possible to just create new visitors wherever needed. Then for example in SemanticAnalyzer visit_match_stmt wouldn't call pattern.accept(self), but pattern.accept(SemanticPatternAnalyzer(self)). This would allow us to keep the mypy ast closer to the python ast and wouldn't produce any extra work for cases where we don't have to differentiate between patterns and expressions (like the already implemented StrConv)

Depending on what kind of data will have to be stored in the nodes option 1 may or may not be the only practical one.

Reviews and other feedback is always welcome, no matter how early.
If I push it it's ready for review 😉

@freundTech
Copy link
Contributor Author

I decided to go with the first option and create separate ast nodes for patterns. This will probably be a bit more work at the beginning, but will lead to better code and prevent problems down the line.

So far I created nodes for as-patterns, or-patterns and literal patterns.

Sometimes converting from python ast to my pattern ast isn't trivial.
For example 1+2j is a valid literal pattern while 2j+1 is a syntax error (when used as a pattern). Python would represent them both as BinOp(x, Add, y), just with the x and y swapped.

JukkaL pushed a commit that referenced this pull request Mar 10, 2021
)

This fixes strconv not displaying False and possibly other falsy values. 
I ran into this problem while writing tests for #10191.
@freundTech freundTech mentioned this pull request Mar 11, 2021
21 tasks
@freundTech
Copy link
Contributor Author

@brandtbucher With the ast conversion phase of this now being finished here's some more feedback on the reuse of expression nodes:

I ended up adding nodes for patterns to mypy and, during ast conversion, converting from expression nodes to pattern nodes.

This was a bit harder than ast conversion would have been if python had dedicated pattern nodes.
The biggest problem in my opinion was static types for expression nodes often being much broader than for the corresponding pattern nodes.

Take a mapping pattern for example:

case {'k': v}:
    pass

This is represented as an ast.Dict, which, according to static types, can have arbitrary expressions as keys. When used as a pattern it can only have literal patterns and value patterns (a.k.a. constant expressions and member expressions) as keys though. This leads to me having to check or cast the types in order to make type checking work, when at runtime the invalid values are already caught by the python parser and don't even reach my code.

Another problem were literal patterns. A literal pattern, expressed as expression nodes, can be a Constant, UnaryOp or BinOp. At the same time not all Constant, UnaryOp and BinOp expressions are valid literal patterns. This makes conversion quite complicated.

I can understand why you went with not duplicating the nodes, but, at least for this usecase, it would have been a lot more comfortable if python would offer separate pattern nodes.

@brandtbucher
Copy link
Member

Thanks for your thoughts!

When I find the time, I might draw up a POC branch with dedicated pattern nodes to see how much that complicates or simplifies CPython's internals. If its a clear win both there and here, then we might as well just do it.

I'll take a closer look at what you've done here first. As you note, handling for things like mappings and numeric literals might be simplified dramatically.

@freundTech
Copy link
Contributor Author

I'm currently working on the type-checking portion of this and would like to hear some opinions on how "assigning" to capture variables might work.

PEP 634 allows capturing to existing variables. In this case we would have to make sure that the captured value fits the type.

PEP 634 also says that captured variables can be used after the match statement, which means that we need to produce a type that works for all match cases.

If we use the normal rules then something like this would not be allowed:

match m:
    case int(x):
        pass
    case str(x):
        pass

Instead two different variable names would have to be used.

Alternatively we could type x as the union or join of all the types it captures and narrow it within the case block. This would be more flexible for the user.

@brandtbucher
Copy link
Member

brandtbucher commented Apr 2, 2021

Not a typing expert, but I would expect the type of x to be inferred as int in the first case, str in the second, and int | str after.

...although I can see the possible benefit of requiring an explicit x: int | str before the match block to prevent errors. If we go that route, though, the type should still be narrowed in each case.

(We're sort of in uncharted territory here, since the names are not scoped to the case block like in many other languages with similar constructs.)

@JelleZijlstra
Copy link
Member

@freundTech The equivalent nonmatch code would be something like

if isinstance(m, int):
    x = m
elif isinstance(m, str):
    x = m

And that currently produces Incompatible types in assignment (expression has type "str", variable has type "int"). So the consistent thing would be to also produce an error for the match code you posted.

I feel like @brandtbucher's proposed semantics are more useful though, so I'd prefer to implement that.

@gvanrossum
Copy link
Member

Pyright already supports pattern matching. What does that do?

@freundTech
Copy link
Contributor Author

freundTech commented Apr 3, 2021

I just checked the behaviour in pyright and found the following:

  1. Pyright doesn't yet support type inference for class patterns, therefore in the above example x is inferred as Unknown
  2. Taking another example
a: List[int] = ...
match a:
    case [b, *x]:
        reveal_type(b)
    case b:
        reveal_type(b)

reveal_type(b)

I get int, List[int] and int | List[int]. So basically the union behaviour discussed above.

  1. In pyright this is consistent with what it infers for @JelleZijlstra's example. There it infers int | str. In mypy this behaviour wouldn't be consistent with the if/elif behaviour.

This leaves us with three options:

  1. Being inconsistent between if/elif and match is ok. Use Union for match
  2. Being inconsistent is not ok. Keep the existing semantics
  3. Being inconststent is not ok. Make if/elif work like in pyright

I would prefer 1, because 2 is probably to restrictive and you are much more likely to use variables assigned in if/elif blocks after those blocks than you are to use capture variables after the match, so pyright's behaviour could hide potential errors.

In this case variables assigned inside the case blocks should still follow the if/elif behaviour though and the union behaviour should only apply to capture patterns.

@freundTech
Copy link
Contributor Author

freundTech commented Apr 3, 2021

A second point of discussion would be the behaviour of or-patterns. Here we again have the option of either allowing different types and typing the variable as the union of them or enforcing that capture patterns in or-patterns have the same type.

Python already enforces that all subpatterns of or-patterns capture the same names, therefore it would be natural to also enforce that they have the same types. Pyright currently does not enforce this.

@gvanrossum
Copy link
Member

Okay, I think it's fine to be inconsistent, and inferring a union when the paths merge seems fine.

There are some refinements possible for control flow which will certainly be requested by some users, but we can put those off till later. E.g.

def f(a: List[int]):
  match a:
    case [x, 0]: ...
    case x: return
  reveal_type(x)  # Ideally, int

@sobolevn
Copy link
Member

Congrats! Long awaited feature 🎉

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 18, 2022

I filed #12010 about exhaustiveness checking.

@henryiii
Copy link
Contributor

henryiii commented Jan 18, 2022

That issue seems to be about noticing that something has been exhaustively checked. But the other direction is very important too - it's important to exhaustively check in many/most cases when there's a limited number of choices (literals, enums, anything that can be narrowed down but has not been narrowed down all the way). I think the two possibilities were to add a "strict" mode check to require literals & enums to error if all possibilities are not exhausted, and the other was to use a case _: assert_never(...) clause. At runtime, match statements are not checked for exhaustivity, but that would be hard and expensive, possibly impossible due to adding things after-the-fact - while it's easier for MyPy, and kind of the point of MyPy is to be stricter than runtime Python. :) It's also really easy and much more explicit to opt out of strictness checking with a case _: pass.

If the explicit opt-in is chosen, though, I think assert_never or something like it should be added to typing - the reason it has not been added with nicer spelling to typing yet has been because case statements were supposed to be the way to check exhaustively! (Personally, I'd be happy with both a strictness check and a typing "assert_never" addition, so that classic if's can be opt in, and case statements would be opt-out).

@gvanrossum
Copy link
Member

Why not use case _: assert False, which is already in the language? Sure, with -O it won't check, but the assert indicates that the code is not supposed to get there in the first place, so it's the user's responsibility to feel comfortable that the case _ is indeed unreachable. And the type checker can help by letting you know if it sees a way that it could be reached (e.g. an enum that has no case).

If the user wants the case _ reachable (say, in a validating parser) they can explicitly raise an exception: case _: raise ValueError. (Or a user-defined error or in some cases TypeError.)

@JelleZijlstra
Copy link
Member

Let's move further discussion of exhaustiveness to #12010.

@henryiii
Copy link
Contributor

assert False doesn't type check. If I'm operating on an enum or a literal, I want the type checker to tell me that I've handled all cases. For example, this is real code from cibuildwheel that currently is type-checked:

PlatformName = Literal["linux", "macos", "windows"]
PLATFORMS: Final[Set[PlatformName]] = {"linux", "macos", "windows"}

...

if platform == "linux":
    cibuildwheel.linux.build(options, tmp_path)
elif platform == "windows":
    cibuildwheel.windows.build(options, tmp_path)
elif platform == "macos":
    cibuildwheel.macos.build(options, tmp_path)
else:
    assert_never(platform)

If a new platform is added, all the places this used (like the one above) will be flagged by MyPy as incomplete.

If this was written as a match statement, I'd ideally like this to be enforced by the type checker to not contain fallthrough (via a strictness flag in mypy):

match platform:
    case "linux":
        cibuildwheel.linux.build(options, tmp_path)
    case "windows":
        cibuildwheel.windows.build(options, tmp_path)
    case "macos":
        cibuildwheel.macos.build(options, tmp_path)

Completeness checking for pattern matching is not done at runtime, since that would require checking against the Literal, would slow the program, etc, so it was not included. But it seems quite valid for a strict flag in mypy. If I explicitly didn't want to have completeness here, I could add case _: pass to make mypy happy with the strictness flag enabled; it would be explicit to the reader that I intended to fall through if unmatched.

But if such a flag was not added, and I have to add case invalid: assert_never(invalid) (which would also, for better or worse, catch fallthrough at runtime), then assert_never or something like it should be added to typing, as I believe part of the original reason for not adding it was because pattern matching was supposed to be the correct way to do completeness checking. (Or at least some way to avoid misusing NoReturn when implementing it - also the spelling assert_never might not be ideal, maybe assert_unreachable?)

Quoting https://hakibenita.com/python-mypy-exhaustive-checking:

In 2018 Guido though assert_never is a pretty clever trick, but it never made it into mypy. Instead, exhaustiveness checking will become officially available as part of mypy if/when PEP 622 - Structural Pattern Matching is implemented. Until then, you can use assert_never instead.

@gvanrossum
Copy link
Member

assert False doesn't type check.

But we could easily make it so. To people who aren't deep into typing, assert False is immediately understandable; assert_never() they would have to look up.

If this was written as a match statement, I'd ideally like this to be enforced by the type checker to not contain fallthrough (via a strictness flag in mypy):

I don't like the idea of such a strictness flag. It doesn't work for large applications (there's always some part of the code that doesn't follow the convention). It's much more scalable to require explicitly marking a case where you want this using case _: assert False.

[...] part of the original reason for not adding it was because pattern matching was supposed to be the correct way to do completeness checking.

By some folks, perhaps. But in the end pattern matching was developed separate from type checking concerns (because it was already controversial enough, and tying it to type checking would probably have sunk it). So let's live with the status quo, which is that you must add an explicit case _ to capture incompleteness, and assert False seems to be the most logical thing to put in code that you expect to be unreachable.

@jstasiak
Copy link
Contributor

jstasiak commented Jan 18, 2022

For what it's worth I believe in most (if not all) codebases I worked with strict exhaustiveness checking (enabled with a linter flag, of course, with a case _: pass escape hatch) would be very much appreciated and it would solve much more problems than it would create.

@henryiii
Copy link
Contributor

henryiii commented Jan 18, 2022

Also, I expect there are very few codebases using pattern matching yet, and even if there are, they are not statically typing those yet, since it was just merged. And no one would force them to enable the --exhaustive-match or whatever the flag would be. That's the idea of a strictness check, it forces a codebase to be more strict than runtime Python allows. In this case, you would only be able to leave off a case _: if all matches were accounted for, and by leaving that off, you are promising that you statically catch all cases. That doesn't mean assert False couldn't also be useful (such as for runtime checking), and it would not interfere with that flag at all, but I do think such a flag would make match statements more useful. Just my 2 cents, though.

PS: I do like assert False better than assert_never() as long as there are no other side effects to making it a mypy error! :)

@gvanrossum
Copy link
Member

Well, okay, as long as it's a per-module flag.

Also note that pyright (the type checker in VS Code's Python extension) has supported match/case for some time now (IIRC since 3.10 beta). We should ask on typing-sig what their experiences are.

Re: assert False -- maybe we could make it a mypy error only if a case _: assert False can be reached (I'm sure people use it to mark properties of their code that mypy cannot prove).

@henryiii
Copy link
Contributor

Not fond of special handling of assert False just inside a catch-all case statement. It's not discoverable - unless I know about this discussion, I wouldn't know it's doing something special for MyPy, while some custom types.assert_unreachable would be much easier to read as doing something special here. If it universally was caught, that would not be be an issue, but the special casing would be confusing, I think.

I'd also like to test for an empty union of types elsewhere - like we are already doing with assert_never via NoReturn. Besides the next 4 years while we are waiting for 3.9's EOL, there might1 be cases where you need to narrow down a union to nothing with an if statement and can't use case/match. I think if I saw that assert False was handled by MyPy here, again without reading this discussion here, I'd try it there, and it wouldn't work - in the worst possible way: it would not trigger a mypy error/warning.

Finally, assert False can be disabled at runtime, and if you can't reach it, there's no point in disabling it, it doesn't speed anything up. But if you do reach it, you don't want it disabled, that's still an error. So it doesn't really seem like the right runtime structure here - I'd use raise AssertionError instead, I think, if I were programming for runtime. assert_never does work at runtime even with assertions disabled, by the way. It also could directly print out the types that didn't get narrowed out, like reveal_type - but maybe that could be done in the special cased assert False too, since it's tied to a match structure.

PS: Though I don't absolutely hate it - catch all case statements are already slightly special in that you can only have one, it has to be at the end, etc.

I think the next step might be to summarize for typing-sig and see if anyone else has opinions, experience, or suggestions? Then the results of that can be added to @JukkaL's exhaustiveness issue?

Footnotes

  1. I have no idea if this is true or not - all cases I could think of could be written, probably best, as a match statement.

@gvanrossum
Copy link
Member

Please write fewer words.

Mypy's original philosophy was to piggyback as much as possible on existing constructs. E.g. if isinstance(x, <type>): ..., assert x is not None, etc.

Writing case _: assert False would fall squarely in that tradition.

@henryiii
Copy link
Contributor

I just didn't like the special casing inside the final match-all case statement. But I do like the simplicity of how it reads, as long as people don't expect it to be treated specially by mypy elsewhere if they read it here.

A little code searching also shows this is often how assert False is already used, at the end of an if chain. 👍

Please write fewer words.

Sorry! I know I do tend to write too much.

@henryiii
Copy link
Contributor

I've emailed typing-sig with a (hopefully short) summary, and I'll try to add any relevant information from responses to #12010.

tushar-deepsource pushed a commit to DeepSourceCorp/mypy that referenced this pull request Jan 20, 2022
@tayler6000
Copy link

tayler6000 commented Jan 27, 2022

Edit Issue appears to be addressed in #12010

This may be desired functionality, however, returns from within a match statement are not recognized in my project.

from enum import Enum

class CustomEnum(Enum):
  a = 0
  b = 1

class Foo:

  def bar(self, e: CustomEnum, b: bool) -> int:
    match (e, b):
      case (CustomEnum.a, True):
        return 0
      case (CustomEnum.b, True):
        return 1
      case (CustomEnum.a, False):
        return 2
      case (CustomEnum.b, False):
        return 3

Returns: error: Missing return statement Which could be valid, because mypy may not realize that all possibilities were covered and there is no default case in this example.

class Foo:

  def bar(self, e: CustomEnum, b: bool) -> int:
    match (e, b):
      case (CustomEnum.a, True):
        return 0
      case (CustomEnum.b, True):
        return 1
      case (CustomEnum.a, False):
        return 2
      case (CustomEnum.b, False):
        return 3
      case _:
        return 4

However, this example will always return 4, even if someone improperly uses the function, however mypy returns the same error error: Missing return statement. It is not satisfied until there is a return outside of the match.

class Foo:

  def bar(self, e: CustomEnum, b: bool) -> int:
    match (e, b):
      case (CustomEnum.a, True):
        return 0
      case (CustomEnum.b, True):
        return 1
      case (CustomEnum.a, False):
        return 2
      case (CustomEnum.b, False):
        return 3
      case _:
        return 4
    return 5

Apologies if this is intentional. Additional apologies for my lack of PEP8.

@erictraut
Copy link

@tayler6000, this is caused by the known issue #12010.

@Tinche
Copy link
Contributor

Tinche commented Feb 1, 2022

@freundTech This PR adds support for dataclasses but not attrs? Any plans on that front?

@fredrikaverpil
Copy link

I noticed this project does not use GitHub releases. What's a good way to track match statement support in mypy?

@richin13
Copy link

richin13 commented Feb 3, 2022

I noticed this project does not use GitHub releases. What's a good way to track match statement support in mypy?

I recommend subscribing to this issue: #12021 (or to that effect, any future release planning issue where this gets included, if not 0.940)

@bergkvist
Copy link

I'm facing issues with match statement support

Consider the following:

from typing import Literal, TypeAlias

Expr: TypeAlias = (
    tuple[Literal['lit'], int] |
    tuple[Literal['var'], str]
)

def square_lits(expr: Expr) -> Expr:
    match expr:
        case ('lit', x):
            return ('lit', square(x))
        case _:
            return expr

def square(x: int) -> int:
    return x * x

Here I get an error:

test.py:11: error: Argument 1 to "square" has incompatible type "int | str"; expected "int"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

While the following works fine:

from typing import Literal, TypeAlias

Expr: TypeAlias = (
    tuple[Literal['lit'], int] |
    tuple[Literal['var'], str]
)

def square_lits(expr: Expr) -> Expr:
    if expr[0] == 'lit':
        return ('lit', square(expr[1]))
    return expr

def square(x: int) -> int:
    return x * x

@intgr
Copy link
Contributor

intgr commented Jul 20, 2024

@bergkvist Please report a new issue.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.