Skip to content

Introduce UnknownSeries and UnknownIndex, type core.strings.pyi using them #1146

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 40 commits into from
Mar 11, 2025

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Mar 6, 2025

One step towards #1133

I think one way to address this issue could be to do it incrementally - when you type a module strictly, add that to the pyproject.toml so that it stays strictly typed. Then gradually the partially unknown types will go away

  • Closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added: Please use assert_type() to assert the type of any return value

🤔 this isn't quite working, trying to fix it up

@MarcoGorelli MarcoGorelli changed the title make typing in pandas_stubs.core.strings.pyi strict, add UnknownSeries and UnknownIndex Introduce UnknownSeries and UnknownIndex, type core.strings.pyi using them Mar 6, 2025
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 7, 2025 10:59
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't comment on the code changes suggested below, but I'd like to suggest the following:

  1. Change the references to StringMethods in core/series.pyi and core/indexes/base.pyi to make Series[str] and Index[str] the first argument. Then in core/strings.pyi the first argument of the Generic called T will get bound to that type.
  2. Update the tests for the string methods in test_series.py and test_indexes.py to test for the return type of Series[str] and Index[str] as appropriate.
  3. For test_indexes.py, we could use a set of tests on the string methods similar to the ones in test_series.py

If you think we should do this in a separate PR, I'm OK with that as well.

@MarcoGorelli MarcoGorelli marked this pull request as draft March 7, 2025 22:23
@MarcoGorelli MarcoGorelli force-pushed the strict-strings-typing branch from c8e6d8f to 92dc75d Compare March 7, 2025 22:34
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 7, 2025 22:56
@MarcoGorelli MarcoGorelli force-pushed the strict-strings-typing branch from c7e8187 to 17e280f Compare March 8, 2025 12:06
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing all of this work. It is a nice improvement to the stubs.

I think that all the methods that have ->T should be -> _TSTR

Because we know these are string methods, so even if the type of the Series (or Index) is unknown, we know we will be returning Series[str] or Index[str]

@MarcoGorelli
Copy link
Member Author

thanks, have updated

I think that all the methods that have ->T should be -> _TSTR

I think the only exception is str.slice, which preserves the type. but I've gone ahead and done this for others 👍

@MarcoGorelli
Copy link
Member Author

Regarding #1146 (comment), is it OK if we leave that to a separate PR please?

Partially because I feel like the scope here keeps increasing, and partially because I'm not sure it's correct - for example, if I have

import pandas as pd
from typing import Any

def func(a: pd.Series[Any]) -> None:
    reveal_type(a.str.upper())

then with that commit, we get Revealed type is "Any", whereas without it we get Revealed type is "pandas.core.series.Series[builtins.str]"

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Mar 11, 2025

Regarding #1146 (comment), is it OK if we leave that to a separate PR please?

Partially because I feel like the scope here keeps increasing, and partially because I'm not sure it's correct - for example, if I have

import pandas as pd
from typing import Any

def func(a: pd.Series[Any]) -> None:
    reveal_type(a.str.upper())

then with that commit, we get Revealed type is "Any", whereas without it we get Revealed type is "pandas.core.series.Series[builtins.str]"

OK. It's a mypy bug. Ugh.

python/mypy#15921

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of tests to change, otherwise OK

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcoGorelli . Long journey, but a really nice improvement to the stubs!

@Dr-Irv Dr-Irv merged commit 2b0279e into pandas-dev:main Mar 11, 2025
13 checks passed
@MarcoGorelli
Copy link
Member Author

thanks for your careful review, much appreciated!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants