-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
gh-129569: The function unicodedata.normalize() always returns built-in str #129570
gh-129569: The function unicodedata.normalize() always returns built-in str #129570
Conversation
Hizuru3
commented
Feb 2, 2025
•
edited
Loading
edited
- Issue: The function unicodedata.normalize() should always return an instance of the built-in str type. #129569
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
It will change the behavior, I am not sure it should be fixed. But from a consistent view, I think that the suggestion looks good to me. And AKAIK, no documentation is written about this behavior: https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a blurb entry, a What's New entry, and update the docs to indicate that a fresh copy is returned.
Optional: I would also expect that we have a fast path for exact strings where we return a reference on the object rather than calling FromObject. Otherwise we'll create new strings if the string is already normalized (unless they are interned but that's not always the case).
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I have made the requested changes; please review again. I added a news entry via blurb_it. Note that as mentioned in the discussion forum, PyUnicode_FromObject() returns a new reference without cloning the whole string for an instance of the exact str type. It performs copying only for instances of str subclasses. So the impact of this adjustment will not be that big. If a visible performance regression is observed, I'll consider inlining PyUnicode_FromObject() by hand. As the previous behavior is considered to be a bug, does the documentation still need an update? If so, I think it would be good to open a new issue that requests to adjust the wording in the documentation. |
Thanks for making the requested changes! @picnixz: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe have a test for this actually? namely that the return value is an exact instance of str?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
I have made the requested changes; please review again; I have synced the fork. |
Thanks for making the requested changes! @picnixz: please review the changes made to this pull request. |
I think I've already approved this PR. Unless conflicts are to be solved or a CI job is to be merged/updated/created, there's no need to sync the fork. |
…built-in str (pythonGH-129570) (cherry picked from commit c359fcd) Co-authored-by: Hizuru <106918920+Hizuru3@users.noreply.github.com> Co-authored-by: Victor Stinner <vstinner@python.org>
GH-130403 is a backport of this pull request to the 3.13 branch. |
…built-in str (pythonGH-129570) (cherry picked from commit c359fcd) Co-authored-by: Hizuru <106918920+Hizuru3@users.noreply.github.com> Co-authored-by: Victor Stinner <vstinner@python.org>
GH-130404 is a backport of this pull request to the 3.12 branch. |
Merged. Thanks for your contribution @Hizuru3. |
I disagree. I don't think that this change should not be backported to stable branches. It would be surprising that the function returns a str subclass in Python 3.13.x but return str in Python 3.13.(x+1). It can break applications relying on the current behavior. This change should only land in Python 3.14 branch (main branch). |
Yeah, it's up to consider it as a bug or behavior change. |
@serhiy-storchaka @corona10: Do you consider that this change must be backported? IMO it's a corner case and it's safer to not backport it. It can wait for Python 3.14. |