Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

gh-129569: The function unicodedata.normalize() always returns built-in str #129570

Merged

Conversation

Hizuru3
Copy link
Contributor

@Hizuru3 Hizuru3 commented Feb 2, 2025

Copy link

cpython-cla-bot bot commented Feb 2, 2025

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Feb 2, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@corona10
Copy link
Member

corona10 commented Feb 2, 2025

It will change the behavior, I am not sure it should be fixed. But from a consistent view, I think that the suggestion looks good to me. And AKAIK, no documentation is written about this behavior: https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize

@corona10 corona10 removed needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Feb 2, 2025
picnixz
picnixz previously requested changes Feb 2, 2025
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a blurb entry, a What's New entry, and update the docs to indicate that a fresh copy is returned.

Optional: I would also expect that we have a fast path for exact strings where we return a reference on the object rather than calling FromObject. Otherwise we'll create new strings if the string is already normalized (unless they are interned but that's not always the case).

@bedevere-app
Copy link

bedevere-app bot commented Feb 2, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@corona10 corona10 added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Feb 2, 2025
@Hizuru3 Hizuru3 requested a review from picnixz February 3, 2025 03:19
@Hizuru3
Copy link
Contributor Author

Hizuru3 commented Feb 3, 2025

I have made the requested changes; please review again.

I added a news entry via blurb_it.

Note that as mentioned in the discussion forum, PyUnicode_FromObject() returns a new reference without cloning the whole string for an instance of the exact str type. It performs copying only for instances of str subclasses. So the impact of this adjustment will not be that big. If a visible performance regression is observed, I'll consider inlining PyUnicode_FromObject() by hand.

As the previous behavior is considered to be a bug, does the documentation still need an update? If so, I think it would be good to open a new issue that requests to adjust the wording in the documentation.

@bedevere-app
Copy link

bedevere-app bot commented Feb 3, 2025

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe have a test for this actually? namely that the return value is an exact instance of str?

@Hizuru3 Hizuru3 requested a review from vstinner February 7, 2025 16:13
@Hizuru3 Hizuru3 requested a review from picnixz February 8, 2025 05:10
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@Hizuru3
Copy link
Contributor Author

Hizuru3 commented Feb 19, 2025

I have made the requested changes; please review again; I have synced the fork.

@bedevere-app
Copy link

bedevere-app bot commented Feb 19, 2025

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from picnixz February 19, 2025 22:22
@picnixz
Copy link
Member

picnixz commented Feb 21, 2025

I think I've already approved this PR. Unless conflicts are to be solved or a CI job is to be merged/updated/created, there's no need to sync the fork.

@vstinner vstinner merged commit c359fcd into python:main Feb 21, 2025
48 of 50 checks passed
@miss-islington-app
Copy link

Thanks @Hizuru3 for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 21, 2025
…built-in str (pythonGH-129570)

(cherry picked from commit c359fcd)

Co-authored-by: Hizuru <106918920+Hizuru3@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-app
Copy link

bedevere-app bot commented Feb 21, 2025

GH-130403 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Feb 21, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 21, 2025
…built-in str (pythonGH-129570)

(cherry picked from commit c359fcd)

Co-authored-by: Hizuru <106918920+Hizuru3@users.noreply.github.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner vstinner removed the needs backport to 3.12 bug and security fixes label Feb 21, 2025
@bedevere-app
Copy link

bedevere-app bot commented Feb 21, 2025

GH-130404 is a backport of this pull request to the 3.12 branch.

@vstinner
Copy link
Member

Merged. Thanks for your contribution @Hizuru3.

@vstinner
Copy link
Member

@corona10 added needs backport to 3.12 needs backport to 3.13 labels Feb 2, 2025

I disagree. I don't think that this change should not be backported to stable branches. It would be surprising that the function returns a str subclass in Python 3.13.x but return str in Python 3.13.(x+1). It can break applications relying on the current behavior. This change should only land in Python 3.14 branch (main branch).

@corona10
Copy link
Member

Yeah, it's up to consider it as a bug or behavior change.
#129569 (comment)

@vstinner
Copy link
Member

@serhiy-storchaka @corona10: Do you consider that this change must be backported? IMO it's a corner case and it's safer to not backport it. It can wait for Python 3.14.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants