-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
drop support for bytes #2602
Comments
I've been working on this a bit, and after removing enough there is a significant speedup. It's not trivial though. So many different places support str and bytes in so many different ways, with indirection through helper functions, that it's hard to know if I've caught all of them. It would also require adding deprecation warnings in a huge number of places. It's a huge amount of effort to deprecate, I'm not sure if I'll catch everything, catching everything might actually require more checks in the short term, making things even slower. I'm wondering if it might be better to just announce a blanket policy, just remove bytes support as we find more of it, and expect users to respond to errors instead of warnings. I have a feeling using bytes anywhere is pretty rare at this point for any app that is staying reasonably up to date. The one issue I'm not sure how to support is that being able to pass bytes to some URL related functions means you can technically encode |
My strategy right now is to add deprecation warnings if the type annotations or docs stated that bytes were supported, or it tests failed when support was removed. For places that weren't documented or tested, I'm just removing it. This seems to be a good metric for cutting down on the amount of extra checks added to the code, although there's still quite a few. |
I've also been looking at the They default to UTF-8, and while it's lightly documented that it's possible to customize that, I can't find examples on GitHub doing that (which is admittedly difficult to search for) or 3rd-party tutorials mentioning it. We explicitly do not trust the encoding in the response's The WHATWG HTML and URL standards (and others like A quick look at some WSGI and ASGI servers shows that they all are using When decoding percent-encoded URL paths and query strings, we can leave invalid bytes percent-encoded instead of replacing them as we default to now. We already have a special internal The cookie spec is really bad about encoding, but Python's
Wikipedia on UTF-8 cites https://w3techs.com/technologies/overview/character_encoding:
Just like other deprecations, this deprecation would give the small percentage of sites an opportunity to see the warning and update, or pin to the current version. The fix would be either sending their HTML as UTF-8, or otherwise updating their client code to send UTF-8 instead of something else. In the rare case that UTF-8 is still not correct, |
Using the |
Most of the library should only work with strings. Being able to pass bytes to most functions is an artifact of only supporting Python 2, then also supporting 3, then dropping support for 2. WSGI on Python 3 only deals with ISO-8859-1 characters. Modern WHATWG HTML and URL standards require UTF-8. HTTP headers must be ISO-8859-1 (new headers should be ASCII), using quoting or other encoding schemes to first convert UTF-8 to valid characters before encoding to bytes.
As #2406 points out, we spend too much time doing instance checks and encoding. This is often redundant because we currently support bytes or strings being passed to any function, and one function calls others that do the same. Very few places should be allowing bytes and strings, we should be dealing with strings for most data, and bytes only where binary data makes sense.
The text was updated successfully, but these errors were encountered: