-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Lossy UTF8 conversion of owned types (Vec::<u8>::into_utf8_lossy
).
#116
Comments
(After a bit more thought, |
I think that's the wrong place. String already has |
Oh, I hadn't thought of that. Yeah, that's a really good place to put it. |
We discussed this in today's @rust-lang/libs-api meeting. We ended up settling on a method on We'd also be happy to see a conversion method on |
This is marked as completed, but there's no linked PR, and |
The ACP was accepted so there's nothing more to track here. Anyone can implement the function. |
Implement feature `string_from_utf8_lossy_owned` for lossy conversion from `Vec<u8>` to `String` methods Accepted ACP: rust-lang/libs-team#116 Tracking issue: rust-lang#129436 Implement feature for lossily converting from `Vec<u8>` to `String` - Add `String::from_utf8_lossy_owned` - Add `FromUtf8Error::into_utf8_lossy` --- Related to rust-lang#64727, but unsure whether to mark it "fixed" by this PR. That issue partly asks for in-place replacement of the original allocation. We fulfill the other half of that request with these functions. closes rust-lang#64727
Rollup merge of rust-lang#129439 - okaneco:vec_string_lossy, r=Noratrieb Implement feature `string_from_utf8_lossy_owned` for lossy conversion from `Vec<u8>` to `String` methods Accepted ACP: rust-lang/libs-team#116 Tracking issue: rust-lang#129436 Implement feature for lossily converting from `Vec<u8>` to `String` - Add `String::from_utf8_lossy_owned` - Add `FromUtf8Error::into_utf8_lossy` --- Related to rust-lang#64727, but unsure whether to mark it "fixed" by this PR. That issue partly asks for in-place replacement of the original allocation. We fulfill the other half of that request with these functions. closes rust-lang#64727
Implement feature `string_from_utf8_lossy_owned` for lossy conversion from `Vec<u8>` to `String` methods Accepted ACP: rust-lang/libs-team#116 Tracking issue: #129436 Implement feature for lossily converting from `Vec<u8>` to `String` - Add `String::from_utf8_lossy_owned` - Add `FromUtf8Error::into_utf8_lossy` --- Related to #64727, but unsure whether to mark it "fixed" by this PR. That issue partly asks for in-place replacement of the original allocation. We fulfill the other half of that request with these functions. closes #64727
Proposal
Problem statement
We should have a function that performs the same lossy conversion as
String::from_utf8_lossy
, but which goes fromVec<u8>
=>String
, instead of&'a [u8]
toCow<'a, str>
.Motivation, use-cases
Our current function
String::from_utf8_lossy
optimizes for the case where you need to borrowed the input (a&[u8]
) and can work with a borrowed output. It's very nice for this purpose, as it avoids the potentially costly copy in the common1 case that the input is already valid UTF-8.Sadly, if you an need owned output (e.g. a
String
), there is no function in the stdlib that avoids copying for already-valid bytes, even if you're happy giving up your owned inputVec<u8>
. In practice, you generally do with an expression likeString::from_utf8_lossy(&vec).to_string()
, which has the downside of always performing an extra copy if the input was valid UTF8 -- in other words, it pessimizes the already-valid-UTF8 case (it also has the dowside of being slightly strange looking, although rewording it to avoid this is likely possible).It seems desirable to solve this by adding an analogous function that transforms an owned
Vec<u8>
(of potentially invalid UTF-8 bytes) into an ownedString
.Solution sketches
I think the following API would be a good solution. A possible implementation is provided as well.
I explored several other options in the past in the IRLO thread linked below.
Links and related work
An IRLO thread and writeup I made for this around two few years ago: https://internals.rust-lang.org/t/too-many-words-on-a-from-utf8-lossy-variant-that-takes-a-vec-u8/13005. It contains a number of alternative API designs of... varying quality.
bstr
(CC @BurntSushi) has a similar API for this, but uses the namebstr::ByteVec::into_string_lossy
. I don't have strong opinions.What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.
Footnotes
I'm speculating when I suggest that already valid input is the common case, but given that the usefulness of the result of this function is tied to the UTF-8 validity of the input (e.g. if it's mostly invalid UTF-8, then the output is likely to be less readable), it seems generally reasonable. ↩
The text was updated successfully, but these errors were encountered: