Impl TryFrom<char> for u8 #2854

ebkalderon · 2020-01-22T02:19:31Z

I ran into a case using the csv crate in combination with structopt where I needed to parse a single char from an argument to use as a CSV delimiter. However, it appears that csv::StringBuilder only accepts u8 delimiters. I was surprised to see that there was no type-safe way to convert between char to u8 in the standard library. It would be nice if u8 implemented TryFrom<char>, where the conversion fails if the char value is outside of the U+0000 to U+00FF codepoint range (0-255).

An example use would look like this:

fn main() {
    let success = u8::try_from('A').expect("Conversion should have succeeded");
    let failure = u8::try_from('我').expect_err("Conversion should have failed");
}

The text was updated successfully, but these errors were encountered:

kennytm · 2020-01-22T06:10:47Z

ASCII range

This differs from impl From<u8> for char which supports U+0000 to U+00FF.

ebkalderon · 2020-01-22T06:43:57Z

@kennytm Good point! Thanks for pointing out the semantic difference in From<u8>. In that case, I think it's necessary to revise my proposal to specifically U+0000 to U+00FF (value range of 0-255).

Lokathor · 2020-01-22T06:47:13Z

if all of 0..=255 is allowed I believe what you want can be done as follows:

if c as u32 <= u8::max_value() as u32 {
  Some(c as u8)
} else {
  None
}

ebkalderon · 2020-01-22T06:58:09Z

@Lokathor Thanks for the code sample! That's actually very similar to what I've incorporated into my own project, so it's nice to see external confirmation that this general pattern exists. I was hoping to at least raise some discussion on whether a similar conversion could or should be included as part of the standard library.

SimonSapin · 2020-01-22T09:19:46Z

This also works: u8::try_from(u32::from(some_char))

fstirlitz · 2020-01-24T17:10:51Z

If this is added, I'd prefer if it's a method on char called something like try_into_latin1, to clarify that using it constitutes a semantic conversion (and not just type-checking bureaucracy), and to discourage using it as the least-effort/least-thought-out way to convert a stream of chars into a stream of u8s (which should explicitly refer to some character encoding).

SimonSapin · 2020-01-24T23:28:05Z

“Latin-1” may refer to slightly different things though.

Originally, it is Part 1: Latin alphabet No. 1 of ISO/IEC 8859. It is a set of 191 characters, each assigned an 8-bit value. The ranges 0x00 to 0x1F and 0x7F to 0x9F are not assigned.
IANA calls ISO-8859-1 an extension of the above that assigns the C0 and C1 control codes to these two ranges. This makes it compatible with ASCII, and all 256 bytes are assigned a character or control code. Unicode chose to make its first 256 code points U+0000 to U+00FF compatible with this, so u8::try_from(u32::from(some_char)) effectively encodes to IANA’s ISO-8859-1. To avoid confusion, https://infra.spec.whatwg.org/#isomorphic-encode calls this operation isomorphic encode instead.
Windows code page 1252 is a different extension of ISO/IEC 8859-1 that assigns various punctuation or letters to some of the 0x80 to 0x9F range. (In 0x00 to 0x7F it is compatible with ASCII and IANA’s ISO-8859-1 though) Windows-1252 is sometimes incorrectly called ISO-8859-1 or Latin-1.
In web browsers, at this point for compatibility reasons, many labels (used for example in charset in Content-Type) including ascii, latin1 and iso-8859-1 map to an extension of Windows-1252 where the 5 unassigned byte values are mapped to corresponding C1 control codes like in IANA’s ISO-8859-1.

Back to Rust API naming, we already have the precedent of impl From<u8> for char that does the opposite conversion. And my pre-1.0 proposal to rename str::as_bytes to str::as_utf8 was rejected.

scottmcm · 2020-11-10T21:52:55Z

FWIW, I think this makes sense. In general, any time there's A: From<B>, I think it makes sense to have B: TryFrom<A> for going the other way. (Assuming, of course, that it's possible to implement, but if it isn't then I'm also skeptical of the From...)

EDIT: Said otherwise, we have u8: TryFrom<u32>, and char: From<u8> is precedent for "u8 is a USV in 0..256" (as opposed to a code unit, or something), so we might as well have u8: TryFrom<char>.

burdges · 2020-11-11T02:31:14Z

Oh? Is there a convention to not provide A: From<B> if B: TryFrom<A> makes no sense, like say if A is a hash of B or something? Or did you mean only in the context of std?

I think wrapper types improve code quality of course, but sometimes people then use From/Into more once they have specific types, which harms readability over named methods.

I suppose one guideline would be impl From<B> for A if code would be improved by using the bound B: Into<A> but avoid doing so otherwise.

scottmcm · 2020-11-11T04:00:36Z

Is there a convention to [...]

Nothing official, as far as I know. It's just a heuristic I use, since if you can't recover the original, then it generally seems to me like it's doing something non-obvious enough that it would be better as a method with a more illustrative name. (Of course, this is also only for Froms with the same ownership type -- I'm not expecting to be able to recover the original from something like String: From<&str>.)

Scrolling through std's From impls they do mostly match this. I probably wouldn't bother ever writing out the the TryFrom opposite of RecvTimeoutError: From<RecvError>, since it's just matches!(foo, RecvTimeoutError::Disconnected), but it would be completely plausible if someone wanted to have it. Similarly for String: From<char> -- I'm not sure that char: TryFrom<String> would be a good API (would one really ever want to .try_into() such that it fails if there's not exactly one char?) but it's pretty clear what the behaviour would be.

The ones that are a bit weirder are things like BinaryHeap<T>: From<Vec<T>> -- there's a From back the other way, but the first way was actually a heapify operation, so the original value can't be recovered exactly. I think that's a place where I'd have actually preferred it just be BinaryHeap::new without the from existing -- it gives a more accessible place to document complexity and such.

All that said, #2484 includes some text proposing rules more along these lines.

Previously suggested in rust-lang/rfcs#2854. It makes sense to have this since `char` implements `From<u8>`. Likewise `u32`, `u64`, and `u128` (since rust-lang#79502) implement `From<char>`.

Implement `TryFrom<char>` for `u8` Previously suggested in rust-lang/rfcs#2854. It makes sense to have this since `char` implements `From<u8>`. Likewise `u32`, `u64`, and `u128` (since rust-lang#79502) implement `From<char>`.

ebkalderon · 2023-07-26T19:50:20Z

Done in rust-lang/rust#84640, which landed in 1.59.0! 🎉

scottmcm added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Nov 10, 2020

ids1024 mentioned this issue Apr 26, 2021

glib: Ergonomics of 'Char' seem questionalble gtk-rs/gtk3-rs#491

Closed

ids1024 mentioned this issue Apr 28, 2021

Implement TryFrom<char> for u8 rust-lang/rust#84640

Merged

ebkalderon closed this as completed Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impl TryFrom<char> for u8 #2854

Impl TryFrom<char> for u8 #2854

ebkalderon commented Jan 22, 2020 •

edited

Loading

kennytm commented Jan 22, 2020

ebkalderon commented Jan 22, 2020

Lokathor commented Jan 22, 2020

ebkalderon commented Jan 22, 2020

SimonSapin commented Jan 22, 2020

fstirlitz commented Jan 24, 2020

SimonSapin commented Jan 24, 2020

scottmcm commented Nov 10, 2020 •

edited

Loading

burdges commented Nov 11, 2020

scottmcm commented Nov 11, 2020

ebkalderon commented Jul 26, 2023

Impl TryFrom<char> for u8 #2854

Impl TryFrom<char> for u8 #2854

Comments

ebkalderon commented Jan 22, 2020 • edited Loading

kennytm commented Jan 22, 2020

ebkalderon commented Jan 22, 2020

Lokathor commented Jan 22, 2020

ebkalderon commented Jan 22, 2020

SimonSapin commented Jan 22, 2020

fstirlitz commented Jan 24, 2020

SimonSapin commented Jan 24, 2020

scottmcm commented Nov 10, 2020 • edited Loading

burdges commented Nov 11, 2020

scottmcm commented Nov 11, 2020

ebkalderon commented Jul 26, 2023

ebkalderon commented Jan 22, 2020 •

edited

Loading

scottmcm commented Nov 10, 2020 •

edited

Loading