Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Combination of Deserialize and from_str_unchecked() is unsound #43

Closed
niklasf opened this issue Jan 17, 2021 · 12 comments
Closed

Combination of Deserialize and from_str_unchecked() is unsound #43

niklasf opened this issue Jan 17, 2021 · 12 comments

Comments

@niklasf
Copy link
Contributor

niklasf commented Jan 17, 2021

While writing the safety comments for #40, I noticed that

            // Safety: The field is specified to be ASCII, and the only safe
            // way to construct VendorInfo is from real CPUID data or the
            // Default implementation.
            str::from_utf8_unchecked(slice)

is not actually true. Using Deserialize, types like VendorInfo can be constructed with arbitrary content in safe code, but calling str::from_utf8_unchecked() with invalid UTF-8 is undefined behavior.

Possible fixes are removing Deserialize, or validating at construction or right when constructing the str.

@gz
Copy link
Owner

gz commented Jan 20, 2021

Yes, thanks for spotting this!

By validating at construction, you mean to do a custom implementation of Deserialize trait which throws an error in case it's not valid utf-8? That seems like a sensible solution to me.

@niklasf
Copy link
Contributor Author

niklasf commented Jan 21, 2021

Yes, that should work. I wonder if it's even worth it, though. It appears that at least none of the published reverse dependencies are using serialize.

Edit: Not a soundness issue, but I'd also expect successfully deserialized data not to lead to panics. There's one assertion outside and some assertions in get_bits().

@niklasf
Copy link
Contributor Author

niklasf commented Jan 24, 2021

Thanks for the quick turnaround on v9.0.0. I'll file a separate advisory for this issue (rustsec/advisory-db#671), that can be updated with the solution when available.

@Shnatsel
Copy link

I suggest using from_utf8_lossy instead of from_utf8_unchecked. This way only the chunks that are valid UTF-8 will be preserved.

This means the original input may be altered, and hence may not be entirely correct, but at least it will not lead to memory safety issues, and does not require changing the API.

@gz
Copy link
Owner

gz commented Jan 26, 2021

Unfortunately, from_utf8_lossy uses std::borrow::Cow which originates from the alloc crate. Ideally we can remain using cpuid with just core and not require dynamic heap allocation support. Otherwise, this would be a good way to go about it.

@Shnatsel
Copy link

Shnatsel commented Jan 26, 2021

bstr crate provides a string abstraction that's not required to be UTF-8, but that would add an extra dependency.

It's probably best to expose it as a &[u8] and let the API user deal with it, either via from_utf8_lossy or via bstr.

@niklasf
Copy link
Contributor Author

niklasf commented Jan 27, 2021

For a non-breaking change (i.e. keeping -> &str) it would also be possible to cut off the slice at the first non-ASCII byte, similar to the nul byte here:

rust-cpuid/src/lib.rs

Lines 4149 to 4150 in 6993bb5

// Brand terminated at nul byte or end, whichever comes first.
let slice = slice.split(|&x| x == 0).next().unwrap();

The less critical issue with the panics still remains.

If we're considering breaking changes, just removing Serialize/Deserialize is my top candidate. None of the published reverse dependencies are using it, and I find it hard to imagine a use case. When using CPUID locally, there's no need to persist it across program runs. When sending information to a server, a higher level representation seems more appropriate (e.g. sending the vendor, not the raw CPUID data that encodes it).

Then, if someone actually needs it and requests it, consider reimplementing it for those structs as needed, with careful validation.

@gz
Copy link
Owner

gz commented Jan 27, 2021

This is the pull request that originally added the feature with some explanation on why it was requested: #18

I agree with @niklasf, a higher level representation would've been more sensible. I can think of a use-case for myself where I would want to persist all CPUID output in order to save the current processor state e.g. before running an experiment. But I do think a better way than having Serialize/Deserialize on invididual data-structures might be to just save cpuid raw registers using a Vec/Map of CpuIdResult bytes and then persisting that...

I'll try to look into how hard it is to just return an error on Serialize/Deserialize if data isn't valid utf8. Otherwise removing Serialize/Deserialize is something I would consider too.

@niklasf
Copy link
Contributor Author

niklasf commented Jan 27, 2021

Ah, I should have tried git blame :)

For the validation, a decent approach seems to be to use derive on a private helper struct, rather than implementing everything manually: serde-rs/serde#642 (comment).

@gz gz closed this as completed in 936ab41 Jul 4, 2021
@gz
Copy link
Owner

gz commented Jul 4, 2021

Alright I worked on cpuid today and pushed a more pragmatic fix for this: use from_utf8 combined with unwrap_or("Invalid string") instead of from_utf8_unchecked.
I figured this is fine because if the CPU isn't totally broken this will never happen for regular cpuid usage, and if someone conjures up some bad struct via Deserialize then they will hopefully realize it when they see this error string.

@Voxelot
Copy link

Voxelot commented Aug 23, 2021

Would it be possible to get this backported to older versions of raw-cpuid? I'm stuck on version 8.1.2 since this is a transitive dependency in my case and don't have the option to make a major upgrade to 9.x.

@gz
Copy link
Owner

gz commented Aug 24, 2021

Hi @Voxelot yes, I can backport these changes.
Just out of curiosity, porting to 9.x or 10.x should be very minimal effort as likely nothing or only little has changed -- so maybe just replacing the version in Cargo.toml works already. If not, you can point me to the project in question and I can probably submit a pull request for the update.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants