Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Better registry support #148

Closed
justanotheranonymoususer opened this issue Jan 7, 2025 · 21 comments
Closed

Better registry support #148

justanotheranonymoususer opened this issue Jan 7, 2025 · 21 comments
Assignees
Labels
bindings Something with the low-level WinAPI bidings enhancement New feature or request

Comments

@justanotheranonymoususer

I skimmed over the registry functions and I have several suggestions:

  • Several useful options are missing. For example RegGetValue receives no options at all, but:
    • RRF_SUBKEY_WOW6432KEY/RRF_SUBKEY_WOW6464KEY are necessary sometimes (with this function and RegOpen*/RegCreate* too)
    • RRF_NOEXPAND is always used, why?
    • Restricting by type can be useful, e.g. if I only expect DWORD, let me pass RRF_RT_REG_DWORD for performance and ease of use
  • Buffer size handling can be improved
  • Data validation is very simplistic (validate_retrieved_reg_val)
    • REG_SZ is checked for null terminator, but what about REG_EXPAND_SZ? REG_MULTI_SZ?
    • As far as I understand, parse_multi_z_str is unsafe because it's not bounded by the size but the buffer can be without the double null terminator
@rodrigocfd rodrigocfd self-assigned this Jan 28, 2025
@rodrigocfd rodrigocfd added enhancement New feature or request bindings Something with the low-level WinAPI bidings labels Jan 28, 2025
@rodrigocfd
Copy link
Owner

@Hawk777 you're welcome to join the discussion.

@rodrigocfd
Copy link
Owner

Several useful options are missing.

My first implementation had a lot of simplifications, I'm gonna upgrade that.

Buffer size handling can be improved

Yeah, seems the same case we had at #145.

Data validation is very simplistic

Do you have any suggestion, or maybe a correct C++ implementation?

rodrigocfd added a commit that referenced this issue Jan 28, 2025
@justanotheranonymoususer
Copy link
Author

Do you have any suggestion, or maybe a correct C++ implementation?

Generally the registry has no notion of types, a type is just a dword, and the data is just a collection of bytes. You need to verify everything. So for example, you verify stuff for SZ type:

if data_type1 == co::REG::SZ {

but not for other types such as EXPAND_SZ and MULTI_SZ.

Correct (I think) C++ implementation example:
https://github.com/microsoft/wil/blob/master/include/wil/registry.h

@rodrigocfd
Copy link
Owner

Correct (I think) C++ implementation example: https://github.com/microsoft/wil/blob/master/include/wil/registry.h

Could you be more specific about which funcions I should look at in this header?

@justanotheranonymoususer
Copy link
Author

It's a large file indeed, and it also makes use of registry_helpers.h, but I'll try to focus more on what I meant.

  • Data validation is very simplistic (validate_retrieved_reg_val)
    • REG_SZ is checked for null terminator, but what about REG_EXPAND_SZ? REG_MULTI_SZ?

Your REG_SZ handling looks fine, but you make no validation for REG_EXPAND_SZ at all. I'd expect both to have the same validation. The only difference between the two is that for a REG_EXPAND_SZ value, it's expected to also expand env vars.

I think they just depend on GetRegValue to do the proper null-termination for SZ and EXPAND_SZ. Relevant core implementation:
https://github.com/microsoft/wil/blob/master/include/wil/registry_helpers.h#L1187

  • As far as I understand, parse_multi_z_str is unsafe because it's not bounded by the size but the buffer can be without the double null terminator

The impl:
https://github.com/microsoft/wil/blob/master/include/wil/registry_helpers.h#L200

They properly pass the start and the end of the buffer, and don't rely on two null terminators to be present because it's not guaranteed.

@Hawk777
Copy link

Hawk777 commented Jan 28, 2025

I don’t have any detailed comments; it’s been quite a while since I’ve worked on my Windows project, and I didn’t dig into things quite to this depth even then.

Overall, it probably makes sense for Winsafe to take one of two philosophies. Either (1) validate everything fully and return something immediately useful to the user (but with the risk that some registry values that could exist might actually be completely unreadable by Winsafe due to failing the validation), or (2) validate nothing and hand the user a bag of bytes to do with as they please (which would be less convenient but mean that every possible registry value, even the sketchy ones, are readable, and Winsafe would still be taking care of some amount of abstraction by eliminating the handling of raw pointers and sizing of buffers).

I’m not sure which philosophy is the right one though!

@justanotheranonymoususer
Copy link
Author

@rodrigocfd
Copy link
Owner

  • As far as I understand, parse_multi_z_str is unsafe because it's not bounded by the size but the buffer can be without the double null terminator

The impl: https://github.com/microsoft/wil/blob/master/include/wil/registry_helpers.h#L200

They properly pass the start and the end of the buffer, and don't rely on two null terminators to be present because it's not guaranteed.

That's fine when the boundaries are known, but there are cases like GetEnvironmentStrings and SHFILEOPSTRUCT where we don't. There is no way other than trust the existence of the double null.

@justanotheranonymoususer
Copy link
Author

You're right, in this cases you have no choice but to trust the buffer. But in the case of the registry, it's incorrect to trust the buffer because it can be anything. That's what happens when you deal with API designed decades ago with various historical accidental design decisions and little awareness to safety. Just like the These inconsistencies are insane part...

@rodrigocfd
Copy link
Owner

Yeah, that's nightmare fuel.

Well, we do what we can do. I believe adding an Option<usize> for then length, then performing the additional check when it's present, is the best solution here.

rodrigocfd added a commit that referenced this issue Feb 3, 2025
rodrigocfd added a commit that referenced this issue Feb 5, 2025
rodrigocfd added a commit that referenced this issue Feb 5, 2025
@rodrigocfd
Copy link
Owner

@justanotheranonymoususer could you please take a look at this new RegGetValue implementation?

@justanotheranonymoususer
Copy link
Author

rodrigocfd added a commit that referenced this issue Feb 6, 2025
@rodrigocfd
Copy link
Owner

I think it's better to do it outside of the loop.

Correct.

Why do you care? It's fine as long at the buffer is large enough, no? Why even query the type in the first call?

Indeed, no need to check if the data type has changed. As long as the buffer is enough, it's fine. That also means I need to retrieve the type just once, in the 2nd call.

What happens if co::REG::from_raw is not one of the enum values? Just making sure it won't panic, afaik the type can be basically anything.

It's a value returned from the OS, I suppose it will always return a valid value. Did you ever seen the OS returning an invalid type?

data_len2 doesn't seem to be used. I'd expect it to be used to resize buf.

I changed the logic a bit. If the buffer is not enough the function will return ERROR::MORE_DATA, so there is no need to check the size. In such case, we loop again retrieving the new size.

Please take a look again.

@justanotheranonymoususer
Copy link
Author

It's a value returned from the OS, I suppose it will always return a valid value. Did you ever seen the OS returning an invalid type?

Regedit is unstructured data. It's not supposed to happen "normally", but you probably don't want to panic if it does.

Image

data_len2 doesn't seem to be used. I'd expect it to be used to resize buf.

I mean, let's say first call says you have 10 bytes. You alloc 10 bytes. Second query says you actually have 4 bytes. Now you have a vector:
[data] [data] [data] [data] [0] [0] [0] [0] [0] [0]

But since you don't use data_len2 for anything, you can't tell between [data] [data] [data] [data] and the case of actual 10 bytes from the registry which are [data] [data] [data] [data] [0] [0] [0] [0] [0] [0]. Resizing the vector after the second call will make sure the size matches the second query.

But also see my comments on the commit below, you just assume that the size is correct, using unsafe buf.as_ptr() and assuming it has the correct size, which doesn't have to be the case:
a240b00#comments

What's basically missing is validate_retrieved_reg_val or a similar check that the data size matches the reported type.

rodrigocfd added a commit that referenced this issue Feb 6, 2025
@rodrigocfd
Copy link
Owner

Oh I just saw you commented on another file:

What if buf's size isn't 4 bytes? I didn't see that you verify it in RegGetValue.

I assume the OS won't return a wrong size for numeric values, that would mean a bug in the OS, right?

Same as co::REG::SZ? Can do co::REG::SZ | co::REG::EXPAND_SZ => {?

Yup.

@rodrigocfd
Copy link
Owner

Resizing the vector after the second call will make sure the size matches the second query.

So, if the buffer is smaller than the first value, I should shrink it and make the call again, until the value is exact?

@justanotheranonymoususer
Copy link
Author

I assume the OS won't return a wrong size for numeric values, that would mean a bug in the OS, right?

No, just try it. The registry is silly, prehistoric storage of data. When you write data, it basically has the following:

  • A type, which is any DWORD between 0x00000000 and 0xFFFFFFFF.
  • Binary data of any size with any content.

Just like when you enumerate files in a folder, you can't trust a x.html to contain html, or 1.bmp to not be an executable file. And you can encounter a garbage.garbage file name, even though .garbage is not a valid file extension.

The fact that the type has a meaning of a type is just a convention. The only thing Microsoft did, probably because there were too many bugs with this confusing API, is to create GetRegValueW, a newer API which helps a tiny bit by appending null terminators to string types if they're missing. That's it. The rest is still the same - a value is a combination of a 32-bit number and an arbitrary blob of bytes. The rest can be anything.

So, if the buffer is smaller than the first value, I should shrink it and make the call again, until the value is exact?

You don't need to make the call again, you can just resize the vector to make it smaller. Even better, you can just make a slice of the x first bytes.

rodrigocfd added a commit that referenced this issue Feb 11, 2025
@rodrigocfd
Copy link
Owner

@justanotheranonymoususer, as for the DWORD (and QWORD, by analogy) situation, I see 3 possibilities for the returned data length:

  • exactly 4 bytes – happy path;
  • more than 4 bytes – just consider the first 4 bytes;
  • less than 4 bytes – maybe fill with 0x00?

What do you think?

@justanotheranonymoususer
Copy link
Author

@rodrigocfd
Copy link
Owner

I'd return an error if the size doesn't match, it shouldn't happen, and if
it does it means that something is wrong, similarly to trying to read a
string and finding DWORD.

I'm okay with that.

BTW looks like Microsoft is also working on a safe registry wrapper in
rust, you might want to take a look:
https://github.com/microsoft/windows-rs/blob/master/crates/libs/registry/readme.md

I've seen other high-level community crates for specific Windows areas, like registry for the Registry itself. If MS is now creating a high-level Registry wrapper, I'd say it's more likely that they'll look at what we are doing than the other way around.

I will also take a look at RegQueryMultipleValues implementation. And it seems to me that RegQueryValueEx is just redundant when we have RegGetValue, the docs even recommend using it, what do you think?

@justanotheranonymoususer
Copy link
Author

RegQueryMultipleValues

Interesting, I wasn't familiar with this API, and frankly never needed something like this.

RegQueryValueEx is just redundant when we have RegGetValue

I tend to agree, although I don't think it matters all that much, RegGetValue doesn't provide much benefits, it provides only very specific guarantees regarding null-termination of string types, it doesn't save you from most of the rest of the validation. Microsoft uses RegQueryValueEx in their wrapper.

rodrigocfd added a commit that referenced this issue Feb 14, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bindings Something with the low-level WinAPI bidings enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants