Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

One-way address exposure for FFI #11

Closed
dvdhrm opened this issue Jun 8, 2022 · 5 comments
Closed

One-way address exposure for FFI #11

dvdhrm opened this issue Jun 8, 2022 · 5 comments

Comments

@dvdhrm
Copy link

dvdhrm commented Jun 8, 2022

Hi!

I am maintaining rust code that provides direct access to the linux syscalls. In a lot of scenarios the linux kernel uses u64 for pointers to make sure 32bit/64bit mixed systems have a single ABI (kernel calls workarounds for it 'compat'-mode). In other scenarios, unsigned long or u64 basically contain unions of lots of different argument options. If I understood correctly, there is no inherent conflict with strict provenance, though I struggle to properly implement it.

I understand that the recommendation is to make FFI calls take a pointer-type as argument, even if the other side takes an unsigned long (or similar). However, what if the full set of possible arguments is either too big or unknown? In my case, I have syscall0() up to syscall6() which corresponds to the libc syscall(...) function. They are implemented in rust with inline assembly and simply take 0 to 6 usize arguments. No external linkage required at all. On top, I have typed per-syscall wrappers. But under strict provenance, I cannot implement the typed-wrappers but have to resort to expose_addr(), even though I never use from_exposed_addr(). I want to avoid implementing the syscall stubs in C or other FFI. This is pure rust (minus inline assembly).

My ideal solution would be a way to take a pointer in rust and format it as an integer of my choice. I am fine with it still being a pointer, but I want it to be formatted like a u64 (or another integer of my choice bigger than usize, depending on platform). I can then pass it to the syscall and be done with it. Alternatively, there must be some state-change that miri/etc. apply to a pointer passed through FFI, and I would be fine calling the same function on that pointer. Something like publish_addr() that tells miri/etc. to assume the address was exposed via FFI.

Note that formatting the value as union does not work here. For instance, assume a big-endian 32bit architecture with a 64bit syscall argument. The union would look like this:

#[repr(C)]
union Arg {
    ptr: *mut core::ffi::c_void,
    int: u64,
}

Assigning a pointer to Arg.ptr would differ from assigning its address to Arg.int, since on big-endian the 32bit Arg.ptr would be aligned to the start of the union, while in Arg.int the lower 32bits would be at the end of the union (I could use ptrs: [*mut core::ffi::c_void; 2] and store in the right bucket, but not sure that is the right solution?).

So I am a bit out of ideas what to do and looking for help!
Thanks
David

@RalfJung
Copy link
Collaborator

RalfJung commented Jun 8, 2022

resort to expose_addr(), even though I never use from_exposed_addr()

Conceptually, the kernel uses from_exposed_addr, so this still makes sense.

They are implemented in rust with inline assembly and simply take 0 to 6 usize arguments.

If the arguments are usize, then a union of a raw ptr and usize should work, right?
The trouble starts when they are u64. I am not quite sure what you mean by "formatting", I assume you are not referring to format!?

@dvdhrm
Copy link
Author

dvdhrm commented Jun 9, 2022

resort to expose_addr(), even though I never use from_exposed_addr()

Conceptually, the kernel uses from_exposed_addr, so this still makes sense.

True, but there were 2 reasons that made me hesitate:

  1. The documentation of sptr says using expose_addr() means you are not compatible with strict-provenance. Does that mean any application using FFI or syscalls that (for whatever reason) cannot use typed prototypes are not compatible to strict-provenance? Additionally, it seems off that typed-prototypes are considered compatible to strict provenance, but untyped prototypes are not, even though the information passed to the outside is in most scenarios just the bare address. But maybe I am reading too much into this?

  2. If I never intend to use from_exposed_addr(), shouldn't I be able to tell code-analysers about it? There is no reason for them to track the address then (or is there?).

They are implemented in rust with inline assembly and simply take 0 to 6 usize arguments.

If the arguments are usize, then a union of a raw ptr and usize should work, right? The trouble starts when they are u64. I am not quite sure what you mean by "formatting", I assume you are not referring to format!?

With formatting I meant "laying out in memory".

Yeah, usize is fine. I think (I haven't checked all architecture restrictions). The problem really is when u64 is used. This also affects syscalls, which can take u64 split among 2 usize registers (on 32bit), with additional restrictions like aligned to an even argument register. But I can deal with all that, and maybe I was just overly afraid of using expose_addr(), because it works fine with it. I just wondered whether there is a way around it.

@RalfJung
Copy link
Collaborator

RalfJung commented Jun 9, 2022

The documentation of sptr says using expose_addr() means you are not compatible with strict-provenance. Does that mean any application using FFI or syscalls that (for whatever reason) cannot use typed prototypes are not compatible to strict-provenance?

Basically, yes. If you are smuggling pointers through integer types, you are not following strict provenance. There is no way around that; an alternative kind of exposure operation would also be outside of strict provenance.

Now, the strict provenance experiment also lead to a better understanding of "permissive provenance", using expose_addr and from_exposed_addr. So even though these operations are equivalent to as casts, by using them explicitly you still explicitly make use of their proposed new semantics. It is also good to collect places where smuggling pointers through integers cannot be (easily) avoided. The rustc issue tracker collects some of these problems. I think your problem falls under rust-lang/rust#95496.

If I never intend to use from_exposed_addr(), shouldn't I be able to tell code-analysers about it? There is no reason for them to track the address then (or is there?).

You intend the kernel to use from_exposed_addr. If the code analysis would assume that from_exposed_addr was never called, that would be unsound, since then it could assume that the kernel never reads or writes this memory!

Yeah, usize is fine. I think (I haven't checked all architecture restrictions). The problem really is when u64 is used. This also affects syscalls, which can take u64 split among 2 usize registers (on 32bit), with additional restrictions like aligned to an even argument register. But I can deal with all that, and maybe I was just overly afraid of using expose_addr(), because it works fine with it. I just wondered whether there is a way around it.

Don't get me wrong, it would be nice to do some union trickery here. But I don't know enough about calling conventions to tell you what is and is not possible there. :)

@dvdhrm
Copy link
Author

dvdhrm commented Jun 10, 2022

[...]

Now, the strict provenance experiment also lead to a better understanding of "permissive provenance", using expose_addr and from_exposed_addr. So even though these operations are equivalent to as casts, by using them explicitly you still explicitly make use of their proposed new semantics. It is also good to collect places where smuggling pointers through integers cannot be (easily) avoided. The rustc issue tracker collects some of these problems. I think your problem falls under rust-lang/rust#95496.

I skimmed through these, but I have to admit I did not read them fully. Thanks for the hints! I think #95496 does indeed apply, even though I think it does not fully describe the huge scope of kernel APIs that do this. Once you start looking at driver APIs (graphics, input, sound, ...) everything is pointers stashed into u64.

If I never intend to use from_exposed_addr(), shouldn't I be able to tell code-analysers about it? There is no reason for them to track the address then (or is there?).

You intend the kernel to use from_exposed_addr. If the code analysis would assume that from_exposed_addr was never called, that would be unsound, since then it could assume that the kernel never reads or writes this memory!

Na, I meant it can assume it is never called from my context. So if it merely tracks my address space, it can assume it is exposed, but avoid putting it in lookup-trees, because no from_exposed_addr() should ever match it. Obviously, this is only true if there is no other API that returns that address in some way.

Yeah, usize is fine. I think (I haven't checked all architecture restrictions). The problem really is when u64 is used. This also affects syscalls, which can take u64 split among 2 usize registers (on 32bit), with additional restrictions like aligned to an even argument register. But I can deal with all that, and maybe I was just overly afraid of using expose_addr(), because it works fine with it. I just wondered whether there is a way around it.

Don't get me wrong, it would be nice to do some union trickery here. But I don't know enough about calling conventions to tell you what is and is not possible there. :)

I will have a look into a u64+ptr union that works even on big-endian+32bit-compat. I am still unsure how to stash this into 32bit syscall wrappers, which split u64 across arguments, but maybe I push this into the inline-asm and be fine with it.

I will close this issue. Thanks a lot for the detailed responses! I might just comment on #95496 eventually with a summary, so you guys can better track crates that require expose_addr().

@dvdhrm dvdhrm closed this as completed Jun 10, 2022
@RalfJung
Copy link
Collaborator

Na, I meant it can assume it is never called from my context. So if it merely tracks my address space, it can assume it is exposed, but avoid putting it in lookup-trees, because no from_exposed_addr() should ever match it. Obviously, this is only true if there is no other API that returns that address in some way.

Not sure which lookup-trees you mean -- the "set of exposed pointers" that the semantics tracks is entirely conceptual, it does not exist at runtime. It does affect compiler analyses, but it is crucial that those analyses do track that your pointer is exposed since otherwise, again, the compiler could assume that the kernel does not read or write that memory.

The fact that the from_exposed_addr is "in a different translation unit" or "on the other side of an FFI boundary" or however you want to think about this, is entirely irrelevant.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants