Skip to content

fn_cast! macro #140803

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
Darksonn opened this issue May 8, 2025 · 30 comments
Open

fn_cast! macro #140803

Darksonn opened this issue May 8, 2025 · 30 comments
Labels
A-control-flow-integrity Area: Control Flow Integrity (CFI) security mitigation A-rust-for-linux Relevant for the Rust-for-Linux project A-sanitizers Area: Sanitizers for correctness and code quality C-discussion Category: Discussion or questions that doesn't represent real issues. I-lang-nominated Nominated for discussion during a lang team meeting. P-lang-drag-2 Lang team prioritization drag level 2.https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang. PG-exploit-mitigations Project group: Exploit mitigations T-lang Relevant to the language team

Comments

@Darksonn
Copy link
Contributor

Darksonn commented May 8, 2025

Since Rust 1.76 we document that it's valid to transmute function pointers from one signature to another as long as their signatures are ABI-compatible. However, we have since learned that these rules may be too broad and allow some transmutes that it is undesirable to permit. Specifically, transmutes that change the pointee type or constness of a pointer argument are considered ABI-compatible, but they are rejected by the CFI sanitizer as incompatible. See rust-lang/unsafe-code-guidelines#489 for additional details and #128728 for a concrete issue.

This issue tracks a proposed solution to the above: Introduce a new macro called fn_cast! that allows you to change the signature of a function pointer. Under most circumstances, this is equivalent to simply transmuting the function pointer, but in some cases it will generate a new "trampoline" function that transmutes all arguments and calls the original function. This allows you to perform such function casts safely without paying the cost of a trampoline when it's not needed.

The argument to fn_cast!() must be an expression that evaluates to a function item or a non-capturing closure. This ensures that the compiler knows which function is being called at monomorphization time.

As a sketch, you can implement a simple version of the macro like this:

macro_rules! fn_cast {
    ($f:expr) => {
        #[cfg(not(any(sanitize = "cfi", sanitize = "kcfi")))]
        {
            // we need $f coerced to a function pointer
            core::mem::transmute::<fn(_) -> _, _>($f)
        }
        
        #[cfg(any(sanitize = "cfi", sanitize = "kcfi"))]
        {
            |arg| {
                let arg = core::mem::transmute(arg);
                let ret = $f(arg);
                core::mem::transmute(ret)
            }
        }
    };
}

This implementation should get the point across, but it is incomplete for a few reasons:

  • It assumes that the function takes one argument, but a real fn_cast! should be improved to work with functions of any arity.
  • With CFI, it always generates a trampoline using a closure. However, if this was a compiler built-in, then it could modify the list of signatures allowed by the target function so that CFI does not reject the call. The trampoline would only be needed if the function is in a different compilation unit.
  • With KCFI, we can't add signatures to the target function, but we still don't always need a trampoline. For example, changing fn(&T) to fn(*const T) is allowed because &T and *const T is treated the same by KCFI. The compiler could detect such cases and emit a transmute instead of a trampoline.

By adding this macro, it becomes feasible to make the following breaking change to the spec:

When you make a function call, then the caller and callee must agree on what the function signature is exactly. Otherwise:

  • If the signatures are ABI-compatible, then it is EB (errornours behavior). That is, similiarly to integer overflow, sanitizers such as cfi, kcfi, or miri could trigger an error when it happens. But otherwise the call is allowed through by transmuting each argument.
  • Otherwise, it is UB (undefined behavior).

Here, the change is that ABI-compatible calls are considered EB. However, even without the spec change the macro is useful because it would allow for a more efficient implementation of #139632 than what is possible today.

This proposal was originally made as a comment. I'm filing a new issue because T-lang requested that I do so during the RfL meeting 2025-05-07.

@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-control-flow-integrity Area: Control Flow Integrity (CFI) security mitigation A-rust-for-linux Relevant for the Rust-for-Linux project A-sanitizers Area: Sanitizers for correctness and code quality C-discussion Category: Discussion or questions that doesn't represent real issues. I-lang-nominated Nominated for discussion during a lang team meeting. T-lang Relevant to the language team labels May 8, 2025
@RalfJung
Copy link
Member

RalfJung commented May 8, 2025

When you make a function call, then the caller and callee must agree on what the function signature is exactly.

So in such a world, the docs for the macro would say that this generates a new function? Because otherwise it seems like this list here has to account for the macro as well.

The macro needs to be unsafe of course, since function arguments are still being transmuted. We could have the macro ensure that the signatures are ABI-compatible -- but this can only be fully checked during monomorphization.

@Darksonn
Copy link
Contributor Author

Darksonn commented May 8, 2025

Well, yes it semantically creates a new function even if it has the same address. How exactly we word that is up to debate. I guess we might not want provenance for function pointers (?), so if fn_cast! returns a fn pointer with the same address, then we probably have to say that this function is valid to call with those two signatures.

@RalfJung
Copy link
Member

RalfJung commented May 8, 2025

I guess we might not want provenance for function pointers (?)

I mean, we could.^^ But yeah it's probably better if we avoid using provenance wherever possible.

@rcvalle rcvalle added the PG-exploit-mitigations Project group: Exploit mitigations label May 9, 2025
@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 14, 2025
@traviscross traviscross added the P-lang-drag-2 Lang team prioritization drag level 2.https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang. label May 16, 2025
@traviscross
Copy link
Contributor

The macro needs to be unsafe of course, since function arguments are still being transmuted.

We have unsafe function pointers. So I wonder, should the macro call be unsafe, or should we be returning an unsafe fn(..) -> _?

We could have the macro ensure that the signatures are ABI-compatible -- but this can only be fully checked during monomorphization.

If we were to do these checks, I wonder whether we might want to support this as a coercion or as cast. E.g., we of course allow:

let _: *const () = &();

Does it make any sense to allow?:

let _: unsafe fn(*const ()) = |&()| ();

@RalfJung
Copy link
Member

I don't think I'd like to make as do even more things...

@Darksonn
Copy link
Contributor Author

The macro itself needs to be unsafe. Otherwise, how do people get a non-unsafe fn pointer? By transmuting the output of fn_cast!? The point of the macro is to get rid of the transmute.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 17, 2025

The macro needs to be unsafe of course, since function arguments are still being transmuted.

We have unsafe function pointers. So I wonder, should the macro call be unsafe, or should we be returning an unsafe fn(..) -> _?

While the function signature change itself can’t cause UB without the function being called, asking the call sites to justify the safety of the implied arg/result transmutes leads to somewhat silly consequences:

  1. The reason why the type punning is sound is usually the same at every call site, and competes for attention with other safety preconditions the underlying function may have, so you’ll basically always want to introduce a safe(r) wrapper if possible.
  2. Call sites don’t automatically know the underlying function, so they don’t even know what the type punning they have to justify is.
  3. In contrast, the code using fn_cast generally knows the original signature so it could document this (but then unsafe code elsewhere has to rely on this documentation!) or it could immediately create the safe wrapper itself (which is basically the same as if the fn_cast was unsafe to begin with).
  4. Having to define a wrapper function at all is undesirable - it produces more code and indirections that fn_cast was specifically designed to avoid when compiling without CFI.

It’s tempting to say: fn_cast is most useful for function pointers and you can just transmute from unsafe fn(..) -> _ to fn(..) -> _ if it’s still safe to call. But transmuting function pointers is precisely what fn_cast is supposed to replace! Of course it’s unlikely that some CFI scheme wants to consider safe/unsafe variants of the same signature to be incompatible. But it still sends a less consistent message to users and is easier to get wrong (accidentally change more about the signature than just the safety).

At the same time, there are cases where a safe function is type-punned into something that creates significant extra safety conditions for callers (e.g., type erasing fmt methods into fn(*const(), &mut Formatter) -> fmt::Result). For these cases, producing unsafe functions even from safe source functions is useful, and if it’s not done by fn_cast then the code using that macro again has to add wrappers.

So I think it’s probably most useful to consider “safe fn <-> unsafe fn” to be part of the type punning that fn_cast can do, and require unsafe for any invocation of fn_cast regardless of whether it produces a safe or unsafe function.

@traviscross
Copy link
Contributor

The point of the macro is to get rid of the transmute.

Interesting. That's not how I think about it. I think of the point of the built-in as being to do something a lot smarter than what's otherwise possible so as to support CFI, in terms of modifying the list of signatures when it can, generating and using a trampoline only when needed, etc. If it were just about getting rid of a transmute, I don't think we'd do this.

...and you can just transmute from unsafe fn(..) -> _ to fn(..) -> _ if it’s still safe to call.

Perhaps you could describe the use case you have in mind for when the cast function pointer will be safe to call. What's coming to my mind, in terms of practical use cases, are all ones where it would not be.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 17, 2025

Let me adjust my phrasing: yes, CFI compatibility is ultimately "the point" but I don't think this can be usefully separated from removing function pointer transmutes. To make CFI work, you need an intentional marker for "this specific function can also be called with this specific signature different from what its definition said" (which then enables e.g. generating the right trampoline if one is needed) rather than transmutes that leave you guessing whether the signature mismatch may be unintentional. Carving out a subset of such transmutes that are "still okay" after the introduction of fn_cast sounds like a bad idea: you'll still have calls not matching the callee signature, you're just hoping that this subset won't cause any problems.

Perhaps you could describe the use case you have in mind for when the cast function pointer will be safe to call. What's coming to my mind, in terms of practical use cases, are all ones where it would not be.

This example is a bit speculative for several reasons, but it's inspired by real code I'm working on. Consider a library that defines a trait for "fieldless #[repr(u8)] enum with consecutively numbered discriminants" as well as arrays/bitsets/maps generic over such enum types as array index, set element, or map key (there are several libraries like this on crates.io, mine isn't (yet)). Virtually all of the code in such a library could type-erase the enum and often also the number of variants (cf. core::array::from_fn and friends internally erasing the length), and this leads to a bunch of type-punning — including some type punning of function signatures that's safe without further consideration about the range of integers that will be passed through it. For example, if Array<I, T> is a glorified wrapper around [T; I::VARIANT_COUNT], then we might have something like the following:

impl<I: /* ... */, T> Array<I, T> {
    fn for_each_with_index_erased<F: FnMut(I, &T)>(f: F) {
        // SAFETY: `I` is a `repr(u8)` enum, so it's sound to transmute into u8
        for_each_with_index_raw(&self.0, unsafe { fn_cast!(f) })
    }
}

fn for_each_with_index_erased<T>(elems: &[T], f: fn(u8, &T)) {
    for (i, elem) in elems.iter().enumerate() {
        f(i as u8, elem);
    }
}

One reason this is speculative is I don't think there's a bound I could put on the type parameter F to make sure fn_cast can handle it (e.g., no captures and it's not a function pointer already). But this might be possible in the future, and even if not, it can be worked around by making the API much less ergonomic while keeping the safety relevant aspects intact.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 17, 2025

Another reason why the above example is speculative: the proposal says that the function being cast can’t have any captures if it’s a closure. It’s not clear to me why that restriction would be needed. Couldn’t the compiler generate another closure type that has the same captures, is ABI-compatible with the original closure type, and implement the appropriate Fn* traits for it by fn_cast-ing away the difference in receiver type and other parameters? The resulting closure type won’t be convertible to a function pointer either, but it could be used as trait object in the same way a fn_cast’d function pointer can be used for a manually constructed vtable.

Of course this doesn’t have to be part of the initial feature but I’d like to know if it’s possible in principle or if there’s a fundamental problem I’m missing.

@RalfJung
Copy link
Member

RalfJung commented May 18, 2025

In discussion with @Darksonn I toyed the idea that repr(transparent) could still be allowed to differ across caller and callee even without using the macro (i.e. Wrapper<T> on one side and T on the other) -- that is apparently trivial for CFI to handle, and it'd reduce the amount of churn needed in the ecosystem to adjust to this change.

@traviscross
Copy link
Contributor

Interesting. As context for when we take up this nomination, @RalfJung, it'd probably be helpful if you could perhaps elaborate on that a bit.

@RalfJung
Copy link
Member

RalfJung commented May 26, 2025

Not sure what to elaborate on? The question is which transmutes are allowed on the fn ptr you get out of the fn_cast!. What exactly is required by the time the function is called -- an exact match of the signature of caller and callee (where "callee" here is the shim generated by the fn_cast! macro recording the types used for that macro invocation), or some sort of slightly fuzzy match?

We could say that repr(transparent) mismatches are still allowed at that point.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 26, 2025

Would this carve-out for repr(transparent) mismatches still be symmetric and transitive as in the current ABI compatibility rules? For example, suppose one library defines a transparent wrapper A(NonZeroU32) and another unrelated library B defines a transparent wrapper B(u32). Is is feasible and useful to allow some transmutes between some of fn(A), fn(B), fn(u32), and fn(NonZeroU32) without allowing transmutes between all pairs of them?

@RalfJung
Copy link
Member

RalfJung commented May 27, 2025

@hanna-kruppe It would be symmetric and transitive, yes. CFI would simply "skip" repr(transparent) wrappers and use the name of their non-1-ZST field as the name of the type for this argument, thus making all repr(transparent) wrappers around the same type mutually compatible (including nested cases). The one potential problem I can imagine is that this forces us to give all 1-ZST the same ABI -- but the C ABI doesn't really say anything about 1-ZST since they do not exist in C, so we get to decide that ABI for ourselves and can ensure this property is maintained.

And of course it reduces the effectiveness of CFI in case these mismatches are unintended.


@traviscross to elaborate a bit more, basically the goal is to keep code like this fully supported:

fn foo(x: i32) -> i32 { x }

fn main() {
  let ptr: fn(i32) -> i32 = foo;
  let ptr: fn(NonZeroI32) -> NonZeroI32 = unsafe { transmute(ptr) };
  ptr(NonZeroI32::new(15).unwrap());
}

repr(transparent) is the ABI guarantee we have promised for the longest time, so it might be worth keeping it around.

The alternative is to say that even for this case, the program is considered to have erroneous behavior and one must use fn_cast!.

@traviscross
Copy link
Contributor

traviscross commented May 27, 2025

It's an interesting tradeoff. The more things that we want to work in that way, the more we weaken the CFI protections. Given our story about safety and security and whatnot, it would be kind of embarrassing if some exploit chain ended up leaning on NonZero<u8> and u8 being CFI-equivalent in order to work. (By that same argument, I suppose, we could also wonder about whether CFI-unifying fn(&T) and fn(*const T) is desirable, though there are clearly other FFI tradeoffs in that case.)

Probably my sense is that people turning on CFI are going to prefer the most precise checking possible and probably will be willing to accept the churn and other work necessary for that.

On the other hand, as a language matter, the alternate model of making everything CFI equivalent that we consider ABI equivalent does have some appeal.

@hanna-kruppe
Copy link
Contributor

The other question is how much ABI compatibility for transparent wrappers is actually used in the ecosystem. As far as I know the biggest user by far of the current ABI compatibility rules is core::fmt and similar type erasure schemes, which type-pun pointee types rather than transparent wrappers. @RalfJung do you have some examples of crates that would need to adopt fn_cast! if transparent wrappers weren’t ABI compatible but wouldn’t need any changes if transparent wrappers were ABI compatible?

@RalfJung
Copy link
Member

No, I don't know any examples. I assume we'll start seeing some once Miri enforces this.

@joshtriplett
Copy link
Member

Not a blocker for adding fn_cast!, but I'd love to see a lint for these transmutes, when we can catch them. The lint could mention that this is incompatible with CFI, and suggest fn_cast!.

@scottmcm
Copy link
Member

Thought I had from the lang meeting: why isn't it ok to just normalize at least all the things in https://doc.rust-lang.org/std/primitive.fn.html#abi-compatibility from a CFI perspective?

@RalfJung
Copy link
Member

RalfJung commented May 28, 2025

See rust-lang/unsafe-code-guidelines#489 for additional details and #128728 for a concrete issue. The short summary is that security people say that kind of normalization would leave too many doors open for an attacker to elevate UB elsewhere in the code into a full exploit.

@Darksonn
Copy link
Contributor Author

Darksonn commented May 28, 2025

@scottmcm Normalizing all those things will normalize so many things that it leaves CFI as almost a no-op in practice.

@workingjubilee
Copy link
Member

workingjubilee commented May 28, 2025

I believe we already do some such "normalization" in cases where Rust already considers it important to simply encode a T the same way we encode U, but I believe finding all such cases would require a thorough audit of the code. I believe it would be beneficial regarding T-lang making a decision here if we had the hard list of what the new "ABI compatibility rules for CFI" would be, and if those implementing Rust's CFI had any specific desires regarding that changing in the future or if they were happy committing to preserving that compatibility (or more?).

I believe one of the consequences of that was the repr(transparent) case that @Darksonn (well, @RalfJung) mentioned.

I could dig up the list of effective rules and originally intended to, but I can't provide the answer to what would be desired in the future in terms of mutations on that list.

@scottmcm
Copy link
Member

so many things that it leaves CFI as almost a no-op in practice.

I do wonder how much this is inherently tied to C++'s whole idea of typed memory and such. Is it possible that in a language without strict aliasing and such, CFI is just inherently the wrong solution?

@Darksonn
Copy link
Contributor Author

if we had the hard list of what the new "ABI compatibility rules for CFI" would be, and if those implementing Rust's CFI had any specific desires regarding that changing in the future or if they were happy committing to preserving that compatibility (or more?).

Yes, but I want to make one point here:

It's important to distinguish between the rules for CFI, and what transmutes we consider ABI compatible in the language. The point of #128728 is to make it so that if CFI rejects something, then the language must consider that case erroneous behavior (or UB). But it's okay for the language to disallow something that CFI accepts.

CFI accepts things such as fn(&T) vs fn(*const T). It also accepts fn(u64) vs fn(usize) on 64-bit platforms. That doesn't mean we want to permanently allow those in the language. Maybe there's a future variant of CFI that wants to catch more cases, and our rules are preferably strict enough that "future CFI" also only disallows what the language disallows. That was the motivation for suggesting "only allow exact matches".

I do wonder how much this is inherently tied to C++'s whole idea of typed memory and such. Is it possible that in a language without strict aliasing and such, CFI is just inherently the wrong solution?

I disagree. It isn't really about strict aliasing. The kernel quite explicitly passes -fno-strict-aliasing and considers strict aliasing completely invalid, but they still developed an entirely new variant of CFI called kCFI just for the kernel. Checking the signature is a heuristic that works very well in practice.

@Darksonn
Copy link
Contributor Author

Darksonn commented May 28, 2025

As for #[repr(transparent)], I think that we treat it like its inner type in CFI, but I'm not sure I ever tested it. One wrinkle is that

#[cfi_encoding = "l"]
#[repr(transparent)]
pub struct c_long(pub core::ffi::c_long);

is no longer treated like the inner type by CFI despite the #[repr(transparent)] annotation, so we have to somehow allow for that.

@RalfJung
Copy link
Member

RalfJung commented May 28, 2025

It's important to distinguish between the rules for CFI, and what transmutes we consider ABI compatible in the language.

Right, I was about to say the same thing. :) What we are defining here is the ceiling of what any future CFI mechanism can do while being compatible with Rust's semantics. It's entirely fine to declare things to be EB due to ABI mismatch that no current CFI tool (except for Miri) can catch because the distinction is lost before reaching codegen.

@scottmcm I don't think this is very tied to a typed idea of memory, though it may come more natural in a language with typed memory. It's about mismatches between caller and callee during argument passing, which is not an in-memory operation, it's its own magic thing.

If I were to define argument passing in MiniRust without any regard for reality, I would say it works like a transmute, and thus allow arbitrary signature mismatches as long as the size matches. This is a terrible model since many real-world ABIs do not actually correctly implement these semantics. In that sense, our ABIs already integrate a notion of "type" more deeply than Rust's memory model does -- it's more like we are passing values to functions, not in-memory representations. Now we basically have two options:

  • We say "screw this" and require an exact match of caller and callee type. This is the most principled approach. It's also incompatible with what the formatting machinery does, which is why I did not implement it in Miri. But @Darksonn's proposal gets us as close to this as we can, by making the deviation from this model explicit.
  • We try to find some reasonable middle-ground of thing that are possible in every "reasonable" ABI. This requires fieldwork to ensure we don't allow too much. (Though TBF the first option also requires this fieldwork when defining what is or is not permitted with the explicit fn_cast!.)

Maybe if there was a practical "memory type integrity" sanitizer enforcing C++'s ideas of typed memory, people would ask for Rust to be compatible with it somehow. But that's not a thing, and control flow integrity is much more narrow in scope -- I think most Rust code will be compatible with the EB rules described here without any adjustment. Function pointer transmutes are quite rare.

@RalfJung
Copy link
Member

RalfJung commented May 28, 2025

Speaking of naive idealized semantics, we could use this to completely hide what ABI can and cannot do from the spec. We could say that by using fn_cast!, we do actually allow arbitrary mismatches between caller and callee types as long as the size is the same (which we could even enforce in a post-mono check), and the arguments are passed as-if via a transmute. It would be the compiler's responsibility to recognize when the signatures are ABI-incompatible and generate a shim that does an explicit transmute.

From a spec perspective I would quite like this actually. :)

@Darksonn
Copy link
Contributor Author

Darksonn commented May 28, 2025

I love that. Then the CFI case becomes just another ABI with its own weird rules. Is treating u64 and f64 differently that unlike treating &Foo and &Bar differently?

After all, CFI is already considered a target modifier, i.e. an ABI-modifying flag.

@traviscross
Copy link
Contributor

Agreed. That is an appealing framing.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-control-flow-integrity Area: Control Flow Integrity (CFI) security mitigation A-rust-for-linux Relevant for the Rust-for-Linux project A-sanitizers Area: Sanitizers for correctness and code quality C-discussion Category: Discussion or questions that doesn't represent real issues. I-lang-nominated Nominated for discussion during a lang team meeting. P-lang-drag-2 Lang team prioritization drag level 2.https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang. PG-exploit-mitigations Project group: Exploit mitigations T-lang Relevant to the language team
Projects
None yet
Development

No branches or pull requests

10 participants