Skip to content

core intrinsics that should take void but take *u8 #199

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
gnzlbg opened this issue Nov 20, 2017 · 8 comments
Closed

core intrinsics that should take void but take *u8 #199

gnzlbg opened this issue Nov 20, 2017 · 8 comments

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 20, 2017

Using c_void requires the std::os::raw module.

This should be fixed in Rust upstream, the tracking issue is: rust-lang/rust#36193

The following intrinsics should use c_void on its API, but use *u8 instead so that they can be exposed in coresimd:

  • void _mm_clflush (void const* p)
  • xrstor
  • xrstor64
  • xrstors
  • xrstors64
  • xsave
  • xave64
  • xsaveopt
  • xasveopt64
  • xsaves
  • xsaves64
  • xsavec
  • xsavec64
@parched
Copy link

parched commented Nov 21, 2017

What about defining a 512 byte long, 16 byte aligned type to be used for all x*? It would be much safer than c_void or *u8. Or is that deviating too much from the exact c prototype for the intrinsic?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 21, 2017

@parched

If you meant to define that for fxrs-family of intrinsics (FXSAVE and FXRSTOR) that is probably a good idea although I am not 100% sure. Technically one needs a 512-byte buffer, but the docs of FXSAVE say:

Bytes 464:511 are available to software use. The processor does not write to bytes 464:511 of an FXSAVE area.

And the docs of FXRSTOR say:

FXRSTOR ignores the content of bytes 464:511 in an FXSAVE state image.

So technically, while one need a 512-byte image to call FXRSTOR (I interpret the docs as "those bytes are ignored but read), one can still store images using FXSAVE into a smaller buffer.

So... if you convince me that it always works for fxrs I think it would be a good idea to do it there.


However, this doesn't help you a lot with xsave and friends because on CPUs with AVX (or AVX-512) the size of the xsave area is "customizable", and can vary from 512 up to 2560 depending on the mask and optimizations used (xsaves/xsavec/xsaveopt...). I don't think we could even do it for xsave since whether the CPU has AVX or AVX-512 might not be known till run-time.

@parched
Copy link

parched commented Nov 21, 2017

Ah ok right, I was looking at https://software.intel.com/en-us/cpp-compiler-18.0-developer-guide-and-reference-xsave-/-xsavec-/-xsaves but that appears to be wrong, more than 512 bytes is needed. Probably just best to leave it as a byte pointer then. It would be nice to encode the alignment requirement some how though, or does that vary with size too?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 21, 2017

It would be nice to encode the alignment requirement some how though, or does that vary with size too?

That does not vary with size, its 16-byte alignment for FXSAVE/FXRSTOR and 64-byte alignment for all other ones.

Is there an easy way to encode the alignment requirement?

Ah ok right, I was looking at https://software.intel.com/en-us/cpp-compiler-18.0-developer-guide-and-reference-xsave-/-xsavec-/-xsaves but that appears to be wrong, more than 512 bytes is needed.

Yes that is not right in general, e.g., xsave cannot return the AVX2 registers in 512 bytes, it needs 800 bytes for that. It is only right if you have a CPU that doesn't have AVX2, then it should be equivalent to FXSAVE, but maybe faster since you can use the mask to decide which registers you want to save and you might not want to save all of them.

@alexcrichton
Copy link
Member

I believe the current status of this issue is that we've since added automatic verification of all intrinsics and their signatures. We have an explicit mapping that allows *mut u8 to map to void* in C (or what Intel specifies).

So at least in that sense we're consistently inconsistent with Intel's specification! My guess is that we're likely to stick with this, but that's at least where we're at!

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jan 29, 2018

I think it would be better to just map void* to c_void and be consistent with the spec.

Providing c_void in core is a problem worth solving anyways. Maybe we could use simd-verify to generate an exhaustive list of intrinsics running into this, and just not stabilize those at first?

@alexcrichton
Copy link
Member

It's true yeah if we had c_void in libcore I think there'd be an obvious choice of what to do. I'm also ok skipping stabilizing this if we feel like we really want to wait for c_void, although we'd want to audit the set of intrinsics to make sure they're not too desirable.

@alexcrichton
Copy link
Member

I'm going to close this in favor of rust-lang/rfcs#2325 where I think we'll decide that either "u8 is fine" or we'll vendor c_void in libcore.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants