-
-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Remove zerocopy from rand #1579
base: master
Are you sure you want to change the base?
Conversation
Results show that some Simd types are 2-4 times as expensive as expected
Results in few minor regressions and two large improvements in benchmarks: -72% time for u64x2, -50% for u32x4.
Code gen is identical and benchmarks unaffected.
…_parts_mut Mostly code gen appears equivalent, though it affects inlining of u32x4 gen with SmallRng. Benchmarks are not significantly affected.
For those cases where you just call I have been proposing this for ahash: tkaitchuck/aHash#253 I'm not sure if this is a great idea, but it's I think a compromise that has some value. |
I should clarify that Project Safe Transmute will likely never replace zerocopy/bytemuck, but just replace their derives (zerocopy-derive and bytemuck-derive). Some very limited functionality may exist directly in the standard library, but we think of Safe Transmute as mostly being a building block that makes it easier to write sound unsafe code, not a building block that permits you to avoid writing unsafe code entirely. I suspect this doesn't change the calculus here, but I figured it was worth mentioning. |
// x86 is little endian so no need for conversion | ||
|
||
// SAFETY: both source and result types are valid for all values. | ||
unsafe { core::mem::transmute(buf) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if here (and elsewhere) we should be using the various unaligned-load intrinsics:
instead of a raw transmute
. However, they seem to produce identical assembly (AVX2 256-bit Godbolt example), so I guess it's just a matter of taste.
// This macro is unsafe to call: target types must support transmute from | ||
// random bits (i.e. all bit representations are valid). | ||
macro_rules! unsafe_impl_fill { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible for the various integer types to implement a "Safe to mem::transmute from raw bytes" sealed marker trait, and then implement this via a generic impl instead of a macro?
rng.fill_bytes(&mut buf); | ||
// x86 is little endian so no need for conversion | ||
|
||
// SAFETY: both source and result types are valid for all values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things to note:
- This should say that any bit-valid
[u8; N]
is also a bit-valid__m128i
("all values" technically doesn't preclude uninitialized bytes, which are unsound to transmute to a__m128i
); it should also say that they're the same size (whilemem::transmute
wouldn't permit this to compile, transmuting a value of a smaller type to one of a larger type would implicitly leave the trailing bytes uninitialized) - Being pedantic, this should provide a citation for the claim that any byte values are valid for
__m128i
; feel free to rip off our safety comment for this: https://github.com/google/zerocopy/blob/dccbbcd8714eebeeac794ca5dd1ba1cbecc09ea2/src/impls.rs#L815-L879
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to be super pedantic and use MiniRust terminology, "value" refers to high-level concepts like mathematical integers, not to raw byte sequences stored in memory. Being "valid" is a property of byte sequences; invalid byte sequences do not even correspond to a value, and therefore there are no "invalid values". The key for transmute safety is that all representations (i.e., byte sequences representing a value) of the source are also representations of the target.
That's how I would phrase it, anyway. We haven't officially adopted this terminology I think, and we probably talk about "invalid values" in many places.
// This macro is unsafe to call: target types must support transmute from | ||
// random bits (i.e. all bit representations are valid). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be precise: In order to call this for a type, T
, it must be the case that mem::transmute::<[u8; size_of::<T>()], T>(...)
is sound for all values. Note that this precludes uninitialized bytes.
Also, to be pedantic, this should be a doc comment with a # Safety
section.
slice::from_raw_parts_mut(self.as_mut_ptr() | ||
as *mut u8, | ||
mem::size_of_val(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let @RalfJung chime in to confirm, but I believe this is unsound. In particular, you aren't allowed to interleave uses of raw pointers and references to the same object. You can fix this by moving the mem::size_of_val(self)
above the call to self.as_mut_ptr()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, size_of_val(self)
does a reborrow so doing that between the creation of the self.as_mut_ptr()
pointer and its use is questionable. SB accepts this due to a terrible hack that I am trying to phase out as it causes other problems. TB actually properly accepts this.
fn fill<R: Rng + ?Sized>(&mut self, rng: &mut R) { | ||
if self.len() > 0 { | ||
rng.fill_bytes(self.as_mut_bytes()); | ||
rng.fill_bytes(unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safety comment? You'll need to justify that you've upheld this safety contract. You may find the following to be a helpful rephrasing of similar requirements: https://doc.rust-lang.org/1.82.0/std/ptr/index.html#pointer-to-reference-conversion
I'd recommend doing #![deny(clippy::undocumented_unsafe_blocks)]
for the whole crate.
unsafe_impl_fill!(u16, u32, u64, u128,); | ||
unsafe_impl_fill!(i8, i16, i32, i64, i128,); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safety comments?
You may want to copy from what we've done in zerocopy (specifically for FromBytes
).
CHANGELOG.md
entrySummary
Replace
zerocopy
dependency withunsafe
code (up from 12 to 17 instances).Add benchmarks for some SIMD / wide types.
Remove two
#[inline(never)]
attributes which were apparently motivated by benchmark results, but caused more harm than help with the new benches.Motivation
zerocopy
I'm not a big fan of this, but together with #1575 it removes the dependency on
zerocopy
v0.8, so is probably an improvement.Project Safe Transmute
If this project lands safe transmute support into the standard library, we would of course want to use that.
Details
Replacing
zerocopy::transmute!
withcore::mem::transmute
is easy and results in identical code generation (tested withStdRng
andSmallRng
); this reverts a change in #1349.Replacing the
fill
impls is more complex but I believe acceptable; this reverts a change in #1502.In both cases, this would have resulted in a usage of
unsafe
in a macro where safety depends on a type passed by the macro caller. In the first case I decided to inline the three macro usages while in the second I prefixed the macro name withunsafe_
.Benchmark results
Unfinished business?
The
Simd
andm128i
etc. type generation should be equivalent, but they're not in terms of code; theSimd
impls currently usefill
to avoid moreunsafe
code here.Notice from the above that
u32x4
,u16x8
andu8x16
are the same size asu128
andm128i
but cost about twice as much to generate here. This indicates thefill
code may be sub-optimal.Additionally, the
m128i
impl performed even worse when transmuting au128
value (~4.3ns or +%130) which, as far as I can tell, is purely because theu128
value is returned viarax, rdx
while the__m128i
value is returned viardx, r10
(withrax
equal to the struct address). I don't understand this.