-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Direct copy specialization #65
Conversation
`StorageBuffer`, and make `GpuArrayBuffer` use it. `EncasedBufferVec` is like `BufferVec`, but it doesn't require that the type be `Pod`. Alternately, it's like `StorageBuffer<Vec<T>>`, except it doesn't allow CPU access to the data after it's been pushed. `GpuArrayBuffer` already doesn't allow CPU access to the data, so switching it to use `EncasedBufferVec` doesn't regress any functionality and offers higher performance. Shutting off CPU access eliminates the need to copy to a scratch buffer, which results in significantly higher performance. *Note that this needs teoxoy/encase#65 from @james7132 to achieve end-to-end performance benefits*, because `encase` is rather slow at encoding data without that patch, swamping the benefits of avoiding the copy. With that patch applied, and `#[inline]` added to `encase`'s `derive` implementation of `write_into` on structs, this results in a *16% overall speedup on `many_cubes --no-frustum-culling`*. I've verified that the generated code is now close to optimal. The only reasonable potential improvement that I see is to eliminate the zeroing in `push`. This requires unsafe code, however, so I'd prefer to leave that to a followup.
2901ec1
to
4ee8377
Compare
I think this is about as clean/low-unsafe as I can get this without negatively impacting the performance again. @teoxoy this should be ready for review now. |
@james7132 I pushed a commit to the PR could you give it a look? |
I actually tried to deduplicate code like this, but apparently the optimizer doesn't like implicit if-returns as replacements for if/elses. I'll check the assembly output of this later. |
I think the Pod changes are fine, but the cleanup in how the branching is handled the compiler really doesn't like: james7132/encase_asm_tests@f936005#diff-ce2d3c1fb388fe83071754963a5343392f61ff8b7d7465aa6d2ba68dfe31993b. This shows a regression in codegen for Vec-like writes. Though I think this comes down to the fact that it cannot inline the reservation, and thus it's assertion that there's enough space does not carry over, unlike the slice writes. |
Ah I think I see why, it views the non-memcpy writes/reads as a potential side effect, and thus cannot treat the return as a replacement for an else to the if. |
@james7132 I pushed another commit to address the issue. Let me know what you think! |
Doesn't seem to make a tangible difference: james7132/encase_asm_tests@4399891. Might be due to one of the |
Mostly resolves the issues found in #53. This PR specializes writes, reads, and creates of types by directly using
slice::copy_from_slice
on them if compiling for a little endian target, and only if the type has no internal padding.Checking the generated assembly, this eliminates a huge number of branches and even starts using vectorized copies and calls to
memcpy
directly for larger types.