Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Direct copy specialization #65

Merged
merged 16 commits into from
Apr 24, 2024
Merged

Conversation

james7132
Copy link
Contributor

@james7132 james7132 commented Mar 2, 2024

Mostly resolves the issues found in #53. This PR specializes writes, reads, and creates of types by directly using slice::copy_from_slice on them if compiling for a little endian target, and only if the type has no internal padding.

Checking the generated assembly, this eliminates a huge number of branches and even starts using vectorized copies and calls to memcpy directly for larger types.

@james7132 james7132 marked this pull request as ready for review March 23, 2024 05:35
pcwalton added a commit to pcwalton/bevy that referenced this pull request Mar 23, 2024
`StorageBuffer`, and make `GpuArrayBuffer` use it.

`EncasedBufferVec` is like `BufferVec`, but it doesn't require that the
type be `Pod`. Alternately, it's like `StorageBuffer<Vec<T>>`, except it
doesn't allow CPU access to the data after it's been pushed.
`GpuArrayBuffer` already doesn't allow CPU access to the data, so
switching it to use `EncasedBufferVec` doesn't regress any functionality
and offers higher performance.

Shutting off CPU access eliminates the need to copy to a scratch buffer,
which results in significantly higher performance. *Note that this needs
teoxoy/encase#65 from @james7132 to achieve
end-to-end performance benefits*, because `encase` is rather slow at
encoding data without that patch, swamping the benefits of avoiding the
copy. With that patch applied, and `#[inline]` added to `encase`'s
`derive` implementation of `write_into` on structs, this results in a
*16% overall speedup on `many_cubes --no-frustum-culling`*.

I've verified that the generated code is now close to optimal. The only
reasonable potential improvement that I see is to eliminate the zeroing
in `push`. This requires unsafe code, however, so I'd prefer to leave
that to a followup.
@james7132 james7132 force-pushed the direct-copy-specialization branch from 2901ec1 to 4ee8377 Compare March 28, 2024 04:12
@james7132
Copy link
Contributor Author

I think this is about as clean/low-unsafe as I can get this without negatively impacting the performance again. @teoxoy this should be ready for review now.

@teoxoy
Copy link
Owner

teoxoy commented Apr 2, 2024

@james7132 I pushed a commit to the PR could you give it a look?

@james7132
Copy link
Contributor Author

I actually tried to deduplicate code like this, but apparently the optimizer doesn't like implicit if-returns as replacements for if/elses. I'll check the assembly output of this later.

@james7132
Copy link
Contributor Author

I think the Pod changes are fine, but the cleanup in how the branching is handled the compiler really doesn't like: james7132/encase_asm_tests@f936005#diff-ce2d3c1fb388fe83071754963a5343392f61ff8b7d7465aa6d2ba68dfe31993b.

This shows a regression in codegen for Vec-like writes. Though I think this comes down to the fact that it cannot inline the reservation, and thus it's assertion that there's enough space does not carry over, unlike the slice writes.

@james7132
Copy link
Contributor Author

but apparently the optimizer doesn't like implicit if-returns as replacements for if/elses. I'll check the assembly output of this later.

Ah I think I see why, it views the non-memcpy writes/reads as a potential side effect, and thus cannot treat the return as a replacement for an else to the if.

@teoxoy
Copy link
Owner

teoxoy commented Apr 14, 2024

@james7132 I pushed another commit to address the issue. Let me know what you think!

@james7132
Copy link
Contributor Author

Doesn't seem to make a tangible difference: james7132/encase_asm_tests@4399891. Might be due to one of the is_pod changes. It might be better to just merge this as is and close the gap in a later PR.

@teoxoy teoxoy merged commit 83e744a into teoxoy:main Apr 24, 2024
7 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants