-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fix TLB bug with zeroed upper bits #496
Conversation
This has a very significant effect on the emulator performance with virtual memory enabled (e.g. when booting Linux). Having a more realistic TLB can also help expose bugs, like the recent bug in riscv#496. For example with a larger TLB you can add an assertion in `translate()`. It also allows you to detect missing `sfence.vma`s in software by asserting that the TLB translation is the same as the uncached translation. This is not implemented in this change but we can do it in future (or you can just hack it in when needed). The detection will never be 100% but the bigger the TLB the more missing fences you will detect. 64 was chosen based on benchmarks I did last year. Performance when booting Linux keeps going up until you get to 64.
We should get this fixed, but I think the question we need to answer is how the upper bits of the
also clear that same VA? |
From the priv spec:
|
Of course, one is free to mask off the high bits too, since it's indistinguishable from an implementation that decided to evict that entry from its TLB of its own accord at that exact moment. |
That should be the case because only valid virtual addresses make it into the TLB. Unless... you switch virtual memory modes (e.g. Sv57->Sv39) then do In any case I think it's ok because this note contradicts the quote above:
Not a very clearly written part of the spec, but I think as long as we invalidate at least all the matching entries then it is compliant. |
I think it should really say:
|
These lines of code zeroed the top 25 bits of an Sv39 address before translating it. That meant an address like 0xFFFFFFFFFFFFFFFF would be stored in the TLB as 0x0000007fffffffff. This caused a bug with `sfence.vma` when the `rs1` (virtual address) argument was not zero. If you did `sfence.vma 0xFFFFFFFFFFFFFFFF, x0` it should clear the TLB entry but it doesn't because it naively checks `0xFFFFFFFFFFFFF000 == 0x0000007ffffff000`. Currently there is only one TLB entry so it would need to do an `sfence.vma` for the page that was currently executing to be visible, otherwise the next fetch would clear it anyway. An alternative fix would be to clear the upper 25 bits of `vMatchMask`, but this is simpler.
This has a very significant effect on the emulator performance with virtual memory enabled (e.g. when booting Linux). Having a more realistic TLB can also help expose bugs, like the recent bug in riscv#496. For example with a larger TLB you can add an assertion in `translate()`. It also allows you to detect missing `sfence.vma`s in software by asserting that the TLB translation is the same as the uncached translation. This is not implemented in this change but we can do it in future (or you can just hack it in when needed). The detection will never be 100% but the bigger the TLB the more missing fences you will detect. 64 was chosen based on benchmarks I did last year. Performance when booting Linux keeps going up until you get to 64.
PR to update the spec wording: riscv/riscv-isa-manual#1510 |
This has a very significant effect on the emulator performance with virtual memory enabled (e.g. when booting Linux). Having a more realistic TLB can also help expose bugs, like the recent bug in riscv#496. For example with a larger TLB you can add an assertion in `translate()`. It also allows you to detect missing `sfence.vma`s in software by asserting that the TLB translation is the same as the uncached translation. This is not implemented in this change but we can do it in future (or you can just hack it in when needed). The detection will never be 100% but the bigger the TLB the more missing fences you will detect. 64 was chosen based on benchmarks I did last year. Performance when booting Linux keeps going up until you get to 64.
debug: Only run pylint if debug files changed.
These lines of code zeroed the top 25 bits of an Sv39 address before translating it. That meant an address like 0xFFFFFFFFFFFFFFFF would be stored in the TLB as 0x0000007fffffffff. This caused a bug with
sfence.vma
when thers1
(virtual address) argument was not zero. If you didsfence.vma 0xFFFFFFFFFFFFFFFF, x0
it should clear the TLB entry but it doesn't because it naively checks0xFFFFFFFFFFFFF000 == 0x0000007ffffff000
.Currently there is only one TLB entry so it would need to do an
sfence.vma
for the page that was currently executing to be visible, otherwise the next fetch would clear it anyway.An alternative fix would be to clear the upper 25 bits of
vMatchMask
, but this is simpler.