Skip to content

bug: very high CPU usage from events server (Rust 1.78.0 regression) #341

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
insertish opened this issue Sep 9, 2024 · 9 comments
Open
Assignees
Labels
help wanted Extra attention is needed

Comments

@insertish
Copy link
Member

What happened?

Something changed between 20240805-1 and 20240829-3 that is causing very high CPU usage.

@insertish insertish added the bug label Sep 9, 2024
@github-project-automation github-project-automation bot moved this to 🆕 Untriaged in Revolt Project Sep 9, 2024
@insertish
Copy link
Member Author

insertish commented Sep 9, 2024

nominal:
image

problematic:
image

@insertish
Copy link
Member Author

insertish commented Sep 9, 2024

Configurations tested:

Image Rust Version Base Image
20240805-1 1.70.0 debian:bullseye-slim
20240829-3, 20240830-1 1.80.1 gcr.io/distroless/cc-debian12:nonroot
20240909-1-debug 1.80.1 debian:bookworm-slim
20240909-4-debug 1.70.0 gcr.io/distroless/cc-debian12:nonroot No build
20240909-5-debug 1.76.0 gcr.io/distroless/cc-debian12:nonroot
20240909-6-debug † 1.79.0 gcr.io/distroless/cc-debian12:nonroot
20240909-7-debug 1.77.0 gcr.io/distroless/cc-debian12:nonroot
20240909-8-debug 1.78.0 gcr.io/distroless/cc-debian12:nonroot
20240909-9-debug 1.77.2 gcr.io/distroless/cc-debian12:nonroot

† Image overwritten by 7-debug by accident

@insertish
Copy link
Member Author

All signs point to a regression in Rust currently.

@insertish
Copy link
Member Author

Something changed in Rust 1.78.0 that is causing very high CPU usage: https://releases.rs/docs/1.78.0/

@insertish insertish changed the title bug: very high CPU usage from events server bug: very high CPU usage from events server (Rust 1.78.0 regression) Sep 9, 2024
@insertish insertish moved this from 🆕 Untriaged to 💡 Open in Revolt Project Sep 29, 2024
@insertish insertish added the help wanted Extra attention is needed label Sep 29, 2024
@insertish insertish pinned this issue Oct 28, 2024
@insertish insertish removed the bug label Nov 28, 2024
@phazeschift
Copy link
Contributor

I narrowed down part of the problem to this commit in Rust. If I revert the changes in that commit on top Rust 1.85, the high cpu usage goes away. Next step is to find which dependency is getting affected by that change.

@phazeschift
Copy link
Contributor

Unfortunately this regression is not stable: various build options can make it disappear or reappear. For example --emit=mir or one of the other intermediate formats mask the regression. That might be because those uses of emit force the compiler to use codegen-units=1. From this comment I gather that with multiple codegen units there are separate vtables for each unit, making the new version of will_wake less likely to guess correctly.
So one workaround is to use codegen-units=1. Another workaround seems to be use lto=true (or equivalently, "fat"), but that might have caused issues in the past.

@IAmTomahawkx IAmTomahawkx moved this from 🕒 Backlog to 🏗 In Progress in Revolt Project Apr 13, 2025
@IAmTomahawkx IAmTomahawkx self-assigned this Apr 13, 2025
@IAmTomahawkx
Copy link
Member

IAmTomahawkx commented Apr 13, 2025

Due to upgrades of dependancies we're unable to keep using older rust version, so this has become a high priority to solve.
I'm not particularly advanced in rust, so I'd love any support I can get in solving this.

Another workaround seems to be use lto=true (or equivalently, "fat"), but that might have caused issues in the past.

Yeah this was causing our github workers to run out of memory and subsequently crash. The logs are expired now, but you can see the termination reason (here)[https://github.com/revoltchat/backend/actions/runs/12522200721/job/34930244359] (under annotations).
I'll start looking into the other suggestions such as codegen-units and --emit. Thanks for taking the time to look into this!

@jyn514
Copy link

jyn514 commented Apr 24, 2025

i suspect this is "just" a bug and the Waker commit needs to be reverted. the fact that it's related to codegen units is very convincing to me, that should not affect observable behavior.

cc @tmiasko

@jyn514
Copy link

jyn514 commented Apr 24, 2025

if you want to try and keep the second optimization, you could try left.data == right.data && (ptr::eq(left.vtable, right.vtable) || left.vtable == right.vtable). but given that this is triggering so often for Revolt i am not sure how often the ptr::eq thing will save time.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
help wanted Extra attention is needed
Projects
Status: 🏗 In Progress
Development

No branches or pull requests

4 participants