Skip to content

improve cold_path() #133852

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Feb 18, 2025
Merged

improve cold_path() #133852

merged 1 commit into from
Feb 18, 2025

Conversation

x17jiri
Copy link
Contributor

@x17jiri x17jiri commented Dec 4, 2024

#120370 added a new instrinsic cold_path() and used it to fix likely and unlikely

However, in order to limit scope, the information about cold code paths is only used in 2-target switch instructions. This is sufficient for likely and unlikely, but limits usefulness of cold_path for idiomatic rust. For example, code like this:

if let Some(x) = y { ... }

may generate 3-target switch:

switch y.discriminator:
0 => true branch
1 = > false branch
_ => unreachable

and therefore marking a branch as cold will have no effect.

This PR improves cold_path() to work with arbitrary switch instructions.

Note that for 2-target switches, we can use llvm.expect, but for multiple targets we need to manually emit branch weights. I checked Clang and it also emits weights in this situation. The Clang's weight calculation is more complex that this PR, which I believe is mainly because switch in C/C++ can have multiple cases going to the same target.

@rustbot
Copy link
Collaborator

rustbot commented Dec 4, 2024

r? @estebank

rustbot has assigned @estebank.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 4, 2024
@x17jiri
Copy link
Contributor Author

x17jiri commented Dec 20, 2024

Just found that this also works:

	let new_buf = Global.allocate(layout).map_err(|_| {
		cold_path();
		Error::new_alloc_failed("Cannot allocate memory.")
	})?;

As long as the closure is inlined, branch weights will be emited.

@estebank
Copy link
Contributor

I'm not the best person to review this, sorry

r? compiler

@rustbot rustbot assigned petrochenkov and unassigned estebank Dec 31, 2024
@bors
Copy link
Collaborator

bors commented Jan 2, 2025

☔ The latest upstream changes (presumably #130060) made this pull request unmergeable. Please resolve the merge conflicts.

@petrochenkov
Copy link
Contributor

r? compiler

@rustbot rustbot assigned lcnr and unassigned petrochenkov Jan 2, 2025
@lcnr
Copy link
Contributor

lcnr commented Jan 6, 2025

maybe r? @nikic 😅

@saethlin
Copy link
Member

Anything like this should be re-rolled to the codegen group (which I am in), not the compiler overall. The compiler group is very big and will often just toss around review on things a lot. And @x17jiri if you feel lost in the review process don't hesitate to reach out on the Zulip https://rust-lang.zulipchat.com/

r? saethlin

I can get to this in a few days max. If someone else wants to approve it before then, feel free.

Comment on lines +247 to +358
let cold_weight = unsafe { llvm::LLVMValueAsMetadata(self.cx.const_u32(1)) };
let hot_weight = unsafe { llvm::LLVMValueAsMetadata(self.cx.const_u32(2000)) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are 1 and 2000 derived from anything in particular? If these are the magic values clang uses or something like that, a comment would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values are used by llvm.expect for branches with 2 targets. I added a comment.

@saethlin
Copy link
Member

Just a few nits, and you have a merge conflict.

Then I'd like to see this go through a perf run before we merge; this isn't just making cold_path better, it also makes #[cold] more powerful, which ought to be an improvement to codegen but you better know. So we'll know at least before this lands by doing a perf run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good tests. Thank you.

@saethlin saethlin added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 13, 2025
@saethlin
Copy link
Member

@rustbot label: +perf-regression-triaged
Handful of low-significance opt regressions. Expected because this PR adds more LLVM attributes. The perf suite is a poor representation of the effectiveness of this PR because the compiler is built with PGO, which is likely to obscure improvements from branch weight information.

@bors r+

@bors
Copy link
Collaborator

bors commented Feb 13, 2025

📌 Commit 7968501 has been approved by saethlin

It is now in the queue for this repository.

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Feb 13, 2025
@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 13, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 13, 2025
improve cold_path()

rust-lang#120370 added a new instrinsic `cold_path()` and used it to fix `likely` and `unlikely`

However, in order to limit scope, the information about cold code paths is only used in 2-target switch instructions. This is sufficient for `likely` and `unlikely`, but limits usefulness of `cold_path` for idiomatic rust. For example, code like this:

```
if let Some(x) = y { ... }
```

may generate 3-target switch:

```
switch y.discriminator:
0 => true branch
1 = > false branch
_ => unreachable
```

and therefore marking a branch as cold will have no effect.

This PR improves `cold_path()` to work with arbitrary switch instructions.

Note that for 2-target switches, we can use `llvm.expect`, but for multiple targets we need to manually emit branch weights. I checked Clang and it also emits weights in this situation. The Clang's weight calculation is more complex that this PR, which I believe is mainly because `switch` in `C/C++` can have multiple cases going to the same target.
@bors
Copy link
Collaborator

bors commented Feb 13, 2025

⌛ Testing commit 7968501 with merge 7c77e1a...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Feb 14, 2025

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 14, 2025
@x17jiri
Copy link
Contributor Author

x17jiri commented Feb 14, 2025

It seems the test is failing because the metadata have different numbers on the apple-darwin arch. I think I can fix this using regex. Is there a way for me to run the test on another arch? (Linux specifically)

@saethlin
Copy link
Member

saethlin commented Feb 14, 2025

Nope. But if you push a fix that uses a regex and works on Linux, I'll adjust the PR description to run try-job on apple and a few other platforms: https://rustc-dev-guide.rust-lang.org/tests/ci.html#try-builds

@x17jiri
Copy link
Contributor Author

x17jiri commented Feb 17, 2025

@saethlin It should be fixed

@saethlin
Copy link
Member

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 17, 2025
improve cold_path()

rust-lang#120370 added a new instrinsic `cold_path()` and used it to fix `likely` and `unlikely`

However, in order to limit scope, the information about cold code paths is only used in 2-target switch instructions. This is sufficient for `likely` and `unlikely`, but limits usefulness of `cold_path` for idiomatic rust. For example, code like this:

```
if let Some(x) = y { ... }
```

may generate 3-target switch:

```
switch y.discriminator:
0 => true branch
1 = > false branch
_ => unreachable
```

and therefore marking a branch as cold will have no effect.

This PR improves `cold_path()` to work with arbitrary switch instructions.

Note that for 2-target switches, we can use `llvm.expect`, but for multiple targets we need to manually emit branch weights. I checked Clang and it also emits weights in this situation. The Clang's weight calculation is more complex that this PR, which I believe is mainly because `switch` in `C/C++` can have multiple cases going to the same target.

try-job: aarch64-apple
try-job: test-various
@bors
Copy link
Collaborator

bors commented Feb 17, 2025

⌛ Trying commit 7bb5f4d with merge e1f68a1...

@bors
Copy link
Collaborator

bors commented Feb 17, 2025

☀️ Try build successful - checks-actions
Build commit: e1f68a1 (e1f68a1662ea16c66f5e4b710c83b29008886fce)

@saethlin
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Feb 17, 2025

📌 Commit 7bb5f4d has been approved by saethlin

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 17, 2025
@bors
Copy link
Collaborator

bors commented Feb 18, 2025

⌛ Testing commit 7bb5f4d with merge 3b022d8...

@bors
Copy link
Collaborator

bors commented Feb 18, 2025

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing 3b022d8 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 18, 2025
@bors bors merged commit 3b022d8 into rust-lang:master Feb 18, 2025
7 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Feb 18, 2025
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3b022d8): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.4% [0.1%, 0.9%] 7
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-0.7%, -0.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.2% [-0.7%, 0.9%] 8

Max RSS (memory usage)

Results (primary -1.9%, secondary 0.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.9% [3.9%, 3.9%] 1
Improvements ✅
(primary)
-1.9% [-2.7%, -0.9%] 7
Improvements ✅
(secondary)
-3.6% [-3.6%, -3.6%] 1
All ❌✅ (primary) -1.9% [-2.7%, -0.9%] 7

Cycles

Results (primary 1.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [1.0%, 1.0%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [1.0%, 1.0%] 1

Binary size

Results (primary 0.0%, secondary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.1%] 8
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 38
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [-0.1%, 0.1%] 11

Bootstrap: 774.633s -> 775.584s (0.12%)
Artifact size: 362.36 MiB -> 360.27 MiB (-0.58%)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants