Skip to content

Remove fewer Storage calls in CopyProp and GVN #142531

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ohadravid
Copy link
Contributor

@ohadravid ohadravid commented Jun 15, 2025

Modify the CopyProp and GVN MIR optimization passes to remove fewer Storage{Live,Dead} calls, allowing for better optimizations by LLVM - see #141649.

Details

The idea is to use a new MaybeUninitializedLocals analysis and remove only the storage calls of locals that are maybe-uninit when accessed in a new location.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 15, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jun 15, 2025

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@matthiaskrgr
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 15, 2025
bors added a commit that referenced this pull request Jun 15, 2025
…try>

Remove fewer Storage calls in `copy_prop`

Modify the `copy_prop` MIR optimization pass to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see #141649.

### Details

This is my attempt to fix the mentioned issue (this is the first part, I also implemented a similar solution for GVN in [this branch](https://github.com/rust-lang/rust/compare/master...ohadravid:rust:better-storage-calls-gvn-v2?expand=1)).

The idea is to use the `MaybeStorageDead` analysis and remove only the storage calls of `head`s that are maybe-storage-dead when the associated `local` is accessed (or, conversely, keep the storage of `head`s that are for-sure alive in _every_ relevant access).

When combined with the GVN change, the final example in the issue (#141649 (comment)) is optimized as expected by LLVM. I also measured the effect on a few functions in `rav1d` (where I originally saw the issue) and observed reduced stack usage in several of them.

This is my first attempt at working with MIR optimizations, so it's possible this isn't the right approach — but all tests pass, and the resulting diffs appear correct.

r? tmiasko

since he commented on the issue and pointed to these passes.
@bors
Copy link
Collaborator

bors commented Jun 15, 2025

⌛ Trying commit d24d035 with merge ef7d206...

@bors
Copy link
Collaborator

bors commented Jun 15, 2025

☀️ Try build successful - checks-actions
Build commit: ef7d206 (ef7d20666974f0dac45b03e051f2e283f9d9f090)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ef7d206): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.4%] 8
Regressions ❌
(secondary)
0.3% [0.2%, 0.4%] 7
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.2%, 0.4%] 8

Max RSS (memory usage)

Results (primary 0.7%, secondary 3.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.5% [1.8%, 5.0%] 5
Regressions ❌
(secondary)
3.4% [3.4%, 3.4%] 1
Improvements ✅
(primary)
-3.9% [-6.5%, -2.0%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.7% [-6.5%, 5.0%] 8

Cycles

Results (primary -0.6%, secondary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.8% [3.8%, 3.8%] 1
Improvements ✅
(primary)
-0.6% [-0.6%, -0.6%] 1
Improvements ✅
(secondary)
-4.1% [-4.1%, -4.1%] 1
All ❌✅ (primary) -0.6% [-0.6%, -0.6%] 1

Binary size

Results (primary 0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.8%] 10
Regressions ❌
(secondary)
0.1% [0.0%, 0.1%] 5
Improvements ✅
(primary)
-0.2% [-0.8%, -0.0%] 8
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) 0.0% [-0.8%, 0.8%] 18

Bootstrap: 757.399s -> 756.065s (-0.18%)
Artifact size: 372.20 MiB -> 372.12 MiB (-0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 15, 2025
@ohadravid
Copy link
Contributor Author

@matthiaskrgr - I updated the impl to stop re-checking once a head is found to be maybe-dead, which should be a bit better

@matthiaskrgr
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 15, 2025
@bors
Copy link
Collaborator

bors commented Jun 15, 2025

⌛ Trying commit 905e968 with merge c0a2949...

bors added a commit that referenced this pull request Jun 15, 2025
…try>

Remove fewer Storage calls in `copy_prop`

Modify the `copy_prop` MIR optimization pass to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see #141649.

### Details

This is my attempt to fix the mentioned issue (this is the first part, I also implemented a similar solution for GVN in [this branch](https://github.com/rust-lang/rust/compare/master...ohadravid:rust:better-storage-calls-gvn-v2?expand=1)).

The idea is to use the `MaybeStorageDead` analysis and remove only the storage calls of `head`s that are maybe-storage-dead when the associated `local` is accessed (or, conversely, keep the storage of `head`s that are for-sure alive in _every_ relevant access).

When combined with the GVN change, the final example in the issue (#141649 (comment)) is optimized as expected by LLVM. I also measured the effect on a few functions in `rav1d` (where I originally saw the issue) and observed reduced stack usage in several of them.

This is my first attempt at working with MIR optimizations, so it's possible this isn't the right approach — but all tests pass, and the resulting diffs appear correct.

r? tmiasko

since he commented on the issue and pointed to these passes.
@cjgillot
Copy link
Contributor

Should this check happen in Replacer::visit_local, and move the replacement of storage statements to a dedicated cleanup visitor?

@bors
Copy link
Collaborator

bors commented Jun 15, 2025

☀️ Try build successful - checks-actions
Build commit: c0a2949 (c0a294957df10fc3880e1677c72c0cf122485509)

@rust-timer

This comment has been minimized.

@ohadravid
Copy link
Contributor Author

Should this check happen in Replacer::visit_local

I'm not sure how to make this work: using ResultsCursor requires a &body, but it's not possible to have that while running a MutVisitor since it requires a &mut body.

Is there a different way to do this?

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c0a2949): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.4%] 9
Regressions ❌
(secondary)
0.3% [0.2%, 0.4%] 7
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) 0.3% [0.2%, 0.4%] 9

Max RSS (memory usage)

Results (primary -0.1%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.2% [3.4%, 5.8%] 4
Regressions ❌
(secondary)
3.1% [3.1%, 3.1%] 1
Improvements ✅
(primary)
-4.4% [-6.6%, -1.8%] 4
Improvements ✅
(secondary)
-5.8% [-5.8%, -5.8%] 1
All ❌✅ (primary) -0.1% [-6.6%, 5.8%] 8

Cycles

Results (secondary -1.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.3% [2.3%, 2.3%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.6% [-2.6%, -2.5%] 2
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.8%] 10
Regressions ❌
(secondary)
0.1% [0.0%, 0.1%] 5
Improvements ✅
(primary)
-0.2% [-0.8%, -0.0%] 8
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) -0.0% [-0.8%, 0.8%] 18

Bootstrap: 756.494s -> 757.685s (0.16%)
Artifact size: 372.15 MiB -> 372.11 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 15, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ohadravid ohadravid force-pushed the better-storage-calls-copy-prop branch from 2c919c0 to 48b0529 Compare June 16, 2025 17:09
@rust-log-analyzer

This comment has been minimized.

@ohadravid ohadravid force-pushed the better-storage-calls-copy-prop branch from f282ae6 to dcb58d1 Compare June 16, 2025 20:20
@rust-log-analyzer

This comment has been minimized.

@ohadravid ohadravid force-pushed the better-storage-calls-copy-prop branch from dcb58d1 to ad0ab67 Compare June 18, 2025 06:57
@rust-log-analyzer

This comment has been minimized.

@ohadravid ohadravid force-pushed the better-storage-calls-copy-prop branch from ad0ab67 to aa11a50 Compare June 18, 2025 07:35
@rust-log-analyzer

This comment has been minimized.

@ohadravid ohadravid force-pushed the better-storage-calls-copy-prop branch 2 times, most recently from 365edc7 to 9beb9b6 Compare June 21, 2025 09:05
@ohadravid ohadravid requested a review from tmiasko June 21, 2025 09:08
rust-bors bot added a commit that referenced this pull request Jun 21, 2025
Remove fewer Storage calls in GVN

Followup to #142531 (Remove fewer Storage calls in `copy_prop`)

Modify the GVN MIR optimization pass to remove fewer Storage{Live,Dead} calls, allowing for better optimizations by LLVM - see #141649.

After replacing locals with values, use the `MaybeStorageDead` analysis to check that the replaced locals are storage-live.

**A slight problem**: In #142531, `@tmiasko` noted #142531 (comment) that `MaybeStorageDead` isn't enough since there can be a `Live(_1); Dead(_1); Live(_1);` block which forces the optimization to check that each value is initialised (and not only storage-live).

This is easy enough in `copy_prop` (because we are checking _before_ the replacement), but in GVN it is actually hard to tell for each statement if the local must be initialized or not after the fact (and modifying `VnState` seems even harder).

I opted for something else which might be wrong (implemented in the last two commits):
If we consider `Dead->Live` to be the same as `Deinit`, than such a local shouldn't be considered SSA - so I updated `SsaVisitor` to mark such cases as non-SSA.

r? tmiasko
@rust-log-analyzer

This comment has been minimized.

@ohadravid ohadravid force-pushed the better-storage-calls-copy-prop branch from 9beb9b6 to 6078a69 Compare June 21, 2025 11:23
@rust-log-analyzer

This comment has been minimized.

@cjgillot cjgillot self-assigned this Jun 21, 2025
Comment on lines 59 to 62
// To keep the storage of a head, we require that none of the locals in it's copy class are borrowed,
// since otherwise we cannot easily identify when it is used.
let mut storage_to_remove = ssa.borrowed_locals().clone();
storage_to_remove.intersect(&head_storage_to_check);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SsaLocals::borrowed_locals()[local] describes whether local is borrowed, not whether any local in its copy class is borrowed. An example that doesn't work as it supposed to:

#![feature(custom_mir, core_intrinsics, freeze)]
extern crate core;
use core::intrinsics::mir::*;
use core::marker::Freeze;

#[custom_mir(dialect = "runtime")]
pub fn f<T: Copy + Freeze>(_1: (T, T)) -> T {
    mir! {
        let _2: T;
        let _3: T;
        let _4: &T;
        {
            StorageLive(_2);
            _2 = _1.0;
            _3 = _2;
            _4 = &_3;
            StorageDead(_2);
            RET = *_4;
            Return()
        }
    }
}
$ rustc +stage1 b.rs --crate-type=lib -Zmir-opt-level=0 -Zmir-enable-passes=+CopyProp -Zunpretty=mir -Copt-level=1
fn f(_1: (T, T)) -> T {
    let mut _0: T;
    let mut _2: T;
    let mut _3: T;
    let mut _4: &T;

    bb0: {
        StorageLive(_2);
        _2 = copy (_1.0: T);
        _4 = &_2;
        StorageDead(_2);
        _0 = copy (*_4);
        return;
    }
}

I think it should be fine to allow the head itself to be borrowed (it is only all other locals from the copy class that cannot be borrowed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I read the compute_copy_classes comment but assumed it referred to ssa.borrowed_locals(), not the one it computes internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this test and pushed the correct fix.
I do get reordered output for some tests (like tests/mir-opt/pre-codegen/derived_ord.rs) and I'm not sure why, but the fixed impl now produces the correct output for this test.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@cjgillot
Copy link
Contributor

@ohadravid do you mind merging this PR and #142819? Both should use the same code to decide whether to keep or remove storage statements. And I fear that having 2 PRs mean that @tmiasko and I won't see each other ideas and give you diverging advice.

@ohadravid ohadravid force-pushed the better-storage-calls-copy-prop branch from fdcc8a6 to 26fc160 Compare June 22, 2025 16:30
@ohadravid
Copy link
Contributor Author

ohadravid commented Jun 22, 2025

@cjgillot , @tmiasko - merged both PR here.

Current impls are based on the new MaybeUninitializedLocals analysis in both passes, with all the new tests cases passing.

Does GVN require an additional check against borrowed locals like mentioned in #142531 (comment)?

Both only do the more complex analysis when tcx.sess.emit_lifetime_markers(), so they shouldn't negatively affect check/debug builds, but the last perf run did show some changes to them as well.

And thank you both for reviewing these and explaining everything! 🙏

@ohadravid ohadravid changed the title Remove fewer Storage calls in copy_prop Remove fewer Storage calls in CopyProp and GVN Jun 22, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants