Skip to content

Change HirVec<P<T>> to HirVec<T> in hir:: Expr. #37642

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Nov 22, 2016

Conversation

nnethercote
Copy link
Contributor

This PR changes data structures like this:

[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]

to this:

[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]

I thought this would be a win for #36799, and on a cut-down version of that workload this reduces the peak heap size (as measured by Massif) from 885 MiB to 875 MiB. However, the peak RSS as measured by -Ztime-passes and by /usr/bin/time increases by about 30 MiB.

I'm not sure why. Just look at the picture above -- the second data structure clearly takes up less space than the first. My best idea relates to unused elements in the slices. HirVec<Expr> is a typedef for P<[Expr]>. If there were any unused excess elements then I could see that memory usage would increase, because those excess elements are larger in HirVec<Expr> than in HirVec<P<Expr>>. But AIUI there are no such excess elements, and Massif's measurements corroborate that.

However, the two main creation points for these data structures are these lines from lower_expr:

        ExprKind::Vec(ref exprs) => {
            hir::ExprArray(exprs.iter().map(|x| self.lower_expr(x)).collect())
        }
        ExprKind::Tup(ref elts) => {
            hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
        }

I suspect what is happening is that temporary excess elements are created within the collect calls. The workload from #36799 has many 2-tuples and 3-tuples and when Vec gets doubled it goes from a capacity of 1 to 4, which would lead to excess elements. Though, having said that, Vec::with_capacity doesn't create excess AFAICT. So I'm not sure. What goes on inside collect is complex.

Anyway, in its current form this PR is a memory consumption regression and so not worth landing but I figured I'd post it in case anyone has additional insight.

@rust-highfive
Copy link
Contributor

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

@michaelwoerister
Copy link
Member

Interesting. Thanks for posting, @nnethercote.

@bluss
Copy link
Member

bluss commented Nov 8, 2016

Vec's FromIterator

  • Uses only a single allocation up front, if the iterator reports a lower bound in Iterator::size_hint that fits at least all the elements in the iterator
  • Uses a more efficient loop with no capacity check in the body, if the iterator is a TrustedLen iterator.

Since both of those lines use the slice iterator and then just .map() they should already be "perfect" (which never should be taken for granted, I guess).

@bors
Copy link
Collaborator

bors commented Nov 10, 2016

☔ The latest upstream changes (presumably #37678) made this pull request unmergeable. Please resolve the merge conflicts.

@eddyb
Copy link
Member

eddyb commented Nov 10, 2016

IMO there isn't a lot we can do that's not a compromise (due to heap pressure dynamics) without moving to arena/SoA models for storing the HIR in the first place (replacing P, at least at that level).

@michaelwoerister
Copy link
Member

What's "SoA"?

@eddyb
Copy link
Member

eddyb commented Nov 10, 2016

"Struct of Arrays", where each type has its own vector for all instances of that type, in a central struct, although it's usually used for fields, not nodes.
We do something similar with our side-tables, this would just be flatter. I'm still a bit torn between this and huge arenas, but the latter are somewhat more ergonomic.

@alexcrichton alexcrichton added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Nov 10, 2016
@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 11, 2016
@nnethercote
Copy link
Contributor Author

I just rebased and remeasured and this change is now a small win for a cut-down version of the workload from #36799, reducing peak RSS from 866 MiB to 848 MiB. I'm not sure what changed, other than #37764 landing, though there shouldn't really be any interaction between that PR and this one.

r? @michaelwoerister

@nnethercote
Copy link
Contributor Author

Some heap allocation numbers from DHAT.

old

tot_alloc:    5,980,761,512 in 27,087,211 blocks
max_live:     838,850,202 in 3,978,848 blocks

new

tot_alloc:    5,971,161,300 in 25,887,185 blocks
max_live:     829,249,992 in 2,778,822 blocks

This PR also speeds up compilation of that workload by about 4%. It also speeds up a couple of the rustc-benchmarks by 1--2%.

@bors
Copy link
Collaborator

bors commented Nov 21, 2016

☔ The latest upstream changes (presumably #37824) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister
Copy link
Member

Looks good to me. Happy to approve it after a rebase.

@eddyb
Copy link
Member

eddyb commented Nov 21, 2016

If it's measurably better I have no real concerns - I imagine it'd eventually be &[&Expr] vs &[Expr], unless we end up with some sort of ID linked list to help with incremental recompilation or something.

This changes structures like this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]
```
to this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]
```
@nnethercote
Copy link
Contributor Author

I rebased. r? @michaelwoerister

@michaelwoerister
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Nov 21, 2016

📌 Commit 382d3b0 has been approved by michaelwoerister

@bors
Copy link
Collaborator

bors commented Nov 21, 2016

⌛ Testing commit 382d3b0 with merge 82d8833...

bors added a commit that referenced this pull request Nov 21, 2016
Change HirVec<P<T>> to HirVec<T> in hir:: Expr.

This PR changes data structures like this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]
```
to this:
```
[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]
```
I thought this would be a win for #36799, and on a cut-down version of that workload this reduces the peak heap size (as measured by Massif) from 885 MiB to 875 MiB. However, the peak RSS as measured by `-Ztime-passes` and by `/usr/bin/time` increases by about 30 MiB.

I'm not sure why. Just look at the picture above -- the second data structure clearly takes up less space than the first. My best idea relates to unused elements in the slices. `HirVec<Expr>` is a typedef for `P<[Expr]>`. If there were any unused excess elements then I could see that memory usage would increase, because those excess elements are larger in `HirVec<Expr>` than in `HirVec<P<Expr>>`. But AIUI there are no such excess elements, and Massif's measurements corroborate that.

However, the two main creation points for these data structures are these lines from `lower_expr`:
```rust
        ExprKind::Vec(ref exprs) => {
            hir::ExprArray(exprs.iter().map(|x| self.lower_expr(x)).collect())
        }
        ExprKind::Tup(ref elts) => {
            hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
        }
```
I suspect what is happening is that temporary excess elements are created within the `collect` calls. The workload from #36799 has many 2-tuples and 3-tuples and when `Vec` gets doubled it goes from a capacity of 1 to 4, which would lead to excess elements. Though, having said that, `Vec::with_capacity` doesn't create excess AFAICT. So I'm not sure. What goes on inside `collect` is complex.

Anyway, in its current form this PR is a memory consumption regression and so not worth landing but I figured I'd post it in case anyone has additional insight.
@nnethercote nnethercote deleted the no-HirVec-of-P branch November 22, 2016 04:16
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants