Change HirVec<P<T>> to HirVec<T> in hir:: Expr. #37642

nnethercote · 2016-11-07T23:56:23Z

This PR changes data structures like this:

[ ExprArray | 8 | P ]
                  |
                  v
                  [ P | P | P | P | P | P | P | P ]
                    |
                    v
                    [ ExprTup | 2 | P ]
                                    |
                                    v
                                    [ P | P ]
                                      |
                                      v
                                      [ Expr ]

to this:

[ ExprArray | 8 | P ]
                  |
                  v
                  [ [ ExprTup | 2 | P ] | ... ]
                                    |
                                    v
                                    [ Expr | Expr ]

I thought this would be a win for #36799, and on a cut-down version of that workload this reduces the peak heap size (as measured by Massif) from 885 MiB to 875 MiB. However, the peak RSS as measured by -Ztime-passes and by /usr/bin/time increases by about 30 MiB.

I'm not sure why. Just look at the picture above -- the second data structure clearly takes up less space than the first. My best idea relates to unused elements in the slices. HirVec<Expr> is a typedef for P<[Expr]>. If there were any unused excess elements then I could see that memory usage would increase, because those excess elements are larger in HirVec<Expr> than in HirVec<P<Expr>>. But AIUI there are no such excess elements, and Massif's measurements corroborate that.

However, the two main creation points for these data structures are these lines from lower_expr:

        ExprKind::Vec(ref exprs) => {
            hir::ExprArray(exprs.iter().map(|x| self.lower_expr(x)).collect())
        }
        ExprKind::Tup(ref elts) => {
            hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
        }

I suspect what is happening is that temporary excess elements are created within the collect calls. The workload from #36799 has many 2-tuples and 3-tuples and when Vec gets doubled it goes from a capacity of 1 to 4, which would lead to excess elements. Though, having said that, Vec::with_capacity doesn't create excess AFAICT. So I'm not sure. What goes on inside collect is complex.

Anyway, in its current form this PR is a memory consumption regression and so not worth landing but I figured I'd post it in case anyone has additional insight.

rust-highfive · 2016-11-07T23:56:36Z

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

michaelwoerister · 2016-11-08T15:26:23Z

Interesting. Thanks for posting, @nnethercote.

bluss · 2016-11-08T19:24:06Z

Vec's FromIterator

Uses only a single allocation up front, if the iterator reports a lower bound in Iterator::size_hint that fits at least all the elements in the iterator
Uses a more efficient loop with no capacity check in the body, if the iterator is a TrustedLen iterator.

Since both of those lines use the slice iterator and then just .map() they should already be "perfect" (which never should be taken for granted, I guess).

bors · 2016-11-10T05:26:50Z

☔ The latest upstream changes (presumably #37678) made this pull request unmergeable. Please resolve the merge conflicts.

eddyb · 2016-11-10T11:25:51Z

IMO there isn't a lot we can do that's not a compromise (due to heap pressure dynamics) without moving to arena/SoA models for storing the HIR in the first place (replacing P, at least at that level).

michaelwoerister · 2016-11-10T15:59:07Z

What's "SoA"?

eddyb · 2016-11-10T16:27:04Z

"Struct of Arrays", where each type has its own vector for all instances of that type, in a central struct, although it's usually used for fields, not nodes.
We do something similar with our side-tables, this would just be flatter. I'm still a bit torn between this and huge arenas, but the latter are somewhat more ergonomic.

nnethercote · 2016-11-18T03:39:38Z

I just rebased and remeasured and this change is now a small win for a cut-down version of the workload from #36799, reducing peak RSS from 866 MiB to 848 MiB. I'm not sure what changed, other than #37764 landing, though there shouldn't really be any interaction between that PR and this one.

r? @michaelwoerister

nnethercote · 2016-11-18T05:29:22Z

Some heap allocation numbers from DHAT.

old

tot_alloc:    5,980,761,512 in 27,087,211 blocks
max_live:     838,850,202 in 3,978,848 blocks

new

tot_alloc:    5,971,161,300 in 25,887,185 blocks
max_live:     829,249,992 in 2,778,822 blocks

This PR also speeds up compilation of that workload by about 4%. It also speeds up a couple of the rustc-benchmarks by 1--2%.

bors · 2016-11-21T17:27:23Z

☔ The latest upstream changes (presumably #37824) made this pull request unmergeable. Please resolve the merge conflicts.

michaelwoerister · 2016-11-21T17:50:52Z

Looks good to me. Happy to approve it after a rebase.

eddyb · 2016-11-21T20:33:01Z

If it's measurably better I have no real concerns - I imagine it'd eventually be &[&Expr] vs &[Expr], unless we end up with some sort of ID linked list to help with incremental recompilation or something.

This changes structures like this: ``` [ ExprArray | 8 | P ] | v [ P | P | P | P | P | P | P | P ] | v [ ExprTup | 2 | P ] | v [ P | P ] | v [ Expr ] ``` to this: ``` [ ExprArray | 8 | P ] | v [ [ ExprTup | 2 | P ] | ... ] | v [ Expr | Expr ] ```

nnethercote · 2016-11-21T22:37:20Z

I rebased. r? @michaelwoerister

michaelwoerister · 2016-11-21T22:44:02Z

@bors r+

bors · 2016-11-21T22:44:06Z

📌 Commit 382d3b0 has been approved by michaelwoerister

bors · 2016-11-21T23:59:10Z

⌛ Testing commit 382d3b0 with merge 82d8833...

Change HirVec<P<T>> to HirVec<T> in hir:: Expr. This PR changes data structures like this: ``` [ ExprArray | 8 | P ] | v [ P | P | P | P | P | P | P | P ] | v [ ExprTup | 2 | P ] | v [ P | P ] | v [ Expr ] ``` to this: ``` [ ExprArray | 8 | P ] | v [ [ ExprTup | 2 | P ] | ... ] | v [ Expr | Expr ] ``` I thought this would be a win for #36799, and on a cut-down version of that workload this reduces the peak heap size (as measured by Massif) from 885 MiB to 875 MiB. However, the peak RSS as measured by `-Ztime-passes` and by `/usr/bin/time` increases by about 30 MiB. I'm not sure why. Just look at the picture above -- the second data structure clearly takes up less space than the first. My best idea relates to unused elements in the slices. `HirVec<Expr>` is a typedef for `P<[Expr]>`. If there were any unused excess elements then I could see that memory usage would increase, because those excess elements are larger in `HirVec<Expr>` than in `HirVec<P<Expr>>`. But AIUI there are no such excess elements, and Massif's measurements corroborate that. However, the two main creation points for these data structures are these lines from `lower_expr`: ```rust ExprKind::Vec(ref exprs) => { hir::ExprArray(exprs.iter().map(|x| self.lower_expr(x)).collect()) } ExprKind::Tup(ref elts) => { hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect()) } ``` I suspect what is happening is that temporary excess elements are created within the `collect` calls. The workload from #36799 has many 2-tuples and 3-tuples and when `Vec` gets doubled it goes from a capacity of 1 to 4, which would lead to excess elements. Though, having said that, `Vec::with_capacity` doesn't create excess AFAICT. So I'm not sure. What goes on inside `collect` is complex. Anyway, in its current form this PR is a memory consumption regression and so not worth landing but I figured I'd post it in case anyone has additional insight.

bors · 2016-11-22T03:15:44Z

rust-highfive assigned pnkfelix Nov 7, 2016

alexcrichton added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Nov 10, 2016

brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 11, 2016

nnethercote force-pushed the no-HirVec-of-P branch from 9346889 to 532867a Compare November 18, 2016 03:36

rust-highfive assigned michaelwoerister and unassigned pnkfelix Nov 18, 2016

nnethercote force-pushed the no-HirVec-of-P branch from 532867a to 382d3b0 Compare November 21, 2016 22:36

bors merged commit 382d3b0 into rust-lang:master Nov 22, 2016

This was referenced Nov 22, 2016

exclusive range patterns #35712

Merged

Separate function bodies from their signatures in HIR #37918

Merged

Implement the loop_break_value feature. #37487

Merged

nnethercote deleted the no-HirVec-of-P branch November 22, 2016 04:16

Change HirVec<P<T>> to HirVec<T> in hir:: Expr. #37642

Change HirVec<P<T>> to HirVec<T> in hir:: Expr. #37642

Uh oh!

Conversation

nnethercote commented Nov 7, 2016

Uh oh!

rust-highfive commented Nov 7, 2016

Uh oh!

michaelwoerister commented Nov 8, 2016

Uh oh!

bluss commented Nov 8, 2016

Uh oh!

bors commented Nov 10, 2016

Uh oh!

eddyb commented Nov 10, 2016

Uh oh!

michaelwoerister commented Nov 10, 2016

Uh oh!

eddyb commented Nov 10, 2016

Uh oh!

nnethercote commented Nov 18, 2016

Uh oh!

nnethercote commented Nov 18, 2016

Uh oh!

bors commented Nov 21, 2016

Uh oh!

michaelwoerister commented Nov 21, 2016

Uh oh!

eddyb commented Nov 21, 2016

Uh oh!

nnethercote commented Nov 21, 2016

Uh oh!

michaelwoerister commented Nov 21, 2016

Uh oh!

bors commented Nov 21, 2016

Uh oh!

bors commented Nov 21, 2016

Uh oh!

bors commented Nov 22, 2016

Uh oh!

Uh oh!