Skip to content

[nll] fewer allocations in liveness, use dirty list #51819

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
nikomatsakis opened this issue Jun 26, 2018 · 11 comments
Closed

[nll] fewer allocations in liveness, use dirty list #51819

nikomatsakis opened this issue Jun 26, 2018 · 11 comments
Labels
NLL-performant Working towards the "performance is good" goal T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

@nnethercote reports that the liveness computations do a lot of allocations. Indeed, the liveness results are stored in a Vec<IdxSetBuf>:

pub ins: IndexVec<BasicBlock, LocalSet>,

pub type LocalSet = IdxSetBuf<Local>;

This means that we will allocate a separate bitset for the ins (and outs!) of each basic block. This is rather inefficient. The other dataflow implementations use a different setup. They have just one big index set which has the bits for every basic block in one allocation. They also avoid allocating both an ins and outs -- since liveness is a reverse analysis, they would basically have only the single outs vector (what is live on exit).

The ins vector is only used during generation of the liveness results here (as well as some assertions and debug printouts later on):

// outs[b] = ∪ {ins of successors}
bits.clear();
for &successor in mir.basic_blocks()[b].terminator().successors() {
bits.union(&ins[successor]);
}
outs[b].clone_from(&bits);

In fact, you don't really need it there -- instead, when you process a block X, you would compute take outs[X], subtract the kills and add the defs, and then take that resulting bit set and propagate it to each predecessor of X.

@nikomatsakis nikomatsakis added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-nll NLL-performant Working towards the "performance is good" goal labels Jun 26, 2018
@nikomatsakis nikomatsakis changed the title [nll] fewer allocations in liveness [nll] fewer allocations in liveness, use dirty list Jun 26, 2018
@nikomatsakis
Copy link
Contributor Author

In addition, while we are modifying liveness, we can probably adjust the main computation to be less naive:

while changed {
changed = false;
for b in mir.basic_blocks().indices().rev() {
// outs[b] = ∪ {ins of successors}
bits.clear();
for &successor in mir.basic_blocks()[b].terminator().successors() {
bits.union(&ins[successor]);
}
outs[b].clone_from(&bits);
// bits = use ∪ (bits - def)
def_use[b].apply(&mut bits);
// update bits on entry and flag if they have changed
if ins[b] != bits {
ins[b].clone_from(&bits);
changed = true;
}
}
}

Here you can see we repeatedly iterate over all the basic blocks and propagate them. We should avoid propagating bits from basic blocks whose "out set" did not change in the previous iteration.

@nnethercote
Copy link
Contributor

nnethercote commented Jun 27, 2018

@nikomatsakis: are you able to make these changes?

@nikomatsakis
Copy link
Contributor Author

@nnethercote I haven't been able to get to it yet; if you want to take a stab at it, please feel free!

@nnethercote
Copy link
Contributor

Ok, I've started looking at this. I see various possibilities, most of which don't overlap with the things you've identified above :)

The other dataflow implementations use a different setup.

I think I see why this code does it this way -- it has various uses of solo bitsets, and it makes the code simpler if the per-BB bitsets can have the same form.

The clone_from calls are silly -- each one is replacing a heap-allocated bitset of length N with a clone of another heap-allocated bitset of length N. It should be possible to do an in-place overwrite.

you would compute take outs[X]

I don't understand this sentence... is there a typo, or does "take" have a meaning in this context that I'm not familiar with?

I can also see how to avoid some more allocations in defs_uses().

Also, in the common case liveness_of_locals() is called twice in a row, with a different mode each time (regular and drop). I wonder if the two calls can be combined somehow.

@nnethercote
Copy link
Contributor

#51869 and #51870 implement some of these ideas, getting rid of a chunk of the liveness allocations. The speed-ups were smaller than I'd hoped for, though. I'll do some more profiling tomorrow.

@nikomatsakis
Copy link
Contributor Author

OK, @nnethercote I left a few comments on those PRs. Good catch in both cases. (Indeed I may have been wrong about which allocations are most important! =)

I still think it'd be good to make the other changes I described, in particular perhaps the dirty list (which isn't necessarily about allocations -- I hope we're not allocating per block in that inner loop!), so let's not close the bug until we're fully satisfied.

@nikomatsakis
Copy link
Contributor Author

@nnethercote I'm going to take a stab at introducing a dirty list into the propagation, unless you are doing so already. It seems orthogonal enough from your existing PRs. Would that step on your toes at all?

@nikomatsakis
Copy link
Contributor Author

@nikomatsakis
Copy link
Contributor Author

Branch now contains (a) a dirty list and (b) it removes the ins sets. I am going to measure the performance effects of those two changes separately.

@nnethercote
Copy link
Contributor

Looks good, #51869 and #51870 are all I have done on this stuff.

@nikomatsakis
Copy link
Contributor Author

Now that #51896 etc has landed I'm going to close this as we don't have immediately actionable stuff in here.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
NLL-performant Working towards the "performance is good" goal T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

2 participants