Skip to content

[nll] why is inflate so dang slow? #51377

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
nikomatsakis opened this issue Jun 5, 2018 · 10 comments
Closed

[nll] why is inflate so dang slow? #51377

nikomatsakis opened this issue Jun 5, 2018 · 10 comments
Labels
A-NLL Area: Non-lexical lifetimes (NLL) C-enhancement Category: An issue proposing an enhancement or a PR with one. NLL-performant Working towards the "performance is good" goal T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

Looking at http://perf.rust-lang.org/nll-dashboard.html, the inflate test is way out of line with the others — what is making it so slow? We need to do some profiles.

I wrote up a guide to profiling that talks about how I do profiling.

@nikomatsakis nikomatsakis added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-nll NLL-performant Working towards the "performance is good" goal labels Jun 5, 2018
@Mark-Simulacrum
Copy link
Member

Note that when doing profiling you should be using the in-tree inflate, not the crate off of crates.io, as their code is quite different.

@spastorino spastorino self-assigned this Jun 5, 2018
@spastorino
Copy link
Member

[santiago@archlinux inflate (master)]$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher    : {do_mir_borrowck}
Matches    : 2290
Not Matches: 331
Percentage : 87%

Tree
| matched `{do_mir_borrowck}` (87% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (72% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (64% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::h5e644cf9693979bb (63% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (63% total, 59% self)
: : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve (6% total, 0% self)
: : : | rustc::util::common::time (6% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve_inner (6% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::compute_region_values (6% total, 0% self)
: : : : : : | <rustc_data_structures::bitvec::SparseBitMatrix<R, C>>::merge (6% total, 0% self)
: : : : : : : | <alloc::btree::map::BTreeMap<K, V>>::entry (5% total, 1% self)
: : : : : : : : | alloc::btree::search::search_tree (3% total, 0% self)
: : : : : : : : : | alloc::btree::search::search_tree (3% total, 3% self)
: | rustc::mir::visit::Visitor::visit_mir (5% total, 3% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (4% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)

@spastorino spastorino removed their assignment Jun 6, 2018
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jun 6, 2018
NLL performance boost

This makes compilation of the [inflate](https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks/inflate) benchmark compile 2 times faster on my computer when NLL is enabled.
This does not fix rust-lang#51377 perfectly, it's still 4-5 times slower than without NLL, but it's a start.
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jun 7, 2018
NLL performance boost

This makes compilation of the [inflate](https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks/inflate) benchmark compile 2 times faster on my computer when NLL is enabled.
This does not fix rust-lang#51377 perfectly, it's still 4-5 times slower than without NLL, but it's a start.
bors added a commit that referenced this issue Jun 7, 2018
NLL performance boost

This makes compilation of the [inflate](https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks/inflate) benchmark compile 2 times faster on my computer when NLL is enabled.
This does not fix the #51377 perfectly, it's still 4-5 times slower than without NLL, but it's a start.
kennytm added a commit to kennytm/rust that referenced this issue Jun 7, 2018
NLL performance boost

This makes compilation of the [inflate](https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks/inflate) benchmark compile 2 times faster on my computer when NLL is enabled.
This does not fix the rust-lang#51377 perfectly, it's still 4-5 times slower than without NLL, but it's a start.
@nnethercote
Copy link
Contributor

Speaking about the non-NLL case: inflate is really weird. I don't know why, but the profiles for it look entirely different to those for any of the other rustc-perf benchmarks. It spends an unusual amount of time in obligation_forest code. In particular, it makes a bazillion calls to shallow_resolve, which in turn makes a bazillion calls to get_root_key within ena. From Cachegrind, here is a key hotspot:

          .      fn process_obligation(&mut self,
          .                            pending_obligation: &mut Self::Obligation)
          .                            -> ProcessResult<Self::Obligation, Self::Error>
          .      {
          .          // if we were stalled on some unresolved variables, first check
          .          // whether any of them have been resolved; if not, don't bother
          .          // doing more work yet
 55,898,106          if !pending_obligation.stalled_on.is_empty() {
 59,971,481              if pending_obligation.stalled_on.iter().all(|&ty| {
179,914,443                  let resolved_ty = self.selcx.infcx().shallow_resolve(&ty);
          .                  resolved_ty == ty // nothing changed here
          .              }) {
          .                  debug!("process_predicate: pending obligation {:?} still stalled on {:?}",
          .                         self.selcx.infcx()
          .                             .resolve_type_vars_if_possible(&pending_obligation.obligation),
          .                         pending_obligation.stalled_on);
167,524,737                  return ProcessResult::Unchanged;
          .              }
     94,744              pending_obligation.stalled_on = vec![];
          .          }

#51411 improves some of that.

clap-rs is the benchmark most similar to inflate, but it still spends much less time in those parts of the code.

I don't know if those functions are used with NLL, but hopefully that's useful info.

@jkordish jkordish added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-NLL Area: Non-lexical lifetimes (NLL) labels Jun 13, 2018
@nikomatsakis
Copy link
Contributor Author

@nnethercote pretty interesting. As of now NLL time is 300% of normal time, which is still high though not as high as it was to start. I guess it is worth profiling this separately, particularly once #51411 lands...it seems like #51538 does not particularly help.

@eddyb
Copy link
Member

eddyb commented Jun 26, 2018

As usual, you really need to keep in mind that the inflate version that's in rustc-perf is ridiculously bloated, as it predates image-rs/inflate#35 (the PR looks small but try expanding the macros).

Could we please rename it to inflate-old-and-bloated (or a similar shorthand)?
Ideally we could apply such a rename retroactively to past data as well.

Also, I'd rather tell people to update their inflate, than block NLL. And we should ask @bvssvni to release a new 0.x.y version for every 0.x of inflate, with the de-bloating applied.

clap-rs is the benchmark most similar to inflate

Funny you should mention that (it's also a pre-debloat) rust-lang/rustc-perf#183 (comment)

As an aside, this sort of thing (one function with a lot of code in it) is why I started hacking on rustc many years ago. I was auto-generating a huge machine code decoder function and rustc was OOM-ing on it. IIRC, one of the culprits was borrowck, failing to allocate roughly 700TB for one Vec.

@bvssvni
Copy link

bvssvni commented Jun 26, 2018

@eddyb can you find somebody more familiar with inflate to make these releases?

@eddyb
Copy link
Member

eddyb commented Jun 26, 2018

@bvssvni Do you mean making the git commits, or also publishing? I don't know anyone other than you who can publish piston crates (I don't think I can, anyway, but I'm not sure how to check).
cc @RazrFalcon for the former, but I might do it myself if I find time for it.

@bvssvni
Copy link

bvssvni commented Jun 26, 2018

@eddyb all PistonCollaborators can publish. You're one of them.

@eddyb
Copy link
Member

eddyb commented Jun 28, 2018

@bvssvni Makes sense, is that automatic via GitHub? I'm not sure what the process is, and I can't find you on IRC. In the meanwhile, I realized that all past version bumps don't appear to be necessary and left this comment with a suggestion of just publishing the latest 0.4 patch version under all previous minor versions: image-rs/inflate#37 (comment).

@nikomatsakis
Copy link
Contributor Author

I'm going to close this issue, as inflate is now 278.32%. Slow but it's a tiny edge case test anyway and it's not 10x anymore.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-NLL Area: Non-lexical lifetimes (NLL) C-enhancement Category: An issue proposing an enhancement or a PR with one. NLL-performant Working towards the "performance is good" goal T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants