Skip to content

unicode_normalization benchmark from rustc-perf is slow #57718

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
nnethercote opened this issue Jan 18, 2019 · 3 comments
Closed

unicode_normalization benchmark from rustc-perf is slow #57718

nnethercote opened this issue Jan 18, 2019 · 3 comments
Labels
I-compiletime Issue: Problems and improvements with respect to compile times.

Comments

@nnethercote
Copy link
Contributor

The unicode_normalization benchmark was just added to rustc-perf: rust-lang/rustc-perf#328

Here is the high-level data from Cachegrind for a Clean-Check run.

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
45,053,275,971 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                      file:function
--------------------------------------------------------------------------------
3,743,278,359 ( 8.31%)  /home/njn/moz/rust0/src/librustc/infer/lexical_region_resolve/mod.rs:rustc::infer::lexical_region_resolve::LexicalResolver::expand_node
3,027,925,390 ( 6.72%)  /home/njn/moz/rust0/src/librustc/middle/region.rs:<rustc::ty::sty::RegionKind as core::cmp::PartialEq>::eq
2,217,916,340 ( 4.92%)  /home/njn/moz/rust0/src/librustc/ty/sty.rs:<rustc::ty::sty::RegionKind as core::cmp::PartialEq>::eq
2,165,108,405 ( 4.81%)  /home/njn/moz/rust0/src/librustc/ty/query/plumbing.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::try_get_with
1,872,071,197 ( 4.16%)  /home/njn/moz/rust0/src/librustc/infer/lexical_region_resolve/mod.rs:rustc::infer::lexical_region_resolve::LexicalResolver::infer_variable_values
1,782,556,352 ( 3.96%)  /home/njn/.cargo/registry/src/github.heygears.com-1ecc6299db9ec823/smallvec-0.6.7/lib.rs:rustc::infer::lexical_region_resolve::LexicalResolver::infer_variable_values
1,678,166,351 ( 3.72%)  /home/njn/moz/rust0/src/libstd/collections/hash/map.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::try_get_with
1,652,330,349 ( 3.67%)  /home/njn/moz/rust0/src/libstd/collections/hash/table.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::try_get_with
1,491,568,194 ( 3.31%)  /home/njn/moz/rust0/src/librustc_mir/hair/pattern/_match.rs:rustc_mir::hair::pattern::_match::IntRange::from_ctor
1,142,511,763 ( 2.54%)  /home/njn/moz/rust0/src/librustc/ty/layout.rs:<rustc::ty::layout::LayoutCx<'tcx, rustc::ty::context::TyCtxt<'a, 'tcx, 'tcx>> as rustc_target::abi::LayoutOf>::
layout_of
  983,408,925 ( 2.18%)  /home/njn/moz/rust0/src/libcore/num/mod.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::try_get_with
  925,902,672 ( 2.06%)  /home/njn/moz/rust0/src/libcore/option.rs:rustc::ty::sty::Const::assert_bits
  836,542,801 ( 1.86%)  /home/njn/moz/rust0/src/libcore/ptr.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::try_get_with
  738,053,639 ( 1.64%)  /home/njn/moz/rust0/src/librustc_mir/hair/pattern/_match.rs:rustc_mir::hair::pattern::_match::constructor_intersects_pattern
  698,283,902 ( 1.55%)  /home/njn/moz/rust0/src/librustc_mir/hair/pattern/_match.rs:rustc_mir::hair::pattern::_match::specialize
  623,416,500 ( 1.38%)  /home/njn/moz/rust0/src/libcore/cmp.rs:rustc::infer::lexical_region_resolve::LexicalResolver::expand_node
  605,680,300 ( 1.34%)  /home/njn/moz/rust0/<::rustc_data_structures::indexed_vec::newtype_index macros>:<rustc::ty::sty::RegionKind as core::cmp::PartialEq>::eq
  590,667,449 ( 1.31%)  /home/njn/moz/rust0/src/librustc/ty/context.rs:<&'a rustc::ty::TyS<'a> as rustc::ty::context::Lift<'tcx>>::lift_to_tcx
  588,790,903 ( 1.31%)  /home/njn/moz/rust0/src/librustc_mir/hair/pattern/_match.rs:rustc_mir::hair::pattern::_match::IntRange::from_pat
  566,270,290 ( 1.26%)  /home/njn/moz/rust0/src/librustc/ty/sty.rs:rustc::ty::sty::Const::assert_bits
  535,797,872 ( 1.19%)  /home/njn/moz/rust0/src/libcore/slice/mod.rs:rustc::infer::lexical_region_resolve::LexicalResolver::infer_variable_values
  531,048,306 ( 1.18%)  /home/njn/moz/rust0/src/librustc/ty/mod.rs:rustc::ty::ParamEnv::and
  504,837,733 ( 1.12%)  /home/njn/moz/rust0/src/librustc/ty/layout.rs:<rustc::ty::layout::LayoutCx<'tcx, rustc::ty::context::TyCtxt<'a, 'tcx, 'tcx>>>::record_layout_for_printing
  491,555,270 ( 1.09%)  /home/njn/moz/rust0/src/librustc/mir/interpret/value.rs:<rustc::mir::interpret::value::Scalar<Tag>>::to_bits
  478,260,738 ( 1.06%)  /home/njn/moz/rust0/src/librustc/ty/query/plumbing.rs:<rustc::ty::layout::LayoutCx<'tcx, rustc::ty::context::TyCtxt<'a, 'tcx, 'tcx>> as rustc_target::abi::Lay
outOf>::layout_of

The first three entries show that expand_node is super-hot:

fn expand_node(
&self,
a_region: Region<'tcx>,
b_vid: RegionVid,
b_data: &mut VarValue<'tcx>,
) -> bool {
debug!("expand_node({:?}, {:?} == {:?})", a_region, b_vid, b_data);
match *a_region {
// Check if this relationship is implied by a given.
ty::ReEarlyBound(_) | ty::ReFree(_) => if self.data.givens.contains(&(a_region, b_vid))
{
debug!("given");
return false;
},
_ => {}
}
match *b_data {
VarValue::Value(cur_region) => {
let mut lub = self.lub_concrete_regions(a_region, cur_region);
if lub == cur_region {
return false;
}
// Watch out for `'b: !1` relationships, where the
// universe of `'b` can't name the placeholder `!1`. In
// that case, we have to grow `'b` to be `'static` for the
// relationship to hold. This is obviously a kind of sub-optimal
// choice -- in the future, when we incorporate a knowledge
// of the parameter environment, we might be able to find a
// tighter bound than `'static`.
//
// (This might e.g. arise from being asked to prove `for<'a> { 'b: 'a }`.)
let b_universe = self.var_infos[b_vid].universe;
if let ty::RePlaceholder(p) = lub {
if b_universe.cannot_name(p.universe) {
lub = self.tcx().types.re_static;
}
}
debug!(
"Expanding value of {:?} from {:?} to {:?}",
b_vid, cur_region, lub
);
*b_data = VarValue::Value(lub);
return true;
}
VarValue::ErrorValue => {
return false;
}
}
}

Most of the calls take the return on line 241, i.e. the lub == cur_region comparison succeeds. That explains why <rustc::ty::sty::RegionKind as core::cmp::PartialEq> shows up as hot.

expand_node is called from a closure within expansion:

fn expansion(&self, var_values: &mut LexicalRegionResolutions<'tcx>) {
self.iterate_until_fixed_point("Expansion", |constraint, origin| {
debug!("expansion: constraint={:?} origin={:?}", constraint, origin);
match *constraint {
Constraint::RegSubVar(a_region, b_vid) => {
let b_data = var_values.value_mut(b_vid);
(self.expand_node(a_region, b_vid, b_data), false)
}
Constraint::VarSubVar(a_vid, b_vid) => match *var_values.value(a_vid) {
VarValue::ErrorValue => (false, false),
VarValue::Value(a_region) => {
let b_node = var_values.value_mut(b_vid);
let changed = self.expand_node(a_region, b_vid, b_node);
let retain = match *b_node {
VarValue::Value(ReStatic) | VarValue::ErrorValue => false,
_ => true
};
(changed, retain)
}
},
Constraint::RegSubReg(..) | Constraint::VarSubReg(..) => {
// These constraints are checked after expansion
// is done, in `collect_errors`.
(false, false)
}
}
})
}

That closure is passed to iterate_until_fixed_point:

fn iterate_until_fixed_point<F>(&self, tag: &str, mut body: F)
where
F: FnMut(&Constraint<'tcx>, &SubregionOrigin<'tcx>) -> (bool, bool),
{
let mut constraints: SmallVec<[_; 16]> = self.data.constraints.iter().collect();
let mut iteration = 0;
let mut changed = true;
while changed {
changed = false;
iteration += 1;
debug!("---- {} Iteration {}{}", "#", tag, iteration);
constraints.retain(|(constraint, origin)| {
let (edge_changed, retain) = body(constraint, origin);
if edge_changed {
debug!("Updated due to constraint {:?}", constraint);
changed = true;
}
retain
});
}
debug!("---- {} Complete after {} iteration(s)", tag, iteration);
}

Most of the calls to iterate_until_fixed_point have a tiny number of constraints and iterations. But for unicode_normalization there are two exceptional calls.

The first exceptional call has 30,902 constraints (20,598 VarSubVar ones followed by 10,304 RegSubVar ones), coming from this match:
https://github.com/nnethercote/rustc-perf/blob/47adadaf3541db9382d5fb3cba5ca7c75064219f/collector/benchmarks/unicode_normalization/src/tables.rs#L1788-L3853

It takes 2,064 iterations to process; by the end the constraints list is empty.

The second exceptional call has 55,172 constraints (36,778 VarSubVar ones followed by 18,394 RegSubVar ones), coming from this match:

https://github.com/nnethercote/rustc-perf/blob/47adadaf3541db9382d5fb3cba5ca7c75064219f/collector/benchmarks/unicode_normalization/src/tables.rs#L3855-L7538

It takes 3,682 iterations to process; again, by the end the constraints list is empty.

cc @rust-lang/wg-compiler-performance

@nnethercote
Copy link
Contributor Author

#57719 improves things by up to 4% by inlining expand_node. But that's just tweaking around the edges. An algorithmic improvement will be necessary to really improve things.

@Centril Centril added the I-compiletime Issue: Problems and improvements with respect to compile times. label Jan 18, 2019
@jens1o
Copy link
Contributor

jens1o commented Jan 26, 2019

ref #55528

@nnethercote
Copy link
Contributor Author

No point having this open as well as #55528.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times.
Projects
None yet
Development

No branches or pull requests

3 participants