-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fix performance issue with nanstd and nanvar #2308
base: master
Are you sure you want to change the base?
Fix performance issue with nanstd and nanvar #2308
Conversation
I think the failures are due to missing overloads of I have on going work on xsimd to push these functions, I'll try to push them quickly so that we can merge this PR. |
Thanks a lot @JohanMabille ! Actually, the unittest without SIMD failed too. I will investigate it in the meanwhile. |
bc16cdd
to
622647d
Compare
622647d
to
d01be62
Compare
return nanmean<result_type>(square(cast<result_type>(sc) - std::move(mrv)), std::forward<X>(axes), es); | ||
// note: otherwise the result is wrong with 'immediate' evaluation strategy | ||
auto sc_shifted = eval(cast<result_type>(sc) - std::move(mrv)); | ||
return nanmean<result_type>(square(sc_shifted), std::forward<X>(axes), es); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am completely lost and I summarize my findings here:
Test code:
xt::xarray<double> aN = {{ nanv, nanv, 123, 3 }, { 1, 2, nanv, 3 }, { 1, 1, nanv, 3 }};
xt::xarray<double> ret2 = xt::nanvar(aN, {1});
std::cout << "Lazy: " << ret2 << std::endl;
xt::xarray<double> ret = xt::nanvar(aN, {1}, xt::evaluation_strategy::immediate);
std::cout << "Immediate: " << ret << std::endl;
- First case:
auto sc_shifted = eval(cast<result_type>(sc) - std::move(mrv));
return nanmean<result_type>(square(sc_shifted), std::forward<X>(axes), es);
Result with gcc8
:
Lazy: { 2400. , 0.666667, 0.888889}
Immediate: { 3600. , 0.666667, 0.888889}
Result with Clang7
: segfault
je 0x4198a0 ; <xt::xreducer_stepper<xt::xreducer_functors<xt::detail::nan_plus, xt::const_value<double>, xt::detail::nan_plus>, xt::xshared_expression<xt::xfunction<xt::detail::lambda_adapt<xt::square_fct>, xt::xarray_container<xt::uvector<double, std::allocator<double> >, (xt::layout_type)1, xt::svector<unsigned long, 4ul, std::allocator<unsigned long>, true>, xt::xtensor_expression_tag> const&> >, std::array<unsigned long, 1ul>, xt::reducer_options<double, std::tuple<xt::evaluation_strategy::lazy_type> > >::aggregate_impl(unsigned long, std::integral_constant<bool, false>) const+784>
shl $0x3,%rcx
neg %rax
mov %rdi,%rsi
nopw 0x0(%rax,%rax,1)
vmovsd (%rsi),%xmm0
- Second case:
return nanmean<result_type>(square(cast<result_type>(sc) - std::move(mrv)), std::forward<X>(axes), es);
Result with gcc8
:
Lazy: { 3600. , 0.666667, 0.888889}
Immediate: { 7365.388889, 4.666667, 12. }
Result with clang7
:
Lazy: { 3600. , 0.666667, 0.888889}
Immediate: { 7365.388889, 12. , 363. }
@JohanMabille @tdegeus Can you give some hint? Thank you!
Checklist
Description