Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ILAENV crash #283

Open
sethrj opened this issue Nov 25, 2020 · 0 comments
Open

ILAENV crash #283

sethrj opened this issue Nov 25, 2020 · 0 comments

Comments

@sethrj
Copy link
Collaborator

sethrj commented Nov 25, 2020

Not sure what's happening here, but after upgrading to catalina and GCC10 I'm getting crashes inside the Anasazi Davidson solver due to a lapack call:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7ffdefbf9eef)
  * frame #0: 0x0000000107273c82 libopenblas.0.dylib`ilaenv_ + 1106
    frame #1: 0x0000000103c11f06 libteuchosnumerics.13.dylib`Teuchos::LAPACK<int, double>::ILAENV(int const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int const&, int const&, int const&, int const&) const + 294
    frame #2: 0x000000010031663e libfortrilinos_hl.dylib`Anasazi::SVQBOrthoManager<double, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::findBasis(this=0x000000010c305bf0, X=0x000000010c308b30, MX=RCP<Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > @ 0x00007ffeefbfbf58, C=Array<Teuchos::RCP<Teuchos::SerialDenseMatrix<int, double> > > @ 0x00007ffeefbfbf40, B=RCP<Teuchos::SerialDenseMatrix<int, double> > @ 0x00007ffeefbfbf18, Q=Array<Teuchos::RCP<const Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > > @ 0x00007ffeefbfbf00, normalize_in=true) const at AnasaziSVQBOrthoManager.hpp:548:26

where the offending LAPACK call is:

int lwork = lapack.ILAENV(1,"hetrd","VU",xc,-1,-1,-1);

and xc=1.

I am unable to reproduce this in a standalone C++ test that calls the same lapack function with the same trilinos install :(

Environment

macOS Catalina with apple-clang@12.0.0 using gcc@10.2.0 for gfortran support:

-- darwin-catalina-x86_64 / apple-clang@12.0.0 ------------------
rbtgxam openblas@0.3.12~consistent_fpcsr~ilp64+pic+shared threads=none

Possible

I've hit a nearly identical bug on an Intel windows ICC/VS2015 system calling the Intel LAPACK implementation.

This may be the result of LAPACK/BLAS interfaces being defined by legacy Fortran ABIs, which are not specified and not natively compliant with C++. Trilinos attempts to the detect fortran's mangling scheme and ABI based on the current fortran compiler, which may not match the ABI used for the library. This for example is why by default Teuchos's sdot will crash with Apple's veclib: the VecLib-defined ABI uses a different fortran calling convention.

The best solution may be to try to use the cblas and lapacke. The VecLib framework, the OpenBLAS library, and the install the correct C interface headers for their routines so they could be used without having to rely on potentially fragile ABI detection.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant