Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implicit linear solver fails with Intel compiler #239

Open
aprokop opened this issue Aug 4, 2018 · 7 comments
Open

Implicit linear solver fails with Intel compiler #239

aprokop opened this issue Aug 4, 2018 · 7 comments

Comments

@aprokop
Copy link
Collaborator

aprokop commented Aug 4, 2018

@sethrj @tjfulle

Nate from LANL discovered that. I can reproduce on condo with Intel 18. GCC is fine.

Backtrace:

(gdb) bt
Program received signal SIGSEGV, Segmentation fault.
fortpetra::c_f_pointer_fortpetraoperator (clswrap=..., fptr=0x2ae360058b4810) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/tpetra/src/fortpetra.f90:6443
6443      fptr => handle%data
#0  fortpetra::c_f_pointer_fortpetraoperator (clswrap=..., fptr=0x2ae360058b4810) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/tpetra/src/fortpetra.f90:6443
#1  0x00002aaaaef4ccfc in fortpetra::swigd_fortpetraoperator_getdomainmap (fresult=..., fself=...) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/tpetra/src/fortpetra.f90:6497
#2  0x00002aaaaf031714 in ForTpetraOperator::getDomainMap (this=0x2aaaaf2ea8a8 <fortpetra_mp_c_f_pointer_fortpetraoperator_$HANDLE.0.137>) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/tpetra/src/fortpetraFORTRAN_wrap.cxx:746
#3  0x00002aaaaaed2a52 in ForTrilinos::TrilinosSolver::setup_solver (this=0x2aaaaf2ea8a0 <fortpetra_mp_c_f_pointer_fortpetraoperator_$FSELF_PTR.0.137>, paramList=...) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/simple/src/solver_handle.cpp:62
#4  0x00002aaaaaec697b in _wrap_TrilinosSolver_setup_solver (farg1=0x2aaaaf2ea8a0 <fortpetra_mp_c_f_pointer_fortpetraoperator_$FSELF_PTR.0.137>, farg2=0x2aaaaf2ea8a8 <fortpetra_mp_c_f_pointer_fortpetraoperator_$HANDLE.0.137>) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/simple/src/fortrilinosFORTRAN_wrap.cxx:737
#5  0x00002aaaaaec5dc2 in fortrilinos::swigf_trilinossolver_setup_solver (self=0x2ae360058b4810, paramlist=0x2ae360058b4810) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/simple/src/fortrilinos.f90:331
#6  0x0000000000412a24 in main () at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/simple/test/test_simple_solver_handle.f90:317
#7  0x00000000004108de in main ()
@aprokop
Copy link
Collaborator Author

aprokop commented Aug 4, 2018

@sethrj Do you have a simple ioc example to test with Intel (outside of ForTrilinos)?

@sethrj
Copy link
Collaborator

sethrj commented Aug 6, 2018

Yes, if you look at Examples/fortran/director inside the "callback" branch, that should be what you need.

@mattbement
Copy link

mattbement commented Aug 30, 2018

Ok...in the director example, I try the following with the gcc/6.4.0

swig -fortran -c++ director.i
g++ -c director.cxx director_wrap.cxx
ar rvs director.a director.o director_wrap.o
gfortran -c director.f90
gfortran runme.f90 director.o director.a -lstdc++

This compiles fine, and when run produces the following output:

[sn-fey2] director - ./a.out
 test_subclass
 Transformed: 'whee'
 Transformed: [whee]
 test_transform
 Transformed: "whiskey", and "tango", and "foxtrot", and "sierra", and "juliet"
 Joined with commas: "whiskey", "tango", "foxtrot", "sierra", "juliet"
 test_actual
 Transformed: 'whiskey', and 'tango', and 'foxtrot', and 'sierra', and 'juliet'
 Joined with commas: 'whiskey', 'tango', 'foxtrot', 'sierra', 'juliet'
 Joined with default: 'whiskey', 'tango', 'foxtrot', 'sierra', 'juliet'
 Joined with commas: [whiskey], [tango], [foxtrot], [sierra], [juliet]
 Joined with default: [whiskey]><[tango]><[foxtrot]><[sierra]><[juliet]
 Transformed: "whiskey", and "tango", and "foxtrot", and "sierra", and "juliet"
 Transformed: !whiskey!, and !tango!, and !foxtrot!, and !sierra!, and !juliet!
 Joined with commas: !whiskey!, !tango!, !foxtrot!, !sierra!, !juliet!

I then blow away the .o, .mod, and .a files and try the following with intel/18.0.2

icpc -c director.cxx director_wrap.cxx
ar rvs director.a director.o director_wrap.o
ifort -c director.f90
ifort runme.f90 director.o director.a -lstdc++

I get the following error:

runme.f90(75): error #8212: Omitted field is not initialized. Field initialization missing:   [SWIGDATA]
  allocate(join, source=SingleJoiner())
^
compilation aborted for runme.f90 (code 1)

@mattbement
Copy link

mattbement commented Aug 30, 2018

So...putting in stuff like the following let's me get past the compile errors.

  type(SingleJoiner) :: sj
  type(BracketJoiner) :: bj
  ! NOTE: because we're not calling any C functions here, we don't actually
  ! have to call init_FortranJoiner
  write(*,*) "test_subclass"
  allocate(join, source=sj)

However, when I run the resulting executable, I get a segfault:

[sn-fey2] director - ./a.out
 test_subclass
 Transformed: 'whee'
 Transformed: [whee]
 test_transform
 Transformed: "whiskey", and "tango", and "foxtrot", and "sierra", and "juliet"
 Joined with commas: "whiskey", "tango", "foxtrot", "sierra", "juliet"
 test_actual
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
a.out              000000000041CF4D  Unknown               Unknown  Unknown
libpthread-2.17.s  00002AB3F1D645E0  Unknown               Unknown  Unknown
a.out              0000000000409A46  Unknown               Unknown  Unknown
a.out              00000000004100E8  Unknown               Unknown  Unknown
a.out              00000000004104BC  Unknown               Unknown  Unknown
a.out              000000000040B7AB  Unknown               Unknown  Unknown
a.out              0000000000407F7A  Unknown               Unknown  Unknown
a.out              0000000000404F25  Unknown               Unknown  Unknown
a.out              00000000004046B2  Unknown               Unknown  Unknown
a.out              0000000000403AEE  Unknown               Unknown  Unknown
libc-2.17.so       00002AB3F1F92C05  __libc_start_main     Unknown  Unknown
a.out              00000000004039E9  Unknown               Unknown  Unknown

Then, for completeness, I go back and build it all again with GCC to make sure I didn't biff something as I was editing runme.f90, and it runs just fine.

@mattbement

This comment has been minimized.

@mattbement
Copy link

mattbement commented Aug 31, 2018

Minimized the previous comment, as I think it's been overtaken by newer information. In a nutshell, I think there's an intel compiler bug, though I could benefit from another pair of eyes to confirm. If you look at what goes into the swigd_Joiner_transform call in FortranJoiner::transform (in director_wrap.cxx, see below). The arguments are (&self,&arg1)
callstack
and compare it to what actually arrives in swigd_Joiner_transform (in director.f90, arguments are farg1 and farg2), you see the following.
intel2

Note that the two receiving arguments are pointing at the second calling argument. The pointers are pointing to the same memory, and in the case of farg1, the value of farg1%mem has taken the value &arg1->size.

I just tried this in Intel 2019.beta and the problem is still there.

@sethrj
Copy link
Collaborator

sethrj commented Sep 1, 2018

Ugh. As a general rule of thumb in my experience, "seems like a compiler bug" usually means "I'm depending on undefined behavior being consistent"...

...but given that the gfortran compiler actually had an acknowledged bug there that we found and fixed, you could be right.

But looking again, are you sure that at the breakpoint you're using, the variables have been initialized? It looks like they both might be filled with bogus values to me.

I'll be back in the office on Tuesday; perhaps we could discuss then?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants