[student@gpu6 A_FORWAD_simple_cpml]$ ./run_this_example.sh running example: 2024年 12月 02日 星期一 11:05:14 CST setting up example... decomposing mesh... ********************** Serial mesh decomposer ********************** total number of nodes: nnodes = 319487 total number of spectral elements: nspec = 300980 materials: num_mat = 1 num_mat = 4 num_mat = 2 num_mat = 5 num_mat = 3 num_mat = 6 defined = 6 undefined = 0 no poroelastic material file found absorbing boundaries: nspec2D_xmin = 2816 nspec2D_xmax = 2816 nspec2D_ymin = 3816 nspec2D_ymax = 3816 nspec2D_bottom = 8655 nspec2D_top = 8655 nspec_cpml = 83910 no moho_surface_file file found Par_file_faults not found: assuming that there are no faults node valence: min = 1 max = 10 nsize = 10 sup_neighbor = 54 mesh2dual: max_neighbor = 32 partitions: num = 36 Databases files in directory: ./OUTPUT_FILES/DATABASES_MPI finished successfully running database generation on 36 processors... running solver on 36 processors... [gpu6:31681:0:31681] Caught signal 8 (Floating point exception: floating-point invalid operation) ==== backtrace (tid: 31681) ==== 0 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(ucs_handle_error+0x1ac) [0x7febe83c654c] 1 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22894) [0x7febe83c6894] 2 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22c13) [0x7febe83c6c13] 3 /lib64/libpthread.so.0(+0xf630) [0x7febf3032630] 4 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(+0xac2df) [0x7febf3d612df] 5 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_generic+0x8e4) [0x7febf3d50b94] 6 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_intra_binomial+0xe9) [0x7febf3d51299] 7 /public/software/mpi/openmpi/gnu/4.0.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_dec_fixed+0x18e) [0x7febe263e86e] 8 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(MPI_Reduce+0x62) [0x7febf3d322a2] 9 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi_mpifh.so.40(pmpi_reduce__+0x89) [0x7febf40262b9] 10 ./bin/xspecfem3D() [0x58ad01] 11 ./bin/xspecfem3D() [0x405ee0] 12 ./bin/xspecfem3D() [0x48df47] 13 ./bin/xspecfem3D() [0x4042ea] 14 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7febf2c77555] 15 ./bin/xspecfem3D() [0x40433b] ================================= Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: #0 0x7FEBF39AC6D7 #1 0x7FEBF39ACD1E #2 0x7FEBF303262F #3 0x7FEBF3D612DF #4 0x7FEBF3D50B93 #5 0x7FEBF3D51298 #6 0x7FEBE263E86D #7 0x7FEBF3D322A1 #8 0x7FEBF40262B8 #9 0x58AD00 in max_all_cr_ at parallel.f90:722 #10 0x405EDF in check_stability_ at check_stability.f90:107 (discriminator 1) #11 0x48DF46 in iterate_time_ at iterate_time.F90:194 #12 0x4042E9 in xspecfem3d at specfem3D.F90:387 #13 0x7FEBF2C77554 [gpu6:31703:0:31703] Caught signal 8 (Floating point exception: floating-point invalid operation) ==== backtrace (tid: 31703) ==== 0 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(ucs_handle_error+0x1ac) [0x7fb4303bc54c] 1 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22894) [0x7fb4303bc894] 2 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22c13) [0x7fb4303bcc13] 3 /lib64/libpthread.so.0(+0xf630) [0x7fb43b028630] 4 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(+0xac2df) [0x7fb43bd572df] 5 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_generic+0x589) [0x7fb43bd46839] 6 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_intra_binomial+0xe9) [0x7fb43bd47299] 7 /public/software/mpi/openmpi/gnu/4.0.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_dec_fixed+0x18e) [0x7fb42a63e86e] 8 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(MPI_Reduce+0x62) [0x7fb43bd282a2] 9 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi_mpifh.so.40(pmpi_reduce__+0x89) [0x7fb43c01c2b9] 10 ./bin/xspecfem3D() [0x58ad01] 11 ./bin/xspecfem3D() [0x405ee0] 12 ./bin/xspecfem3D() [0x48df47] 13 ./bin/xspecfem3D() [0x4042ea] 14 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fb43ac6d555] 15 ./bin/xspecfem3D() [0x40433b] ================================= Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: #0 0x7FB43B9A26D7 #1 0x7FB43B9A2D1E #2 0x7FB43B02862F #3 0x7FB43BD572DF #4 0x7FB43BD46838 #5 0x7FB43BD47298 #6 0x7FB42A63E86D #7 0x7FB43BD282A1 #8 0x7FB43C01C2B8 #9 0x58AD00 in max_all_cr_ at parallel.f90:722 #10 0x405EDF in check_stability_ at check_stability.f90:107 (discriminator 1) #11 0x48DF46 in iterate_time_ at iterate_time.F90:194 #12 0x4042E9 in xspecfem3d at specfem3D.F90:387 #13 0x7FB43AC6D554 [gpu6:31680:0:31680] Caught signal 8 (Floating point exception: floating-point invalid operation) ==== backtrace (tid: 31680) ==== 0 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(ucs_handle_error+0x1ac) [0x7f1c2d8ee54c] 1 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22894) [0x7f1c2d8ee894] 2 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22c13) [0x7f1c2d8eec13] 3 /lib64/libpthread.so.0(+0xf630) [0x7f1c3c60c630] 4 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(+0xac2df) [0x7f1c3d33b2df] 5 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_generic+0x589) [0x7f1c3d32a839] 6 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_intra_binomial+0xe9) [0x7f1c3d32b299] 7 /public/software/mpi/openmpi/gnu/4.0.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_dec_fixed+0x18e) [0x7f1c27bee86e] 8 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(MPI_Reduce+0x62) [0x7f1c3d30c2a2] 9 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi_mpifh.so.40(pmpi_reduce__+0x89) [0x7f1c3d6002b9] 10 ./bin/xspecfem3D() [0x58ad01] 11 ./bin/xspecfem3D() [0x405ee0] 12 ./bin/xspecfem3D() [0x48df47] 13 ./bin/xspecfem3D() [0x4042ea] 14 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f1c3c251555] 15 ./bin/xspecfem3D() [0x40433b] ================================= Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: #0 0x7F1C3CF866D7 #1 0x7F1C3CF86D1E #2 0x7F1C3C60C62F #3 0x7F1C3D33B2DF #4 0x7F1C3D32A838 #5 0x7F1C3D32B298 #6 0x7F1C27BEE86D #7 0x7F1C3D30C2A1 #8 0x7F1C3D6002B8 #9 0x58AD00 in max_all_cr_ at parallel.f90:722 #10 0x405EDF in check_stability_ at check_stability.f90:107 (discriminator 1) #11 0x48DF46 in iterate_time_ at iterate_time.F90:194 #12 0x4042E9 in xspecfem3d at specfem3D.F90:387 #13 0x7F1C3C251554 [gpu6:31677:0:31677] Caught signal 8 (Floating point exception: floating-point invalid operation) ==== backtrace (tid: 31677) ==== 0 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(ucs_handle_error+0x1ac) [0x7f398af1954c] 1 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22894) [0x7f398af19894] 2 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22c13) [0x7f398af19c13] 3 /lib64/libpthread.so.0(+0xf630) [0x7f3999bef630] 4 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(+0xac2df) [0x7f399a91e2df] 5 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_generic+0x8e4) [0x7f399a90db94] 6 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_intra_binomial+0xe9) [0x7f399a90e299] 7 /public/software/mpi/openmpi/gnu/4.0.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_dec_fixed+0x18e) [0x7f398938786e] 8 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(MPI_Reduce+0x62) [0x7f399a8ef2a2] 9 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi_mpifh.so.40(pmpi_reduce__+0x89) [0x7f399abe32b9] 10 ./bin/xspecfem3D() [0x58ad01] 11 ./bin/xspecfem3D() [0x405ee0] 12 ./bin/xspecfem3D() [0x48df47] 13 ./bin/xspecfem3D() [0x4042ea] 14 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f3999834555] 15 ./bin/xspecfem3D() [0x40433b] ================================= Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: #0 0x7F399A5696D7 #1 0x7F399A569D1E #2 0x7F3999BEF62F #3 0x7F399A91E2DF #4 0x7F399A90DB93 #5 0x7F399A90E298 #6 0x7F398938786D #7 0x7F399A8EF2A1 #8 0x7F399ABE32B8 #9 0x58AD00 in max_all_cr_ at parallel.f90:722 #10 0x405EDF in check_stability_ at check_stability.f90:107 (discriminator 1) #11 0x48DF46 in iterate_time_ at iterate_time.F90:194 #12 0x4042E9 in xspecfem3d at specfem3D.F90:387 #13 0x7F3999834554 [gpu6:31683:0:31683] Caught signal 8 (Floating point exception: floating-point invalid operation) ==== backtrace (tid: 31683) ==== 0 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(ucs_handle_error+0x1ac) [0x7f7f05cee54c] 1 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22894) [0x7f7f05cee894] 2 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22c13) [0x7f7f05ceec13] 3 /lib64/libpthread.so.0(+0xf630) [0x7f7f14a10630] 4 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(+0xac2df) [0x7f7f1573f2df] 5 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_generic+0x8e4) [0x7f7f1572eb94] 6 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_intra_binomial+0xe9) [0x7f7f1572f299] 7 /public/software/mpi/openmpi/gnu/4.0.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_dec_fixed+0x18e) [0x7f7f040fe86e] 8 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(MPI_Reduce+0x62) [0x7f7f157102a2] 9 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi_mpifh.so.40(pmpi_reduce__+0x89) [0x7f7f15a042b9] 10 ./bin/xspecfem3D() [0x58ad01] 11 ./bin/xspecfem3D() [0x405ee0] 12 ./bin/xspecfem3D() [0x48df47] 13 ./bin/xspecfem3D() [0x4042ea] 14 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f7f14655555] 15 ./bin/xspecfem3D() [0x40433b] ================================= Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: #0 0x7F7F1538A6D7 #1 0x7F7F1538AD1E #2 0x7F7F14A1062F #3 0x7F7F1573F2DF #4 0x7F7F1572EB93 #5 0x7F7F1572F298 #6 0x7F7F040FE86D #7 0x7F7F157102A1 #8 0x7F7F15A042B8 #9 0x58AD00 in max_all_cr_ at parallel.f90:722 #10 0x405EDF in check_stability_ at check_stability.f90:107 (discriminator 1) #11 0x48DF46 in iterate_time_ at iterate_time.F90:194 #12 0x4042E9 in xspecfem3d at specfem3D.F90:387 #13 0x7F7F14655554 [gpu6:31697:0:31697] Caught signal 8 (Floating point exception: floating-point invalid operation) ==== backtrace (tid: 31697) ==== 0 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(ucs_handle_error+0x1ac) [0x7f4727bb754c] 1 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22894) [0x7f4727bb7894] 2 /public/software/mpi/openmpi/gnu/4.0.3/external_libs/ucx/1.8.0/lib/libucs.so.0(+0x22c13) [0x7f4727bb7c13] 3 /lib64/libpthread.so.0(+0xf630) [0x7f47369b7630] 4 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(+0xac2df) [0x7f47376e62df] 5 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_generic+0x589) [0x7f47376d5839] 6 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(ompi_coll_base_reduce_intra_binomial+0xe9) [0x7f47376d6299] 7 /public/software/mpi/openmpi/gnu/4.0.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_reduce_intra_dec_fixed+0x18e) [0x7f4725fc786e] 8 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi.so.40(MPI_Reduce+0x62) [0x7f47376b72a2] 9 /public/software/mpi/openmpi/intel/4.0.3/lib/libmpi_mpifh.so.40(pmpi_reduce__+0x89) [0x7f47379ab2b9] 10 ./bin/xspecfem3D() [0x58ad01] 11 ./bin/xspecfem3D() [0x405ee0] 12 ./bin/xspecfem3D() [0x48df47] 13 ./bin/xspecfem3D() [0x4042ea] 14 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f47365fc555] 15 ./bin/xspecfem3D() [0x40433b] ================================= Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: #0 0x7F47373316D7 #1 0x7F4737331D1E #2 0x7F47369B762F #3 0x7F47376E62DF #4 0x7F47376D5838 #5 0x7F47376D6298 #6 0x7F4725FC786D #7 0x7F47376B72A1 #8 0x7F47379AB2B8 #9 0x58AD00 in max_all_cr_ at parallel.f90:722 #10 0x405EDF in check_stability_ at check_stability.f90:107 (discriminator 1) #11 0x48DF46 in iterate_time_ at iterate_time.F90:194 #12 0x4042E9 in xspecfem3d at specfem3D.F90:387 #13 0x7F47365FC554 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 24 with PID 0 on node gpu6 exited on signal 8 (Floating point exception). -------------------------------------------------------------------------- [student@gpu6 A_FORWAD_simple_cpml]$