Skip to content

Failed to solve the set of equations - Potential Rank Deficient Matrix and ReturnCode.Stalled = 17 #496

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
abcdvvvv opened this issue Nov 8, 2024 · 10 comments
Assignees

Comments

@abcdvvvv
Copy link

abcdvvvv commented Nov 8, 2024

No description provided.

@abcdvvvv abcdvvvv added the bug Something isn't working label Nov 8, 2024
@ChrisRackauckas
Copy link
Member

NewtonRaphson should not be used if you need robustness. Did you try the default algorithm? Or TrustRegion?

@ErikQQY
Copy link
Member

ErikQQY commented Nov 8, 2024

I tested the MWE with TrustRegion, GaussNewton, and polyalgorithms like RobustMultiNewton, the returned solution s still unstable

@abcdvvvv
Copy link
Author

abcdvvvv commented Nov 8, 2024

Thank you for your reply. The reason why I chose this combination of Newton's method + forward difference is because my function cannot be auto-differentiated. When the TrustRegion is chosen (sol = solve(prob, TrustRegion())), an error is reported.

ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{DifferentiationInterface.FixTail{…}, Float64}, Float64, 12})   

In my original version of the function, a large number of arrays of type asserted as Vector{Float64} were used as buffers, reducing memory allocation to improve performance. I couldn't easily change the type of these arrays.
For example, the above error is reported when sum!(V, v) because V is of type Vector{Float64}.

As for the default method, I tried sol = solve(prob) and the error message was reported as

ERROR: DomainError with -285.9475498224578:
log was called with a negative real argument but will only return a complex result if called with a complex argument. Try log(Complex(x)).

I think the situation might be better if upper and lower bounds were provided for the individual variables, since the function is a fully physical model.

Note that you can reproduce the problems with the MWE code I provided.

@avik-pal
Copy link
Member

avik-pal commented Nov 8, 2024

The way you write your function won't work with ForwardDiff, also preallocating r in that way doesn't provide any performance benefit but rather adds an unnecessary copy which will slow your code.

function nonlinearfunc(X, (WL, WV, P, Q, F, hF, f, r))
    ns = 50
    X = reshape(X, 9, ns)'
    v = @view X[:, 1:4]
    T = @view X[:, 5]
    l = @view X[:, 6:9]

    V = sum(v, dims=2)
    L = sum(l, dims=2)
    SL = WL ./ L
    SV = WV ./ V

    hg = func3(298.15, T)
    HV = sum(v .* (fom .+ hg), dims=2)
    hv = func1(T)
    HL = sum(l .* (fom .+ hg .- hv), dims=2)
    k = func0(T) ./ P

    eq1 = @. Q + F * hF + [0; HL[1:ns-1]] + [HV[2:ns]; 0] - (1 + SL) * HL - (1 + SV) * HV
    eq2 = @. f + [zeros(1, 4); l[1:ns-1, :]] + [v[2:ns, :]; zeros(1, 4)] - (1 + SL) * l - (1 + SV) * v
    eq3 = @. k * l * V / L - v

    eq1[1] = L[1] - 1.0 * V[1]
    eq1[ns] = 20.0 - L[end]

    return reshape(hcat(eq1, eq2, eq3)', ns * 9)
end

This is AD compatible and I get this with NewtonRaphson

julia> @show sol.retcode
sol.retcode = SciMLBase.ReturnCode.Success
ReturnCode.Success = 1

julia> @show SciMLBase.successful_retcode(sol)
SciMLBase.successful_retcode(sol) = true
true

julia> display(sol.stats)
SciMLBase.NLStats
Number of function evaluations:                    13
Number of Jacobians created:                       12
Number of factorizations:                          0
Number of linear solves:                           11
Number of nonlinear solver iterations:             11

julia> display(sol.resid)
450-element reshape(adjoint(::Matrix{Float64}), 450) with eltype Float64:
 -3.552713678800501e-15
  1.7763568394002505e-15
  0.0
  0.0
  9.183549615799121e-41
 -9.947598300641403e-14
  5.329070518200751e-15
 -2.481541837659083e-24
  1.285696946211877e-39
 -8.881784197001252e-16
  0.0
 -3.552713678800501e-15
 -3.308722450212111e-24
 -9.550891600431086e-39
 -5.3290705182007514e-14
 -1.1723955140041653e-13
 -5.293955920339377e-23
  4.555040609436364e-38
  3.552713678800501e-15
  0.0
  3.552713678800501e-15
 -3.308722450212111e-24
  5.877471754111438e-38
 -1.4566126083082054e-13
  1.7763568394002505e-14
  2.845501307182415e-22
 -3.5264830524668625e-38
 -2.6645352591003757e-15
  3.552713678800501e-15
  0.0
  2.6469779601696886e-23
 -5.64237288394698e-37
 -2.948752353404416e-13
 -4.014566457044566e-13
 -1.9852334701272664e-22
  
  0.0
 -1.4210854715202004e-14
 -1.3322676295501878e-15
  6.074607623436297e-25
  1.0587911840678754e-22
  1.7053025658242404e-13
  4.440892098500626e-15
 -7.105427357601002e-15
  6.462348535570529e-27
  2.6469779601696886e-23
  0.0
  1.3322676295501878e-15
 -6.785465962349055e-26
 -1.7205356741102976e-22
 -2.1316282072803006e-14
  2.0872192862952943e-14
  2.042810365310288e-14
  0.0
 -6.617444900424222e-24
 -3.552713678800501e-15
  0.0
 -2.50416005753358e-26
 -2.514629062161204e-22
 -1.3500311979441904e-13
  3.8191672047105385e-14
  0.0
  0.0
  0.0
  7.105427357601002e-15
  0.0
  3.8370194429950014e-26
  3.970466940254533e-23
  2.2382096176443156e-13
  1.2612133559741778e-13

With v4 I am getting convergence even with FiniteDiff.

@abcdvvvv
Copy link
Author

abcdvvvv commented Nov 8, 2024

The way you write your function won't work with ForwardDiff, also preallocating r in that way doesn't provide any performance benefit but rather adds an unnecessary copy which will slow your code.

Thank you for your prompt reply. After using the code you provided it is possible to perform ForwardDiff. With an absolute tolerance of 1e-6 it requires 128 function evaluations, much higher than the result of AD. So, I think it would be better to modify my source code to enable AD.

@avik-pal
Copy link
Member

avik-pal commented Nov 8, 2024

You can also try out autodiff = AutoEnzyme() with an import Enzyme, that might not require code change. And check what is holding NonlinearSolve back to v3

@abcdvvvv
Copy link
Author

abcdvvvv commented Nov 9, 2024

Please help me, I don't know how to deal with this when using solve(prob, NewtonRaphson(; autodiff=AutoEnzyme()), abstol=1e-6)

ERROR: Constant memory is stored (or returned) to a differentiable variable.
As a result, Enzyme cannot provably ensure correctness and throws this error.
This might be due to the use of a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity).
If Enzyme should be able to prove this use non-differentable, open an issue!
To work around this issue, either:
 a) rewrite this variable to not be conditionally active (fastest, but requires a code change), or
 b) set the Enzyme mode to turn on runtime activity (e.g. autodiff(set_runtime_activity(Reverse), ...) ). This will maintain correctness, but may slightly reduce performance.
Mismatched activity for:   store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 1767394484496 to {}*) to {} addrspace(10)*), {} addrspace(10)** %.fca.0.0.gep, align 8, !dbg !147, !noalias !151 const val: {} addrspace(10)* addrspacecast ({}* inttoptr (i64 1767394484496 to {}*) to {} addrspace(10)*)
 value=[96.91707454 54.83007454 67.22807454 93.13707454] of type Matrix{Float64}
 llvalue={} addrspace(10)* addrspacecast ({}* inttoptr (i64 1767394484496 to {}*) to {} addrspace(10)*)

@avik-pal
Copy link
Member

Did you rewrite the problem to use in place function correctly as I mentioned above? The form you posted in the original code, won't correctly work (that is essentially what the error is about)

@abcdvvvv
Copy link
Author

Yes, I changed the last two lines to return reshape(hcat(eq1, eq2, eq3)', ns * 9) and it still reports this error. This is because I used const cps1 = [96.91707454 54.83007454 67.22807454 93.13707454], a const variable in my code, but I don't know how to fix it.
I also tried solve(prob, NewtonRaphson(; autodiff=AutoEnzyme(mode=Enzyme.EnzymeCore.Reverse)), abstol=1e-6), then I get

ERROR: StackOverflowError:
Stacktrace:
  [1] LLVMRunPassManager
    @ C:\Users\karei\.julia\packages\LLVM\wMjUU\lib\15\libLLVM.jl:6353 [inlined]
  [2] run!
    @ C:\Users\karei\.julia\packages\LLVM\wMjUU\src\passmanager.jl:39 [inlined]
  [3] #18910
    @ C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\compiler\optimize.jl:2514 [inlined]
  [4] LLVM.ModulePassManager(::Enzyme.Compiler.var"#18910#18917"{LLVM.Module}; kwargs::@Kwargs{})
    @ LLVM C:\Users\karei\.julia\packages\LLVM\wMjUU\src\passmanager.jl:33
  [5] ModulePassManager
    @ C:\Users\karei\.julia\packages\LLVM\wMjUU\src\passmanager.jl:30 [inlined]
  [6] removeDeadArgs!(mod::LLVM.Module, tm::LLVM.TargetMachine)
    @ Enzyme.Compiler C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\compiler\optimize.jl:2512
  [7] post_optimze!(mod::LLVM.Module, tm::LLVM.TargetMachine, machine::Bool)
    @ Enzyme.Compiler C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\compiler\optimize.jl:2828
  [8] post_optimze!(mod::LLVM.Module, tm::LLVM.TargetMachine)
    @ Enzyme.Compiler C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\compiler\optimize.jl:2827
  [9] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\compiler.jl:8394
 [10] _thunk
    @ C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\compiler.jl:8375 [inlined]
 [11] cached_compilation
    @ C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\compiler.jl:8416 [inlined]
 [12] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{…}, ::Type{…}, ::Type{…}, tt::Type{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Type{…}, ::Val{…}, ::Val{…})
    @ Enzyme.Compiler C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\compiler.jl:8548
 [13] #s2102#19072
    @ C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\compiler.jl:8685 [inlined]
 [14] 
    @ Enzyme.Compiler .\none:0
 [15] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core .\boot.jl:602
 [16] autodiff_thunk
    @ C:\Users\karei\.julia\packages\Enzyme\RvNgp\src\Enzyme.jl:969 [inlined]
 [17] batch_seeded_autodiff_thunk(::EnzymeCore.ReverseModeSplit{…}, ::NTuple{…}, ::Const{…}, ::Type{…}, ::BatchDuplicated{…}, ::Const{…})
    @ DifferentiationInterfaceEnzymeExt C:\Users\karei\.julia\packages\DifferentiationInterface\bulUW\ext\DifferentiationInterfaceEnzymeExt\reverse_onearg.jl:32
 [18] value_and_pullback(f::NonlinearFunction{…}, ::DifferentiationInterface.NoPullbackPrep, backend::AutoEnzyme{…}, x::Vector{…}, ty::NTuple{…}, contexts::DifferentiationInterface.Constant{…})
    @ DifferentiationInterfaceEnzymeExt C:\Users\karei\.julia\packages\DifferentiationInterface\bulUW\ext\DifferentiationInterfaceEnzymeExt\reverse_onearg.jl:101
 [19] pullback
    @ C:\Users\karei\.julia\packages\DifferentiationInterface\bulUW\ext\DifferentiationInterfaceEnzymeExt\reverse_onearg.jl:120 [inlined]
 [20] (::DifferentiationInterface.var"#42#43"{})(a::Int64)
    @ DifferentiationInterface C:\Users\karei\.julia\packages\DifferentiationInterface\bulUW\src\first_order\jacobian.jl:345
 [21] mapreduce_impl(f::DifferentiationInterface.var"#42#43"{}, op::typeof(vcat), A::Base.OneTo{…}, ifirst::Int64, ilast::Int64, blksize::Int64)
    @ Base .\reduce.jl:262
 [22] mapreduce_impl
    @ .\reduce.jl:277 [inlined]
 [23] _mapreduce(f::DifferentiationInterface.var"#42#43"{}, op::typeof(vcat), ::IndexLinear, A::Base.OneTo{…})
    @ Base .\reduce.jl:447
 [24] _mapreduce_dim
    @ .\reducedim.jl:367 [inlined]
 [25] mapreduce
    @ .\reducedim.jl:359 [inlined]
 [26] _jacobian_aux
    @ C:\Users\karei\.julia\packages\DifferentiationInterface\bulUW\src\first_order\jacobian.jl:344 [inlined]
 [27] jacobian(f::NonlinearFunction{…}, prep::DifferentiationInterface.PullbackJacobianPrep{…}, backend::AutoEnzyme{…}, x::Vector{…}, contexts::DifferentiationInterface.Constant{…})
    @ DifferentiationInterface C:\Users\karei\.julia\packages\DifferentiationInterface\bulUW\src\first_order\jacobian.jl:182
 [28] construct_jacobian_cache(prob::NonlinearProblem{…}, alg::GeneralizedFirstOrderAlgorithm{…}, f::NonlinearFunction{…}, fu::Vector{…}, u::Vector{…}, p::Tuple{…}; stats::SciMLBase.NLStats, autodiff::AutoEnzyme{…}, vjp_autodiff::AutoEnzyme{…}, jvp_autodiff::AutoForwardDiff{…}, linsolve::Nothing)
    @ NonlinearSolveBase C:\Users\karei\.julia\packages\NonlinearSolveBase\54f3T\src\jacobian.jl:78       
 [29] construct_jacobian_cache
    @ C:\Users\karei\.julia\packages\NonlinearSolveBase\54f3T\src\jacobian.jl:34 [inlined]
 [30] __init(::NonlinearProblem{…}, ::GeneralizedFirstOrderAlgorithm{…}; stats::SciMLBase.NLStats, alias_u0::Bool, maxiters::Int64, abstol::Float64, reltol::Nothing, maxtime::Nothing, termination_condition::Nothing, internalnorm::Function, linsolve_kwargs::@NamedTuple{}, kwargs::@Kwargs{})
    @ NonlinearSolveFirstOrder C:\Users\karei\.julia\packages\NonlinearSolveFirstOrder\YnPZr\src\solve.jl:159
 [31] __init
    @ C:\Users\karei\.julia\packages\NonlinearSolveFirstOrder\YnPZr\src\solve.jl:119 [inlined]
 [32] __solve(::NonlinearProblem{…}, ::GeneralizedFirstOrderAlgorithm{…}; kwargs::@Kwargs{})
    @ NonlinearSolveBase C:\Users\karei\.julia\packages\NonlinearSolveBase\54f3T\src\solve.jl:5
 [33] __solve
    @ C:\Users\karei\.julia\packages\NonlinearSolveBase\54f3T\src\solve.jl:1 [inlined]
 [34] #solve_call#44
    @ C:\Users\karei\.julia\packages\DiffEqBase\frOsk\src\solve.jl:612 [inlined]
 [35] solve_call
    @ C:\Users\karei\.julia\packages\DiffEqBase\frOsk\src\solve.jl:569 [inlined]
 [36] #solve_up#53
    @ C:\Users\karei\.julia\packages\DiffEqBase\frOsk\src\solve.jl:1084 [inlined]
 [37] solve_up
    @ C:\Users\karei\.julia\packages\DiffEqBase\frOsk\src\solve.jl:1078 [inlined]
 [38] #solve#52
    @ C:\Users\karei\.julia\packages\DiffEqBase\frOsk\src\solve.jl:1072 [inlined]
 [39] eval
    @ .\boot.jl:385 [inlined]
 [40] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
    @ Base .\loading.jl:2076

@avik-pal
Copy link
Member

Interesting, I will have to take a look at this. Seems strange that it is using autodiff_thunk.

@avik-pal avik-pal removed the bug Something isn't working label Nov 11, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants