Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix warning, fix CI and run tests with Julia 1.3 only on Ubuntu #197

Closed
wants to merge 6 commits into from

Conversation

devmotion
Copy link
Member

No description provided.

@devmotion
Copy link
Member Author

👀 🤣 I don't know what's going on with the Zygote tests. Maybe we should just upgrade to ChainRulesCore 1 and see if they are fixed in more recent versions of Zygote.

@DhairyaLGandhi
Copy link

Zygote moves to CR 1. This is a result of that. Perhaps we should not be too aggressive in adopting ChainRules.

@devmotion
Copy link
Member Author

Perhaps we should not be too aggressive in adopting ChainRules.

In Zygote? Or in DistributionsAD? In DistributionsAD it's only used for a handful of functions which still pass all tests. The errors seem to originate from the integration in Zygote, somehow now we have to define constructors for structs that worked automatically before?

@DhairyaLGandhi
Copy link

I meant in Zygote. @oxinabox shouldn't these constructors be output as regular named tuples if CR doesn't have a compatible rrule and not wrap it in a Tangent?

@DhairyaLGandhi
Copy link

Could we bump CR to 1, it's blocking FluxML/FluxBench.jl#4 tests with the SciML stack

@devmotion
Copy link
Member Author

BTW locally I got the same errors with the latest release of Zygote and ChainRulesCore 1. Initially I only fixed the ChainRules definitions (which are not involved in the errors) but now I started to remove Zygote definitions that might not be needed anymore and ported some others. I still get the same errors regarding missing adjoints for constructors for e.g. product distributions of Bernoulli distributions.

@devmotion
Copy link
Member Author

Could we bump CR to 1, it's blocking FluxML/FluxBench.jl#4 tests with the SciML stack

Zygote tests are completely broken also with CR 1 and the latest Zygote version, I would like to understand and fix the problem before making a new release. I don't mind if it uses CR1 and/or the latest Zygote version or not, I just want to fix the test failures right now.

@oxinabox
Copy link

@oxinabox shouldn't these constructors be output as regular named tuples if CR doesn't have a compatible rrule and not wrap it in a Tangent?

I am going to need more context.
What constructors?

@devmotion
Copy link
Member Author

Of a large number of distributions: https://github.com/TuringLang/DistributionsAD.jl/pull/197/checks?check_run_id=3410352390

The univariate tests of these distributions seem to be fine though, only the multi- and matrixvariate product distributions seem to throw the missing adjoint error. So I assume it might be a problem with the logpdf/loglikelihood definitions for these cases, something that Zygote does not like about the definitions of the primals or tangents and somehow messes up the ChainRules integration.

@devmotion
Copy link
Member Author

I opened #198 with updates for CR 1 but locally tests fail in a similar way.

@oxinabox
Copy link

oxinabox commented Aug 24, 2021

  Test threw exception
  Expression: ≈((Zygote.gradient(f, x))[1], finitediff, rtol = rtol, atol = atol)
  Need an adjoint for constructor TuringDirichlet{Float64, Vector{Float64}, Float64}. Gradient is of type Tangent{TuringDirichlet{Float64, Vector{Float64}, Float64}, NamedTuple{(:alpha, :alpha0, :lmnB), Tuple{Vector{Float64}, NoTangent, Float64}}}

Yeah that is weird, idk how they are getting there,
Those are not supposed to escape, being returned from a rrule into Zygote.
After the rrule is called it should convert them into NamedTuples.
Working out how they are getting through seems needful

Adding JuliaDiff/ChainRules.jl#510 might fix.
As might be just change https://github.com/FluxML/Zygote.jl/blob/975923915c6533abd278902aec6dfe07f1929f46/src/lib/lib.jl#L301
to actually accept them -- the data is correct it is just not expected to have a path that could get it out to here.

It is unfortunately difficult to trace where values come from and the path they used to get there, when going through AD.

@devmotion
Copy link
Member Author

I tried to reproduce the error with some MWE (played around with different sum/map/view combinations that could cause the problem it feels). In the following example we end up with NamedTuple in an intermediate step which is different but errors as well:

julia> using Zygote

julia> struct A{T}
         x::T
       end

julia> f(a::A, y::Real) = a.x * y
f (generic function with 1 method)

julia> gradient(x -> sum(map(f, fill(A(x), 2, 1), ones(2, 1))), 2.0)
ERROR: MethodError: no method matching +(::NamedTuple{(:x,), Tuple{Float64}}, ::NamedTuple{(:x,), Tuple{Float64}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
  +(::ChainRulesCore.AbstractThunk, ::Any) at /home/david/.julia/packages/ChainRulesCore/Voykb/src/differential_arithmetic.jl:138
  +(::ChainRulesCore.Tangent{P, T} where T, ::P) where P at /home/david/.julia/packages/ChainRulesCore/Voykb/src/differential_arithmetic.jl:162
  ...
Stacktrace:
  [1] add_sum(x::NamedTuple{(:x,), Tuple{Float64}}, y::NamedTuple{(:x,), Tuple{Float64}})
    @ Base ./reduce.jl:24
  [2] _mapreduce
    @ ./reduce.jl:408 [inlined]
  [3] _mapreduce_dim
    @ ./reducedim.jl:318 [inlined]
  [4] #mapreduce#672
    @ ./reducedim.jl:310 [inlined]
  [5] mapreduce
    @ ./reducedim.jl:310 [inlined]
  [6] #_sum#682
    @ ./reducedim.jl:878 [inlined]
  [7] _sum
    @ ./reducedim.jl:878 [inlined]
  [8] #_sum#681
    @ ./reducedim.jl:877 [inlined]
  [9] _sum
    @ ./reducedim.jl:877 [inlined]
 [10] #sum#679
    @ ./reducedim.jl:873 [inlined]
 [11] sum
    @ ./reducedim.jl:873 [inlined]
 [12] fill_pullback
    @ ~/.julia/packages/ChainRules/qd40H/src/rulesets/Base/array.jl:344 [inlined]
 [13] ZBack
    @ ~/.julia/packages/Zygote/nsu1Y/src/compiler/chainrules.jl:140 [inlined]
 [14] Pullback
    @ ./REPL[25]:1 [inlined]
 [15] (::typeof((#63)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/nsu1Y/src/compiler/interface2.jl:0
 [16] (::Zygote.var"#50#51"{typeof((#63))})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/nsu1Y/src/compiler/interface.jl:41
 [17] gradient(f::Function, args::Float64)
    @ Zygote ~/.julia/packages/Zygote/nsu1Y/src/compiler/interface.jl:76
 [18] top-level scope
    @ REPL[25]:1

@devmotion
Copy link
Member Author

I had another look at the issue and reduced it to a MWE: #198 (comment)

@devmotion
Copy link
Member Author

Closed in favour of #198.

@mcabbott
Copy link

mcabbott commented Oct 17, 2021

The above example is fixed on latest Zygote, by FluxML/Zygote.jl#1103 or the one before.

julia> gradient(x -> sum(map(f, fill(A(x), 2, 1), ones(2, 1))), 2.0)
(2.0,)

(@v1.7) pkg> st Zygote
      Status `~/.julia/environments/v1.7/Project.toml`
  [e88e6eb3] Zygote v0.6.29

The expressions in #198 (comment) work too, although I'm not sure if those were fixed here already.

@devmotion
Copy link
Member Author

The expressions in #198 (comment) work too, although I'm not sure if those were fixed here already.

8bb15b5 was added as a workaround for these examples (and a similar one for matrixvariate distributions), so tests passed again. Of course, it would be better if such hacks could be removed and Zygote would just work and do the correct thing.

@yebai yebai deleted the dw/warning branch October 3, 2024 20:51
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants