AD fix for PDBijector #280

torfjelde · 2023-08-04T09:48:53Z

Addresses the issues described in TuringLang/Turing.jl#2018 (comment) for ReverseDiff.jl by adding a few new functions and corresponding rrules:

cholesky_lower/cholesky_upper: does what cholesky_factor does, but always returning a raw Matrix, thus making it compatible with the likes of ReverseDiff.
Defers ReverseDiff-differentiation of permutedims to ChainRules.

In addition, fixes a missing import in ReverseDiffExt (IMO we should never import but instead always qualify the methods we're overloading in extensions to avoid these sorts of issues, but I'll make a separate PR for this after this has gone through).

Note that this does not address the issue for Tracker.jl.

… etc.

torfjelde · 2023-08-04T09:50:27Z

src/utils.jl

@@ -19,6 +19,65 @@ cholesky_factor(X::Cholesky) = X.U
 cholesky_factor(X::UpperTriangular) = X
 cholesky_factor(X::LowerTriangular) = X

+# TODO: Add `check` as an argument?


I think this is the last remaining question @devmotion . I'm thinking "let's not, until we start using it"?

torfjelde · 2023-08-04T09:52:15Z

src/utils.jl

+    This is a thin wrapper around `cholesky(Hermitian(X)).L`
+    but with a custom `ChainRulesCore.rrule` implementation.
+"""
+cholesky_lower(X::AbstractMatrix) = lower_triangular(parent(cholesky(Hermitian(X)).L))


I wrap in Hermitian to effectively do the same as the current implementation of cholesky_factor but I believe cholesky(::Hermitian) is only valid starting from Julia 1.8 (going by a comment in BijectorsReverseDiffExt), so we need to fix this.

This is actually not a problem anymore since we're now defining the adjoint to circumvent the cholesky on tracked completely.

src/utils.jl

… proper

devmotion · 2023-08-05T19:48:13Z

Is there a particular reason for defining a ChainRule-rule and applying it to ReverseDiff even though the only (?) broken backends are ReverseDiff and Tracker? It would seem a bit more natural to define a rule for ReverseDiff directly (ignoring Tracker as discussed in Turing).

torfjelde · 2023-08-06T06:35:01Z

Is there a particular reason for defining a ChainRule-rule and applying it to ReverseDiff even though the only (?) broken backends are ReverseDiff and Tracker? It would seem a bit more natural to define a rule for ReverseDiff directly (ignoring Tracker as discussed in Turing).

Nah. I had the same thought, but a) I'm fairly familiar with defining rrules using ChainRules, and less so with ReverseDiff, b) it's easy to make use of the rrule in defining a Tracker.@grad (but since we dropped official support for Tracker, we're no longer testing this), and c) I figured this wouldn't hurt Zygote perf vs. not defining this 🤷

devmotion · 2023-08-06T11:59:07Z

ext/BijectorsReverseDiffExt.jl

+@grad_from_chainrules Bijectors.cholesky_lower(X::TrackedMatrix)
+@grad_from_chainrules Bijectors.cholesky_upper(X::TrackedMatrix)
+
+# TODO: Type-piracy; probably shouldn't do this.


No, this should really not be defined in Bijectors. I'm sure this will lead to surprising debugging and issues when Bijectors is (not) loaded.

devmotion · 2023-08-06T12:03:36Z

One can just define the pullback in a function and reuse it without defining an rrule? From the code it seems the rrule does not provide any performance improvements over the already existing rrules, so there's no benefit for ChainRules-compatible AD backends. Defining a ReverseDiff-rule is (sometimes) very similar to defining a Tracker rule (see, e.g., https://github.com/TuringLang/DistributionsAD.jl/blob/5847e86f7783ea8f745a7465c2f4f9020c729051/ext/DistributionsADReverseDiffExt.jl#L38-L43).

torfjelde · 2023-08-07T04:57:44Z

One can just define the pullback in a function and reuse it without defining an rrule? From the code it seems the rrule does not provide any performance improvements over the already existing rrules, so there's no benefit for ChainRules-compatible AD backends.

But isn't it fair to assume that an rrule will generally lead to improvements in type-stability, and thus such a rrule also benefitting the likes of Zygote?

But I'm happy to not use a rrule here, if that is preferred. I always just do it by default because we generally have good tools to ensure that this works as intended + given the amount of type-instabilities I've encountered with Zygote, I've "arrived" at the conclusion that writing a rrule to avoid tracing through the full callstack is generally considered to beneficial 🤷

EDIT: I have the change to not using rrule ready to go, but as I'm making the changes, my motivation for making the change is dwindling 🙃 Given how rarely I write rules for ReverseDiff and Tracker these days, there is genuinely an increased maintenance burden. And if this is the case for me, I imagine this is doubly so for new developers trying to contribute 😕 It's also "annoying" to remove these good ChainRules-practices from the rrule, e.g. usage of ProjectTo, just because the particular AD framework we're fixing doesn't support these (while when just "deferring" to ChainRules for these AD frameworks, these would just be no-ops anyways).

With that being said, I will still make the change because I ran into a super-weird error with ReverseDiff (I'll raise an issue in a sec) 🤦

of AD rules without type piracy

other tests for the sake of reproducing ReverseDiff bug

…remove rules ChainRules defs

ext/BijectorsReverseDiffExt.jl

devmotion · 2023-08-07T07:26:52Z

Regarding JuliaDiff/ReverseDiff.jl#236, the macro has always had issues and limitations (one bug was just fixed recently), so in my experience it's not really the case (and possibly a too strong expectation) that it brings CR-compatibility to ReverseDiff.

test/ad/utils.jl

@devmotion

ForwardDiff as per suggestion of @devmotion

torfjelde · 2023-08-07T09:07:59Z

Regarding JuliaDiff/ReverseDiff.jl#236, the macro has always had issues and limitations (one bug was just fixed recently), so in my experience it's not really the case (and possibly a too strong expectation) that it brings CR-compatibility to ReverseDiff.

Yeaaah but it's so darn convenient 😞

torfjelde · 2023-08-07T09:34:45Z

Btw, @devmotion the fact that I'm facing issues when using @grad_from_chainrules is not the macro's fault, but it's because we're just calling value on the inputs, while @grad_from_chainrules does not (which is, arguably, the correct way of doing things).

EDIT: Naaah, I'm stupid. The value call is there 🤦

torfjelde · 2023-08-07T15:05:35Z

So this is quite confusing. I'm trying to transition VecCorrBijector to also use the cholesky_lower, etc. and I'm now running into the following issue, despite CI passing for the current implementation (which doesn't have a custom adjoint for Zygote):

julia> d = 4
4

julia> dist = LKJ(d, 2.0)
LKJ{Float64, Int64}(
d: 4
η: 2.0
)


julia> b = bijector(dist)
Bijectors.VecCorrBijector()

julia> x = rand(dist)
4×4 Matrix{Float64}:
  1.0        -0.19264   -0.63806    0.0930006
 -0.19264     1.0        0.259633  -0.168056
 -0.63806     0.259633   1.0        0.170947
  0.0930006  -0.168056   0.170947   1.0

julia> # (✓) Works!
       Zygote.gradient(x) do x
           sum(cholesky(Hermitian(x)).U)
       end

([1.141585190493663 1.7498257982625538 1.7225797690347182 1.6454362799052624; 0.0 0.5205946318942621 1.0163692667364 1.0672427905346635; 0.0 0.0 0.4745109346480554 0.8467387954612068; 0.0 0.0 0.0 0.5399333180126097],)

julia> # (×) Fails!
       Zygote.gradient(x) do x
           sum(parent(cholesky(Hermitian(x)).U))
       end
ERROR: MethodError: no method matching UpperTriangular(::NamedTuple{(:data,), Tuple{FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}})

Closest candidates are:
  UpperTriangular(::UpperTriangular)
   @ LinearAlgebra ~/.julia/juliaup/julia-1.9.2+0.x64.linux.gnu/share/julia/stdlib/v1.9/LinearAlgebra/src/triangular.jl:21
  UpperTriangular(::ChainRulesCore.AbstractThunk)
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/0t04l/src/tangent_types/thunks.jl:68
  UpperTriangular(::TrackedMatrix)
   @ DistributionsADTrackerExt ~/.julia/packages/DistributionsAD/Ufc05/ext/DistributionsADTrackerExt.jl:131
  ...

Stacktrace:
 [1] (::Zygote.var"#1010#1013"{Cholesky{Float64, Matrix{Float64}}})(Δ::NamedTuple{(:data,), Tuple{FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}})
   @ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/array.jl:630
 [2] (::Zygote.var"#3461#back#1014"{Zygote.var"#1010#1013"{Cholesky{Float64, Matrix{Float64}}}})(Δ::NamedTuple{(:data,), Tuple{FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}})
   @ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:71
 [3] Pullback
   @ ./REPL[29]:3 [inlined]
 [4] (::Zygote.Pullback{Tuple{var"#51#52", Matrix{Float64}}, Tuple{Zygote.var"#3461#back#1014"{Zygote.var"#1010#1013"{Cholesky{Float64, Matrix{Float64}}}}, Zygote.Pullback{Tuple{typeof(cholesky), Hermitian{Float64, Matrix{Float64}}}, Tuple{Zygote.ZBack{ChainRules.var"#cholesky_HermOrSym_pullback#2122"{Hermitian{Float64, Matrix{Float64}}, Cholesky{Float64, Matrix{Float64}}}}, Zygote.Pullback{Tuple{Type{NoPivot}}, Tuple{}}}}, Zygote.var"#3027#back#778"{Zygote.var"#772#776"{Matrix{Float64}}}, Zygote.Pullback{Tuple{typeof(parent), UpperTriangular{Float64, Matrix{Float64}}}, Tuple{Zygote.var"#2184#back#299"{Zygote.var"#back#298"{:data, Zygote.Context{false}, UpperTriangular{Float64, Matrix{Float64}}, Matrix{Float64}}}}}, Zygote.var"#3299#back#918"{Zygote.var"#back#917"{Hermitian{Float64, Matrix{Float64}}}}}})(Δ::Float64)
   @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
 [5] (::Zygote.var"#75#76"{Zygote.Pullback{Tuple{var"#51#52", Matrix{Float64}}, Tuple{Zygote.var"#3461#back#1014"{Zygote.var"#1010#1013"{Cholesky{Float64, Matrix{Float64}}}}, Zygote.Pullback{Tuple{typeof(cholesky), Hermitian{Float64, Matrix{Float64}}}, Tuple{Zygote.ZBack{ChainRules.var"#cholesky_HermOrSym_pullback#2122"{Hermitian{Float64, Matrix{Float64}}, Cholesky{Float64, Matrix{Float64}}}}, Zygote.Pullback{Tuple{Type{NoPivot}}, Tuple{}}}}, Zygote.var"#3027#back#778"{Zygote.var"#772#776"{Matrix{Float64}}}, Zygote.Pullback{Tuple{typeof(parent), UpperTriangular{Float64, Matrix{Float64}}}, Tuple{Zygote.var"#2184#back#299"{Zygote.var"#back#298"{:data, Zygote.Context{false}, UpperTriangular{Float64, Matrix{Float64}}, Matrix{Float64}}}}}, Zygote.var"#3299#back#918"{Zygote.var"#back#917"{Hermitian{Float64, Matrix{Float64}}}}}}})(Δ::Float64)
   @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:45
 [6] gradient(f::Function, args::Matrix{Float64})
   @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:97
 [7] top-level scope
   @ REPL[29]:2

which then also breaks cholesky_lower 😕

torfjelde · 2023-08-07T15:26:01Z

Seems like something similar to https://github.com/FluxML/Zygote.jl/blob/29fa32a688fb4454d2ee81b9ba2a6484e4468bda/src/lib/array.jl#L356-L357 is needed.

Adding those, I indeed manage to get stuff running, but the resulting adjoint type is UpperTriangular no matter what 😕

julia> Zygote.@adjoint LinearAlgebra.parent(x::UpperTriangular) = parent(x), Δ -> (UpperTriangular(Δ),)

julia> Zygote.@adjoint LinearAlgebra.parent(x::LowerTriangular) = parent(x), Δ -> (LowerTriangular(Δ),)

julia> x
2×2 Matrix{Float64}:
  1.0       -0.612002
 -0.612002   1.0

julia> last(Zygote.gradient(x) do x
           sum(parent(cholesky(Hermitian(x)).U))
       end)
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0428  1.77385
  ⋅      0.632226

julia> last(Zygote.gradient(x) do x
           sum(parent(cholesky(Hermitian(x)).L))
       end)
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0428  1.77385
  ⋅      0.632226

EDIT: I believe the "issue" of returning UpperTriangular no matter what is because of Zygote's literal_property? It seems, for example, that the jacobian is correct:

julia> last(Zygote.jacobian(x) do x
           parent(cholesky(Hermitian(x)).U)
       end)
4×4 Matrix{Float64}:
 0.5       0.0  0.0       0.0
 0.0       0.0  0.0       0.0
 0.306001  0.0  1.0       0.0
 0.236798  0.0  0.773848  0.632226

julia> last(Zygote.jacobian(x) do x
           parent(cholesky(Hermitian(x)).L)
       end)
4×4 Matrix{Float64}:
 0.5       0.0  0.0       0.0
 0.306001  0.0  1.0       0.0
 0.0       0.0  0.0       0.0
 0.236798  0.0  0.773848  0.632226

* removed redundant imports to BijectorsZygoteExt * use cholesky_upper and cholesky_lower instead of cholesky_factor, etc. * added tests for CorrVecBijector * name testset correctly * use cholesky_lower and cholesky_upper instead of cholesky_factor * removed now-redundant cholesky_factor * Fix obsolete function references in tests. (#282) * Update chainrules.jl * Update corr.jl * Revert changes to transform. * removed type-piracy that has been addressed upstream and bumped Zygote version in test * use :L for Hermitian in `cholesky_lower` * fixed ForwardDiff tests for LKJCholesky * fixed tests for matrix dists and added tests for both values of uplo in LKJCholesky tests * another attempt at fixing Julia 1.6 tests --------- Co-authored-by: Hong Ge <[email protected]>

torfjelde added 6 commits August 4, 2023 08:15

added cholesky_lower and cholesky_triangular

3967e39

updated PD to use new cholesky_lower and cholesky_upper

394debc

simplified imports in BijectorsReverseDiffExtx

64d87bf

added ChainRules as a dep since we need the chain rules for cholesky,…

d175513

… etc.

forgot to update Project.toml in previous commit

94f6a0e

added explicit implementation of with_logabsdet_jacobian for PDBijector

83fee94

torfjelde commented Aug 4, 2023

View reviewed changes

src/utils.jl Outdated Show resolved Hide resolved

torfjelde commented Aug 4, 2023

View reviewed changes

src/utils.jl Outdated Show resolved Hide resolved

torfjelde commented Aug 4, 2023

View reviewed changes

src/utils.jl Outdated Show resolved Hide resolved

torfjelde added 2 commits August 4, 2023 10:53

Update src/utils.jl

4b390c6

added ProjectTo in rrules for cholesky_lower and cholesky_upper to be…

1185cad

… proper

torfjelde mentioned this pull request Aug 4, 2023

CompatHelper: bump compat for Bijectors to 0.13, (keep existing compat) TuringLang/Turing.jl#2018

Merged

added ProjectTo for cholesky_upper too

6be9534

devmotion reviewed Aug 6, 2023

View reviewed changes

torfjelde added 7 commits August 7, 2023 06:36

added transpose_eager as a alias for permutedims to allow definition

7675ea2

of AD rules without type piracy

allow usage of ForwardDiff gradient as ground-truth

15c47eb

added AD tests for PDVecBijector

9322fda

added AD tests for PDVecBijector to runtests and commented out all

0bf8487

other tests for the sake of reproducing ReverseDiff bug

forgot to remove type-piracy def of ReverseDiff rule for permutedims

5d0cd2d

use ReverseDiff.@Grad instead of ReverseDiff.@grad_from_chainrules

29790dc

only define cholesky_lower and cholesky_upper rules for ReverseDiff, …

3241936

…remove rules ChainRules defs

github-actions bot reviewed Aug 7, 2023

View reviewed changes

ext/BijectorsReverseDiffExt.jl Outdated Show resolved Hide resolved

ext/BijectorsReverseDiffExt.jl Outdated Show resolved Hide resolved

formatting

951028e

devmotion reviewed Aug 7, 2023

View reviewed changes

test/ad/utils.jl Outdated Show resolved Hide resolved

torfjelde added 2 commits August 7, 2023 10:03

parameterise gradient test for PD bijector properly instead of using

4fe6085

ForwardDiff as per suggestion of @devmotion

reversed chagne to test_ad

1102266

torfjelde added 3 commits August 7, 2023 10:42

reactivate tests

4e66a8d

updated doocstrings

4f1ecc8

improved PDVecBijector AD tests a bit

e87a2aa

This was referenced Aug 7, 2023

Adjoint for parent for LowerTriangular and UpperTriangular FluxML/Zygote.jl#1444

Merged

AD fix for CorrBijector #281

Merged

yebai approved these changes Aug 12, 2023

View reviewed changes

torfjelde merged commit df21aef into master Aug 12, 2023

delete-merged-branch bot deleted the torfjelde/pd-fix branch August 12, 2023 11:14

torfjelde mentioned this pull request Oct 6, 2023

inv function error in gradient computation TuringLang/JuliaBUGS.jl#115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AD fix for PDBijector #280

AD fix for PDBijector #280

torfjelde commented Aug 4, 2023

torfjelde Aug 4, 2023

torfjelde Aug 7, 2023

torfjelde Aug 4, 2023

torfjelde Aug 7, 2023

devmotion commented Aug 5, 2023

torfjelde commented Aug 6, 2023 •

edited

Loading

devmotion Aug 6, 2023

devmotion commented Aug 6, 2023

torfjelde commented Aug 7, 2023 •

edited

Loading

devmotion commented Aug 7, 2023

torfjelde commented Aug 7, 2023

torfjelde commented Aug 7, 2023 •

edited

Loading

torfjelde commented Aug 7, 2023

torfjelde commented Aug 7, 2023 •

edited

Loading

AD fix for PDBijector #280

AD fix for PDBijector #280

Conversation

torfjelde commented Aug 4, 2023

torfjelde Aug 4, 2023

Choose a reason for hiding this comment

torfjelde Aug 7, 2023

Choose a reason for hiding this comment

torfjelde Aug 4, 2023

Choose a reason for hiding this comment

torfjelde Aug 7, 2023

Choose a reason for hiding this comment

devmotion commented Aug 5, 2023

torfjelde commented Aug 6, 2023 • edited Loading

devmotion Aug 6, 2023

Choose a reason for hiding this comment

devmotion commented Aug 6, 2023

torfjelde commented Aug 7, 2023 • edited Loading

devmotion commented Aug 7, 2023

torfjelde commented Aug 7, 2023

torfjelde commented Aug 7, 2023 • edited Loading

torfjelde commented Aug 7, 2023

torfjelde commented Aug 7, 2023 • edited Loading

torfjelde commented Aug 6, 2023 •

edited

Loading

torfjelde commented Aug 7, 2023 •

edited

Loading

torfjelde commented Aug 7, 2023 •

edited

Loading

torfjelde commented Aug 7, 2023 •

edited

Loading