Faster Matrix{BlasFloat} * or \ VecOrMatrix{Dual} #589

alyst · 2022-07-18T06:29:12Z

The calculation of the gradient from a multivariate normal prior involves left-division by a triangular matrix.
Currently, when the covariance matrix is fixed, and the random vector depends on model parameters (the typical use case), it is done via a fallback path in LinearAlgebra: the constant matrix gets promoted to Dual type (i.e. it is copied each time the gradient is calculated), and then the generic triangular left division implementation is called. For 100x100 and larger matrices this results in big CPU and memory overhead.

However, when the matrix is constant, we don't need to convert it to Dual. We just have to left divide the dual values vector as well as each partial by this matrix. Since it would be the operation on fixed vectors, LAPACK's trtrs() could be used for much faster division.
This is what this PR does. To avoid excessive copying, it relies on a hack: the array of duals could be accessed as a vector of N+1 floats.

src/dual.jl

mcabbott · 2022-07-18T18:02:14Z

src/dual.jl

+Base.:*(m::Union{LowerTriangular{<:LinearAlgebra.BlasFloat},
+                 UpperTriangular{<:LinearAlgebra.BlasFloat},
+                 StridedMatrix{<:LinearAlgebra.BlasFloat}},
+        x::StridedVecOrMat{<:Dual}) =
+    _map_dual_components(Base.Fix1(lmul!, m), (x, _) -> lmul!(m, x), x)


I think it's going to be very hard to avoid ambiguities here.

julia> rand(3,3) * Dual.(rand(3),0) ERROR: MethodError: *(::Matrix{Float64}, ::Vector{Dual{Nothing, Float64, 1}}) is ambiguous. Candidates: *(A::StridedMatrix{T}, x::StridedVector{S}) where {T<:Union{Float32, Float64, ComplexF32, ComplexF64}, S<:Real} @ LinearAlgebra ~/.julia/dev/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:49 *(m::Union{LowerTriangular{var"#s87", S} where {var"#s87"<:Union{Float32, Float64, ComplexF32, ComplexF64}, S<:AbstractMatrix{var"#s87"}}, UpperTriangular{var"#s86", S} where {var"#s86"<:Union{Float32, Float64, ComplexF32, ComplexF64}, S<:AbstractMatrix{var"#s86"}}, StridedMatrix{<:Union{Float32, Float64, ComplexF32, ComplexF64}}}, x::StridedVecOrMat{<:Dual}) @ Main REPL[51]:4

Attaching the rule to mul!(Matrix{<:Dual}, ...) seems less likely to trigger them.

Testing with Test.detect_ambiguities(ForwardDiff) might be a good idea too.

Attaching the rule to mul!(Matrix{<:Dual}, ...) seems less likely to trigger them.

The problem is that promotion to Dual already happens at \ (probably for * too).

BTW, I think LinearAlgebra does something like this trick for some mixed real/complex cases (where the strides work out correctly). Maybe mul!(ones(4,4).+im, rand(ComplexF64, 4,4), rand(4,4), true, false) is one? Staying close to the signature used there is probably a way to avoid ambiguities.

\ might be easier than *, as it doesn't have so many methods.

For *, promotion to make C should happen correctly without this package doing anything, I think.

For *, promotion to make C should happen correctly without this package doing anything, I think.

This is *(m::Triangular, x::Vector)

codecov-commenter · 2022-07-19T01:29:51Z

Codecov Report

Patch coverage: 93.02% and project coverage change: -2.55 ⚠️

Comparison is base (e3670ce) 89.65% compared to head (1d38eac) 87.11%.

❗ Current head 1d38eac differs from pull request most recent head 22d8670. Consider uploading reports for the commit 22d8670 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #589      +/-   ##
==========================================
- Coverage   89.65%   87.11%   -2.55%     
==========================================
  Files          11        9       -2     
  Lines         967      947      -20     
==========================================
- Hits          867      825      -42     
- Misses        100      122      +22

Impacted Files	Coverage Δ
src/dual.jl	`81.81% <93.02%> (-0.34%)`	⬇️

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

alyst · 2022-07-20T03:32:42Z

Now these functions should be fixed and properly tested.
The latter would require merging JuliaDiff/DiffTests.jl#11 first and updating the deps here.

src/dual.jl

alyst · 2022-09-22T18:40:49Z

@mcabbott @fredrikekre @devmotion I've refactored the code so that in-place ldiv/mul are also supported. These changes should now be covered by tests with the updated DiffTests package. There are some linalg differences between 1.0 and the current 1.x, so some of the tests are disabled on 1.0, I guess it's the most straightforward way to handle incompatibilities.

mcabbott · 2022-11-15T15:36:19Z

src/dual.jl

+    @eval LinearAlgebra.mul!(y::StridedMatrix{T}, m::$MT, x::StridedMatrix{T}) where T <: Dual =
+        _map_dual_components!((y, x) -> mul!(y, m, x), (y, x, _) -> mul!(y, m, x), y, x)
+
+    @eval Base.:*(m::$MT, x::StridedMatrix{<:Dual}) = mul!(similar(x, (size(m, 1), size(x, 2))), m, x)


I've lost track a bit, but since I was passing by:

Why does this case call _map_dual_components! and not just reinterpret y?

Why add a method to * here, won't it go to mul! anyway?

Should there be any methods at all for 3-arg mul!, or can they all be on 5-arg mul! only, as that's eventually called?

@mcabbott Unfortunately, I also start to loose track, but AFAIR I was exploring these possibilities:

reinterpret() would only work for the dual vectors (translating it into normal matrix multiplication), but for dual matrices there's no representation as a normal matrix linalg operation

* and \ methods are required to overload Base Julia methods that would promote eltypes of all vectors and matrices to Dual, so we need to intercept that code path early

mul!(): AFAIR there are some 3-arg methods that don't call 5-arg methods, plus it was evolving from Julia 1.0 to 1.8, so it is hard to come up with the set of methods that are optimal for all the versions. This is a part of the PR that could be potentially more polished, but as the whole infrastructure of mul!()/ldiv!() methods in LinearAlgrebra is nontrivial, I was waiting for the ForwardDiff devs feedback and approval of the PR in principle before going forward.

Eventually we'll get there!

I do think you can reinterpret dual-mat * matorvec. reinterpret(Float64, rand(3,4).+im) * rand(4, 5) makes a 6×5 Matrix{Float64} via existing methods, I think you're suggesting that there is no method which makes a 2×3×5 array, but this is just reshape. I think that @less mul!((rand(3).+im), (rand(3,3).+im), rand(3), true, false) works like this, without ever calling reinterpret but just passing pointers.

Agree that some of the other methods here need to catch * or / directly.

Re 1.0, the only reason not to drop it is not being sure what version of ForwardDiff to call that... I mean it shouldn't break on 1.0, but it's totally fine for 1.0 to miss these fast paths.

@mcabbott BLAS has complex matrix-matrix multiplication (zgemm()), so there's need for reinterpretation. The K×L matrix z of duals with N partials could be represented as (N+1)×L×M array Z of reals. When multiplied by the constant K×L matrix A, the result y = A×z could be reinterpreted as (N+1)×K×M array. My point is that K and L are the middle dimension of Y and Z arrays, resp. So any combination of reshaping and transposing operations would not make these dimensions the first or the last one (so that the resulting matrix is compatible with matrix A multiplication). One would need permutedims, which involves array elements reshuffling.

it moved to weakdeps

reduces the amount of noise in the output

To test the autodiff of `f(x)=M*x` or `f(x)=M\x`

alyst · 2023-06-22T03:45:28Z

I've rebased the PR and cleaned up commit history a bit. For tests to succeed, DiffTests 0.1.3 is required (JuliaDiff/DiffTests.jl#17).
There were concerns regarding how efficient is constant_matrix * dual_matrix case, and whether it could be reinterpreted as normal numeric matrix multiplication, but I think plain reinterpretation is not possible.

mcabbott reviewed Jul 18, 2022

View reviewed changes

alyst mentioned this pull request Jul 20, 2022

Add f(x) = A*x, A\x JuliaDiff/DiffTests.jl#11

Merged

alyst changed the title ~~Faster * and / of dual array by constant matrix~~ Faster Matrix{BlasFloat} * or \ of VecOrMatrix{Dual} Jul 20, 2022

alyst changed the title ~~Faster Matrix{BlasFloat} * or \ of VecOrMatrix{Dual}~~ Faster Matrix{BlasFloat} * or \ VecOrMatrix{Dual} Jul 20, 2022

fredrikekre reviewed Jul 20, 2022

View reviewed changes

src/dual.jl Outdated Show resolved Hide resolved

alyst mentioned this pull request Jul 24, 2022

Add ZeroTangent vs array of arrays tests JuliaDiff/ChainRulesTestUtils.jl#257

Merged

alyst force-pushed the faster_mtx_div branch from 06c8c6f to 8cb7806 Compare August 29, 2022 19:16

alyst force-pushed the faster_mtx_div branch from 792ee25 to 27cc509 Compare September 8, 2022 17:20

mcabbott reviewed Nov 15, 2022

View reviewed changes

alyst added 8 commits June 21, 2023 09:43

Project.toml: remove StaticArrays

9a91982

it moved to weakdeps

faster * and / of dual arr by constant mtx

ebcdf17

* and \: support const SparseMatrixCSC

21043fb

tests: convert println() into @testset

a0847cd

reduces the amount of noise in the output

tests: improve testset names

a8a1cf5

tests: add vec -> vec and mtx -> mtx tests

934653d

tests: relax Mat-Mat precision to pass inv()

c5e7f3a

require DiffTests 0.1.3

22d8670

To test the autodiff of `f(x)=M*x` or `f(x)=M\x`

alyst force-pushed the faster_mtx_div branch from 1d38eac to 22d8670 Compare June 22, 2023 03:39

alyst mentioned this pull request Jun 22, 2023

Fix Backend Compatibility JuliaDiff/DiffTests.jl#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster Matrix{BlasFloat} * or \ VecOrMatrix{Dual} #589

Faster Matrix{BlasFloat} * or \ VecOrMatrix{Dual} #589

alyst commented Jul 18, 2022

mcabbott Jul 18, 2022

alyst Jul 18, 2022

mcabbott Jul 18, 2022

mcabbott Jul 18, 2022

alyst Jul 18, 2022

codecov-commenter commented Jul 19, 2022 •

edited by codecov bot

Loading

alyst commented Jul 20, 2022

alyst commented Sep 22, 2022

mcabbott Nov 15, 2022

alyst Nov 15, 2022

mcabbott Nov 15, 2022

alyst Jun 21, 2023 •

edited

Loading

alyst commented Jun 22, 2023

Faster Matrix{BlasFloat} * or \ VecOrMatrix{Dual} #589

Are you sure you want to change the base?

Faster Matrix{BlasFloat} * or \ VecOrMatrix{Dual} #589

Conversation

alyst commented Jul 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 19, 2022 • edited by codecov bot Loading

Codecov Report

alyst commented Jul 20, 2022

alyst commented Sep 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyst Jun 21, 2023 • edited Loading

Choose a reason for hiding this comment

alyst commented Jun 22, 2023

codecov-commenter commented Jul 19, 2022 •

edited by codecov bot

Loading

alyst Jun 21, 2023 •

edited

Loading