Workaround for performance regression introduced by FFTW

Convolutions in DSP currently rely on FFTW.jl, and a recent change in FFTW.jl (JuliaMath/FFTW.jl#105) has introduced a large performance regression in `conv` whenever Julia is started with more than one thread. Since v1 of FFTW.jl, it uses multi-threaded FFTW transformations by default whenever Julia has more than one thread. This new default causes small FFT problems to run much more slowly and use much more memory. Since the overlap-save method of `conv` in DSP breaks a convolutions into small convolutions, and therefore performs a large number of small FFTW transformations, this change can cause convolutions to be slower by two orders of magnitude, and similarly use two orders of magnitude more memory. While FFTW.jl does not provide an explicit way to set the number of threads used by a FFTW plan without changing a global variable, generating the plans with the planning flag set to `FFTW.PATIENT` (instead of the default `MEASURE`) allows the planner to consider changing the number of threads. Adding this flag to the plans generated by the overlap-save convolution method seems to rescue the performance regression on multi-threaded instances of Julia. Fixes JuliaDSP#399 Also see JuliaMath/FFTW.jl#121
galenlynch · May 24, 2020 · 2253e37 · 2253e37
1 parent f53fe27
commit 2253e37
Showing 1 changed file with 6 additions and 4 deletions.
diff --git a/src/dspbase.jl b/src/dspbase.jl
@@ -301,17 +301,19 @@ unnormalized.
     bufsize = ntuple(i -> i == 1 ? nffts[i] >> 1 + 1 : nffts[i], N)
     fdbuff = similar(u, Complex{T}, NTuple{N, Int}(bufsize))
 
-    p = plan_rfft(tdbuff)
-    ip = plan_brfft(fdbuff, nffts[1])
+    # PATIENT flag needed if Julia has more than one thread (See #339)
+    p = plan_rfft(tdbuff, flags = FFTW.PATIENT)
+    ip = plan_brfft(fdbuff, nffts[1], flags = FFTW.PATIENT)
 
     tdbuff, fdbuff, p, ip
 end
 
 @inline function os_prepare_conv(u::AbstractArray{<:Complex}, nffts)
     buff = similar(u, nffts)
 
-    p = plan_fft!(buff)
-    ip = plan_bfft!(buff)
+    # PATIENT flag needed if Julia has more than one thread (See #339)
+    p = plan_fft!(buff, flags = FFTW.PATIENT)
+    ip = plan_bfft!(buff, flags = FFTW.PATIENT)
 
     buff, buff, p, ip # Only one buffer for complex
 end