Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple FFTs at once #9

Closed
antoine-levitt opened this issue Jul 27, 2019 · 8 comments
Closed

Multiple FFTs at once #9

antoine-levitt opened this issue Jul 27, 2019 · 8 comments
Labels
feature New feature or request performance Performance regression or performance-related
Milestone

Comments

@antoine-levitt
Copy link
Member

antoine-levitt commented Jul 27, 2019

FFTW supports multiple FFTs, and it says in the manual (http://www.fftw.org/fftw3.pdf p. 32)

Plans obtained in this way can often be faster than calling FFTW multiple times for the individual transforms

We should benchmark to see if there's really an improvement on timing (this doesn't seem to be a BLAS3 situation where there should be a large impact, but performance is weird, so I don't know...)

This would also nicely let us use FFTW multithreading optimally automatically.

One tricky complication is that with locking, the number of vectors we apply the Hamiltonian on changes from iteration to iteration. Maybe have one plan for the full nband (which is always needed at the first iteration anyway), and then fall back to the 1-by-1 version if less are required. We should look at how other codes do it.

@mfherbst
Copy link
Member

Good idea. I would expect that for a large number of bands, using a "half-band" version is also reasonable, since it probably will take quite some iterations to go from 1 locked to half locked and in between you need to use the 1-band version for almost the complete band. But I agree, we should look at the other codes.

@antoine-levitt
Copy link
Member Author

So I did a bit of digging.

If I understand correctly QE does not do multiple FFTs at once (PW/src/vloc_psi.F90 calls fwfft, defined in FFTXlib/fft_interfaces.F90, without the optional argument howmany).

Abinit... has a m_fft.F90 file that is 4940 lines long (so about twice the size of DFTK right now), which as far as I understand is . That's just the high-level interface though; if you add the low-level wrapper around FFTW3, it adds 15k lines. But of course there's also the FFT routines from Goedecker, that are still used by default when running under MPI, which add another 10k lines. So the only way to know what is actually going on is to run it through a debugger.

@antoine-levitt
Copy link
Member Author

Just while I'm at it, here are some fun LOC stats. Counting just the "core" part (as best as I could find out what's core and what's not) of what DFTK does right now, excluding parallelisation etc. as far as I could:

Abinit: sloccount 28_numeric_noabirule/ 29_kpoints/ 32_util/ 41_xc_lowlevel/ 42_parser/ 44_abit* 46_diago/ 53_ffts/ 55_abiutil/ 56* 57* 56* 61* 62* 66_nonlocal/ 66_wfs/ 67_common/ 69_wfdesc/ 94_scfcv/ 98_main/ 83_cut3d/ => 190k

QE: sloccount LAXlib/ FFTXlib/ UtilXlib/ KS_Solvers/ PW/src/ => 85k (excluding the copy of an '97 version of FFTW they have in there)

@antoine-levitt
Copy link
Member Author

GPAW is comparatively pretty good: the core part is about 30k lines of python, supports three different basis sets, with about 10k of C code for performance-critical parts.

@antoine-levitt
Copy link
Member Author

So, reading the FFTW manual more closely, there is a "wisdom" mechanism for reusing measurements to create new plans; they say it's pretty smart, and so this might negate the problem of the changing nband (ie do the FFTW_MEASURE with the full nband, and reuse the wisdom to do a smaller number of bands)

@mfherbst
Copy link
Member

Ah that can be done quite easily in fact.

@mfherbst mfherbst added the feature New feature or request label Jul 29, 2019
@mfherbst mfherbst added this to the The first release milestone Jul 29, 2019
@mfherbst mfherbst added performance Performance regression or performance-related and removed feature New feature or request labels Jul 29, 2019
@mfherbst
Copy link
Member

mfherbst commented Aug 4, 2019

This issue is related to the TODO given in https://github.com/mfherbst/DFTK.jl/blob/master/src/core/PlaneWaveBasis.jl#L76

@mfherbst mfherbst added the feature New feature or request label Oct 28, 2019
@mfherbst mfherbst mentioned this issue Oct 28, 2019
@antoine-levitt
Copy link
Member Author

Closing in favour of #15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request performance Performance regression or performance-related
Projects
None yet
Development

No branches or pull requests

2 participants