-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple FFTs at once #9
Comments
Good idea. I would expect that for a large number of bands, using a "half-band" version is also reasonable, since it probably will take quite some iterations to go from 1 locked to half locked and in between you need to use the 1-band version for almost the complete band. But I agree, we should look at the other codes. |
So I did a bit of digging. If I understand correctly QE does not do multiple FFTs at once (PW/src/vloc_psi.F90 calls Abinit... has a |
Just while I'm at it, here are some fun LOC stats. Counting just the "core" part (as best as I could find out what's core and what's not) of what DFTK does right now, excluding parallelisation etc. as far as I could: Abinit: QE: |
GPAW is comparatively pretty good: the core part is about 30k lines of python, supports three different basis sets, with about 10k of C code for performance-critical parts. |
So, reading the FFTW manual more closely, there is a "wisdom" mechanism for reusing measurements to create new plans; they say it's pretty smart, and so this might negate the problem of the changing |
Ah that can be done quite easily in fact. |
This issue is related to the TODO given in https://github.com/mfherbst/DFTK.jl/blob/master/src/core/PlaneWaveBasis.jl#L76 |
Closing in favour of #15 |
FFTW supports multiple FFTs, and it says in the manual (http://www.fftw.org/fftw3.pdf p. 32)
We should benchmark to see if there's really an improvement on timing (this doesn't seem to be a BLAS3 situation where there should be a large impact, but performance is weird, so I don't know...)
This would also nicely let us use FFTW multithreading optimally automatically.
One tricky complication is that with locking, the number of vectors we apply the Hamiltonian on changes from iteration to iteration. Maybe have one plan for the full nband (which is always needed at the first iteration anyway), and then fall back to the 1-by-1 version if less are required. We should look at how other codes do it.
The text was updated successfully, but these errors were encountered: