Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFTW can use 4x available threads #223

Closed
wants to merge 1 commit into from
Closed

FFTW can use 4x available threads #223

wants to merge 1 commit into from

Conversation

navidcy
Copy link
Member

@navidcy navidcy commented Nov 16, 2020

@navidcy navidcy requested a review from glwagner November 16, 2020 00:24
@navidcy
Copy link
Member Author

navidcy commented Dec 16, 2020

@glwagner shall I merge this? Is there a verdict whether it makes any difference?

@glwagner
Copy link
Member

only one way to find that out...

@navidcy
Copy link
Member Author

navidcy commented Dec 22, 2020

I made a "clean" decaying 2D turbulence script.

using FourierFlows, Printf, Random
 
using Random: seed!
using FFTW: rfft, irfft

import GeophysicalFlows.TwoDNavierStokes
import GeophysicalFlows.TwoDNavierStokes: energy, enstrophy
import GeophysicalFlows: peakedisotropicspectrum


dev = CPU()     # Device (CPU/GPU)
n, L  = 1024, 2π             # grid resolution and domain length

    dt = 2e-3  # timestep
nsteps = 4000  # total number of steps
 nsubs = 20    # number of steps between each plot

prob = TwoDNavierStokes.Problem(dev; nx=n, Lx=L, ny=n, Ly=L, dt=dt, stepper="FilteredRK4")

sol, clock, vars, grid = prob.sol, prob.clock, prob.vars, prob.grid
x, y = grid.x, grid.y

seed!(1234)
k₀, E₀ = 6, 0.5
ζ₀ = peakedisotropicspectrum(grid, k₀, E₀, mask=prob.timestepper.filter)
TwoDNavierStokes.set_zeta!(prob, ζ₀)

startwalltime = time()

for j = 0:Int(nsteps/nsubs)
  if j % (1000 / nsubs) == 0
    cfl = clock.dt * maximum([maximum(vars.u) / grid.dx, maximum(vars.v) / grid.dy])
    
    log = @sprintf("step: %04d, t: %d, cfl: %.2f, walltime: %.2f min",
        clock.step, clock.t, cfl, (time()-startwalltime)/60)

    println(log)
  end  

  stepforward!(prob, nsubs)
  TwoDNavierStokes.updatevars!(prob)  
end

println(@sprintf("walltime: %.2f min", (time()-startwalltime)/60))

Running with n=256 I got

With current setup:

step: 0000, t: 0, cfl: 0.46, walltime: 0.00 min
step: 1000, t: 5, cfl: 0.51, walltime: 0.16 min
step: 2000, t: 10, cfl: 0.46, walltime: 0.33 min
step: 3000, t: 15, cfl: 0.56, walltime: 0.51 min
step: 4000, t: 20, cfl: 0.41, walltime: 0.69 min
walltime: 0.70 min

and with FFTW.set_num_threads(4*threads):

step: 0000, t: 0, cfl: 0.46, walltime: 0.00 min
step: 1000, t: 5, cfl: 0.51, walltime: 0.18 min
step: 2000, t: 10, cfl: 0.46, walltime: 0.35 min
step: 3000, t: 15, cfl: 0.56, walltime: 0.51 min
step: 4000, t: 20, cfl: 0.41, walltime: 0.67 min
walltime: 0.67 min

Hm.... then I cranked it up to n=1024. Results are:

step: 0000, t: 0, cfl: 0.79, walltime: 0.00 min
step: 1000, t: 2, cfl: 0.87, walltime: 2.38 min
step: 2000, t: 4, cfl: 0.70, walltime: 4.71 min
step: 3000, t: 6, cfl: 0.74, walltime: 7.10 min
step: 4000, t: 8, cfl: 0.78, walltime: 9.53 min
walltime: 9.58 min

and with FFTW.set_num_threads(4*threads):

step: 0000, t: 0, cfl: 0.79, walltime: 0.00 min
step: 1000, t: 2, cfl: 0.87, walltime: 2.34 min
step: 2000, t: 4, cfl: 0.70, walltime: 4.76 min
step: 3000, t: 6, cfl: 0.74, walltime: 7.26 min
step: 4000, t: 8, cfl: 0.78, walltime: 9.68 min
walltime: 9.72 min

So, @glwagner, based on the above I conclude that this PR does nothing. So I'm closing it and feel free to open if you think otherwise.

@navidcy navidcy closed this Dec 22, 2020
@navidcy navidcy deleted the fftw-4x branch February 25, 2021 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants