You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is with a basic Enzo-E cosmology test problem on TACC Vista. Running with --with-production on multiple nodes seems to periodically trigger a 30s timeout(?) See the projections traces comparing with and without --with-production. This is independent of network build (both mpi and netlrts) and compiler (gcc or nvc). It only occurs when multiple nodes are used.
The text was updated successfully, but these errors were encountered:
Hi James, could you try profiling the problematic case with NVIDIA Nsight Systems? (You may need to add -g to your --with-production build line to ensure the resulting binary has debug symbols.)
I tried a basic trace but don't see anything obvious when I look at the timeline, though I'm just learning to use Nsight Systems. The CPU appears to be at 99% utilization throughout, though.
Do you have any suggestions for nsys parameters, or tips for analysing the existing traces? I have a page at CharmIssue3850 with a link to reports for the 256-core run, though it's 1.3GB. Single reports are available as well, with an explicit link to the first.
This is with a basic Enzo-E cosmology test problem on TACC Vista. Running with
--with-production
on multiple nodes seems to periodically trigger a 30s timeout(?) See the projections traces comparing with and without--with-production
. This is independent of network build (both mpi and netlrts) and compiler (gcc or nvc). It only occurs when multiple nodes are used.The text was updated successfully, but these errors were encountered: