fix missing ptx code within the binary #1960

psychocoderHPC · 2017-04-12T07:47:34Z

Embedded ptx code within the binary was accidentally removed with #1933.
If the code is compiled with sm_20 and than performed on e.g. sm_35 it can result in segfaults, see #1953, #1954

~~fix Double Free for Ionization (RNG-using) Simulations #1953~~ not yet fixed
fix sm_20 currenly not working for LWFA example #1954 tested

@PrometheusPi, @n01r could you please check if out issues are solved

embedded ptx code within the binary was accidentally removed with ComputationalRadiationPhysics#1933

n01r · 2017-04-12T11:29:40Z

LaserWakefield example compiled with sm_20, run on k20.
It featured ions as well as ionization with the ADK model.

Runtime Test

__:~/paramSets/089_Issue1953BugOutOfMemoryWithIonizationADK/bin$ mpiexec -n 1 picongpu -d 1 1 1 -g 64 64 64 -s 2000 --e_macroParticlesCount.period 50 --i_macroParticlesCount.period 50
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per gpu: 1048576
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 52sec  47msec = 52 sec
  0 % =        0 | time elapsed:                  120msec | avg time per step:   0msec
  5 % =      100 | time elapsed:                  713msec | avg time per step:   5msec
 10 % =      200 | time elapsed:             1sec 445msec | avg time per step:   7msec
 15 % =      300 | time elapsed:             2sec 771msec | avg time per step:  13msec
 20 % =      400 | time elapsed:             4sec 315msec | avg time per step:  15msec
 25 % =      500 | time elapsed:             5sec 634msec | avg time per step:  13msec
 30 % =      600 | time elapsed:             6sec 762msec | avg time per step:  11msec
 35 % =      700 | time elapsed:             7sec 860msec | avg time per step:  10msec
 40 % =      800 | time elapsed:             8sec 936msec | avg time per step:  10msec
 45 % =      900 | time elapsed:            10sec   9msec | avg time per step:  10msec
 50 % =     1000 | time elapsed:            11sec  78msec | avg time per step:  10msec
 55 % =     1100 | time elapsed:            12sec 142msec | avg time per step:  10msec
 60 % =     1200 | time elapsed:            13sec 208msec | avg time per step:  10msec
 65 % =     1300 | time elapsed:            14sec 282msec | avg time per step:  10msec
 70 % =     1400 | time elapsed:            15sec 365msec | avg time per step:  10msec
 75 % =     1500 | time elapsed:            16sec 451msec | avg time per step:  10msec
 80 % =     1600 | time elapsed:            17sec 536msec | avg time per step:  10msec
 85 % =     1700 | time elapsed:            18sec 624msec | avg time per step:  10msec
 90 % =     1800 | time elapsed:            19sec 711msec | avg time per step:  10msec
 95 % =     1900 | time elapsed:            20sec 798msec | avg time per step:  10msec
100 % =     2000 | time elapsed:            21sec 890msec | avg time per step:  10msec
calculation  simulation time: 21sec 891msec = 21 sec
[kepler003:06823] *** Process received signal ***
[kepler003:06823] Signal: Segmentation fault (11)
[kepler003:06823] Signal code: Address not mapped (1)
[kepler003:06823] Failing at address: 0x35
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 6823 on node kepler003 exited on signal 11 (Segmentation fault).

As can be seen: the runtime test fails at the end of the simulation.

CUDA-MEMCHECK

~/paramSets/089_Issue1953BugOutOfMemoryWithIonizationADK/bin$ cuda-memcheck picongpu -d 1 1 1 -g 64 64 64 -s 10 
========= CUDA-MEMCHECK
========= Program hit cudaErrorSetOnActiveProcess (error 36) due to "cannot set while device is active in this process" on CUDA API call to cudaSetDeviceFlags. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2eea03]
=========     Host Frame:picongpu [0x661350]
=========     Host Frame:picongpu (_ZN5PMacc6detail18EnvironmentContext9setDeviceEi + 0x1fd) [0x4b3aad]
=========     Host Frame:picongpu (_ZN5PMacc11EnvironmentILj3EE11initDevicesENS_9DataSpaceILj3EEES3_ + 0x9c) [0x59f16c]
=========     Host Frame:picongpu (_ZN8picongpu12MySimulation10pluginLoadEv + 0x1bc) [0x5a571c]
=========     Host Frame:picongpu (_ZN8picongpu17SimulationStarterINS_21InitialiserControllerENS_16PluginControllerENS_12MySimulationEE10pluginLoadEv + 0x2b) [0x4d8f0b]
=========     Host Frame:picongpu (main + 0x8c) [0x496f0c]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]
=========     Host Frame:picongpu [0x4972af]
=========
========= Program hit cudaErrorSetOnActiveProcess (error 36) due to "cannot set while device is active in this process" on CUDA API call to cudaGetLastError. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2eea03]
=========     Host Frame:picongpu [0x651443]
=========     Host Frame:picongpu (_ZN5PMacc6detail18EnvironmentContext9setDeviceEi + 0x267) [0x4b3b17]
=========     Host Frame:picongpu (_ZN5PMacc11EnvironmentILj3EE11initDevicesENS_9DataSpaceILj3EEES3_ + 0x9c) [0x59f16c]
=========     Host Frame:picongpu (_ZN8picongpu12MySimulation10pluginLoadEv + 0x1bc) [0x5a571c]
=========     Host Frame:picongpu (_ZN8picongpu17SimulationStarterINS_21InitialiserControllerENS_16PluginControllerENS_12MySimulationEE10pluginLoadEv + 0x2b) [0x4d8f0b]
=========     Host Frame:picongpu (main + 0x8c) [0x496f0c]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]
=========     Host Frame:picongpu [0x4972af]
=========
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per gpu: 1048576
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 59sec 543msec = 59 sec
  0 % =        0 | time elapsed:                    0msec | avg time per step:   0msec
 10 % =        1 | time elapsed:                  845msec | avg time per step: 845msec
 20 % =        2 | time elapsed:             1sec 739msec | avg time per step: 894msec
 30 % =        3 | time elapsed:             2sec 634msec | avg time per step: 894msec
 40 % =        4 | time elapsed:             3sec 531msec | avg time per step: 896msec
 50 % =        5 | time elapsed:             4sec 430msec | avg time per step: 898msec
 60 % =        6 | time elapsed:             5sec 326msec | avg time per step: 896msec
 70 % =        7 | time elapsed:             6sec 223msec | avg time per step: 896msec
 80 % =        8 | time elapsed:             7sec 120msec | avg time per step: 896msec
 90 % =        9 | time elapsed:             8sec  17msec | avg time per step: 896msec
100 % =       10 | time elapsed:             8sec 914msec | avg time per step: 897msec
calculation  simulation time:  8sec 935msec = 8 sec
[kepler003:09356] *** Process received signal ***
[kepler003:09356] Signal: Segmentation fault (11)
[kepler003:09356] Signal code: Address not mapped (1)
[kepler003:09356] Failing at address: 0x35
========= Error: process didn't terminate successfully
=========        The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= Internal error (20)
========= No CUDA-MEMCHECK results found

psychocoderHPC · 2017-04-12T11:37:48Z

@n01r Could you please post the error message from stderr

psychocoderHPC · 2017-04-12T11:39:15Z

@n01r Have you removed you build folder to start with a clean project?

n01r · 2017-04-12T12:00:42Z

@psychocoderHPC I ran picongpu from terminal, so I posted the output from there.
Yes, I removed all contents from the build directory and even created a new parameter set.

psychocoderHPC · 2017-04-12T12:19:55Z

I tried the test case 10 from the LWFA example with but with sm_20 with the current pull request on k20 and k80.

cmake -DCUDA_ARCH=20 -DPARAM_OVERWRITES:LIST="-DPARAM_IONS=1;-DPARAM_IONIZATION=1" -DCMAKE_INSTALL_PREFIX=lwfaCrash/ -DPIC_EXTENSION_PATH=lwfaCrash/   ../dev

Environment

Currently Loaded Modulefiles:
  1) gcc/5.3.0                     4) openmpi/1.8.6.kepler.cuda80   7) numactl/2.0.7
  2) cmake/3.3.0                   5) boost/1.62.0                  8) valgrind/3.8.1
  3) cuda/8.0                      6) hdf5-parallel/1.8.15

Run

widera@kepler004:/bigdata/hplsim/scratch/widera/testPic/build_picongpu$ mpiexec -n 1 picongpu -d 1 1 1 -g 64 64 64 -s 2000 --e_macroParticlesCount.period 50 --i_macroParticlesCount.period 50
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per gpu: 1048576
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 55sec 865msec = 55 sec
  0 % =        0 | time elapsed:                    1msec | avg time per step:   0msec
  5 % =      100 | time elapsed:             1sec 293msec | avg time per step:  12msec
 10 % =      200 | time elapsed:             2sec 796msec | avg time per step:  15msec
...
 95 % =     1900 | time elapsed:            34sec 475msec | avg time per step:  17msec
100 % =     2000 | time elapsed:            36sec 280msec | avg time per step:  18msec
calculation  simulation time: 36sec 281msec = 36 sec
full simulation time:  1min 32sec 214msec = 92 sec

No errors visible.

@n01r After you started PIConGPU in the interactive shell did it take ~45 second or more before the first output from the simulation was shown on the terminal? If not than something is wrong with your setup.
The long break after the start shows that the NVIDIA CUDA runtime compiler is translating the sm_20 PTX code to sm_3X (current used architecture)

psychocoderHPC · 2017-04-12T12:28:23Z

@n01r I have not enabled ADK for my test? Which test could I use to enable ADK?

If I tested my current setup with the dev I can reproduce the the error what(): [CUDA] Error: invalid device function from here

ax3l · 2017-04-12T12:30:14Z

configure for -t 10 in LWFA

psychocoderHPC · 2017-04-12T12:32:15Z

@ax3l Test case 10 is BSI

psychocoderHPC · 2017-04-12T12:36:30Z

ok found it: I need to change speciesDefinition.hpp

particles::ionization::BSIEffectiveZ< PIC_Electrons >

// to

particles::ionization::AlgorithmADK< PIC_Electrons >

n01r · 2017-04-12T12:40:21Z

@ax3l Test case 10 is BSI

@psychocoderHPC That is true but I changed the
particles::ionization::BSIEffectiveZ< PIC_Electrons >
manually to
particles::ionization::ADKLinPol< PIC_Electrons >

and in cmakeFlags I changed -DCUDA_ARCH=35 to -DCUDA_ARCH=20 and then I configured with -t 10 where I changed it.

ax3l · 2017-04-12T12:47:12Z

if #1953 is not fixed with this PR, let's move the discussion about the ADK/RNG cleanup problem back to its issue because it is definitely two independent problems then

fix missing ptx code wthin the binary

ba33025

embedded ptx code within the binary was accidentally removed with ComputationalRadiationPhysics#1933

psychocoderHPC assigned ax3l Apr 12, 2017

psychocoderHPC requested review from ax3l, PrometheusPi and n01r April 12, 2017 07:47

psychocoderHPC changed the title ~~fix missing ptx code wthin the binary~~ fix missing ptx code within the binary Apr 12, 2017

n01r mentioned this pull request Apr 12, 2017

Out-of-memory in Bremsstrahlung example #1961

Closed

ax3l added bug a bug in the project's code component: core in PIConGPU (core application) labels Apr 12, 2017

ax3l added this to the Next Stable: 0.3.0 milestone Apr 12, 2017

psychocoderHPC mentioned this pull request Apr 12, 2017

Double Free for Ionization (RNG-using) Simulations #1953

Closed

ax3l approved these changes Apr 12, 2017

View reviewed changes

ax3l mentioned this pull request Apr 12, 2017

sm_20 currenly not working for LWFA example #1954

Closed

ax3l merged commit 53fc080 into ComputationalRadiationPhysics:dev Apr 12, 2017

psychocoderHPC deleted the fix-cmakeMissingPTXInSource branch April 12, 2017 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix missing ptx code within the binary #1960

fix missing ptx code within the binary #1960

psychocoderHPC commented Apr 12, 2017 •

edited by ax3l

Loading

n01r commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017

n01r commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017 •

edited

Loading

psychocoderHPC commented Apr 12, 2017

ax3l commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017

n01r commented Apr 12, 2017

ax3l commented Apr 12, 2017 •

edited

Loading

fix missing ptx code within the binary #1960

fix missing ptx code within the binary #1960

Conversation

psychocoderHPC commented Apr 12, 2017 • edited by ax3l Loading

n01r commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017

n01r commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017 • edited Loading

Environment

Run

psychocoderHPC commented Apr 12, 2017

ax3l commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017

psychocoderHPC commented Apr 12, 2017

n01r commented Apr 12, 2017

ax3l commented Apr 12, 2017 • edited Loading

psychocoderHPC commented Apr 12, 2017 •

edited by ax3l

Loading

psychocoderHPC commented Apr 12, 2017 •

edited

Loading

ax3l commented Apr 12, 2017 •

edited

Loading