Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix missing ptx code within the binary #1960

Conversation

psychocoderHPC
Copy link
Member

@psychocoderHPC psychocoderHPC commented Apr 12, 2017

Embedded ptx code within the binary was accidentally removed with #1933.
If the code is compiled with sm_20 and than performed on e.g. sm_35 it can result in segfaults, see #1953, #1954

@PrometheusPi, @n01r could you please check if out issues are solved

embedded ptx code within the binary was accidentally removed with ComputationalRadiationPhysics#1933
@psychocoderHPC psychocoderHPC changed the title fix missing ptx code wthin the binary fix missing ptx code within the binary Apr 12, 2017
@n01r
Copy link
Member

n01r commented Apr 12, 2017

LaserWakefield example compiled with sm_20, run on k20.
It featured ions as well as ionization with the ADK model.

Runtime Test

__:~/paramSets/089_Issue1953BugOutOfMemoryWithIonizationADK/bin$ mpiexec -n 1 picongpu -d 1 1 1 -g 64 64 64 -s 2000 --e_macroParticlesCount.period 50 --i_macroParticlesCount.period 50
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per gpu: 1048576
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 52sec  47msec = 52 sec
  0 % =        0 | time elapsed:                  120msec | avg time per step:   0msec
  5 % =      100 | time elapsed:                  713msec | avg time per step:   5msec
 10 % =      200 | time elapsed:             1sec 445msec | avg time per step:   7msec
 15 % =      300 | time elapsed:             2sec 771msec | avg time per step:  13msec
 20 % =      400 | time elapsed:             4sec 315msec | avg time per step:  15msec
 25 % =      500 | time elapsed:             5sec 634msec | avg time per step:  13msec
 30 % =      600 | time elapsed:             6sec 762msec | avg time per step:  11msec
 35 % =      700 | time elapsed:             7sec 860msec | avg time per step:  10msec
 40 % =      800 | time elapsed:             8sec 936msec | avg time per step:  10msec
 45 % =      900 | time elapsed:            10sec   9msec | avg time per step:  10msec
 50 % =     1000 | time elapsed:            11sec  78msec | avg time per step:  10msec
 55 % =     1100 | time elapsed:            12sec 142msec | avg time per step:  10msec
 60 % =     1200 | time elapsed:            13sec 208msec | avg time per step:  10msec
 65 % =     1300 | time elapsed:            14sec 282msec | avg time per step:  10msec
 70 % =     1400 | time elapsed:            15sec 365msec | avg time per step:  10msec
 75 % =     1500 | time elapsed:            16sec 451msec | avg time per step:  10msec
 80 % =     1600 | time elapsed:            17sec 536msec | avg time per step:  10msec
 85 % =     1700 | time elapsed:            18sec 624msec | avg time per step:  10msec
 90 % =     1800 | time elapsed:            19sec 711msec | avg time per step:  10msec
 95 % =     1900 | time elapsed:            20sec 798msec | avg time per step:  10msec
100 % =     2000 | time elapsed:            21sec 890msec | avg time per step:  10msec
calculation  simulation time: 21sec 891msec = 21 sec
[kepler003:06823] *** Process received signal ***
[kepler003:06823] Signal: Segmentation fault (11)
[kepler003:06823] Signal code: Address not mapped (1)
[kepler003:06823] Failing at address: 0x35
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 6823 on node kepler003 exited on signal 11 (Segmentation fault).

As can be seen: the runtime test fails at the end of the simulation.

CUDA-MEMCHECK

~/paramSets/089_Issue1953BugOutOfMemoryWithIonizationADK/bin$ cuda-memcheck picongpu -d 1 1 1 -g 64 64 64 -s 10 
========= CUDA-MEMCHECK
========= Program hit cudaErrorSetOnActiveProcess (error 36) due to "cannot set while device is active in this process" on CUDA API call to cudaSetDeviceFlags. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2eea03]
=========     Host Frame:picongpu [0x661350]
=========     Host Frame:picongpu (_ZN5PMacc6detail18EnvironmentContext9setDeviceEi + 0x1fd) [0x4b3aad]
=========     Host Frame:picongpu (_ZN5PMacc11EnvironmentILj3EE11initDevicesENS_9DataSpaceILj3EEES3_ + 0x9c) [0x59f16c]
=========     Host Frame:picongpu (_ZN8picongpu12MySimulation10pluginLoadEv + 0x1bc) [0x5a571c]
=========     Host Frame:picongpu (_ZN8picongpu17SimulationStarterINS_21InitialiserControllerENS_16PluginControllerENS_12MySimulationEE10pluginLoadEv + 0x2b) [0x4d8f0b]
=========     Host Frame:picongpu (main + 0x8c) [0x496f0c]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]
=========     Host Frame:picongpu [0x4972af]
=========
========= Program hit cudaErrorSetOnActiveProcess (error 36) due to "cannot set while device is active in this process" on CUDA API call to cudaGetLastError. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2eea03]
=========     Host Frame:picongpu [0x651443]
=========     Host Frame:picongpu (_ZN5PMacc6detail18EnvironmentContext9setDeviceEi + 0x267) [0x4b3b17]
=========     Host Frame:picongpu (_ZN5PMacc11EnvironmentILj3EE11initDevicesENS_9DataSpaceILj3EEES3_ + 0x9c) [0x59f16c]
=========     Host Frame:picongpu (_ZN8picongpu12MySimulation10pluginLoadEv + 0x1bc) [0x5a571c]
=========     Host Frame:picongpu (_ZN8picongpu17SimulationStarterINS_21InitialiserControllerENS_16PluginControllerENS_12MySimulationEE10pluginLoadEv + 0x2b) [0x4d8f0b]
=========     Host Frame:picongpu (main + 0x8c) [0x496f0c]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]
=========     Host Frame:picongpu [0x4972af]
=========
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per gpu: 1048576
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 59sec 543msec = 59 sec
  0 % =        0 | time elapsed:                    0msec | avg time per step:   0msec
 10 % =        1 | time elapsed:                  845msec | avg time per step: 845msec
 20 % =        2 | time elapsed:             1sec 739msec | avg time per step: 894msec
 30 % =        3 | time elapsed:             2sec 634msec | avg time per step: 894msec
 40 % =        4 | time elapsed:             3sec 531msec | avg time per step: 896msec
 50 % =        5 | time elapsed:             4sec 430msec | avg time per step: 898msec
 60 % =        6 | time elapsed:             5sec 326msec | avg time per step: 896msec
 70 % =        7 | time elapsed:             6sec 223msec | avg time per step: 896msec
 80 % =        8 | time elapsed:             7sec 120msec | avg time per step: 896msec
 90 % =        9 | time elapsed:             8sec  17msec | avg time per step: 896msec
100 % =       10 | time elapsed:             8sec 914msec | avg time per step: 897msec
calculation  simulation time:  8sec 935msec = 8 sec
[kepler003:09356] *** Process received signal ***
[kepler003:09356] Signal: Segmentation fault (11)
[kepler003:09356] Signal code: Address not mapped (1)
[kepler003:09356] Failing at address: 0x35
========= Error: process didn't terminate successfully
=========        The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= Internal error (20)
========= No CUDA-MEMCHECK results found

@psychocoderHPC
Copy link
Member Author

@n01r Could you please post the error message from stderr

@psychocoderHPC
Copy link
Member Author

@n01r Have you removed you build folder to start with a clean project?

@n01r
Copy link
Member

n01r commented Apr 12, 2017

@psychocoderHPC I ran picongpu from terminal, so I posted the output from there.
Yes, I removed all contents from the build directory and even created a new parameter set.

@psychocoderHPC
Copy link
Member Author

psychocoderHPC commented Apr 12, 2017

I tried the test case 10 from the LWFA example with but with sm_20 with the current pull request on k20 and k80.

cmake -DCUDA_ARCH=20 -DPARAM_OVERWRITES:LIST="-DPARAM_IONS=1;-DPARAM_IONIZATION=1" -DCMAKE_INSTALL_PREFIX=lwfaCrash/ -DPIC_EXTENSION_PATH=lwfaCrash/   ../dev

Environment

Currently Loaded Modulefiles:
  1) gcc/5.3.0                     4) openmpi/1.8.6.kepler.cuda80   7) numactl/2.0.7
  2) cmake/3.3.0                   5) boost/1.62.0                  8) valgrind/3.8.1
  3) cuda/8.0                      6) hdf5-parallel/1.8.15

Run

widera@kepler004:/bigdata/hplsim/scratch/widera/testPic/build_picongpu$ mpiexec -n 1 picongpu -d 1 1 1 -g 64 64 64 -s 2000 --e_macroParticlesCount.period 50 --i_macroParticlesCount.period 50
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF
PIConGPUVerbose PHYSICS(1) | Courant c*dt <= 1.00229 ? 1
PIConGPUVerbose PHYSICS(1) | species e: omega_p * dt <= 0.1 ? 0.0247974
PIConGPUVerbose PHYSICS(1) | species i: omega_p * dt <= 0.1 ? 0.000578698
PIConGPUVerbose PHYSICS(1) | y-cells per wavelength: 18.0587
PIConGPUVerbose PHYSICS(1) | macro particles per gpu: 1048576
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 6955.06
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 1.39e-16
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 4.16712e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 6.33563e-27
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 1.11432e-15
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 1.22627e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 40903.8
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 5.69418e-10
initialization time: 55sec 865msec = 55 sec
  0 % =        0 | time elapsed:                    1msec | avg time per step:   0msec
  5 % =      100 | time elapsed:             1sec 293msec | avg time per step:  12msec
 10 % =      200 | time elapsed:             2sec 796msec | avg time per step:  15msec
...
 95 % =     1900 | time elapsed:            34sec 475msec | avg time per step:  17msec
100 % =     2000 | time elapsed:            36sec 280msec | avg time per step:  18msec
calculation  simulation time: 36sec 281msec = 36 sec
full simulation time:  1min 32sec 214msec = 92 sec

No errors visible.

@n01r After you started PIConGPU in the interactive shell did it take ~45 second or more before the first output from the simulation was shown on the terminal? If not than something is wrong with your setup.
The long break after the start shows that the NVIDIA CUDA runtime compiler is translating the sm_20 PTX code to sm_3X (current used architecture)

@psychocoderHPC
Copy link
Member Author

@n01r I have not enabled ADK for my test? Which test could I use to enable ADK?

If I tested my current setup with the dev I can reproduce the the error what(): [CUDA] Error: invalid device function from here

@ax3l ax3l added bug a bug in the project's code component: core in PIConGPU (core application) labels Apr 12, 2017
@ax3l ax3l added this to the Next Stable: 0.3.0 milestone Apr 12, 2017
@ax3l
Copy link
Member

ax3l commented Apr 12, 2017

configure for -t 10 in LWFA

@psychocoderHPC
Copy link
Member Author

@ax3l Test case 10 is BSI

@psychocoderHPC
Copy link
Member Author

ok found it: I need to change speciesDefinition.hpp

particles::ionization::BSIEffectiveZ< PIC_Electrons >

// to

particles::ionization::AlgorithmADK< PIC_Electrons >

@n01r
Copy link
Member

n01r commented Apr 12, 2017

@ax3l Test case 10 is BSI

@psychocoderHPC That is true but I changed the
particles::ionization::BSIEffectiveZ< PIC_Electrons >
manually to
particles::ionization::ADKLinPol< PIC_Electrons >

and in cmakeFlags I changed -DCUDA_ARCH=35 to -DCUDA_ARCH=20 and then I configured with -t 10 where I changed it.

@ax3l
Copy link
Member

ax3l commented Apr 12, 2017

if #1953 is not fixed with this PR, let's move the discussion about the ADK/RNG cleanup problem back to its issue because it is definitely two independent problems then

@ax3l ax3l merged commit 53fc080 into ComputationalRadiationPhysics:dev Apr 12, 2017
@psychocoderHPC psychocoderHPC deleted the fix-cmakeMissingPTXInSource branch April 12, 2017 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug a bug in the project's code component: core in PIConGPU (core application)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants