-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call for Review: DASH Tutorial at HIPEAC Stockholm #1
Comments
@knuedd |
To answer your questions from the implementation of the tutorial application
Also added as comments in your code: You are referring to my implementation of the histogram benchmark (bench.04.histo-tf): This variant is recommended for clusters. On a single node (< 48 cores) and for smaller ranges, the variant in |
Another feature you could demonstrate is how threaded parallelism is automatically balanced for hardware capacities. dash::util::UnitLocality uloc;
// number of threads available to this unit
auto n_threads = uloc.num_domain_threads();
#pragma omp parallel num_threads(n_threads)
{
// For example, on a 32-core system:
// - running with mpirun -n 8 -> all units run 4 threads
// - running with mpirun -n 5 -> two units run 7 threads, three units run 6 threads
} |
Adapted program to recent changes in DASH-0.3.0
Hi all, I need some help again. Please look at the example dash-apps/dash_tutorial_2017-01/05-astro_count_objects/astro-benchmark.cpp which is the tutorial code with some debug and benchmark output. I see bad bugs and might have done things totally wrong. I see strange results when running it with different numbers of units. I already changed dash::Matrix to dash::NArray, now the local extents are correct ... wait, I just realized one bug when doing the local pointer arithmetic. Well, ignore everything after /* *** part 5 ... */ for now. But look at /* *** part 4 ... */ please. There every unit iterates over its local block with the local iterator and computes the sum of all pixel brightnesses. Do you have any idea why this could be? Thanks, Andreas |
If you run the example with 9 units -> 3x3 the result is even worse. The reason is dash::copy. The local to global copy doesn't work well for more than one remote unit. I implemented the setup for the matrix without dash::copy and everything worked as expected. Maybe Tobias can explain why dash::copy behaves this way. |
I'm on it, I suspect the recent refactoring runs to have harmed semantics. |
Add Kokkos regions to MiniMD for profiling.
Dear DASH crew,
if you like have a look at the main example for the upcoming DASH tutorial in Stockholm in January.
SPOILER ALERT: If you are a tutorial attendee, read further on your own risk. You will miss half the fun ;)
Please look at https://github.com/dash-project/dash-apps/tree/master/dash_tutorial_2017-01 --> 05-astro_count_objects/. The 02_ to 04_ examples are the same thing but with the later steps missing. The 01_intro/ is tbd. Read the README.md first for the dependencies.
The astro-assignment.cpp is what we should give to the attendees together with some explanations. The astro-solution.cpp is ... well ... the solution. Use 'make diff' to see the difference between the two.
First of all, let's discuss what you like and what you don't like. Keep in mind that we only have 3 hours and want to bring the key idea across. I also plan to have the example implemented with MPI RMA directly to demonstrate how many lines of code we save.
With this code I also discovered a few issues, that I'll report under https://github.com/dash-project/dash.
Cheers, Andreas
The text was updated successfully, but these errors were encountered: