Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for Review: DASH Tutorial at HIPEAC Stockholm #1

Open
knuedd opened this issue Nov 15, 2016 · 6 comments
Open

Call for Review: DASH Tutorial at HIPEAC Stockholm #1

knuedd opened this issue Nov 15, 2016 · 6 comments

Comments

@knuedd
Copy link
Contributor

knuedd commented Nov 15, 2016

Dear DASH crew,

if you like have a look at the main example for the upcoming DASH tutorial in Stockholm in January.

SPOILER ALERT: If you are a tutorial attendee, read further on your own risk. You will miss half the fun ;)

Please look at https://github.com/dash-project/dash-apps/tree/master/dash_tutorial_2017-01 --> 05-astro_count_objects/. The 02_ to 04_ examples are the same thing but with the later steps missing. The 01_intro/ is tbd. Read the README.md first for the dependencies.

The astro-assignment.cpp is what we should give to the attendees together with some explanations. The astro-solution.cpp is ... well ... the solution. Use 'make diff' to see the difference between the two.

First of all, let's discuss what you like and what you don't like. Keep in mind that we only have 3 hours and want to bring the key idea across. I also plan to have the example implemented with MPI RMA directly to demonstrate how many lines of code we save.

With this code I also discovered a few issues, that I'll report under https://github.com/dash-project/dash.

Cheers, Andreas

@fuchsto
Copy link
Member

fuchsto commented Nov 15, 2016

@knuedd
Yes, please report issues as soon as possible, we will give them highest priority!
Also, I like that you give Hubble some love.

@fuchsto fuchsto changed the title Please Call for Review: DASH Tutorial at HIPEAC Stockholm Nov 15, 2016
@fuchsto
Copy link
Member

fuchsto commented Nov 15, 2016

To answer your questions from the implementation of the tutorial application

  1. The barrier after computation of local histograms is not needed.
    dash::transform ist a one-sided, non-collective operation and does not require any synchronization.
    Values in local histograms don't depend on other units' results so there is no need to synchronize
    local completion.
    Every unit just adds its local result onto the master unit's histogram whenever it is ready.
    We just have to synchronize units for the overall completion of the global histogram.
  2. The transformation function passed to dash::transform is executed as atomic operation on single
    elements in the range. Eventually, dash::transform corresponds to
    MPI_Accumulate + MPI_Win_flush (blocking until target signals remote completion)
    so consistency and behavior corresponds to the MPI specs:
    http://www.mpich.org/static/docs/v3.1/www3/MPI_Accumulate.html
    http://www.mpich.org/static/docs/v3.1/www3/MPI_Win_flush.html

Also added as comments in your code:

https://github.com/dash-project/dash-apps/blob/master/dash_tutorial_2017-01/04-astro_marker_color/astro.cpp#L284

https://github.com/dash-project/dash-apps/blob/master/dash_tutorial_2017-01/04-astro_marker_color/astro.cpp#L300

You are referring to my implementation of the histogram benchmark (bench.04.histo-tf):
https://github.com/dash-project/dash/blob/development/dash/examples/bench.04.histo-tf/main.cpp

This variant is recommended for clusters. On a single node (< 48 cores) and for smaller ranges, the variant in bench.04.histo is slightly more efficient but has terrible scaling.
Might be interesting to compare them as a motivation to use DASH algorithms whenever possible.

@fuchsto
Copy link
Member

fuchsto commented Nov 15, 2016

Another feature you could demonstrate is how threaded parallelism is automatically balanced for hardware capacities.
So if there is some time left on the schedule, integrating OpenMP for any process configuration is
pretty simple:

dash::util::UnitLocality uloc;
// number of threads available to this unit
auto n_threads = uloc.num_domain_threads();
#pragma omp parallel num_threads(n_threads)
{
   // For example, on a 32-core system:
   // - running with mpirun -n 8 -> all units run 4 threads
   // - running with mpirun -n 5 -> two units run 7 threads, three units run 6 threads
}

fmoessbauer added a commit that referenced this issue Nov 19, 2016
Adapted program to recent changes in DASH-0.3.0
@knuedd
Copy link
Contributor Author

knuedd commented Jan 10, 2017

Hi all,

I need some help again. Please look at the example dash-apps/dash_tutorial_2017-01/05-astro_count_objects/astro-benchmark.cpp which is the tutorial code with some debug and benchmark output.

I see bad bugs and might have done things totally wrong. I see strange results when running it with different numbers of units.

I already changed dash::Matrix to dash::NArray, now the local extents are correct ... wait, I just realized one bug when doing the local pointer arithmetic. Well, ignore everything after /* *** part 5 ... */ for now.

But look at /* *** part 4 ... */ please. There every unit iterates over its local block with the local iterator and computes the sum of all pixel brightnesses.
When you run it with 3 units, no block is underfilled and everything is fine. But with 4 units that are arranged as 2x2 units, then 3 out of 4 blocks are underfilled. The last unit (unit 3) produces the sum of all pixel brightnesses as 0! This means all the pixel values read from the image did not make it to that unit. It is always only the very last unit of you use any number of units that leads to an actual nxm arrangement of units with n,m>1.

Do you have any idea why this could be?

Thanks, Andreas

@dhinf
Copy link
Member

dhinf commented Jan 10, 2017

If you run the example with 9 units -> 3x3 the result is even worse. The reason is dash::copy. The local to global copy doesn't work well for more than one remote unit. I implemented the setup for the matrix without dash::copy and everything worked as expected. Maybe Tobias can explain why dash::copy behaves this way.

@fuchsto
Copy link
Member

fuchsto commented Jan 11, 2017

I'm on it, I suspect the recent refactoring runs to have harmed semantics.

pascalj pushed a commit that referenced this issue Sep 5, 2019
Add Kokkos regions to MiniMD for profiling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment