-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High latencies #18
Comments
just FYI... the same experiment with a discrete NVidia card connected via PCIe takes 2 microseconds. |
Update: |
Hi, |
Hi, |
Hi, On Thursday, February 4, 2016, Harald Lang [email protected] wrote:
Regards, Aditya Atluri, USA. |
ROC Driver only runs with Haswell CPU and FIJI based GPU. I am having the team look into this to see if it regression, but it would be help to understand which NVIDIA GPU ( need model number) are you comparing the APU too. Which APU is it and model number. greg Hi, On Thursday, February 4, 2016, Harald Lang <[email protected]mailto:[email protected]> wrote:
Regards, Aditya Atluri, USA. — |
Hi Greg, How about disabling integrated graphics on APU? —
Regards, Aditya Atluri, USA. |
What are trying to run FIJI card on APU? We only are testing FIJI card with Xeon E5 v3, Xeon E3, I7, I5 Haswell or newer since we need PCIe Gen 3 Platform atomics with the ROC driver and runtime. greg Hi Greg, How about disabling integrated graphics on APU? —
Regards, Aditya Atluri, USA. — |
Hi Harald, Regards, Aditya Atluri, USA. |
Hi Greg, |
Hi Gregory, Alternatively, I can plug in a GTX 970 into the APU system and re-run the measurements... @adityaatluri I'm going to profile the system as requested. I'll post the results ASAP. |
Hi Harald, |
Can I get the test your running. I can do A/B test on same hardware with Fiji vs Titan x Sent from Outlook Mobilehttps://aka.ms/qtex0l On Fri, Feb 5, 2016 at 3:32 AM -0800, "Harald Lang" <[email protected]mailto:[email protected]> wrote: Hi Gregory, Alternatively, I can plug in a GTX 970 into the APU system and re-run the measurements... @adityaatlurihttps://github.com/adityaatluri I'm going to profile the system as requested. I'll post the results ASAP. Reply to this email directly or view it on GitHubhttps://github.com//issues/18#issuecomment-180308482. |
Hi Aditya, |
Hi Harald, |
Hi Aditya and Gregory, I pushed the code to Quickstart instructions:
The output is a little verbose. The important lines start with The dispatch functions can be found in |
Update: |
Hi Harald, |
Hi Aditya, the vector_copy sample runs without errors.
On the Godavari APU, the output looks exactly the same. The output of the profiler can be found here: https://gist.github.com/harald-lang/b132a4df7863ad4523f2 ... by the way... thank you very much for your help! :) |
Hi Aditya, I profiled the vector_copy as you suggested. Please refer to https://gist.github.com/harald-lang/b132a4df7863ad4523f2 |
Hi, |
Hi Aditya, I updated the profile at https://gist.github.com/harald-lang/b132a4df7863ad4523f2#gistcomment-1694953 Unfortunately, the results are approx. the same. |
Hi Harald, |
Hi Harald, Also, here are the numbers we ran on Titan and APU. |
I noticed very high latencies for kernel dispatches using AQL. Synchronous dispatches take up to 21 µs. Asynchronous (batch) dispatches help to hide latencies. However, kernel dispatching still takes 6 µs (in average), which is still far to slow for fine-grained offloading.
In my experiments I set
HSA_ENABLE_INTERRUPT
to0
, which greatly improves robustness of the kernel offload times. With interrupts enabled, latencies vary from 6 to 15 microseconds.System setup:
The text was updated successfully, but these errors were encountered: