-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MPSNDArrayDescriptor
wrapper
#502
Conversation
MPSNDArrayDescriptor
wrapper
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #502 +/- ##
==========================================
+ Coverage 71.04% 76.04% +5.00%
==========================================
Files 36 57 +21
Lines 1143 2793 +1650
==========================================
+ Hits 812 2124 +1312
- Misses 331 669 +338 ☔ View full report in Codecov by Sentry. |
65077c1
to
8654f9f
Compare
@maleadt It seems like PR benchmark jobs are not commenting and are being uploaded because Metal.jl/.github/workflows/Benchmark.yml Line 45 in 5a8daaa
Metal.jl/.github/workflows/Benchmark.yml Line 49 in 5a8daaa
pull_request to pull_request_target .
Should I add it to this PR or would it be better for you to commit the fix directly? |
Ah oops. Feel free to commit directly on master, because the modified We should probably also edit out or revert the upload on the |
Done. I assume the benchmark page will update next time a benchmark action runs?
I don't think I have permission to commit directly to main. I tried and got the following error:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Benchmark suite | Current: 2b16b45 | Previous: 5a8daaa | Ratio |
---|---|---|---|
private array/construct |
26680.5 ns |
26673.5 ns |
1.00 |
private array/broadcast |
462104.5 ns |
458208 ns |
1.01 |
private array/random/randn/Float32 |
745125 ns |
852625 ns |
0.87 |
private array/random/randn!/Float32 |
631375 ns |
658334 ns |
0.96 |
private array/random/rand!/Int64 |
538625 ns |
555625 ns |
0.97 |
private array/random/rand!/Float32 |
577333 ns |
589208 ns |
0.98 |
private array/random/rand/Int64 |
772188 ns |
802125 ns |
0.96 |
private array/random/rand/Float32 |
621291.5 ns |
595625 ns |
1.04 |
private array/copyto!/gpu_to_gpu |
643020.5 ns |
679834 ns |
0.95 |
private array/copyto!/cpu_to_gpu |
604041.5 ns |
666958.5 ns |
0.91 |
private array/copyto!/gpu_to_cpu |
819812 ns |
518021 ns |
1.58 |
private array/accumulate/1d |
1316917 ns |
1339437.5 ns |
0.98 |
private array/accumulate/2d |
1371666 ns |
1374708 ns |
1.00 |
private array/iteration/findall/int |
2087834 ns |
2065062.5 ns |
1.01 |
private array/iteration/findall/bool |
1799333.5 ns |
1816354 ns |
0.99 |
private array/iteration/findfirst/int |
1689500 ns |
1690417 ns |
1.00 |
private array/iteration/findfirst/bool |
1664583 ns |
1658583 ns |
1.00 |
private array/iteration/scalar |
3601875 ns |
3599687.5 ns |
1.00 |
private array/iteration/logical |
3197625 ns |
3169854 ns |
1.01 |
private array/iteration/findmin/1d |
1753916 ns |
1746000 ns |
1.00 |
private array/iteration/findmin/2d |
1340791 ns |
1351313 ns |
0.99 |
private array/reductions/reduce/1d |
1046833 ns |
1041021 ns |
1.01 |
private array/reductions/reduce/2d |
655875 ns |
655041 ns |
1.00 |
private array/reductions/mapreduce/1d |
1035812.5 ns |
1040833 ns |
1.00 |
private array/reductions/mapreduce/2d |
660417 ns |
660979.5 ns |
1.00 |
private array/permutedims/4d |
2540625 ns |
2523625 ns |
1.01 |
private array/permutedims/2d |
1017917 ns |
1020958 ns |
1.00 |
private array/permutedims/3d |
1584167 ns |
1584104.5 ns |
1.00 |
private array/copy |
560166 ns |
580395.5 ns |
0.97 |
latency/precompile |
5263931459 ns |
5246987875 ns |
1.00 |
latency/ttfp |
6619689354 ns |
6666328874.5 ns |
0.99 |
latency/import |
1164853499.5 ns |
1165891041.5 ns |
1.00 |
integration/metaldevrt |
684146 ns |
718542 ns |
0.95 |
integration/byval/slices=1 |
1520958 ns |
1537583 ns |
0.99 |
integration/byval/slices=3 |
10693916.5 ns |
8342167 ns |
1.28 |
integration/byval/reference |
1622854.5 ns |
1552229 ns |
1.05 |
integration/byval/slices=2 |
2617791.5 ns |
2618584 ns |
1.00 |
kernel/indexing |
443583 ns |
458000 ns |
0.97 |
kernel/indexing_checked |
442916.5 ns |
473584 ns |
0.94 |
kernel/launch |
9513.833333333332 ns |
10069.5 ns |
0.94 |
metal/synchronization/stream |
14417 ns |
15479.5 ns |
0.93 |
metal/synchronization/context |
14833 ns |
15084 ns |
0.98 |
shared array/construct |
27600.75 ns |
26243.083333333336 ns |
1.05 |
shared array/broadcast |
448500 ns |
473062.5 ns |
0.95 |
shared array/random/randn/Float32 |
615624.5 ns |
806375 ns |
0.76 |
shared array/random/randn!/Float32 |
459250 ns |
662875 ns |
0.69 |
shared array/random/rand!/Int64 |
370083 ns |
555333 ns |
0.67 |
shared array/random/rand!/Float32 |
412667 ns |
589000 ns |
0.70 |
shared array/random/rand/Int64 |
670812.5 ns |
737791.5 ns |
0.91 |
shared array/random/rand/Float32 |
558104 ns |
636875 ns |
0.88 |
shared array/copyto!/gpu_to_gpu |
87708 ns |
89084 ns |
0.98 |
shared array/copyto!/cpu_to_gpu |
87417 ns |
84916 ns |
1.03 |
shared array/copyto!/gpu_to_cpu |
83375 ns |
77708 ns |
1.07 |
shared array/accumulate/1d |
1385083.5 ns |
1339729 ns |
1.03 |
shared array/accumulate/2d |
1386625 ns |
1387291 ns |
1.00 |
shared array/iteration/findall/int |
1960958 ns |
1813291 ns |
1.08 |
shared array/iteration/findall/bool |
1680125 ns |
1572375 ns |
1.07 |
shared array/iteration/findfirst/int |
1405875 ns |
1385750 ns |
1.01 |
shared array/iteration/findfirst/bool |
1377145.5 ns |
1363396 ns |
1.01 |
shared array/iteration/scalar |
157167 ns |
158084 ns |
0.99 |
shared array/iteration/logical |
2981958 ns |
2988208 ns |
1.00 |
shared array/iteration/findmin/1d |
1450020.5 ns |
1467875 ns |
0.99 |
shared array/iteration/findmin/2d |
1361562.5 ns |
1368229 ns |
1.00 |
shared array/reductions/reduce/1d |
724500 ns |
731459 ns |
0.99 |
shared array/reductions/reduce/2d |
654771 ns |
669917 ns |
0.98 |
shared array/reductions/mapreduce/1d |
731875 ns |
733042 ns |
1.00 |
shared array/reductions/mapreduce/2d |
654167 ns |
671958 ns |
0.97 |
shared array/permutedims/4d |
2332875 ns |
2426458 ns |
0.96 |
shared array/permutedims/2d |
1014874.5 ns |
1015708 ns |
1.00 |
shared array/permutedims/3d |
1573062.5 ns |
1591625 ns |
0.99 |
shared array/copy |
223667 ns |
250687 ns |
0.89 |
This comment was automatically generated by workflow using github-action-benchmark.
I realized automatically reversing dimensions in the descriptor as part of the low-level wrappers is a terrible idea.
Also a few more improvements
Also using this as an opportunity to test the [skip benchmark] and benchmarking from a fork. Edit: It works