You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
1a0c48a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
1a0c48a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/117857
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
1a0c48a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
72166
ns70812.5
ns1.02
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
72208
ns246375
ns0.29
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
73792
ns73979.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
716209
ns72062.5
ns9.94
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43839
ns43880
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
235500
ns308958.5
ns0.76
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
292209
ns793604
ns0.37
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
233771
ns233375
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2230250
ns282166
ns7.90
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
190422
ns190452
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
417042
ns415000
ns1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
391834
ns849833
ns0.46
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
406666
ns415875
ns0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2236292
ns300084
ns7.45
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1617833
ns1656000
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1192083
ns1060416
ns1.12
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1372417
ns1381084
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3021645.5
ns2397416
ns1.26
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
212820
ns209181.5
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12269375
ns12352250
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9554312.5
ns8849958.5
ns1.08
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9312229
ns9323292
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18597917
ns18049625
ns1.03
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1906055
ns1913367.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17342416.5
ns17385417
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14370124.5
ns13997625
ns1.03
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14320000
ns14382770.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21879042
ns21102542
ns1.04
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
119514416
ns122037708.5
ns0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174474958
ns148978417
ns1.17
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
148469291
ns147416375
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448126458.5
ns103471542
ns4.33
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5495054
ns5473319
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
581881396
ns584214291.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
535710708
ns924299000
ns0.58
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
443551604
ns438064084
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1628778458
ns630297125
ns2.58
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34985718
ns38161414
ns0.92
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
743015729
ns699393958.5
ns1.06
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
671791166
ns1004148063
ns0.67
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
629284000
ns601864833.5
ns1.05
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1731283646
ns740760041
ns2.34
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
860083.5
ns878333
ns0.98
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
826167
ns1659792
ns0.50
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
1216291.5
ns1217875
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
790000
ns959125
ns0.82
lenet(28, 28, 1, 32)/forward/GPU/CUDA
263585.5
ns265941.5
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2689458
ns2700312.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
2459645.5
ns4392063
ns0.56
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
3304937
ns3302937.5
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3169583
ns3374167
ns0.94
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1050300
ns1052375
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
6771791
ns6807874.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
6379542
ns6268583
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
6492417
ns6489125
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
8179208
ns7605063
ns1.08
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
211697
ns209645
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
24030062
ns24577000
ns0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
21863145.5
ns21078000
ns1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
21209895.5
ns21705458
ns0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
30339645.5
ns29719834
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1970788
ns1972195
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
37405667
ns48847875
ns0.77
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
34394458.5
ns45336083.5
ns0.76
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
45812625
ns34513729.5
ns1.33
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
49950334
ns49269917
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
13380646
ns13343291
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
12414229
ns12534687.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
12490042
ns12487312
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
16341833
ns14837625
ns1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
512690
ns516082
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
47504708
ns47462208
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
41894125
ns41011646
ns1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
41039083
ns41180667
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
60879833
ns58172313
ns1.05
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3230861.5
ns3228529
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
74751500
ns97347333.5
ns0.77
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
68922833
ns146675958
ns0.47
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
91225334
ns69089125
ns1.32
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
101557250
ns98763625
ns1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
283992291.5
ns286756312.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
339835667
ns316971625
ns1.07
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
313710583.5
ns313660292
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
612185542
ns269927875
ns2.27
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7106627
ns7102725
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
971662042
ns977913167
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
878835625
ns1267309084
ns0.69
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
824019771
ns832686083.5
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2224878916.5
ns1109108334
ns2.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33878542
ns33843856
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1404909708
ns1781568125
ns0.79
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1358005896
ns2031944646
ns0.67
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1650625792
ns1227897000
ns1.34
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2661202542
ns1668096458
ns1.60
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1539792
ns1549791.5
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
1272125
ns3030500
ns0.42
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
1622375
ns1627020.5
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2495749.5
ns2159334
ns1.16
lenet(28, 28, 1, 128)/forward/GPU/CUDA
269633
ns273208.5
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
7849125
ns7876334
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
6602687.5
ns12090041.5
ns0.55
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
7142875
ns7093625
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11764167
ns10467416.5
ns1.12
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1113950
ns1133081.5
ns0.98
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
192026500
ns192676916.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
144990959
ns311575249.5
ns0.47
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
129332541.5
ns127609750
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
447329791.5
ns177249416
ns2.52
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4844326
ns4872584
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
623445292
ns784043583
ns0.80
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
512510000
ns1024411500
ns0.50
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
592532958
ns559332750
ns1.06
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1242092000
ns794744875
ns1.56
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
16447225
ns16268331
ns1.01
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1063792
ns1083500
ns0.98
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
975750
ns1648687.5
ns0.59
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
1351416.5
ns1362000
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1388875
ns1380541
ns1.01
lenet(28, 28, 1, 64)/forward/GPU/CUDA
271754
ns273825
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
4470209
ns4431666.5
ns1.01
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
3848624.5
ns6601874.5
ns0.58
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
4529125
ns4608020.5
ns0.98
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6089375
ns5706917
ns1.07
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1134621
ns1148559
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23850583
ns23836750
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
34071208
ns43746958
ns0.78
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
36834208
ns37494917
ns0.98
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132533000
ns35098834
ns3.78
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1820300
ns1834843
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
184146437.5
ns184322479.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
159330625
ns270582042
ns0.59
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
184542562.5
ns185507041
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
647733209
ns382851417
ns1.69
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
16521951
ns16531854
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
292798042
ns294425541
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
244693792
ns374504875
ns0.65
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
296969875.5
ns293550625.5
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
646009938
ns433623500
ns1.49
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
762535917
ns764247416
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
492838000
ns816447791
ns0.60
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
438050083.5
ns432863312
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1947787375
ns859004750
ns2.27
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12469626
ns12470955
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
2013595563
ns1863360604
ns1.08
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
1549070125
ns2802913125
ns0.55
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
1579947145.5
ns1500209542
ns1.05
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
4927858875
ns2077162895.5
ns2.37
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49732109
ns49776341
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3065875
ns3069416.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2056250
ns2083292
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2274916
ns2310104
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6030541.5
ns4825875
ns1.25
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
578458
ns579507
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25488666.5
ns25429750
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
19719458
ns18860542
ns1.05
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18691208.5
ns18995771
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39310417
ns36731291
ns1.07
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3197549
ns3198905
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
34796812.5
ns34791875
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
29570667
ns81624291
ns0.36
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
29341958
ns29509583.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45325625
ns42723125
ns1.06
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1645833
ns1656125
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1199459
ns1096584
ns1.09
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1394895.5
ns1407125
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3030917
ns2444459
ns1.24
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
214156
ns215245.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12708499.5
ns12711292
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9952917
ns9213500
ns1.08
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9680666
ns9770917
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
19008687.5
ns18346563
ns1.04
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1955141
ns1941213
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17685396
ns17682833
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14735750
ns14335354
ns1.03
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14612083
ns14654500
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22192833.5
ns21449958
ns1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23611041
ns23917958
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
34102333
ns43787792
ns0.78
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
37338792
ns37445542
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132620542
ns35102208
ns3.78
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1824553
ns1837807
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
301585020.5
ns302651708
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
228577270.5
ns340997792
ns0.67
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
191160459
ns191785000
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
600527188
ns389240750
ns1.54
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13961516.5
ns17931121
ns0.78
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
297018625
ns298716042
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
251347292
ns431554958
ns0.58
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
301596896
ns297489208.5
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
651795542
ns437979250
ns1.49
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
2419583
ns2420395.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
2415875
ns3814687.5
ns0.63
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
2283687.5
ns2371562.5
ns0.96
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
4100250
ns2407083
ns1.70
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
574636
ns570784
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
6528500
ns6550584
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
6517354.5
ns10128729
ns0.64
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
6521916.5
ns6537000
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
11653375
ns6522791
ns1.79
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1367239
ns1368166.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
17536000.5
ns17591937.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
17510833
ns21731104
ns0.81
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
17511291
ns17540854
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
18970666
ns14118125
ns1.34
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
69583
ns65625
ns1.06
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
68459
ns578000
ns0.12
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
70042
ns70583
ns0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
725125
ns68770.5
ns10.54
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47418
ns47926
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
329958.5
ns331792
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
326187.5
ns1017417
ns0.32
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
285625
ns316208
ns0.90
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2298750
ns331208
ns6.94
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
210185
ns214904
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
440375
ns446708
ns0.99
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
445167
ns1090500
ns0.41
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
429896
ns430604.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2260958
ns374959
ns6.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3056625
ns3044333
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2083417
ns2071416
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2246583
ns2282063
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6008729.5
ns4836833
ns1.24
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
576345.5
ns580025
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
23667021
ns23703041.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18070167
ns17227208.5
ns1.05
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18267041
ns18457937.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
38420083
ns36049312.5
ns1.07
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
3100732.5
ns3101815
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
34097125
ns34744792
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
27584000
ns80182375
ns0.34
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
29094416
ns29270125
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44555479.5
ns41879958
ns1.06
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
119734792
ns120065104.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
173704250
ns148623146
ns1.17
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
147841375
ns147898792
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447866083
ns107010541
ns4.19
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5452572
ns5474909
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
472213958
ns469653562.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
534077167
ns926515542
ns0.58
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
434948646
ns432504750
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1735672750
ns724938750
ns2.39
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
32301623
ns35169265.5
ns0.92
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
638421125
ns638211020.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
660879000
ns983543708
ns0.67
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
598776187.5
ns613930458.5
ns0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1717902229
ns724015375
ns2.37
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
330166
ns420167
ns0.79
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
452791.5
ns1717812.5
ns0.26
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
329750
ns330104
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2049250
ns420291
ns4.88
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
568084
ns565510.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
2007083.5
ns2023542
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
1957667
ns5126750
ns0.38
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
2009791.5
ns2014333
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7110375
ns2011167
ns3.54
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1325105
ns1320420
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
5779208
ns5781000
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
5777292
ns8875375
ns0.65
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
5772521
ns5781291
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7808917
ns2873875
ns2.72
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
104666.5
ns102959
ns1.02
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
104375
ns536583
ns0.19
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
105292
ns104500
ns1.01
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
140750
ns103770.5
ns1.36
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
27866
ns27852
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
208750
ns210334
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
209458
ns515750
ns0.41
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
209708
ns209292
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
333875
ns209896
ns1.59
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
219920
ns219036
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
706667
ns707208.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
706750
ns1040041
ns0.68
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
707166.5
ns706750
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
807417
ns687937.5
ns1.17
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
12437.5
ns13084
ns0.95
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
13395.5
ns441042
ns0.030372390838060773
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
14542
ns13958
ns1.04
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
54479.5
ns13354
ns4.08
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
27908
ns27961
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
25854.5
ns25792
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
25833
ns321521
ns0.08034622932872192
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
25875
ns25792
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151125
ns26083
ns5.79
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
208325
ns210134.5
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
46458
ns46084
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
45042
ns335833.5
ns0.13
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
45959
ns46083
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
150875
ns27042
ns5.58
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
319195333
ns320271709
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
234131917
ns430953624.5
ns0.54
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
279995959
ns272585584
ns1.03
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
864882791.5
ns319252125
ns2.71
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7664558.5
ns7617926.5
ns1.01
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1232626937.5
ns1240631125
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1005847687.5
ns1686654750.5
ns0.60
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
784149062.5
ns879062875
ns0.89
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
3031224208
ns1565071083
ns1.94
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
27144525
ns27143469
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
417750
ns418250
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
416459
ns622375
ns0.67
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
422103.5
ns430500
ns0.98
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
1068209
ns415333
ns2.57
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47170.5
ns47795.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1085083
ns1080750
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1064667
ns1617583
ns0.66
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
1085854.5
ns1080791.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
3028125
ns1082812.5
ns2.80
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
223831
ns225487.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
3098396
ns3105916.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
3111521
ns3686708
ns0.84
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
3113958
ns3109708
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
4879084
ns3032000
ns1.61
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
447667
ns583375
ns0.77
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
527917
ns1019792
ns0.52
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
584791
ns581583
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2202771
ns530292
ns4.15
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
569915
ns575876.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
2138042
ns2132729
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
2129688
ns4908875
ns0.43
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
2137667
ns2119979.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7240666.5
ns2135125
ns3.39
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1331738
ns1355203
ns0.98
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
7933875
ns7940395.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
7937583
ns10769750
ns0.74
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
7946916
ns7927833.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9720958.5
ns4851791.5
ns2.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
6958
ns6541
ns1.06
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
7000
ns4375
ns1.60
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
7666.5
ns8229.5
ns0.93
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
4583
ns6875
ns0.67
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24682
ns25194
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7541
ns7875
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7084
ns9250
ns0.77
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7875
ns7417
ns1.06
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
9292
ns7625
ns1.22
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
186807
ns191956.5
ns0.97
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
9084
ns9000
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
9083
ns9459
ns0.96
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
9208.5
ns9062.5
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
7167
ns6041
ns1.19
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
20083
ns19958
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
19458
ns15709
ns1.24
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
20750
ns21312
ns0.97
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
15292
ns19583
ns0.78
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25020
ns25221
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
33959
ns34042
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
33959
ns30792
ns1.10
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
33542
ns33479.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
30625
ns33792
ns0.91
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
196006
ns200953.5
ns0.98
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
95104.5
ns94500
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
94875
ns91250
ns1.04
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
94583.5
ns94666
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
88541
ns92125
ns0.96
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
13500
ns13208
ns1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
11792
ns357500
ns0.03298461538461538
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
19042
ns14646
ns1.30
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
50750
ns13458
ns3.77
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26010
ns26262
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
24250
ns23667
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
23625
ns261229
ns0.09043789165827684
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
24125
ns23333
ns1.03
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
146750
ns24125
ns6.08
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
167198.5
ns171941
ns0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
57250
ns57250
ns1
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
57250
ns283208
ns0.20
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
57500
ns57250
ns1.00
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
152209
ns34083
ns4.47
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
6584
ns6250
ns1.05
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
6542
ns3417
ns1.91
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
7479
ns7834
ns0.95
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
3292
ns6167
ns0.53
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
22935
ns23362
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5250
ns5417
ns0.97
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5334
ns6875
ns0.78
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5542
ns5667
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
7000
ns5292
ns1.32
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
170791
ns176236
ns0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
9667
ns9125
ns1.06
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
9000
ns8333
ns1.08
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
9250
ns9333
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
6083
ns6083
ns1
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
106049667
ns107034791.5
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
117278625
ns128490375
ns0.91
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
120820500.5
ns120207791
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
214863521
ns118338292
ns1.82
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2646634.5
ns2634352
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
397432312.5
ns397535521
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
375661416
ns482736437.5
ns0.78
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
403543791.5
ns395708541
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
890588792
ns634072166
ns1.40
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
15247832
ns15152647.5
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
622643125
ns805393791.5
ns0.77
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
576393229
ns959710667
ns0.60
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
815424292
ns630441458
ns1.29
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
1176606500
ns909742750
ns1.29
This comment was automatically generated by workflow using github-action-benchmark.