-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
10 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
33e5432
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
411833
ns410479.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
322270.5
ns322979
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
322687.5
ns243583
ns1.32
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
739792
ns740125
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43717
ns43310
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
592458
ns1312625
ns0.45
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
485750
ns2418334
ns0.20
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
472146
ns16373020.5
ns0.028836829465888718
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
916416
ns958000
ns0.96
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
193389
ns190740
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
732083
ns1378500
ns0.53
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
630020.5
ns2610979.5
ns0.24
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
590250
ns16066041
ns0.036738982553324744
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
1008000
ns967958
ns1.04
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1531625.5
ns1773750
ns0.86
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1199500
ns1093875
ns1.10
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1370166
ns1520104
ns0.90
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2432729.5
ns2458417
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
211497
ns209499
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12247917
ns12121583
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9551854.5
ns8834833
ns1.08
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9290625
ns9223542
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
17955583
ns17972771
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1916393.5
ns1903079
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17351270.5
ns17300562
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14353042
ns13987625
ns1.03
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14309667
ns14513146
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21080250
ns21072834
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
121821646
ns250439208
ns0.49
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174069521
ns148115625
ns1.18
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
148056167
ns117228750
ns1.26
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
106139667
ns104041542
ns1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5478633
ns5463821
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
596837750
ns1224682250
ns0.49
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
543667792
ns933837625
ns0.58
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
445085375
ns835803479
ns0.53
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
626736625
ns628560812
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
38176542
ns35032007
ns1.09
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
652965479.5
ns1141719792
ns0.57
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
674093584
ns983678666.5
ns0.69
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
632863021
ns1377974646
ns0.46
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
743445292
ns746244021
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
849625
ns1114917
ns0.76
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
832854.5
ns1628542
ns0.51
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
1217000
ns4086771
ns0.30
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
966042
ns959792
ns1.01
lenet(28, 28, 1, 32)/forward/GPU/CUDA
266296.5
ns272035
ns0.98
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2721500
ns2981354.5
ns0.91
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
2466917
ns4115937.5
ns0.60
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
3314395.5
ns9608958
ns0.34
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3364958.5
ns3297500.5
ns1.02
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1061958
ns1076584
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2259875
ns2355125
ns0.96
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1580250
ns1453000
ns1.09
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1752416.5
ns1602646
ns1.09
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3779541
ns3770125
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
212874
ns215196
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
20464770.5
ns20246500
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
17681833
ns16965833.5
ns1.04
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17968916
ns18330417
ns0.98
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
26220958.5
ns26150209
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1983562
ns1980657
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
44361875
ns44324250
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
42037625
ns41015042
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
41240937.5
ns41295750
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
47003375
ns47634416
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4301083.5
ns4656667
ns0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2876167
ns2867250
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2986437.5
ns2754917
ns1.08
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
7412625
ns7179750
ns1.03
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
515223
ns515735.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
40138542
ns40447166.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
34883937.5
ns33885499.5
ns1.03
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
33862542
ns34257187.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51421084
ns51082812.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2979770
ns3174195
ns0.94
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
88409354.5
ns109744583
ns0.81
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
84462416
ns135227938
ns0.62
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
83166916.5
ns270381750
ns0.31
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
93812228.5
ns95391167
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
143119041
ns270563333
ns0.53
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
186909958.5
ns161054417
ns1.16
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
160607000
ns125340042
ns1.28
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
149056313
ns146582812.5
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7091795
ns7052057
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
876576041.5
ns1502349770.5
ns0.58
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
819011417
ns1201703584
ns0.68
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
713621416.5
ns1090436625
ns0.65
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1026954750.5
ns1030635583
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33962668
ns33863530
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1654338292
ns2004525437
ns0.83
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1556399750
ns1793970792
ns0.87
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1456365229
ns2094682166.5
ns0.70
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1581565875
ns1594796917
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1500042
ns1816417
ns0.83
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
1281708
ns2535417
ns0.51
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
1629875
ns9580729.5
ns0.17
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2163395.5
ns2124083
ns1.02
lenet(28, 28, 1, 128)/forward/GPU/CUDA
262650.5
ns265598
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
7601959
ns9396125
ns0.81
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
6596916
ns11490250
ns0.57
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
7128375
ns25636708
ns0.28
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
10476396
ns10456812.5
ns1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1087771
ns1095109
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
185964437.5
ns381007729.5
ns0.49
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
146352312.5
ns283558854
ns0.52
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
130050146
ns264714708
ns0.49
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
179543416.5
ns179954521
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4845696
ns4874412
ns0.99
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
643688917
ns1154043958
ns0.56
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
604191917
ns991918083
ns0.61
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
537019041
ns1078324541
ns0.50
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
663244750
ns668069084
ns0.99
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
16664478
ns16315510
ns1.02
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1073937.5
ns1054520.5
ns1.02
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
979688
ns1957562.5
ns0.50
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
1338583
ns6624334
ns0.20
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1380812
ns1352146
ns1.02
lenet(28, 28, 1, 64)/forward/GPU/CUDA
265966
ns267010
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6009021
ns6499937.5
ns0.92
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
4658625
ns13781958
ns0.34
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
4922187.5
ns20923250
ns0.24
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5723978.5
ns5707062.5
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1137942.5
ns1115597.5
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23733624.5
ns70442792
ns0.34
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
35284771.5
ns43467103.5
ns0.81
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
37100750.5
ns39734999.5
ns0.93
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
35260167
ns35200125
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1834016
ns1845136
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
184898625
ns356138708
ns0.52
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
160642834
ns270050583
ns0.59
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
144248000
ns254207104
ns0.57
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
271530583
ns271696541.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
16393096
ns16499812
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
296257000
ns395249958
ns0.75
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
245304833
ns396501625
ns0.62
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
301408687
ns738492916.5
ns0.41
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
446273791
ns447067000
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
656873875
ns1189294541
ns0.55
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
433591937.5
ns689030520.5
ns0.63
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
402349417
ns650962625
ns0.62
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
677798728.5
ns681961562
ns0.99
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12482697
ns12470086
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
1891955437.5
ns3681028375
ns0.51
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
1637549708
ns2822971000
ns0.58
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
1514000729
ns2698825750
ns0.56
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
2113439354.5
ns2121646854.5
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49760182
ns49909051
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3046500
ns3408458
ns0.89
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2098166
ns2063208
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2287292
ns2518458
ns0.91
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4866125
ns4888750
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
582507.5
ns580004.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25579833
ns25958666
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
20277104
ns18964292
ns1.07
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19545458
ns19447166.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
36687292
ns36745416.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2979368
ns3191777
ns0.93
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
35578625
ns55195125
ns0.64
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
28390167
ns81683979.5
ns0.35
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
30144895.5
ns174851250
ns0.17
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
42776229
ns42883916.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1650667
ns1788312.5
ns0.92
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1204458
ns1100250
ns1.09
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1396750
ns1558396
ns0.90
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2509645.5
ns2464688
ns1.02
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
218107
ns215197
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12697333
ns12518625
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9973959
ns9205333
ns1.08
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9758687
ns9628104
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18284458
ns18331625
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1944527.5
ns1949026.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17688854
ns17616875
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14754291
ns14310166
ns1.03
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14674374.5
ns14557291.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21468083.5
ns21449812.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23681167
ns70367541.5
ns0.34
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
34404604
ns43412916.5
ns0.79
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
37545958
ns39742938
ns0.94
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
35268000
ns35448542
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1848561
ns1795063
ns1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
190505958.5
ns360004208
ns0.53
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
237366917
ns346542937
ns0.68
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
194090667
ns307664333.5
ns0.63
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
460122917
ns463480458
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13928578
ns13962488.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
301146020.5
ns418770999.5
ns0.72
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
250240417
ns421592709
ns0.59
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
308748000
ns780166249.5
ns0.40
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
395462625
ns393782854
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1916083.5
ns1880375
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1556917
ns1570562.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1579625
ns1246416.5
ns1.27
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2659291.5
ns2596208.5
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
570148
ns564741
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
6146812.5
ns9321042
ns0.66
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
5943834
ns13025292
ns0.46
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
5926041
ns33090166
ns0.18
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
6788041.5
ns6518396.5
ns1.04
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1353691.5
ns1351683.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
18785021
ns22256291
ns0.84
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
19131625
ns27788229
ns0.69
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
19125833
ns54815104
ns0.35
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
15678041
ns15723000
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
68937
ns660437.5
ns0.10
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
68625
ns564125.5
ns0.12
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
70792
ns1067959
ns0.06628718892766483
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
69854
ns68833
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47405.5
ns48015
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
287792
ns1518999.5
ns0.19
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
312812.5
ns1050917
ns0.30
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
280416
ns1571000
ns0.18
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
281521
ns325084
ns0.87
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
211915
ns216110
ns0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
444500
ns1555895.5
ns0.29
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
448250
ns1060292
ns0.42
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
391667
ns1624541
ns0.24
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
357041.5
ns374750
ns0.95
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3044791
ns3421708
ns0.89
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2094645.5
ns2057375
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2278916.5
ns2472729
ns0.92
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4567208
ns4540646
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
585440
ns585099
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
23578062.5
ns24053333
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18085666
ns17186833
ns1.05
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
16978625
ns17114833.5
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
34976833
ns35115834
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2912837
ns3096781.5
ns0.94
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
33419374.5
ns53599104
ns0.62
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
27788708
ns80093333
ns0.35
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
27373667
ns172009854
ns0.16
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
42059688
ns42254666
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
118607334
ns249876333.5
ns0.47
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
173693458.5
ns148299229
ns1.17
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
147902833
ns116785208
ns1.27
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
108303292
ns106758125
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5451158
ns5452339
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
470478958
ns1100542291
ns0.43
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
467481645.5
ns855735416.5
ns0.55
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
434223083.5
ns831274375
ns0.52
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
737222479.5
ns738168166.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
35181339
ns32317772.5
ns1.09
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
635200500
ns1001895729
ns0.63
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
665043396
ns966598875
ns0.69
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
582947041.5
ns1307543687
ns0.45
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
731724375
ns738405458
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1304833
ns1230583
ns1.06
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
937167
ns962250
ns0.97
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
903709
ns796604
ns1.13
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2036958
ns2036541
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
564089
ns567146.5
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
2960625
ns5691500
ns0.52
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
2635667
ns6401396
ns0.41
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
2619417
ns25408000
ns0.10
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
3698292
ns3697229
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1319613
ns1332396
ns0.99
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
6561416
ns9370333
ns0.70
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
6499959
ns13058291
ns0.50
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
6497875
ns32481708
ns0.20
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
4438375
ns4424396
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
39271
ns390896
ns0.10
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
32458.5
ns458604
ns0.07077674856739148
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
32062.5
ns2946292
ns0.01088232259395878
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
54437.5
ns54375
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
27919
ns28214
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
179042
ns360312.5
ns0.50
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
175541
ns439417
ns0.40
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
175167
ns5063292
ns0.034595476618768974
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
190708.5
ns190708
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
219938
ns219423.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
442334
ns632709
ns0.70
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
463458.5
ns711770.5
ns0.65
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
442417
ns5249812.5
ns0.0842729145088515
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
429500
ns429750
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
13562.5
ns335333.5
ns0.04044481091212181
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
13437.5
ns393604
ns0.03413964288980803
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
14416
ns765792
ns0.018824955079185992
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
14375
ns13458
ns1.07
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28121
ns28223
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
25917
ns286125
ns0.09057929226736566
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
25667
ns310708
ns0.08260810793413752
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
25625
ns733437.5
ns0.03493821900298253
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
26250
ns25916
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
209865
ns209427
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
45437.5
ns302000
ns0.15
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
46479.5
ns328375
ns0.14
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
46041
ns842791.5
ns0.054629169848058504
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
28209
ns28333
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
318266167
ns602432125
ns0.53
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
238108104
ns430731937.5
ns0.55
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
203733333
ns392016750
ns0.52
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
322939875
ns322757833
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7668589
ns7676293
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1098692854.5
ns2003927916.5
ns0.55
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
952627249.5
ns1623931938
ns0.59
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
856876291
ns1626427584
ns0.53
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
1173710250
ns1179210042
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
27280510.5
ns27131071
ns1.01
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
193124.5
ns523645.5
ns0.37
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
168542
ns450709
ns0.37
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
168187.5
ns2446250
ns0.06875319366377108
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
218458.5
ns219187.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47292
ns47774.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1214729
ns1875042
ns0.65
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
1095750
ns2602792
ns0.42
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
1014896
ns16587416.5
ns0.06118469383101341
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
1504666
ns1501583
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
222578.5
ns226318.5
ns0.98
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
2298292
ns2982667
ns0.77
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
2283250
ns5736062.5
ns0.40
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
2158334
ns17019146
ns0.13
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2476833
ns2470812.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1582437.5
ns1498583
ns1.06
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1264833
ns1193771
ns1.06
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1174562.5
ns1029042
ns1.14
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2357375
ns2235875
ns1.05
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
571094.5
ns572216
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
3197541
ns5950125
ns0.54
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
2843042
ns4653916
ns0.61
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
2853458
ns27167500
ns0.11
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
3931104
ns3927896
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1330355
ns1342658.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
8842250
ns11627667
ns0.76
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
8776708
ns14277520.5
ns0.61
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
8804292
ns36899542
ns0.24
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
6342000
ns6331458.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
4625
ns2333
ns1.98
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2458
ns2166
ns1.13
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
2542
ns3333
ns0.76
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2416
ns2646
ns0.91
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24562
ns25097
ns0.98
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7125
ns7333
ns0.97
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7125
ns7125
ns1
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7417
ns7375
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7292
ns7250
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
186417
ns189428.5
ns0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8541
ns8167
ns1.05
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8500
ns8250
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8709
ns8542
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6125
ns6083
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10625
ns10667
ns1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
14792
ns14041.5
ns1.05
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
12000
ns11125
ns1.08
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7500
ns7333
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
24702.5
ns25251
ns0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
21458
ns21917
ns0.98
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
21583
ns21708.5
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
22042
ns21750
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
21792
ns21916
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
196629
ns198645
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
56833
ns53625
ns1.06
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
59166
ns53500
ns1.11
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
57208
ns53625
ns1.07
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
54542
ns54583
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28687.5
ns28395.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28709
ns28667
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28792
ns28417
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46041
ns46084
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
25795
ns26326
ns0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
44250
ns224125
ns0.20
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
47667
ns272959
ns0.17
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
44000
ns4409500
ns0.009978455607211702
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
63916
ns65708
ns0.97
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
167633.5
ns170084
ns0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
68417
ns240562
ns0.28
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
68292
ns290792
ns0.23
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
68083
ns4409209
ns0.015441091588083032
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
68125
ns71541
ns0.95
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
2500
ns1708.5
ns1.46
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1750
ns1792
ns0.98
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
1792
ns2541.5
ns0.71
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1708
ns1917
ns0.89
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23041
ns23384
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5375
ns5292
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5083
ns5291
ns0.96
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5416
ns5459
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5125
ns5208.5
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
171497
ns173533
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
8375
ns7417
ns1.13
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
8167
ns7500
ns1.09
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
8208
ns7708
ns1.06
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5708
ns5625
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
34068625
ns81107833
ns0.42
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
40361624.5
ns49783792
ns0.81
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43432603.5
ns43745208
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
56216958.5
ns56305270.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2631639
ns2634961
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
453239687.5
ns620785875
ns0.73
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
319327021
ns429264250
ns0.74
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
307674396
ns416731125
ns0.74
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
506119959
ns507694646.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
15174112
ns15139001
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
735455458
ns871599625
ns0.84
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
706582229
ns839558208.5
ns0.84
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
743368604
ns1206593209
ns0.62
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
910398833
ns921408813
ns0.99
This comment was automatically generated by workflow using github-action-benchmark.