-
Notifications
You must be signed in to change notification settings - Fork 2.1k
VLIW5 speedup from vectorizing
magnumripper edited this page Dec 22, 2014
·
2 revisions
These figures are from Juniper
Natural vector size versus scalar
Ratio: 1.15406 real, 1.17261 virtual PBKDF2-HMAC-SHA1-opencl:Raw
Ratio: 1.17748 real, 1.22909 virtual RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Many salts
Ratio: 1.15708 real, 1.17223 virtual RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Only one salt
Ratio: 1.10558 real, 1.11278 virtual encfs-opencl, EncFS:Raw
Ratio: 1.12377 real, 1.11362 virtual krb5pa-sha1-opencl, Kerberos 5 AS-REQ Pre-Auth etype 17/18:Raw
Ratio: 2.30797 real, 2.27428 virtual ntlmv2-opencl, NTLMv2 C/R:Many salts
Ratio: 1.55751 real, 1.43351 virtual ntlmv2-opencl, NTLMv2 C/R:Only one salt
Ratio: 1.76623 real, 1.75933 virtual office2007-opencl, MS Office 2007 (50,000 iterations):Raw
Ratio: 1.77146 real, 1.74888 virtual office2010-opencl, MS Office 2010 (100,000 iterations):Raw
Ratio: 0.06893 real, 0.06890 virtual office2013-opencl, MS Office 2013 (100,000 iterations):Raw
Ratio: 1.11096 real, 1.10500 virtual sha1crypt-opencl, (NetBSD):Raw
Ratio: 1.11199 real, 1.11693 virtual wpapsk-opencl, WPA/WPA2 PSK:Raw
After tuning some to 2x, and completely turned vectorizing off for Office 2013
Ratio: 1.42069 real, 1.43795 virtual PBKDF2-HMAC-SHA1-opencl:Raw
Ratio: 1.37714 real, 1.38273 virtual RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Many salts
Ratio: 1.17866 real, 1.14452 virtual RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Only one salt
Ratio: 1.32869 real, 1.32331 virtual encfs-opencl, EncFS:Raw
Ratio: 1.43004 real, 1.39547 virtual krb5pa-sha1-opencl, Kerberos 5 AS-REQ Pre-Auth etype 17/18:Raw
Ratio: 2.30797 real, 2.21851 virtual ntlmv2-opencl, NTLMv2 C/R:Many salts
Ratio: 1.54382 real, 1.42823 virtual ntlmv2-opencl, NTLMv2 C/R:Only one salt
Ratio: 1.76929 real, 1.74862 virtual office2007-opencl, MS Office 2007 (50,000 iterations):Raw
Ratio: 1.77418 real, 1.76354 virtual office2010-opencl, MS Office 2010 (100,000 iterations):Raw
Ratio: 1.00000 real, 0.97561 virtual office2013-opencl, MS Office 2013 (100,000 iterations):Raw
Ratio: 1.33333 real, 1.34906 virtual sha1crypt-opencl, (NetBSD):Raw
Ratio: 1.42564 real, 1.41961 virtual wpapsk-opencl, WPA/WPA2 PSK:Raw
So even a register starved device can get over 2x speed from vectorizing. Only Office2013 (64-bit) needed to run scalar for best performance. Actually it too can get 21% faster than scalar running 2x, but the kernel duration got too long in wall clock time.