-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numba support #2
Comments
To be honest I did not try numba yet but only read the doc and viewed some talks on it. My understanding is that its jit can unroll loops to simd instructions. So I expect this to work with plain python types, but I may be wrong. I this is correct I think that numba will be interesting also with pypy, hence my interest in it. |
After spending many weeks on this last year, I'm fairly confident saying 1) numba does not operate in its fast mode with python arrays, and 2) numpy by itself will not be fast because this isn't a vectorized algorithm (the data point additions are incremental). Notably, the numpy.insert/delete performance is terrible. I haven't published my approximate histogram code yet. It isn't derived from streamhist or anything else. I guess I'll finally get around to it since nothing else is filling my needs. As a quick example, here is timing of stdlib bisect_left vs. your hand-coded one (4x slower) vs. njit of your hand-coded one (not faster). import numba
import bisect
def _bisect_left(a, x):
lo = 0
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if a[mid] < x: lo = mid+1
else: hi = mid
return lo
_bisect_left_njit = numba.njit(_bisect_left)
a = list(range(1000))
na = numba.typed.List(a)
%timeit bisect.bisect_left(a, 15)
%timeit _bisect_left(a, 15)
%timeit _bisect_left_njit(na, 15)
|
shows clearly that numba only works well with numpy arrays: import numba
import numpy as np
def argmin_diff(a):
i, i_m = 1, -1
m = inf
last_item = a[0]
for item in a[1:]:
d = item - last_item
if d < m:
m = d
i_m = i-1
last_item = item
i += 1
return i_m, m
argmin_diff_jit = njit(argmin_diff)
a = list(range(1000))
na = numba.typed.List(a)
npa = np.array(a)
%timeit argmin_diff(a)
%timeit argmin_diff_jit(na)
%timeit argmin_diff_jit(npa)
|
There is something wrong here, because the bisect code in distogram is a copy of the stdlib. So it is exactly the same code: |
read below that stdlib function definition: # Overwrite above definitions with a fast C implementation
try:
from _bisect import *
except ImportError:
pass (if tuple values are used, there is no reason to re-implement stdlib bisect in the first place) |
ok thanks, I did not see that previously. |
In the implementations I've seen, (point, count) tuple works fine and gives the most performance opportunities in pure Python. To update an item, simply write a new tuple. |
My |
So using stdlib bisect / tuples made a huge difference, or the old numbers were mistaken? Interpreter Operation Numpy Req/s
============ ========== ======= ==========
- CPython 3.7 update no 65763
- CPython 3.7 update yes 39277
+ CPython 3.7 update no 436709
+ CPython 3.7 update yes 251603 |
Yes, for CPython this is a huge bump. |
It looks like the laptop numbers need to be updated?
|
@MainRo would you consider adding me to the README credits as I helped make this package about 10x faster and produce exact histograms when under capacity? |
sure, I just updated the credits section of the README. |
From belm0:
I'm not sure what this means. Pypy has limitations (it is not a 1:1 replacement for CPython), and there are legacy applications which cannot transition to pypy easily, or which are heavily dependent on numpy. For such applications, a numpy + numba implementation is useful. Having such an implementation does not preclude having a pure Python implementation which supports Pypy. They can exist along side each other.
A numba-only implementation will not perform, because numba does not support fast mode with Python arrays. The only way to get performance on this algorithm with numba is via numpy arrays.
I measured my pure Python implementation (no numa or numpy) vs. distogram. It is 20% faster (and less code, but I didn't compare closely). The implementation uses a "maintain cost function array" approach just as distogram does. So distogram appears to have some room for improvement.
My numba+numpy implementation is 20x faster than streamhist (with 64 bins).
The text was updated successfully, but these errors were encountered: