-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up removal from a set #127
base: master
Are you sure you want to change the base?
Conversation
The method removes all elements that do not satisfy the hook, in one go. This is often faster than calling IndexOf then Remove multiple times.
Instead of using O(n) times an O(n) List.Remove(), use the new List.Filter() method to keep items that were not deleted.
All tests pass and I introduced a new test for the Filter() method. |
how is the current implementation O(n**2) as I only see a single loop in https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/doublylinkedlist.go#L104 also check out the enumerable functions found in also O(n) AFAICS https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/enumerable.go#L33 |
Indeed, I didn't see the Select method in enumerable; it can do the same as Filter (except that my proposed Filter does it in place and thus does not reallocate anything, whereas Select creates a new list). As for the current implementation, removing a single element in a doubly linked list is O(n). But the API to remove elements from a set accepts multiple elements, and removes each one separately, in a loop. In effect, in means the complexity of set removal is O(m × n) where m is the amount of items to remove and n the set size. Worse comes to worst, m = n/2 and we get O(n**2). My patch series make that removal O(n) regardless on the amount of elements to remove. It could build up on Select instead, but that would mean the set would reallocate its internal list for each removal, which is wasteful. |
It is common that these filter/select/remove/map (enumerable) functions return a new collection, AFAIK at least in Ruby/Java/Javascript, etc. You could still use the current Select function to have O(n) removal for multiple indexes given the condition function evaluates this, right? It would create a new collection, but it'd run in O(n). Now to think about it, I am not sure why I haven't made the Remove(i) a variadic argument function so that it can accept multiple indexes at once Remove(1,3,4,5, ...) etc.. Implementing that wouldn't even be a breaking change :) ref. https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/doublylinkedlist.go#L104 |
...really strange as most of the other functions have variadic arguments :/ |
Let me reopen this PR to have a reminder to have that implemented for all lists, I would start making this a variadic function and applying it to all three list sub-types: |
You'd have to first loop through the list to find out (and remember in a toRemove list) all indices of elements that need removal, to then call the remove function with "toRemove...". A lot of allocations that can be avoided. But making remove accept variadic arguments is a good idea nevertheless. I don't think using Select is a very good idea: the common case is probably removing a single item from the set, and having the set re-create a new list each time is very wasteful. I agree that Select is good API in general, especially in go where append, etc. return the new slice, but in that case it is not optimal to use it. |
The idea behind "Filter" is that it was not index-based, so it would sidestep this issue. With the updated set, checking if an element needs to be filtered out of the doubly linked list is O(1), given its value. |
@FrnchFrgg so same can be achieved with the Select function, right? ...just not in place. |
Indeed, the same can be achieved with the Select function. At the very least, there should be a check for the number of items to remove from the set, to avoid doing a complete reallocation for the common case. The better way would be to change the "Filter" I introduced into an in-place "Select", i.e. reuse the exact same signature, and make it equivalent to list = list.Select(...). In most List structures it would be possible to do so without reallocation, but at worst harnessing the existing Select gives an easy implementation. |
Oh I just found out that Select is not yet part of the EnumerableWithIndex interface, it is commented out. |
@FrnchFrgg we are talking about the doubly linked list right? I think it should be, this guarantees it: https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/enumerable.go#L10 And the select is right here: https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/enumerable.go#L33 ` |
So it's implemented, but I could not enforce it here (some technical issue that I could not resolve without type checking, which I didn't want to do): https://github.com/emirpasic/gods/blob/master/containers/enumerable.go#L19 In other words you are right, not part of the EnumerableWithIndex, but you can still use Select for lists |
Introduce a new Filter() command to doubly linked lists that can perform several removals in a single O(n) sweep, and use that new method to make set removal O(n) instead of O(n**2)