`in` operator is O(n) #277

ntrel · 2019-06-19T15:09:48Z

This operator does a linear search through an array taking O(n) time. Building it into the language with special short syntax encourages programmers to reach for that instead of considering what other functions they might use.

Instead of encouraging use of this operation, a method would be clearer:
[v1, v2, v3].contains(item)

V should be able to know that the array literal does not escape the function call and allocate it on the stack. This is a good general optimization rather than just for linear find.

BTW for containers like maps that have better than O(n) search, having an in operator would be OK. This means in needs to be overloadable.

The text was updated successfully, but these errors were encountered:

ntrel · 2019-06-29T13:25:04Z

@lutherwenxu in on arrays can't be made better than O(n) ;-)

MrOnlineCoder · 2019-06-29T16:21:07Z

in needs to be overloadable.

AFAIK, operators overloading is not planned in V.

In my opinion, the in syntax is clear and is the thing that some lanaguges miss. Moreover, in operator can be just a shorthand way of calling the .contains() method. So, next pieces of code could be equal in result:

v in a

v.contains(a)

ntrel · 2019-06-30T08:41:40Z

@MrOnlineCoder:
https://vlang.io/docs#op

Allowing O(n) in for arrays but not O(k) in for maps seems misguided. If we have in it must be overloadable.

Choosing to use a.contains(v) means the programmer has to think about which function to use, perhaps choosing to use e.g. a binary search instead if in a tight loop. v in a doesn't require much thought: "let's use the language feature, that's probably best, and it's shortest".

Delta456 · 2019-10-06T08:33:19Z

in is now optimized 62133c6

ntrel · 2019-10-06T10:37:30Z

@Delta456 that commit isn't relevant

ntrel · 2019-10-06T10:40:18Z

I think you meant b242e8d. That commit just does loop unrolling for a literal array. in is still O(n) on a runtime array.

Delta456 · 2019-10-06T11:40:49Z

@ntrel my bad..

medvednikov · 2019-10-06T14:00:30Z

An interesting thing about it is that the linear algorithm is actually significantly faster for a small N.

In Ruby they recently changed their maps to use arrays for N < 10 (or 40, can't find the commit).

So a in [1,2,3] is perfectly fine as it is.

For large arrays it should be clear that a in numbers == numbers.contains(a), and both use O(n).

ntrel · 2019-10-06T15:19:53Z

the linear algorithm is actually significantly faster for a small N.

Yes, I know. I've explained above why it's bad to encourage using an operator.

dumblob · 2021-05-17T06:55:56Z

For static arrays and non-mut arrays this cane be made O(1) - see #6991 (comment) . Probably worth for arrays of cumulative size bigger than few cache line sizes.

ntrel · 2021-05-17T11:37:00Z

For static arrays and non-mut arrays this cane be made O(1)

Only when the initializer is known at compile time.

dumblob · 2021-05-17T13:40:56Z

@ntrel of course - I was myself surprised how often is that the case.

ArtemkaKun · 2023-05-26T10:54:54Z

Hi, to summarize everything that is going on here:

in and contains use an algorithm with O(n) complexity
in operator can't be overload
@ntrel proposed not to encourage the usage of in since it has O(n) complexity, and optimize it somehow
@dumblob proposed hash map conversion for static arrays to achieve O(1) complexity in some cases

My thoughts:

in - is a syntax sugar for the contains() method. I suppose most people will not even care about the search algo complexity in 99% + Clang/GCC compilator optimizes this operation pretty heavily when you compile your V program with -prod flag
Another search algorithm, like binary search, must have a different method, and not be hidden under the in operator because there is no silver bullet and any algorithm must be applied meaningfully. Since the linear algorithm is the simplest one and the expected one, it should be used by default, in my opinion
Automatic static array -> hash map conversion shouldn't be applied by default. As before, this is not a silver bullet and a programmer must understand if they need this exact conversion here or not. I can see something like the optimized_for_read parameter for static arrays, but not as default for sure.

I decided to close this issue because of:

It wasn't active for 2 years now
There combined a discussion and at least 2 separate feature requests. I think it should be split into separate smaller clear issues what should be achieved.

If you think this issue still needs to be open - feel free to reopen it with a comment on why you decided to do so.

dumblob · 2023-05-26T19:48:18Z

I do not want to reopen but I would like to understand why static arrays should not use perfect hashing by default.

Perfect hashing does the same thing as jumping to an index. V arrays do bounds checking for plain C arrays. An in operator over an array indexed by a perfect hash function is technically equivalent to such a bounds check.

So why not default? Is there even a single disadvantage (we do not speak of extreme - yet common - cases as e.g. very short arrays of less than ~32 octets)?

dumblob · 2023-05-26T20:16:22Z

Btw. for non-static arrays of fixed-sized items (e.g. integers, floats, all structs, fixed-length strings, etc.) there are also nice trivial optimizations possible like the limit branching in a tight loop by halving the number of comparisons (works also for tiny arrays). This should be a no-brainer.

ArtemkaKun · 2023-05-26T21:41:17Z

I do not want to reopen but I would like to understand why static arrays should not use perfect hashing by default.

Perfect hashing does the same thing as jumping to an index. V arrays do bounds checking for plain C arrays. An in operator over an array indexed by a perfect hash function is technically equivalent to such a bounds check.

So why not default? Is there even a single disadvantage (we do not speak of extreme - yet common - cases as e.g. very short arrays of less than ~32 octets)?

I don't have a lot of experience in this area, but right now I see 2 disadvantages of converting a static array into a hash map by default:

Not every array can be perfectly hashed - i.g. an array with nonunique values.
Converting an array into a hash map can change expected memory usage. If this is done under the hood, a programmer can not expect it. I'm totally fine to have a built-in way to mark an array as "optimize with hash table", it just needs to be explicit.

I'm also not sure if Clang/GCC already does that kind of optimization. I found no info about that.

If you think this still should be a case - please, create a new issue, and maybe more experienced people in optimizations than me will leave a comment on this.

ArtemkaKun · 2023-05-26T22:17:44Z

Btw. for non-static arrays of fixed-sized items (e.g. integers, floats, all structs, fixed-length strings, etc.) there are also nice trivial optimizations possible like the limit branching in a tight loop by halving the number of comparisons (works also for tiny arrays). This should be a no-brainer.

Interesting concept, but unfortunately:

it can't be applied everywhere, because in V this implementation requires mut array. If array is not mut - a new clone of the array must be created, which will eat all optimizations we did in the searching algorithm.
for me, this algorithm is 2 times slower in development build (for a big 100000 elements array, a small array is untested).

In the production build (Clang), the new algorithm is after about 17% which can be handy. I will open a new issue about that.

If you want to review/retest my code, here it is -> https://pastebin.com/mvRLWNYs

╭─yuart@yuart in ~/Projects via V v0.3.4 took 10s
╰─λ v run test.v
true
SPENT     0.213 ms in in
true
SPENT     0.449 ms in small comp

╭─yuart@yuart in ~/Projects via V v0.3.4 took 304ms
╰─λ v run test.v
true
SPENT     0.212 ms in in
true
SPENT     0.453 ms in small comp

╭─yuart@yuart in ~/Projects via V v0.3.4 took 304ms
╰─λ v run test.v
true
SPENT     0.215 ms in in
true
SPENT     0.465 ms in small comp

╭─yuart@yuart in ~/Projects via V v0.3.4 took 307ms
╰─λ v -cc clang -prod run test.v
Note: building an optimized binary takes much longer. It shouldn't be used with `v run`.
Use `v run` without optimization, or build an optimized binary with -prod first, then run it separately.
true
SPENT     0.023 ms in in
true
SPENT     0.019 ms in small comp

╭─yuart@yuart in ~/Projects via V v0.3.4 took 4s
╰─λ v -cc clang -prod run test.v
Note: building an optimized binary takes much longer. It shouldn't be used with `v run`.
Use `v run` without optimization, or build an optimized binary with -prod first, then run it separately.
true
SPENT     0.023 ms in in
true
SPENT     0.018 ms in small comp

╭─yuart@yuart in ~/Projects via V v0.3.4 took 4s
╰─λ v -cc clang -prod run test.v
Note: building an optimized binary takes much longer. It shouldn't be used with `v run`.
Use `v run` without optimization, or build an optimized binary with -prod first, then run it separately.
true
SPENT     0.023 ms in in
true
SPENT     0.019 ms in small comp

ArtemkaKun · 2023-05-26T22:25:56Z

Created issue -> #18283

dumblob · 2023-05-28T21:26:06Z

I don't have a lot of experience in this area, but right now I see 2 disadvantages of converting a static array into a hash map by default:

Not every array can be perfectly hashed - i.g. an array with nonunique values.

I see. Thanks for bringing this up. This is actually a misnomer as e.g. gperf does not actually construct a perfect hash. gperf handles these cases gracefully - the hash is (can be) a function of the value as well as of the index/order. In other words, this is not an issue.

Converting an array into a hash map can change expected memory usage. If this is done under the hood, a programmer can not expect it. I'm totally fine to have a built-in way to mark an array as "optimize with hash table", it just needs to be explicit.

Yes and no 😉. Memory usage is totally different from normal mutable/immutable arrays (the difference seems easily being higher tens of percent especially in case of small size arrays). Shall we mark this also somewhere? If not, then why not to allow the memory usage be again closer to non-static arrays?

Btw. "optimize with hash table" would be overly misleading IMHO (it even sounds a little like an oxymoron). Beware that e.g. gperf generates code - especially for tiny arrays - which is equivalent to C indexing assembly-wise (i.e. just one add to the array base address). Something like "optimize for existence checks" would be much closer to what it does.

What I did not check is the iteration performance over gperf-generated arrays. But I have my reasons to believe the difference would not be significant if any at all.

I'm also not sure if Clang/GCC already does that kind of optimization. I found no info about that.

AFAIK they do not and there seem to be no plans to do so. It is too high-level for them IMHO.

If you think this still should be a case - please, create a new issue, and maybe more experienced people in optimizations than me will leave a comment on this.

I think so, but my situation does not allow me to devote much time to this. Could you at least document it that it is a known pitfall and subject to change before V 1.0?

ntrel mentioned this issue Jun 20, 2019

Python style List comprehensions #283

Closed

chanbakjsd added the Type: Optimization This issue/PR is related to optimizations that could be applied to V. label Jun 25, 2019

ntrel mentioned this issue Jun 29, 2019

use in operator to check if character is in string #761

Closed

Delta456 closed this as completed Oct 6, 2019

hazohelet reopened this Oct 6, 2019

crthpl mentioned this issue May 17, 2021

Array append sometimes segfaults #10011

Closed

yuyi98 pushed a commit to yuyi98/v that referenced this issue Jun 21, 2022

!! => ! (vlang#277)

0274881

ArtemkaKun closed this as completed May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`in` operator is O(n) #277

`in` operator is O(n) #277

ntrel commented Jun 19, 2019 •

edited

Loading

ntrel commented Jun 29, 2019

MrOnlineCoder commented Jun 29, 2019

ntrel commented Jun 30, 2019

Delta456 commented Oct 6, 2019

ntrel commented Oct 6, 2019

ntrel commented Oct 6, 2019

Delta456 commented Oct 6, 2019

medvednikov commented Oct 6, 2019

ntrel commented Oct 6, 2019

dumblob commented May 17, 2021 •

edited

Loading

ntrel commented May 17, 2021

dumblob commented May 17, 2021

ArtemkaKun commented May 26, 2023

dumblob commented May 26, 2023

dumblob commented May 26, 2023 •

edited

Loading

ArtemkaKun commented May 26, 2023 •

edited

Loading

ArtemkaKun commented May 26, 2023 •

edited

Loading

ArtemkaKun commented May 26, 2023

dumblob commented May 28, 2023

in operator is O(n) #277

in operator is O(n) #277

Comments

ntrel commented Jun 19, 2019 • edited Loading

ntrel commented Jun 29, 2019

MrOnlineCoder commented Jun 29, 2019

ntrel commented Jun 30, 2019

Delta456 commented Oct 6, 2019

ntrel commented Oct 6, 2019

ntrel commented Oct 6, 2019

Delta456 commented Oct 6, 2019

medvednikov commented Oct 6, 2019

ntrel commented Oct 6, 2019

dumblob commented May 17, 2021 • edited Loading

ntrel commented May 17, 2021

dumblob commented May 17, 2021

ArtemkaKun commented May 26, 2023

dumblob commented May 26, 2023

dumblob commented May 26, 2023 • edited Loading

ArtemkaKun commented May 26, 2023 • edited Loading

ArtemkaKun commented May 26, 2023 • edited Loading

ArtemkaKun commented May 26, 2023

dumblob commented May 28, 2023

`in` operator is O(n) #277

`in` operator is O(n) #277

ntrel commented Jun 19, 2019 •

edited

Loading

dumblob commented May 17, 2021 •

edited

Loading

dumblob commented May 26, 2023 •

edited

Loading

ArtemkaKun commented May 26, 2023 •

edited

Loading

ArtemkaKun commented May 26, 2023 •

edited

Loading