Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checker: Support integer and voidptr key types for maps #7503

Merged
merged 24 commits into from
Dec 27, 2020

Conversation

ntrel
Copy link
Contributor

@ntrel ntrel commented Dec 23, 2020

Part of #6991.

Allow maps with integer, rune and voidptr keys & implement basic indexing operations for them (see map_test.v additions). I have implemented non-string map literals too but I'll submit that separately for easier review. for and map.clone() also work (and are in the int keys test) but I have a list of other things which need fixing so for now I have added a warning saying non-string keys are WIP.

Add map.has_string_keys field - for now this is true when the key size is bigger than a pointer. When false, keys are assumed to be bitwise comparable (the parser enforces these assumptions).
Change DenseArray.clone_key to a map method to access map.has_string_keys (in future it will probably need to access a function pointer field of map).
Change DenseArray.push to only make space, not clone key (cloning needs knowledge of the key type).
Fix map.clone to clone keys too.

Update: use function pointers for hash, equal, clone and free key operations.

@ntrel
Copy link
Contributor Author

ntrel commented Dec 23, 2020

Unrelated gitly error:

Errors while retrieving meta data for module markdown:
Skipping module "markdown", since https://vpm.vlang.io reported that "markdown" does not exist.

@ntrel ntrel requested a review from ka-weihe December 23, 2020 13:14
@ntrel ntrel marked this pull request as ready for review December 23, 2020 13:16
vlib/builtin/map.v Outdated Show resolved Hide resolved
key := *&string(pkey)
hash.wyhash_c(key.str, u64(key.len), 0)
} else {
hash.wyhash_c(pkey, u64(m.key_bytes), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not use wyhash_c for integers. We have wyhash64 in the cheaders, which should be a lot faster.

Copy link
Contributor Author

@ntrel ntrel Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would need to have more if branches for each value of key_bytes to use wyhash64. E.g. this pull supports byte, i16, int, i64 sizes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wyhash64(*&u64(pkey), 0) does not work unless key_bytes is 8.

Copy link
Member

@ka-weihe ka-weihe Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wyhash64(*&u64(pkey), 0) does not work unless key_bytes is 8.

Why can't you do:
wyhash64(u64(*&u32(pkey)), u64(0))

if the hash-function is passed a function pointer, we would just have a hash-function for each integer type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*&u32(pkey)

That reads 4 bytes from pkey when it may point to only 1, 2 or maybe 8 bytes.

Copy link
Member

@ka-weihe ka-weihe Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was only an example for u32 keys. You would have to cast differently for each key type.

@ka-weihe
Copy link
Member

ka-weihe commented Dec 23, 2020

Nice work. I'm just a bit worried that these changes will hurt performance quite a lot, but I guess it is okay for now - I don't know, not up to me. I will probably improve it in the new year. When this approach is expanded to even more key types, then it will be extremely slow 😢

@ntrel
Copy link
Contributor Author

ntrel commented Dec 24, 2020

Rebased.

@ntrel
Copy link
Contributor Author

ntrel commented Dec 24, 2020

I've now added a hash_fn map field - this depends on #7541 for bootstrapping (hence tests failing ATM). This is initialized in new_map, and non-string keys now use wyhash64 as requested.

// for now assume anything bigger than a pointer is a string
has_string_keys := key_bytes > sizeof(voidptr)
mut hash_fn := voidptr(0)
match key_bytes {
Copy link
Member

@ka-weihe ka-weihe Dec 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a lot better now, but this match-statement should be in cgen. This can also be done in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we also need function pointers for free, clone and equal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now added function pointers for those 3 operations.

// The growth-factor is roughly 1.125 `(x + (x >> 3))`
[inline]
fn (mut d DenseArray) push(key voidptr, value voidptr) int {
fn (mut d DenseArray) push() int {
Copy link
Member

@ka-weihe ka-weihe Dec 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you remove the parameters from this method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, you describe this in #7538, but it shouldn't be named push if it takes no arguments. I'm fine with it being fixed in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now renamed to expand.

@ka-weihe ka-weihe closed this Dec 25, 2020
@ka-weihe ka-weihe reopened this Dec 25, 2020
@ntrel
Copy link
Contributor Author

ntrel commented Dec 26, 2020

This pull now depends on #7578 for bootstrap.

@ka-weihe
Copy link
Member

ka-weihe commented Dec 26, 2020

Good job @ntrel. My only complaint now is that the match-statement is not in cgen, but I'm totally fine with it being done in another PR. I approve.

@medvednikov
Copy link
Member

@ka-weihe

regarding performance, would simply converting a key to a string be better?

for example m[2] => m['2']

@medvednikov medvednikov reopened this Dec 27, 2020
@ka-weihe
Copy link
Member

@ka-weihe

regarding performance, would simply converting a key to a string be better?

for example m[2] => m['2']

I don't think so. What if the integer is "2147483647", hashing that with the string hash-function would take so much longer compared to the integer hash-function. The same can be said for the equality function. I think what we are doing here is the fastest we can get without generics and that is probably why Go does the same 🙂

@medvednikov
Copy link
Member

I see, thanks.

@medvednikov
Copy link
Member

Conflict after merging the other map PR.

@medvednikov medvednikov reopened this Dec 27, 2020
This prevents 'conflicting types' errors with some C compilers (even
though the types are the same).
/tmp/v/test_session_247731369751/x.10116584592008963799.tmp.c: In function ‘new_map’:
/tmp/v/test_session_247731369751/x.10116584592008963799.tmp.c:7428:18: error: ‘_t123’ undeclared (first use in this function); did you mean ‘_t122’?
 map_free_string* _t123; /* if prepend */
                  ^~~~~
                  _t122
@ntrel
Copy link
Contributor Author

ntrel commented Dec 27, 2020

@medvednikov Tests green!

@Delta456
Copy link
Member

@ntrel Wonderful work!! 🎉 🥳

@medvednikov medvednikov merged commit e813583 into vlang:master Dec 27, 2020
@medvednikov
Copy link
Member

Great job, thanks!

@ntrel ntrel deleted the map-int branch December 27, 2020 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants