Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove other derived type from hashmap #843

Open
wants to merge 49 commits into
base: master
Choose a base branch
from

Conversation

chuckyvt
Copy link
Contributor

@chuckyvt chuckyvt commented Jul 2, 2024

This is a proposed update to the hashmap routines to remove the 'other' derived type that is currently required to store values. I reviewed the PR history related to the hashmap routines to try to understand the reason for using the other derived type. I don't see a clear reason other than perhaps it was thought that directly using unlimited polymorphic type would require use of select type in the code? I should note up front that this change is not backwards compatible with existing code, as the get_other_data function will now require as class(*), allocatable data type in the other field, which is discussed a bit more below. But I do believe it simpler and a better long-term approach.

For an example. Currently to set a value, and then retrieve it, the code looks like:

type(other_type)      :: other
class(*), allocatable  :: data
type(chaining_hashmap_type) :: map

call set(other, 4.0)
call map%map_entry(key, other)

call map%get_other_data(key, other)
call get(other, data)

select type (data)
    type is (real)
        print *, 'Value = ', data
end select

This will simply too

class(*), allocatable  :: data
type(chaining_hashmap_type) :: map

call map%map_entry(key, 4.0)

call map%get_other_data(key, data)

select type (data)
    type is (real)
        print *, 'Value = ', data
end select

As mentioned above, this change is not backwards compatible with existing code that uses the other type. The map_entry part will function correctly, however the get_other_data routine will require a class(*), allocatable type data. I believe the hashmaps are still listed as experimental, and thus subject to change, so would like to think this isn't too detrimental.

I am submitting this in draft form, as the specification still need updating.

Note, would like to give credit to @LKedward, as his fhash package was used as a reference in developing this PR.

Update to hashmap routines to remove 'other' data derived wrapper type.
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @chuckyvt for this proposition. I already thought about this complexity (see for example here). But I think that your solution is more elegant.

A question is: why was the DT other_type considered as needed in this implementation, while it seems to add only one level of abstraction? If there are no major reasons, I am in favour of implementing your PR.

src/stdlib_hashmap_chaining.f90 Show resolved Hide resolved
src/stdlib_hashmap_chaining.f90 Show resolved Hide resolved
src/stdlib_hashmap_chaining.f90 Show resolved Hide resolved
src/stdlib_hashmap_chaining.f90 Outdated Show resolved Hide resolved
src/stdlib_hashmaps.f90 Outdated Show resolved Hide resolved
src/stdlib_hashmap_open.f90 Outdated Show resolved Hide resolved
@jvdp1
Copy link
Member

jvdp1 commented Jul 2, 2024

@chuckyvt The other_type was also questionned during the implementation of the hashmap modules: see here, here, here, here, here, here, here, here, here, here, here.

I probably missed some coments on this topic. But it seems that the main concern regarding other_type vs class(*) is the performance.

chuckyvt and others added 19 commits July 2, 2024 22:30
Co-authored-by: Jeremie Vandenplas <[email protected]>
Remove reference of other_type that were missed in the initial commits.
In progress check to understand in-line allocation of other.
Tweak to see why CI test fails for this.
Test to see if stack size increase addresses CI failure.
Add double hyphen for GCC nomenclature.
Bug fix on linker line
Revert example/CMakesLists.txt back to original.
Test to see if updating to chaining hashmaps fixes the single CI failure.
Update to remove remaining references to the 'other_type' derived type.
Remove reference to other_type
Remove 'other_type' reference.
Remove reference to 'other_type'.
@chuckyvt chuckyvt marked this pull request as ready for review July 10, 2024 12:30
@chuckyvt chuckyvt marked this pull request as draft July 10, 2024 12:39
Try allocate(new_ent % other, source = other) one more time.
typo fix
@jvdp1 jvdp1 marked this pull request as ready for review July 10, 2024 18:32
@jvdp1 jvdp1 marked this pull request as draft July 10, 2024 18:33
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @chuckyvt . LGTM, pending some very minor comments.
Out of curiosity, did you do some benchmarks between the two versions?

doc/specs/stdlib_hashmaps.md Outdated Show resolved Hide resolved
doc/specs/stdlib_hashmaps.md Outdated Show resolved Hide resolved
doc/specs/stdlib_hashmaps.md Outdated Show resolved Hide resolved
doc/specs/stdlib_hashmaps.md Outdated Show resolved Hide resolved
example/hashmaps/CMakeLists.txt Outdated Show resolved Hide resolved
example/hashmaps/example_hashmaps_get_other_data.f90 Outdated Show resolved Hide resolved
example/hashmaps/example_hashmaps_map_entry.f90 Outdated Show resolved Hide resolved
example/hashmaps/example_hashmaps_remove.f90 Outdated Show resolved Hide resolved
example/hashmaps/example_hashmaps_set_other_data.f90 Outdated Show resolved Hide resolved
chuckyvt and others added 3 commits July 11, 2024 08:52
Reverting back to inline allocation based on PR discussion.  Both approaches seemed to work fine.
@chuckyvt chuckyvt marked this pull request as ready for review July 11, 2024 13:05
@chuckyvt
Copy link
Contributor Author

I believe all requested changes in. I did do some prelim benchmarking. It appeared that mapping data may be slightly faster with this approach. (which makes sense to me, it takes out an interface). I haven't been able to compare getting data. I will work to pull that together and upload that comparison.

@chuckyvt
Copy link
Contributor Author

First of all, should note this is on Windows, with Intel IFX 2024.1 compiler in CMake Release mode.

I went through and ran the updated open and chaining test programs with this PR. (ie the _pr files). I also went back to the current version tests and made some updates to make it an equivalent comparison. (Removed the redundant dummy value and associated allocation, added a get(other,data) command so that on the get tests the final step is with an unlimited polymorphic value).

Long story short, it's difficult to make any judgements with any confidence. Comparing the files, I think I could convince myself that the PR has slightly better performance? But it's also possible it's just random. Without a Python 'timeit' type function that will repeat the commands many times, it's difficult to gauge.

test_open_maps_pr.txt
test_open_maps_baseline.txt
test_chaining_maps_pr.txt
test_chaining_maps_baseline.txt

Remove outdated comment.
@chuckyvt
Copy link
Contributor Author

Also, I have tried everything I can think of to get the one Windows CI 32 Bit GFortran build to run reliably, but no luck. It fails on the set_other_data example, which is a relatively simple case.

@jvdp1
Copy link
Member

jvdp1 commented Jul 14, 2024

Comparing the files, I think I could convince myself that the PR has slightly better performance? But it's also possible it's just random

Thank you @chuckyvt . These "benchmarks" ar eenough for me. It confirms that your changes do not reduce the performances. Thank you for running them.

doc/specs/stdlib_hashmaps.md Outdated Show resolved Hide resolved
@jvdp1
Copy link
Member

jvdp1 commented Jul 14, 2024

Also, I have tried everything I can think of to get the one Windows CI 32 Bit GFortran build to run reliably, but no luck. It fails on the set_other_data example, which is a relatively simple case.

@chuckyvt I tested several things. It seems it is related with the allocate statement of the other data. I could make things only worse. Could it be a compiler bug?

@jalvesz @perazz Any ideas on this issue?

@perazz
Copy link
Contributor

perazz commented Jul 15, 2024

@jvdp1 @chuckyvt the use of class(*), allocatable is to enable hashing of any derived types beyond the standard intrinsics, so, I believe that removal of other_type would introduce a backward incompatible change? I found some details at the original discussion here: #611

Perhaps to increase performance when using intrinsic types, imho the best approach would be to template the hashmap derived type:

type, abstract, extends(hashmap_type) :: chaining_hashmap_type
type, extends(chaining_hashmap_type) :: chaining_hashmap_type_generic
type, extends(chaining_hashmap_type) :: chaining_hashmap_type_int32
type, extends(chaining_hashmap_type) :: chaining_hashmap_type_int64
!....

but I understand that would be a significant deviation from the current PR.

Co-authored-by: Jeremie Vandenplas <[email protected]>
@chuckyvt
Copy link
Contributor Author

@jvdp1 @chuckyvt the use of class(*), allocatable is to enable hashing of any derived types beyond the standard intrinsics, so, I believe that removal of other_type would introduce a backward incompatible change? I found some details at the original discussion here: #611

Perhaps to increase performance when using intrinsic types, imho the best approach would be to template the hashmap derived type:

type, abstract, extends(hashmap_type) :: chaining_hashmap_type
type, extends(chaining_hashmap_type) :: chaining_hashmap_type_generic
type, extends(chaining_hashmap_type) :: chaining_hashmap_type_int32
type, extends(chaining_hashmap_type) :: chaining_hashmap_type_int64
!....

but I understand that would be a significant deviation from the current PR.

Just for clarity, this PR only affects the 'other' or value part of the hashmaps, there is no change to the code associated with the key interface and hashing. You are correct, this would be incompatible with existing "other_type" uses. This is covered in the description of the PR. Perhaps there is a way to publish a notice and/or simple transition guide along with this PR?

Finally, I've also thought about extending hashmaps to specific types as you mentioned. That is out of scope of this PR but there is nothing in this PR preventing that. Performance would improve and would not require use of 'select type' on retrieval which would be nice. However, I believe we would still need a class(*) interface to store mixed data type maps, and also to store derived types.

Test to see if removing associate construct fixes the one CI failure.
Another try to fix the CI
Hopefully this fixes the CI failures seen on 32 bit Gfortran.
Revert back to inline allocation.
Another try to fix CI issue
@chuckyvt
Copy link
Contributor Author

I've run out of idea to fix the 32-bit Windows CI failure, as I've tried everything I can think of. I do believe it's a compiler bug as I have never seen the issue on any of the other builds, and the set_other_data code is relatively simple. The only option I know of is to disable the set_other_data example, however it's worth noting that this is currently the only set_other_data test.

Based on this experience I may look into a future PR to remove the 32-bit windows Gfortran compiler and replace with Windows Intel IFX, which seems like much more relevant to the typical stdlib user.

Outside of that, I believe the code is in a final state with no other updates outstanding.

@perazz
Copy link
Contributor

perazz commented Jul 17, 2024

@chuckyvt I see that the mingW i686 setup uses gfortran-13.2.0. It's hard to guess without having the same setup in place, however I have non-trivial issues with gfortran>=13 with nested allocatable components and you may have hit the same compiler problem. Unfortunately that has to date no straightforward solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants