-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove other derived type from hashmap #843
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @chuckyvt for this proposition. I already thought about this complexity (see for example here). But I think that your solution is more elegant.
A question is: why was the DT other_type
considered as needed in this implementation, while it seems to add only one level of abstraction? If there are no major reasons, I am in favour of implementing your PR.
Co-authored-by: Jeremie Vandenplas <[email protected]>
Co-authored-by: Jeremie Vandenplas <[email protected]>
Co-authored-by: Jeremie Vandenplas <[email protected]>
Remove reference of other_type that were missed in the initial commits.
…into remove_other_type
In progress check to understand in-line allocation of other.
Tweak to see why CI test fails for this.
Test to see if stack size increase addresses CI failure.
Add double hyphen for GCC nomenclature.
Bug fix on linker line
Revert example/CMakesLists.txt back to original.
Bugfix
Test to see if updating to chaining hashmaps fixes the single CI failure.
Update to remove remaining references to the 'other_type' derived type.
Remove reference to other_type
Remove 'other_type' reference.
Remove reference to 'other_type'.
Try allocate(new_ent % other, source = other) one more time.
Another try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @chuckyvt . LGTM, pending some very minor comments.
Out of curiosity, did you do some benchmarks between the two versions?
Co-authored-by: Jeremie Vandenplas <[email protected]>
Co-authored-by: Jeremie Vandenplas <[email protected]>
Reverting back to inline allocation based on PR discussion. Both approaches seemed to work fine.
I believe all requested changes in. I did do some prelim benchmarking. It appeared that mapping data may be slightly faster with this approach. (which makes sense to me, it takes out an interface). I haven't been able to compare getting data. I will work to pull that together and upload that comparison. |
First of all, should note this is on Windows, with Intel IFX 2024.1 compiler in CMake Release mode. I went through and ran the updated open and chaining test programs with this PR. (ie the _pr files). I also went back to the current version tests and made some updates to make it an equivalent comparison. (Removed the redundant dummy value and associated allocation, added a get(other,data) command so that on the get tests the final step is with an unlimited polymorphic value). Long story short, it's difficult to make any judgements with any confidence. Comparing the files, I think I could convince myself that the PR has slightly better performance? But it's also possible it's just random. Without a Python 'timeit' type function that will repeat the commands many times, it's difficult to gauge. test_open_maps_pr.txt |
Remove outdated comment.
Also, I have tried everything I can think of to get the one Windows CI 32 Bit GFortran build to run reliably, but no luck. It fails on the set_other_data example, which is a relatively simple case. |
Thank you @chuckyvt . These "benchmarks" ar eenough for me. It confirms that your changes do not reduce the performances. Thank you for running them. |
@chuckyvt I tested several things. It seems it is related with the allocate statement of the other data. I could make things only worse. Could it be a compiler bug? |
@jvdp1 @chuckyvt the use of Perhaps to increase performance when using intrinsic types, imho the best approach would be to template the hashmap derived type: type, abstract, extends(hashmap_type) :: chaining_hashmap_type
type, extends(chaining_hashmap_type) :: chaining_hashmap_type_generic
type, extends(chaining_hashmap_type) :: chaining_hashmap_type_int32
type, extends(chaining_hashmap_type) :: chaining_hashmap_type_int64
!.... but I understand that would be a significant deviation from the current PR. |
Co-authored-by: Jeremie Vandenplas <[email protected]>
Just for clarity, this PR only affects the 'other' or value part of the hashmaps, there is no change to the code associated with the key interface and hashing. You are correct, this would be incompatible with existing "other_type" uses. This is covered in the description of the PR. Perhaps there is a way to publish a notice and/or simple transition guide along with this PR? Finally, I've also thought about extending hashmaps to specific types as you mentioned. That is out of scope of this PR but there is nothing in this PR preventing that. Performance would improve and would not require use of 'select type' on retrieval which would be nice. However, I believe we would still need a class(*) interface to store mixed data type maps, and also to store derived types. |
I've run out of idea to fix the 32-bit Windows CI failure, as I've tried everything I can think of. I do believe it's a compiler bug as I have never seen the issue on any of the other builds, and the set_other_data code is relatively simple. The only option I know of is to disable the set_other_data example, however it's worth noting that this is currently the only set_other_data test. Based on this experience I may look into a future PR to remove the 32-bit windows Gfortran compiler and replace with Windows Intel IFX, which seems like much more relevant to the typical stdlib user. Outside of that, I believe the code is in a final state with no other updates outstanding. |
@chuckyvt I see that the mingW i686 setup uses gfortran-13.2.0. It's hard to guess without having the same setup in place, however I have non-trivial issues with gfortran>=13 with nested allocatable components and you may have hit the same compiler problem. Unfortunately that has to date no straightforward solution |
This is a proposed update to the hashmap routines to remove the 'other' derived type that is currently required to store values. I reviewed the PR history related to the hashmap routines to try to understand the reason for using the other derived type. I don't see a clear reason other than perhaps it was thought that directly using unlimited polymorphic type would require use of select type in the code? I should note up front that this change is not backwards compatible with existing code, as the get_other_data function will now require as class(*), allocatable data type in the other field, which is discussed a bit more below. But I do believe it simpler and a better long-term approach.
For an example. Currently to set a value, and then retrieve it, the code looks like:
This will simply too
As mentioned above, this change is not backwards compatible with existing code that uses the other type. The map_entry part will function correctly, however the get_other_data routine will require a class(*), allocatable type data. I believe the hashmaps are still listed as experimental, and thus subject to change, so would like to think this isn't too detrimental.
I am submitting this in draft form, as the specification still need updating.
Note, would like to give credit to @LKedward, as his fhash package was used as a reference in developing this PR.