Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange datatype issues after PR #358 #385

Closed
qchempku2017 opened this issue Jul 10, 2023 · 3 comments · Fixed by #392
Closed

Strange datatype issues after PR #358 #385

qchempku2017 opened this issue Jul 10, 2023 · 3 comments · Fixed by #392
Assignees
Labels
bug Something isn't working

Comments

@qchempku2017
Copy link
Collaborator

qchempku2017 commented Jul 10, 2023

Weird datatype issues associated with smol.util.container classes appear after PR #358 .

Current Behavior

(1) When trying to test test_cofe/test_cluster_utils.py, the following issues pops up on my local computer (but not on CI):

IntArrayContainer = <class 'smol.utils.cluster.container.IntArray1DContainer'>
dim = 1, rng = Generator(PCG64) at 0x25C6BA67660
​
    @pytest.mark.parametrize(
        "IntArrayContainer, dim", [(IntArray1DContainer, 1), (IntArray2DContainer, 2)]
    )
    def test_int_container(IntArrayContainer, dim, rng):
        arrays = tuple(
            rng.integers(1, 5, size=dim * (rng.integers(1, 5),)) for _ in range(10)
        )
>       container = IntArrayContainer(arrays)
​
test_cluster_utils.py:19: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
smol\utils\cluster\container.pyx:230: in smol.utils.cluster.container.IntArray1DContainer.__cinit__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
​
>   ???
E   ValueError: Buffer dtype mismatch, expected 'long' but got 'long long'
​
smol\utils\cluster\container.pyx:251: ValueError

The tests were performed on a 64bit Windows 11 platform, with numpy=1.24.4.

The datatype generated by my random number generator was "long long" (int64, maybe?) rather than "long"(probably int32?). This could require an explicit specification in either the container class or the test to make the containers always store one of the types.

(2) When trying to initialize ClusterExpansion with a coef array that contains int, the following error pops up:

    @cached_property
    def eci(self):
        """Get the ECI for the cluster expansion.
    
        This just divides coefficients by the corresponding multiplicities.
        External terms are dropped since their fitted coefficients do not
        represent ECI.
        """
        num_ext_terms = len(self._subspace.external_terms)  # check for extra terms
        coefs = self.coefs[:-num_ext_terms] if num_ext_terms else self.coefs[:]
        eci = coefs.copy()
>       eci /= self._subspace.function_total_multiplicities
E       numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'divide' output from dtype('float64') to dtype('int32') with casting rule 'same_kind'
..\venv\lib\site-packages\smol\cofe\expansion.py:182: UFuncTypeError

This can be temporarily solved by force transforming coefs array to float type, or do as suggested here:
Discussion on stackoverflow

@qchempku2017 qchempku2017 added the bug Something isn't working label Jul 10, 2023
@lbluque
Copy link
Collaborator

lbluque commented Jul 11, 2023

  1. For the first issue, can you look into the actual data type being generated in the arrays variable and see if you force it to be of type int by explicitly adding dtype=int or using np.int_ (for more info: https://numpy.org/doc/stable/user/basics.types.html)

Seems like np.int64 is long long on windows and long on linux.

@lbluque
Copy link
Collaborator

lbluque commented Jul 11, 2023

  1. For the second issue: You can go ahead and implement the stack-overflow solution you linked.

However, why do you have/want coefficients or eci that are integers?

@qchempku2017
Copy link
Collaborator Author

  1. For the first issue, can you look into the actual data type being generated in the arrays variable and see if you force it to be of type int by explicitly adding dtype=int or using np.int_ (for more info: https://numpy.org/doc/stable/user/basics.types.html)

Seems like np.int64 is long long on windows and long on linux.

array.astype(np.int_) works. np.int_ is int32 on windows, as required by container. Simply enforcing dtype(int) also seems to work. I'll use np.int_.

qchempku2017 added a commit to qchempku2017/smol that referenced this issue Jul 21, 2023
lbluque added a commit that referenced this issue Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants