Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Incompatible dype warning when assigning boolean series with logical indexer #57338

Open
3 tasks done
m0nzderr opened this issue Feb 10, 2024 · 3 comments · May be fixed by #60127
Open
3 tasks done

BUG: Incompatible dype warning when assigning boolean series with logical indexer #57338

m0nzderr opened this issue Feb 10, 2024 · 3 comments · May be fixed by #60127
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves PDEP6-related related to PDEP6 (not upcasting during setitem-like Series operations)
Milestone

Comments

@m0nzderr
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

data1 = pd.Series([True, True, True], dtype=bool)
data2 = pd.Series([False, False, False], dtype=bool)
condition = pd.Series([False, True, False], dtype=bool)

data1[condition] = data2[condition] # > FutureWarning: Setting an item of incompatible dtype...

Issue Description

The assignment data1[condition] = data2[condition] results in warning claiming incompatible types:

FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in
a future version of pandas. Value '[False]' has dtype incompatible with bool, please
explicitly cast to a compatible dtype first. data1[condition] = data2[condition]

Which is clearly not true.
This bug is somewhat related to the one reported in #56600, although the use case is different.
Interestingly, the problem disappears when using another dtype, such as int (code below works as expected, no warnings) :

data1 = pd.Series([1 ,2, 3], dtype=int)
data2 = pd.Series([4, 5, 6], dtype=int)
condition = pd.Series([False, True, False], dtype=bool)

data1[condition] = data2[condition] 

Note: Reproduced on pandas==2.2.0 and 3.0.0.dev0+292.g70f367194a

Expected Behavior

There should be no warning.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 70f3671 python : 3.10.12.final.0 python-bits : 64 OS : Linux OS-release : 5.15.133.1-microsoft-standard-WSL2 Version : #1 SMP Thu Oct 5 21:02:42 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8

pandas : 3.0.0.dev0+292.g70f367194a
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.8.2
setuptools : 59.6.0
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.21.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None

@m0nzderr m0nzderr added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 10, 2024
@phofl
Copy link
Member

phofl commented Feb 10, 2024

This actually upcasts to object, which means that the warning is kind of correct, but this should definitely work and continue to work.

I suspect that we do a align under the hood that causes this, cc @MarcoGorelli

@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves PDEP6-related related to PDEP6 (not upcasting during setitem-like Series operations) and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 10, 2024
@phofl phofl added this to the 3.0 milestone Feb 10, 2024
@m0nzderr
Copy link
Author

@phofl I'm not familiar with the implementation and have no idea about the reasons for the upcast, but, intuitively, I see no reason for that to happen. Both series already have the same dtype, so any conversion would be unnecessary. It is also very counter-intuitive to have different behaviors among primitive types (i.e., no upcast in case of ints or floats, but an upcast in case of bools...).

@SpoopyPillow
Copy link
Contributor

I think what's happening is that when we mask the series, we eventually try to set the items at the index array([1]) to array([False]) (which is the second element of data2). However, this array is an object type. When we try to cast this array to a bool, pandas gives a LossySetitemError, which may be the problem.

@SpoopyPillow SpoopyPillow linked a pull request Oct 30, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves PDEP6-related related to PDEP6 (not upcasting during setitem-like Series operations)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants