We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
data = { "ix": [1, 2, 1, 1, 2, 2], "iy": [1, 2, 2, 1, 2, 1], "col": ["b", "b", "a", "a", "a", "a"], "col_b": ["x", "y", "x", "y", "x", "y"], "foo": [7, 1, 0, 1, 2, 2], "bar": [9, 4, 0, 2, 0, 0], } pivot_modin = mpd.DataFrame(data).pivot_table( values=['foo'], index=['ix'], columns=['col'], aggfunc='min', margins=False, observed=True, ) pivot_modin.loc[:, [('foo', 'b')]]
this raises
KeyError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key) 3804 try: -> 3805 return self._engine.get_loc(casted_key) 3806 except KeyError as err: File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc() File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: ('foo', 'b') The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:3499, in MultiIndex.get_locs(self, seq) 3498 try: -> 3499 lvl_indexer = self._get_level_indexer(k, level=i, indexer=indexer) 3500 except (InvalidIndexError, TypeError, KeyError) as err: 3501 # InvalidIndexError e.g. non-hashable, fall back to treating 3502 # this as a sequence of labels 3503 # KeyError it can be ambiguous if this is a label or sequence 3504 # of labels 3505 # github.com/pandas-dev/pandas/issues/39424#issuecomment-871626708 File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:3391, in MultiIndex._get_level_indexer(self, key, level, indexer) 3390 else: -> 3391 idx = self._get_loc_single_level_index(level_index, key) 3393 if level > 0 or self._lexsort_depth == 0: 3394 # Desired level is not sorted File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:2980, in MultiIndex._get_loc_single_level_index(self, level_index, key) 2979 else: -> 2980 return level_index.get_loc(key) File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key) 3811 raise InvalidIndexError(key) -> 3812 raise KeyError(key) from err 3813 except TypeError: 3814 # If we have a listlike key, _check_indexing_error will raise 3815 # InvalidIndexError. Otherwise we fall through and re-raise 3816 # the TypeError. KeyError: ('foo', 'b') During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key) 3804 try: -> 3805 return self._engine.get_loc(casted_key) 3806 except KeyError as err: File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc() File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'b' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) Cell In[17], line 1 ----> 1 pivot_modin.loc[:, [('foo', 'b')]] File /opt/conda/lib/python3.10/site-packages/modin/logging/logger_decorator.py:144, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs) 129 """ 130 Compute function with logging if Modin logging is enabled. 131 (...) 141 Any 142 """ 143 if LogMode.get() == "disable": --> 144 return obj(*args, **kwargs) 146 logger = get_logger() 147 logger.log(log_level, start_line) File /opt/conda/lib/python3.10/site-packages/modin/pandas/indexing.py:666, in _LocIndexer.__getitem__(self, key) 664 except KeyError: 665 pass --> 666 return self._helper_for__getitem__( 667 key, *self._parse_row_and_column_locators(key) 668 ) File /opt/conda/lib/python3.10/site-packages/modin/logging/logger_decorator.py:144, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs) 129 """ 130 Compute function with logging if Modin logging is enabled. 131 (...) 141 Any 142 """ 143 if LogMode.get() == "disable": --> 144 return obj(*args, **kwargs) 146 logger = get_logger() 147 logger.log(log_level, start_line) File /opt/conda/lib/python3.10/site-packages/modin/pandas/indexing.py:712, in _LocIndexer._helper_for__getitem__(self, key, row_loc, col_loc, ndim) 709 if isinstance(row_loc, Series) and is_boolean_array(row_loc): 710 return self._handle_boolean_masking(row_loc, col_loc) --> 712 qc_view = self.qc.take_2d_labels(row_loc, col_loc) 713 result = self._get_pandas_object_from_qc_view( 714 qc_view, 715 row_multiindex_full_lookup, (...) 719 ndim, 720 ) 722 if isinstance(result, Series): File /opt/conda/lib/python3.10/site-packages/modin/core/storage_formats/pandas/query_compiler_caster.py:157, in apply_argument_cast.<locals>.cast_args(*args, **kwargs) 155 kwargs = cast_nested_args_to_current_qc_type(kwargs, current_qc) 156 args = cast_nested_args_to_current_qc_type(args, current_qc) --> 157 return obj(*args, **kwargs) File /opt/conda/lib/python3.10/site-packages/modin/logging/logger_decorator.py:144, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs) 129 """ 130 Compute function with logging if Modin logging is enabled. 131 (...) 141 Any 142 """ 143 if LogMode.get() == "disable": --> 144 return obj(*args, **kwargs) 146 logger = get_logger() 147 logger.log(log_level, start_line) File /opt/conda/lib/python3.10/site-packages/modin/core/storage_formats/base/query_compiler.py:4217, in BaseQueryCompiler.take_2d_labels(self, index, columns) 4197 def take_2d_labels( 4198 self, 4199 index, 4200 columns, 4201 ): 4202 """ 4203 Take the given labels. 4204 (...) 4215 Subset of this QueryCompiler. 4216 """ -> 4217 row_lookup, col_lookup = self.get_positions_from_labels(index, columns) 4218 if isinstance(row_lookup, slice): 4219 ErrorMessage.catch_bugs_and_request_email( 4220 failure_condition=row_lookup != slice(None), 4221 extra_log=f"Only None-slices are acceptable as a slice argument in masking, got: {row_lookup}", 4222 ) File /opt/conda/lib/python3.10/site-packages/modin/core/storage_formats/pandas/query_compiler_caster.py:157, in apply_argument_cast.<locals>.cast_args(*args, **kwargs) 155 kwargs = cast_nested_args_to_current_qc_type(kwargs, current_qc) 156 args = cast_nested_args_to_current_qc_type(args, current_qc) --> 157 return obj(*args, **kwargs) File /opt/conda/lib/python3.10/site-packages/modin/logging/logger_decorator.py:144, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs) 129 """ 130 Compute function with logging if Modin logging is enabled. 131 (...) 141 Any 142 """ 143 if LogMode.get() == "disable": --> 144 return obj(*args, **kwargs) 146 logger = get_logger() 147 logger.log(log_level, start_line) File /opt/conda/lib/python3.10/site-packages/modin/core/storage_formats/base/query_compiler.py:4314, in BaseQueryCompiler.get_positions_from_labels(self, row_loc, col_loc) 4312 axis_lookup = self.get_axis(axis).get_indexer_for(axis_loc) 4313 else: -> 4314 axis_lookup = self.get_axis(axis).get_locs(axis_loc) 4315 elif is_boolean_array(axis_loc): 4316 axis_lookup = boolean_mask_to_numeric(axis_loc) File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:3513, in MultiIndex.get_locs(self, seq) 3509 raise err 3510 # GH 39424: Ignore not founds 3511 # GH 42351: No longer ignore not founds & enforced in 2.0 3512 # TODO: how to handle IntervalIndex level? (no test cases) -> 3513 item_indexer = self._get_level_indexer( 3514 x, level=i, indexer=indexer 3515 ) 3516 if lvl_indexer is None: 3517 lvl_indexer = _to_bool_indexer(item_indexer) File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:3391, in MultiIndex._get_level_indexer(self, key, level, indexer) 3388 return slice(i, j, step) 3390 else: -> 3391 idx = self._get_loc_single_level_index(level_index, key) 3393 if level > 0 or self._lexsort_depth == 0: 3394 # Desired level is not sorted 3395 if isinstance(idx, slice): 3396 # test_get_loc_partial_timestamp_multiindex File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:2980, in MultiIndex._get_loc_single_level_index(self, level_index, key) 2978 return -1 2979 else: -> 2980 return level_index.get_loc(key) File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key) 3807 if isinstance(casted_key, slice) or ( 3808 isinstance(casted_key, abc.Iterable) 3809 and any(isinstance(x, slice) for x in casted_key) 3810 ): 3811 raise InvalidIndexError(key) -> 3812 raise KeyError(key) from err 3813 except TypeError: 3814 # If we have a listlike key, _check_indexing_error will raise 3815 # InvalidIndexError. Otherwise we fall through and re-raise 3816 # the TypeError. 3817 self._check_indexing_error(key) KeyError: 'b'
what pandas does
ix 1 7 2 1 Name: (foo, b), dtype: int64
Replace this line with the error backtrace (if applicable).
commit : 3e951a6 python : 3.10.14 python-bits : 64 OS : Linux OS-release : 5.15.154+ Version : #1 SMP Thu Jun 27 20:43:36 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : POSIX LANG : C.UTF-8 LOCALE : None.None
modin : 0.32.0 ray : 2.24.0 dask : 2024.9.1 distributed : None
pandas : 2.2.3 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 pip : 24.0 Cython : 3.0.10 sphinx : None IPython : 8.21.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.6.1 html5lib : 1.1 hypothesis : None gcsfs : 2024.6.1 jinja2 : 3.1.4 lxml.etree : 5.3.0 matplotlib : 3.7.5 numba : 0.60.0 numexpr : 2.10.1 odfpy : None openpyxl : 3.1.5 pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 17.0.0 pyreadstat : None pytest : 8.3.3 python-calamine : None pyxlsb : None s3fs : 2024.6.1 scipy : 1.14.1 sqlalchemy : 2.0.30 tables : 3.10.1 tabulate : 0.9.0 xarray : 2024.9.0 xlrd : None xlsxwriter : None zstandard : 0.23.0 tzdata : 2024.1 qtpy : None pyqt5 : None
The text was updated successfully, but these errors were encountered:
Spotted in the Narwhals CI - for now we'll just xfail and raise NotImplementedError for Modin for the pivot operation
pivot
Sorry, something went wrong.
No branches or pull requests
Modin version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
this raises
Expected Behavior
what pandas does
Error Logs
Installed Versions
INSTALLED VERSIONS
commit : 3e951a6
python : 3.10.14
python-bits : 64
OS : Linux
OS-release : 5.15.154+
Version : #1 SMP Thu Jun 27 20:43:36 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : POSIX
LANG : C.UTF-8
LOCALE : None.None
Modin dependencies
modin : 0.32.0
ray : 2.24.0
dask : 2024.9.1
distributed : None
pandas dependencies
pandas : 2.2.3
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
pip : 24.0
Cython : 3.0.10
sphinx : None
IPython : 8.21.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.1
html5lib : 1.1
hypothesis : None
gcsfs : 2024.6.1
jinja2 : 3.1.4
lxml.etree : 5.3.0
matplotlib : 3.7.5
numba : 0.60.0
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 17.0.0
pyreadstat : None
pytest : 8.3.3
python-calamine : None
pyxlsb : None
s3fs : 2024.6.1
scipy : 1.14.1
sqlalchemy : 2.0.30
tables : 3.10.1
tabulate : 0.9.0
xarray : 2024.9.0
xlrd : None
xlsxwriter : None
zstandard : 0.23.0
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: