Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: can't use list of tuples of select multiple columns when columns are multiindex #7409

Open
3 tasks done
MarcoGorelli opened this issue Nov 13, 2024 · 1 comment
Open
3 tasks done
Labels
bug 🦗 Something isn't working Triage 🩹 Issues that need triage

Comments

@MarcoGorelli
Copy link

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

data = {
    "ix": [1, 2, 1, 1, 2, 2],
    "iy": [1, 2, 2, 1, 2, 1],
    "col": ["b", "b", "a", "a", "a", "a"],
    "col_b": ["x", "y", "x", "y", "x", "y"],
    "foo": [7, 1, 0, 1, 2, 2],
    "bar": [9, 4, 0, 2, 0, 0],
}
pivot_modin = mpd.DataFrame(data).pivot_table(
    values=['foo'],
    index=['ix'],
    columns=['col'],
    aggfunc='min',
    margins=False,
    observed=True,
)
pivot_modin.loc[:, [('foo', 'b')]]

Issue Description

this raises

KeyError                                  Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('foo', 'b')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:3499, in MultiIndex.get_locs(self, seq)
   3498 try:
-> 3499     lvl_indexer = self._get_level_indexer(k, level=i, indexer=indexer)
   3500 except (InvalidIndexError, TypeError, KeyError) as err:
   3501     # InvalidIndexError e.g. non-hashable, fall back to treating
   3502     #  this as a sequence of labels
   3503     # KeyError it can be ambiguous if this is a label or sequence
   3504     #  of labels
   3505     #  github.com/pandas-dev/pandas/issues/39424#issuecomment-871626708

File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:3391, in MultiIndex._get_level_indexer(self, key, level, indexer)
   3390 else:
-> 3391     idx = self._get_loc_single_level_index(level_index, key)
   3393     if level > 0 or self._lexsort_depth == 0:
   3394         # Desired level is not sorted

File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:2980, in MultiIndex._get_loc_single_level_index(self, level_index, key)
   2979 else:
-> 2980     return level_index.get_loc(key)

File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.

KeyError: ('foo', 'b')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'b'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[17], line 1
----> 1 pivot_modin.loc[:, [('foo', 'b')]]

File /opt/conda/lib/python3.10/site-packages/modin/logging/logger_decorator.py:144, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs)
    129 """
    130 Compute function with logging if Modin logging is enabled.
    131 
   (...)
    141 Any
    142 """
    143 if LogMode.get() == "disable":
--> 144     return obj(*args, **kwargs)
    146 logger = get_logger()
    147 logger.log(log_level, start_line)

File /opt/conda/lib/python3.10/site-packages/modin/pandas/indexing.py:666, in _LocIndexer.__getitem__(self, key)
    664     except KeyError:
    665         pass
--> 666 return self._helper_for__getitem__(
    667     key, *self._parse_row_and_column_locators(key)
    668 )

File /opt/conda/lib/python3.10/site-packages/modin/logging/logger_decorator.py:144, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs)
    129 """
    130 Compute function with logging if Modin logging is enabled.
    131 
   (...)
    141 Any
    142 """
    143 if LogMode.get() == "disable":
--> 144     return obj(*args, **kwargs)
    146 logger = get_logger()
    147 logger.log(log_level, start_line)

File /opt/conda/lib/python3.10/site-packages/modin/pandas/indexing.py:712, in _LocIndexer._helper_for__getitem__(self, key, row_loc, col_loc, ndim)
    709 if isinstance(row_loc, Series) and is_boolean_array(row_loc):
    710     return self._handle_boolean_masking(row_loc, col_loc)
--> 712 qc_view = self.qc.take_2d_labels(row_loc, col_loc)
    713 result = self._get_pandas_object_from_qc_view(
    714     qc_view,
    715     row_multiindex_full_lookup,
   (...)
    719     ndim,
    720 )
    722 if isinstance(result, Series):

File /opt/conda/lib/python3.10/site-packages/modin/core/storage_formats/pandas/query_compiler_caster.py:157, in apply_argument_cast.<locals>.cast_args(*args, **kwargs)
    155     kwargs = cast_nested_args_to_current_qc_type(kwargs, current_qc)
    156     args = cast_nested_args_to_current_qc_type(args, current_qc)
--> 157 return obj(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/modin/logging/logger_decorator.py:144, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs)
    129 """
    130 Compute function with logging if Modin logging is enabled.
    131 
   (...)
    141 Any
    142 """
    143 if LogMode.get() == "disable":
--> 144     return obj(*args, **kwargs)
    146 logger = get_logger()
    147 logger.log(log_level, start_line)

File /opt/conda/lib/python3.10/site-packages/modin/core/storage_formats/base/query_compiler.py:4217, in BaseQueryCompiler.take_2d_labels(self, index, columns)
   4197 def take_2d_labels(
   4198     self,
   4199     index,
   4200     columns,
   4201 ):
   4202     """
   4203     Take the given labels.
   4204 
   (...)
   4215         Subset of this QueryCompiler.
   4216     """
-> 4217     row_lookup, col_lookup = self.get_positions_from_labels(index, columns)
   4218     if isinstance(row_lookup, slice):
   4219         ErrorMessage.catch_bugs_and_request_email(
   4220             failure_condition=row_lookup != slice(None),
   4221             extra_log=f"Only None-slices are acceptable as a slice argument in masking, got: {row_lookup}",
   4222         )

File /opt/conda/lib/python3.10/site-packages/modin/core/storage_formats/pandas/query_compiler_caster.py:157, in apply_argument_cast.<locals>.cast_args(*args, **kwargs)
    155     kwargs = cast_nested_args_to_current_qc_type(kwargs, current_qc)
    156     args = cast_nested_args_to_current_qc_type(args, current_qc)
--> 157 return obj(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/modin/logging/logger_decorator.py:144, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs)
    129 """
    130 Compute function with logging if Modin logging is enabled.
    131 
   (...)
    141 Any
    142 """
    143 if LogMode.get() == "disable":
--> 144     return obj(*args, **kwargs)
    146 logger = get_logger()
    147 logger.log(log_level, start_line)

File /opt/conda/lib/python3.10/site-packages/modin/core/storage_formats/base/query_compiler.py:4314, in BaseQueryCompiler.get_positions_from_labels(self, row_loc, col_loc)
   4312         axis_lookup = self.get_axis(axis).get_indexer_for(axis_loc)
   4313     else:
-> 4314         axis_lookup = self.get_axis(axis).get_locs(axis_loc)
   4315 elif is_boolean_array(axis_loc):
   4316     axis_lookup = boolean_mask_to_numeric(axis_loc)

File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:3513, in MultiIndex.get_locs(self, seq)
   3509     raise err
   3510 # GH 39424: Ignore not founds
   3511 # GH 42351: No longer ignore not founds & enforced in 2.0
   3512 # TODO: how to handle IntervalIndex level? (no test cases)
-> 3513 item_indexer = self._get_level_indexer(
   3514     x, level=i, indexer=indexer
   3515 )
   3516 if lvl_indexer is None:
   3517     lvl_indexer = _to_bool_indexer(item_indexer)

File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:3391, in MultiIndex._get_level_indexer(self, key, level, indexer)
   3388         return slice(i, j, step)
   3390 else:
-> 3391     idx = self._get_loc_single_level_index(level_index, key)
   3393     if level > 0 or self._lexsort_depth == 0:
   3394         # Desired level is not sorted
   3395         if isinstance(idx, slice):
   3396             # test_get_loc_partial_timestamp_multiindex

File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/multi.py:2980, in MultiIndex._get_loc_single_level_index(self, level_index, key)
   2978     return -1
   2979 else:
-> 2980     return level_index.get_loc(key)

File /opt/conda/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'b'

Expected Behavior

what pandas does

ix
1    7
2    1
Name: (foo, b), dtype: int64

Error Logs

Replace this line with the error backtrace (if applicable).

Installed Versions

INSTALLED VERSIONS

commit : 3e951a6
python : 3.10.14
python-bits : 64
OS : Linux
OS-release : 5.15.154+
Version : #1 SMP Thu Jun 27 20:43:36 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : POSIX
LANG : C.UTF-8
LOCALE : None.None

Modin dependencies

modin : 0.32.0
ray : 2.24.0
dask : 2024.9.1
distributed : None

pandas dependencies

pandas : 2.2.3
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
pip : 24.0
Cython : 3.0.10
sphinx : None
IPython : 8.21.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.1
html5lib : 1.1
hypothesis : None
gcsfs : 2024.6.1
jinja2 : 3.1.4
lxml.etree : 5.3.0
matplotlib : 3.7.5
numba : 0.60.0
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 17.0.0
pyreadstat : None
pytest : 8.3.3
python-calamine : None
pyxlsb : None
s3fs : 2024.6.1
scipy : 1.14.1
sqlalchemy : 2.0.30
tables : 3.10.1
tabulate : 0.9.0
xarray : 2024.9.0
xlrd : None
xlsxwriter : None
zstandard : 0.23.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

@MarcoGorelli MarcoGorelli added bug 🦗 Something isn't working Triage 🩹 Issues that need triage labels Nov 13, 2024
@MarcoGorelli
Copy link
Author

MarcoGorelli commented Nov 13, 2024

Spotted in the Narwhals CI - for now we'll just xfail and raise NotImplementedError for Modin for the pivot operation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working Triage 🩹 Issues that need triage
Projects
None yet
Development

No branches or pull requests

1 participant