`selectors` should support slicing columns #15963

samukweku · 2024-04-30T07:06:15Z

Description

Hi team. I would like to suggest adding a slice method to the selectors class, where users can select a slice of columns :

import polars as pl

data = {'City': ['Houston', 'Austin', 'Hoover'],
 'State': ['Texas', 'Texas', 'Alabama'],
 'Name': ['Aria', 'Penelope', 'Niko'],
 'Mango': [4, 10, 90],
 'Orange': [10, 8, 14],
 'Watermelon': [40, 99, 43],
 'Gin': [16, 200, 34],
 'Vodka': [20, 33, 18]}

df = pl.DataFrame(data)

df

┌─────────┬─────────┬──────────┬───────┬────────┬────────────┬─────┬───────┐
│ City    ┆ State   ┆ Name     ┆ Mango ┆ Orange ┆ Watermelon ┆ Gin ┆ Vodka │
│ ---     ┆ ---     ┆ ---      ┆ ---   ┆ ---    ┆ ---        ┆ --- ┆ ---   │
│ str     ┆ str     ┆ str      ┆ i64   ┆ i64    ┆ i64        ┆ i64 ┆ i64   │
╞═════════╪═════════╪══════════╪═══════╪════════╪════════════╪═════╪═══════╡
│ Houston ┆ Texas   ┆ Aria     ┆ 4     ┆ 10     ┆ 40         ┆ 16  ┆ 20    │
│ Austin  ┆ Texas   ┆ Penelope ┆ 10    ┆ 8      ┆ 99         ┆ 200 ┆ 33    │
│ Hoover  ┆ Alabama ┆ Niko     ┆ 90    ┆ 14     ┆ 43         ┆ 34  ┆ 18    │
└─────────┴─────────┴──────────┴───────┴────────┴────────────┴─────┴───────┘

The slicing syntax can be :

df.select(cs.slice('Mango','Vodka')) # alternative - df.select(cs['Mango':'Vodka'])
shape: (3, 5)
┌───────┬────────┬────────────┬─────┬───────┐
│ Mango ┆ Orange ┆ Watermelon ┆ Gin ┆ Vodka │
│ ---   ┆ ---    ┆ ---        ┆ --- ┆ ---   │
│ i64   ┆ i64    ┆ i64        ┆ i64 ┆ i64   │
╞═══════╪════════╪════════════╪═════╪═══════╡
│ 4     ┆ 10     ┆ 40         ┆ 16  ┆ 20    │
│ 10    ┆ 8      ┆ 99         ┆ 200 ┆ 33    │
│ 90    ┆ 14     ┆ 43         ┆ 34  ┆ 18    │
└───────┴────────┴────────────┴─────┴───────┘

The text was updated successfully, but these errors were encountered:

aut0clave · 2024-04-30T10:41:21Z

If you know what fields you want, why do you need a selector? Why not use a simple .select("Mango","Vodka")? Or the existing cs.by_name("Mango","Vodka")?

cmdlineluser · 2024-04-30T11:01:01Z

@aut0clave They want to extract the "range of columns" Mango .. Vodka

I believe first/last are the only selectors that are "positional"

>>> cs.first().meta.serialize()
'{"Nth":0}'

There is no .nth() selector, but it would be easy to add:

>>> df.select( pl.Expr.deserialize( io.StringIO("""{"Nth":3}""") ) )
shape: (3, 1)
┌───────┐
│ Mango │
│ ---   │
│ i64   │
╞═══════╡
│ 4     │
│ 10    │
│ 90    │
└───────┘

nth -> column name mapping is done here:

polars/crates/polars-plan/src/logical_plan/expr_expansion.rs

Line 67 in 4b23768

fn replace_nth(expr: Expr, schema: &Schema) -> Expr {

From what I can tell, there is nothing that goes the other way, i.e. column name -> nth - which I think would be needed in order to support this at the selector level?

samukweku · 2024-04-30T12:11:23Z

@cmdlineluser i'd assume there was a way to get the positions of the column names (maybe grab the positions via list.index from python and pass it to the rust end). dont know much about the internal implementation, happy to learn. I'd also suggest, if the team feels like this is a worthwhile addition, that the slicing be limited to column names only (numeric positions should not be supported)

alexander-beedie · 2024-04-30T13:24:56Z

@cmdlineluser i'd assume there was a way to get the positions of the column names (maybe grab the positions via list.index from python and pass it to the rust end).

FYI: until we are actually evaluating a lazy query plan we may not know the position of all of the columns (eg: expanding a struct, or evaluating earlier selectors). Consequently we can't precompute and pass-down, because it's only at the lower level that we would know the answer (selectors are dynamic, evaluating internally at the point they are invoked) ;)

Offering index-based selection doesn't seem like a bad idea (we currently only support selection by name/dtype and the special cases of first/last, as noted by @cmdlineluser), but would need some internal additions to be possible 🤔

samukweku · 2024-05-03T23:18:14Z

@cmdlineluser so something like cs.by_position, cs.by_range?

cmdlineluser · 2024-05-04T00:29:15Z

@alexander-beedie is the person to ask. (they created selectors :-D)

alexander-beedie · 2024-05-04T07:16:52Z

@cmdlineluser so something like cs.by_position, cs.by_range?

Probably cs.by_index, which would take one or more index values, a range, or a slice (as range/slice can be directly expanded into a list of indexes, so internally we just need to handle that). Does need additional low-level support though.

alexander-beedie · 2024-07-03T13:51:28Z

FYI, forgot to update this issue, but we do now have a new cs.by_index selector which can take indices and ranges, which gets you some of the way there: #16217

samukweku · 2024-07-03T14:02:39Z

Thanks @alexander-beedie. Looks good. Safe to assume that slicing with labels may be implemented at a future date?

alexander-beedie · 2024-07-03T16:17:40Z

Thanks @alexander-beedie. Looks good. Safe to assume that slicing with labels may be implemented at a future date?

Probably, but no timeline; the 1.0 (and a few quick point releases to address any related issues) has priority at the moment. And I'm on vacation for the next two weeks ;)

samukweku added the enhancement New feature or an improvement of an existing feature label Apr 30, 2024

alexander-beedie added the A-selectors Area: column selectors label Jun 9, 2024

alexander-beedie self-assigned this Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`selectors` should support slicing columns #15963

`selectors` should support slicing columns #15963

samukweku commented Apr 30, 2024

aut0clave commented Apr 30, 2024

cmdlineluser commented Apr 30, 2024 •

edited

Loading

samukweku commented Apr 30, 2024

alexander-beedie commented Apr 30, 2024 •

edited

Loading

samukweku commented May 3, 2024

cmdlineluser commented May 4, 2024 •

edited

Loading

alexander-beedie commented May 4, 2024

alexander-beedie commented Jul 3, 2024 •

edited

Loading

samukweku commented Jul 3, 2024 •

edited

Loading

alexander-beedie commented Jul 3, 2024

selectors should support slicing columns #15963

selectors should support slicing columns #15963

Comments

samukweku commented Apr 30, 2024

Description

aut0clave commented Apr 30, 2024

cmdlineluser commented Apr 30, 2024 • edited Loading

samukweku commented Apr 30, 2024

alexander-beedie commented Apr 30, 2024 • edited Loading

samukweku commented May 3, 2024

cmdlineluser commented May 4, 2024 • edited Loading

alexander-beedie commented May 4, 2024

alexander-beedie commented Jul 3, 2024 • edited Loading

samukweku commented Jul 3, 2024 • edited Loading

alexander-beedie commented Jul 3, 2024

`selectors` should support slicing columns #15963

`selectors` should support slicing columns #15963

cmdlineluser commented Apr 30, 2024 •

edited

Loading

alexander-beedie commented Apr 30, 2024 •

edited

Loading

cmdlineluser commented May 4, 2024 •

edited

Loading

alexander-beedie commented Jul 3, 2024 •

edited

Loading

samukweku commented Jul 3, 2024 •

edited

Loading