Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(postgres): more flexible map type implementation #10484

Open
cpcloud opened this issue Nov 13, 2024 Discussed in #10483 · 0 comments
Open

feat(postgres): more flexible map type implementation #10484

cpcloud opened this issue Nov 13, 2024 Discussed in #10483 · 0 comments
Labels
feature Features or general enhancements postgres The PostgreSQL backend refactor Issues or PRs related to refactoring the codebase

Comments

@cpcloud
Copy link
Member

cpcloud commented Nov 13, 2024

Discussed in #10483

Originally posted by augcollet November 13, 2024
Hello,

I need your help to resolve a specific problem...

From the following data with postgresql backend :

import ibis
from ibis import _
import os

con = ibis.postgres.connect(
    user=os.getenv('POSTGRES_USER'),
    password=os.getenv('POSTGRES_PASSWORD'),
    host="postgres",
    port=os.getenv('POSTGRES_PORT'),
    database=os.getenv('POSTGRES_DB'),
)
ibis.set_backend(con)

t=ibis.memtable({
    'client_id':[0,1,0,2,3,0,1,2,3],
    'product':['a', 'b', 'a', 'a', 'b', 'c', 'a', 'a', 'b'],
    'amount':[1.2, 2.5, 4.2, 12.7, 1.2, 3.8, 1.4, 3.8, 3],
})

image

I'm trying to perform the following calculation :
image

I tried the following approach:

data=(
    t.group_by(['client_id', 'product'])
    .agg(
        sum_amount=_['amount'].sum()
    )
    .group_by(['client_id'])
    .agg(
        products_and_sum_amounts=ibis.map(
            _['product'].collect(),
            _['sum_amount'].collect()
        )
    )
)
data.execute()

I get the following error :
image

It seems that ibis uses hstore to store data from a .map, which is incompatible with numeric values.

I have to cast the values ​​to a string before using .collect to get a result.
image

How can I get around this? For example, how can I build a JSON object instead of MapValue?

( My goal is to exploit the resulting pandas dataset to use it with a DictVectorizer under sklearn.
https://scikit-learn.org/1.5/modules/generated/sklearn.feature_extraction.DictVectorizer.html )

Thank you in advance for your support!

@cpcloud cpcloud added feature Features or general enhancements refactor Issues or PRs related to refactoring the codebase postgres The PostgreSQL backend labels Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements postgres The PostgreSQL backend refactor Issues or PRs related to refactoring the codebase
Projects
Status: backlog
Development

No branches or pull requests

1 participant