Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phrase embeddings in context #108

Open
jnferfer opened this issue Feb 12, 2024 · 2 comments
Open

Phrase embeddings in context #108

jnferfer opened this issue Feb 12, 2024 · 2 comments

Comments

@jnferfer
Copy link

Hi,

I need to get the embeddings of a word or a phrase within a sentence. This sentence is the context of the word/phrase.

For example, I need the different embedding values of big apple in these two sentences:

I'm living in the Big Apple since 2012
I ate a big apple yesterday

When using model.encode() I can set the parameter output_value to token_embeddings to get token embeddings. However, I don't know how to properly map the output vectors to the target tokens corresponding to the big apple text. Is there a straightforward approach for this?

Thanks!

@hongjin-su
Copy link
Collaborator

You may first check the tokenization of the sentences, record the indices of desired words, e.g., big apple, and find token embeddings following the indices.

@jnferfer
Copy link
Author

jnferfer commented Apr 21, 2024

Thanks! Then, if I want to get a single embedding for "big apple", how should I proceed? I'm trying to get the average embedding of "big" and "apple", but I sometimes get odd results when comparing the average embedding against others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants