Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Sentence Highlighter #145

Open
asfoorial opened this issue Mar 23, 2023 · 4 comments
Open

[FEATURE] Sentence Highlighter #145

asfoorial opened this issue Mar 23, 2023 · 4 comments
Assignees
Labels
backlog All the backlog features should be marked with this label Enhancements Increases software capabilities beyond original client specifications Features Introduces a new unit of functionality that satisfies a requirement help wanted Extra attention is needed

Comments

@asfoorial
Copy link

Is your feature request related to a problem?

No

What solution would you like?

I would like to have a highlighter that supports the neural search capability. It should highlight the most relevant sentences in the neural search resulting documents.

What alternatives have you considered?

There are no available alternatives at the moment. So the only choice is to develop one.

Do you have any additional context?

I tried to implement it myself but faced the following challenges:

  1. I had to implement my own neural-search plugin since this one relies on KNNQuery which does store the query text. For example, in the below, fieldContext.context.query() returns an instance of KNNQuery. I suggest that the neural-search plugin has its own NeuralQuery that extends KNNQuery and keeps neural search related attributes such as query text. I hope there are other ways to get the query text at highlight time.

@OverRide
public HighlightField highlight(FieldHighlightContext fieldContext) {
System.out.println("Query: "+fieldContext.context.query());
}

  1. The inferenceSentences method is asynchronous notifies an ActionListener after the result is retrieved. If I call it inside the above highlight method then the highlight method will return before the actionlistener is notified and thus won't be able to get the embeddings to compute sentence similarity and get the sentence to highlight. I had to implement my own synchronous inferSentences. Below is a pseudo code of what I am trying to do.

@OverRide
public HighlightField highlight(FieldHighlightContext fieldContext) {
System.out.println("highlighting..");
List responses = new ArrayList<>();
String queryText = get query text from fieldContext.context.query()

    List<Float[]> embeddings = new ArrayList<>();

    List<String> sentences= get sentences from search hit
    sentences = query + sentences


    
    List<List<Float>> vectors = clientAccessor.inferSentences("U3R9CYcBOk2JRjrls0nH", sentences);

    for(List<Float> v:vectors)
        {
            List<Float> s = v;
            embeddings.add(s.stream().toArray(Float[]::new));
        }
        System.out.println("Computing similarity");
        double maxSim = 0;
        String maxSentence = null;
        if(embeddings.size()>0)
        {
            Float[] queryEmbedding = embeddings.get(0);
            for(int i=1;i<embeddings.size();i++)
            {
                float sim = consineSim(queryEmbedding, embeddings.get(i));
                set maxSim and maxSentence
            }
        }
    responses.add(maxSentence);

    return new HighlightField(fieldContext.fieldName, responses.toArray(new Text[] {}));
}

Having said the above, I hope that you tell what is the route to take here. Is this feature going to be available in the plugin any time soon?

Thanks

@navneet1v
Copy link
Collaborator

navneet1v commented Mar 23, 2023

@asfoorial This is an interesting feature, and I remember the same request for the highlight feature at the time the RFC was created for this plugin.

Is this feature going to be available in the plugin any time soon?

Highlight feature was not in our roadmap, as team was busy in making plugin GA, but we would really like this feature to be present in plugin.

@asfoorial on the approaches suggested I need to take a deep-look to see if that is feasible or not.
In meantime can you provide the use case which you are trying to solve with Highlight feature.

@navneet1v navneet1v added Enhancements Increases software capabilities beyond original client specifications Features Introduces a new unit of functionality that satisfies a requirement and removed untriaged labels Mar 23, 2023
@vamshin
Copy link
Member

vamshin commented Mar 28, 2023

Please +1 if you are looking for this feature to help prioritize

@navneet1v navneet1v added the backlog All the backlog features should be marked with this label label Mar 28, 2023
@navneet1v navneet1v added the help wanted Extra attention is needed label Sep 15, 2023
@dswitzer
Copy link

+1 for highlight over neural searches and hybrid searches.

I think this could be helpful when building RAG-based workflows when you're trying to export portions of larger documents to extract just the portion of the text that's being matched.

@dblock
Copy link
Member

dblock commented Jan 6, 2025

[Catch All Triage - 1, 2, 3, 4]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog All the backlog features should be marked with this label Enhancements Increases software capabilities beyond original client specifications Features Introduces a new unit of functionality that satisfies a requirement help wanted Extra attention is needed
Projects
Status: Backlog(Hot)
Development

No branches or pull requests

7 participants