Automated feedback loop #75

luarss · 2024-10-21T16:03:29Z

Think about an automatic mechanism in which we can use the feedback we get from the users back in our evaluation dataset.

Use it as part of our dataset ground truth (good or bad )

Tasks

DeepEval using explicit feedback https://docs.confident-ai.com/docs/confident-ai-human-feedback

For this, we need to institute more metrics (tracked in #18 )

Explicit feedback

Thumbs up/ thumbs down. But this is rare.

Implicit feedback

Inferred from user behavior, such as response time, follow-up questions, or even actions like copying and pasting responses. Implicit feedback is far more abundant and can provide valuable insights into how well the LLM is performing

Kannav02 · 2024-10-27T20:40:12Z

Hey @luarss!

how would you want the feedback to be stored, do you want a feeder pipeline, that feeds back into the evaluation dataset and then based on that each subsequent prompt answer is influenced by these metrics

do you also want the thumbs up and thumbs down feature to be implemented, I believe the functionality would be similar to how LLMs like chatGPT take the feedback, like once about 100 prompts maybe?

If those are the requirements can I work on this issue?

thank you!

luarss · 2024-10-28T18:16:20Z

Hi @Kannav02, thanks for your interest.

We have some implementation of the thumbs up/down interface in this PR: #61, done by Aviral (@error9098x). That should be merged soon.

What we need is a convenient way to store and access this feedback programmatically. Currently we use Google Sheets but that isn't the most programming friendly database. This feedback could be thumbs up/down, or in the form of actual text (suggestions). Happy to hear your suggestions on this.

We would need to store minimally: Question, Response, Sources, Date/Time, Thumbs up/down, Feedback, Conversation history

Kannav02 · 2024-10-29T01:39:27Z

Since the data is pretty structured, I believe something like this could be stored within A SQL based database, maybe PostgreSQL,

I would go with MongoDB as well since we would consider scalability considering the amount of prompt feedback we will have, but the idea again is it's schemaless, I mean we could still work around this by enforcing it in the code itself,

Initially, when I saw this problem I thought maybe Redis could be a game-changer given the fast-query capabilities, but again the fields here are more and I believe it would rather just complicate our stuff

In the end I believe it boils down to PostgreSQL and MongoDB, if it is just for the prompt services and feeding it back to AI, I would go with MongoDB but if you still have its use in other cases we might have to consider the consistency in queries across the codebase, so then maybe PostgreSQL

luarss · 2024-10-29T16:10:44Z

I think MongoDB might be preferable since our feedback might evolve over time and schemas could change. A nice checkpoint could be DB hosting and basic web UI for filtering, querying and sorting.

I know postgres is relatively easy to set up, but have no experience with mongo. How is the setup and maintenance like?

Kannav02 · 2024-10-29T18:08:45Z

Postgres is good when we have laid down schemas however we can have migrations in that as well meaning even if you change the fields or columns, an ORM like 'prisma' would help us with migrations , but again seeing the requirements of this feature now I believe MongoDB would be the right choice

If you say so I can start working on the hosting part as for the UI, what would you expect from that, do you want an interactive dashboard kind of UI to interact with MongoDB?

Thank you!

luarss · 2024-10-29T18:20:24Z

Some milestones:

Interactive web UI
Integration with ORAssistant. Pending PR merge, or you can try to do a small prototype with our existing streamlit code in frontend/
Export dataset using python

That would be a good amount for a PoC, then we can decide to go further with mongodb/postgres or not.

Maybe this will work? https://github.com/mongo-express/mongo-express

Kannav02 · 2024-10-30T14:46:27Z

So right now you wouldn't want this issue to focus on DB integration before the milestones right?

And as for the Web UI, what would you want in that, is this related to the DB operations?

I can start working on these milestones if you want me to

Thank you!

luarss · 2024-10-31T02:38:25Z

DB integration can be done in hand with the web UI, otherwise the web UI will not show any meaningful information. I would suggest actually spending more time in the DB integration, and use something pre-built frontend like mongo-express that we can run as a docker container.

Yes, you can start working on it :)

Kannav02 · 2024-10-31T19:47:25Z

sure i will start working on this, I will keep on posting regular updates on this issue

Kannav02 · 2024-11-04T17:12:10Z

Hey @luarss,

Hope you're doing well,

I just had a question regarding the GOOGLE_APPLICATION_CREDENTIALS key in the backend folder, what is this supposed to do, is it the one that provides access to service accounts, is there a guide for the same that I could follow ?

Thank you!

error9098x · 2024-11-04T17:50:47Z

Hi @Kannav02, currently we are using GOOGLE_APPLICATION_CREDENTIALS to access a service account that grants us access to use Gemini without the rate limit that is imposed when using AI Studio platform. I hope this clarifies things.

Kannav02 · 2024-11-04T17:52:45Z

That's perfect, I appreciate your help

Thank you!

Kannav02 · 2024-11-08T01:11:06Z

Just a quick update on this, I am still integrating the DB, I have a follow up for the same, would you want a single Big PR comprising all the features or Individual PR's for different features referencing this issue?

luarss · 2024-11-08T04:38:49Z

Individual, feature-based PRs will be preferred and easier to review

Kannav02 · 2024-11-11T06:39:18Z

Hey, so I was making changes and testing out my mongoDB changes using the mock server , but when I switched to the main backend I am getting this error, is this the expected behavior right now with backend or should I look into something else?

luarss · 2024-11-11T13:44:53Z

The main backend won't work because you need the google credentials. Streamlit is entirely front-end based so you should be okay to proceed with the mock backend(to silence the errors)

Kannav02 · 2024-11-11T13:53:23Z

I did get the google credentials from my end, I tried to then run it using docker compose given all my google credentials, it failed at first due to some IAM issues but I fixed it, the backend was up with docker-compose, is it supposed to work with a certain google project/credentials?

Thank you for your support

luarss · 2024-11-11T14:11:42Z

This is an example json credential you need. On top of this you also need Vertex AI enabled for this credential, I can supply it to you once we are past the prototype stage.

Can i get your email please? You can drop me a PM

Kannav02 · 2024-11-11T15:50:52Z

Thank you for the help

I did download the credentials file from my google account for the service account, I have given the access to Vertex AI for backend and then sheets API for the frontend, I had a couple of free credits so I decided to use this, but the backend didn't respond to the requests from frontend.

I'll PM you as well

Once again thank you for the support!

luarss · 2024-11-11T16:45:32Z

Did you perhaps put the credentials in the correct directory?

For docker compose, you would have to ensure the file is called backend/src/secret.json

Kannav02 · 2024-11-11T16:54:33Z

I believe the credentials were in the right directory as the backend did compile and start to run, I have attached an image, its just the frontend was sending requests to an API endpoint that apparently doesn't exist on the backend as I get 404 not found error in the terminal

luarss · 2024-11-11T17:19:52Z

I see, the streamlit is outdated and calling the old backend: https://github.com/The-OpenROAD-Project/ORAssistant/blob/master/backend/src/api/routers/chains.py

You can update it as part of the PR.

Kannav02 · 2024-11-11T17:30:34Z

sure, I'll open a PR for this by today,

Its really wonderful to see minor inconsistencies causing code breaks lol,

Thank you for your help on this, I appreciate it

Kannav02 · 2024-11-12T06:16:16Z

I found the issue for the request being sent, apparently in the codebase where the routers were being added in the main.py file in the API folder, the router for chains wasn't being added , so I have included it now,

on another note should we also include the functionality of bind mounts for the docker file, I have found this issue that if I make any changes, I have to restart the docker-compose again, and the docker container takes about 10-15 minutes to setup for my machine, would this be a useful addition, considering the container would respond to any changes made locally on the machine?

luarss · 2024-11-12T16:58:05Z

@Kannav02 We excluded it because in production we only have 1 chain running, the agent-retriever.

For bind mounts, I am fine with it as long as it is a separate target (maybe make-docker-dev) and a separate docker-compose file if needed

Kannav02 · 2024-11-12T19:00:13Z

Sure I will make a separate docker-compose file for the same,

Just a quick update as well, I have finally tested the entire flow of getting the feedback to Google Sheets, I am uploading an image for your reference

Also to follow up on the previous point, would the chains/listAll endpoint be used later on?, as there are endpoints in it that are still not implemented like the chains/ensemble, should I still include it in my changes

luarss · 2024-11-13T01:12:29Z

It could be used later on, but for now you may modify frontend to do your tests.

Kannav02 · 2024-11-19T02:46:33Z

hey , so sorry for the late update, I was just busy with school , but I have now opened a PR, which gives an idea of how the schema would, work, do let me know if you want me to make any changes to that

when the PR is approved, I will start working on integrating and migrating from Google Sheets, and optimize the design and the solution as well

Just a side note:
would you still want the google sheet functionality to be there to support previous feedbacks, or once the mongoDb is updated do you want me to remove the entire functionality of google sheets?

The PR linking to this comment is #100

luarss assigned Kannav02 Nov 2, 2024

Kannav02 mentioned this issue Nov 19, 2024

Design and add a MongoDB schema (Issue 75) #100

Closed

3 tasks

Kannav02 mentioned this issue Nov 20, 2024

Design and add a MongoDB schema (Issue 75) - Revision #101

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated feedback loop #75

Automated feedback loop #75

luarss commented Oct 21, 2024 •

edited

Loading

Kannav02 commented Oct 27, 2024

luarss commented Oct 28, 2024 •

edited

Loading

Kannav02 commented Oct 29, 2024

luarss commented Oct 29, 2024

Kannav02 commented Oct 29, 2024

luarss commented Oct 29, 2024

Kannav02 commented Oct 30, 2024

luarss commented Oct 31, 2024

Kannav02 commented Oct 31, 2024

Kannav02 commented Nov 4, 2024

error9098x commented Nov 4, 2024

Kannav02 commented Nov 4, 2024

Kannav02 commented Nov 8, 2024

luarss commented Nov 8, 2024

Kannav02 commented Nov 11, 2024

luarss commented Nov 11, 2024 •

edited

Loading

Kannav02 commented Nov 11, 2024

luarss commented Nov 11, 2024

Kannav02 commented Nov 11, 2024

luarss commented Nov 11, 2024

Kannav02 commented Nov 11, 2024

luarss commented Nov 11, 2024 •

edited

Loading

Kannav02 commented Nov 11, 2024

Kannav02 commented Nov 12, 2024

luarss commented Nov 12, 2024 •

edited

Loading

Kannav02 commented Nov 12, 2024

luarss commented Nov 13, 2024

Kannav02 commented Nov 19, 2024

Automated feedback loop #75

Automated feedback loop #75

Comments

luarss commented Oct 21, 2024 • edited Loading

Kannav02 commented Oct 27, 2024

luarss commented Oct 28, 2024 • edited Loading

Kannav02 commented Oct 29, 2024

luarss commented Oct 29, 2024

Kannav02 commented Oct 29, 2024

luarss commented Oct 29, 2024

Kannav02 commented Oct 30, 2024

luarss commented Oct 31, 2024

Kannav02 commented Oct 31, 2024

Kannav02 commented Nov 4, 2024

error9098x commented Nov 4, 2024

Kannav02 commented Nov 4, 2024

Kannav02 commented Nov 8, 2024

luarss commented Nov 8, 2024

Kannav02 commented Nov 11, 2024

luarss commented Nov 11, 2024 • edited Loading

Kannav02 commented Nov 11, 2024

luarss commented Nov 11, 2024

Kannav02 commented Nov 11, 2024

luarss commented Nov 11, 2024

Kannav02 commented Nov 11, 2024

luarss commented Nov 11, 2024 • edited Loading

Kannav02 commented Nov 11, 2024

Kannav02 commented Nov 12, 2024

luarss commented Nov 12, 2024 • edited Loading

Kannav02 commented Nov 12, 2024

luarss commented Nov 13, 2024

Kannav02 commented Nov 19, 2024

luarss commented Oct 21, 2024 •

edited

Loading

luarss commented Oct 28, 2024 •

edited

Loading

luarss commented Nov 11, 2024 •

edited

Loading

luarss commented Nov 11, 2024 •

edited

Loading

luarss commented Nov 12, 2024 •

edited

Loading