Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPIK-76 Add stream experiment items endpoint #294

Merged

Conversation

andrescrz
Copy link
Collaborator

@andrescrz andrescrz commented Sep 19, 2024

Details

This is meant to be used by SDK and to return at least these fields per experiment item: id, trace_id and dataset_item_id and that's exactly what it does, plus any other field in experiments table.

It has the same semantics as the other stream operations in the service: limit (default 500) and last retrieved id cursor.

The search by experiment name follows the same pattern as in the find experiment endpoints: contains regex and case insensitive.

Therefore, it can match experiment items per multiple experiments. For that reason, they're sorted by experiment id first.

Issues

OPIK-76

Testing

  • Added integration test covering all logical cases.

Documentation

N/A

@andrescrz andrescrz self-assigned this Sep 19, 2024
@andrescrz andrescrz added the enhancement New feature or request label Sep 19, 2024
@andrescrz andrescrz marked this pull request as ready for review September 19, 2024 17:20
@andrescrz andrescrz requested a review from a team as a code owner September 19, 2024 17:20
Copy link
Contributor

@thiagohora thiagohora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern only concerns are:

  1. What if we have too many experiments?
  2. Shouldn't the response format be responsibility of the API layer ?

@andrescrz andrescrz force-pushed the andrescrz/OPIK-76-add-stream-experiment-items-endpoint branch from f45e966 to 28b5ad0 Compare September 20, 2024 09:14
thiagohora
thiagohora previously approved these changes Sep 20, 2024
@andrescrz andrescrz force-pushed the andrescrz/OPIK-76-add-stream-experiment-items-endpoint branch from 28b5ad0 to 7506224 Compare September 20, 2024 12:44
@andrescrz
Copy link
Collaborator Author

andrescrz commented Sep 20, 2024

My concern only concerns are:

  1. What if we have too many experiments?
  2. Shouldn't the response format be responsibility of the API layer ?

Those are nice questions. Let me answer then:

Regarding 1: there are multiple alternatives to handle this:

  1. By using the current limit param: this wouldn't work in some cases, as we want to limit by the items.
  2. Adding a limit for experiments: this works but makes the endpoint more cumbersome and results will be less predictable.
  3. Optimising the experiments query to just return the id: this can be done on top any other improvement. I'd rather not to start with, in order to leave this DAO method more reusable for the future. Also, because it might by a not very impactful optimisation.

With all this in mind, I honestly think that the best option is to leave the endpoint like it is. Experiment names are randomly generated and in practice the number of matches within a workspace is going to be very low. We might never encounter an issue with this and if we do, we can always tackle 2, 3 or both.

Regarding 2, I've refactored the services and resources a bit and encapsulated the streaming logic in a Streamer class. I think the separation of responsibilities and reusability of code has improved a lot.

With this, adding new streaming endpoints is much more straight forward. Let me know what you think.

@thiagohora
Copy link
Contributor

  1. By using the current limit param: this wouldn't work in some cases, as we want to limit by the items.

Perfect. My concern was more with having a very large list of experimentIds passed to the second. But I agree this may not be an issue any time soon

@andrescrz andrescrz merged commit 90f5f49 into main Sep 20, 2024
7 checks passed
@andrescrz andrescrz deleted the andrescrz/OPIK-76-add-stream-experiment-items-endpoint branch September 20, 2024 14:04
@andrescrz
Copy link
Collaborator Author

andrescrz commented Sep 20, 2024

  1. By using the current limit param: this wouldn't work in some cases, as we want to limit by the items.

Perfect. My concern was more with having a very large list of experimentIds passed to the second. But I agree this may not be an issue any time soon

Yeah, I didn't explain myself well in my previous message, but that's exactly what I meant. We would also need to limit the number of retrieved experiments (and their IDs), in addition to the current limit for items.

But it's unlikely to be an issue for a very long period of time, so I propose to not to do it for now.

@thiagohora
Copy link
Contributor

Agree! An option would be to do the join or a subquery in the IN clause. But can discuss about it later, when if needed

dsblank pushed a commit that referenced this pull request Oct 4, 2024
* OPIK-76 Add stream experiment items endpoint

* Rev2: created Streamer class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants