-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: just integration-otel to see traces #33
Conversation
Signed-off-by: Adrian Cole <[email protected]>
oops forgot to add the .env file that sets the correct variables for otel export etc. will do when I get back to my computer |
Signed-off-by: Adrian Cole <[email protected]>
added the .env file |
oh this looks quite helpful |
one thing this doesn't do is payloads. If desired, I can help with instrumentation (starting with openai) to add the genai spans, which would have the prompt and completion events, as well tokens used etc in metrics. that's big enough to be a separate follow-up PR |
cc @baxen for your thoughts |
ps i think genai spans is the best bang for buck here. i can help with that at some point. it would get token and latency metrics plus spans that have the payloads |
@codefromthecrypt Tested this out for some 403s I have been debugging on a provider. I know that the 403 returned payload contains a useful error message (approximately "signature does not match payload") when i inspect it in a debugger, but it doesn't seem to get surfaced here. Is this a limitation of the TUI (all nodes are expanded in the screenshot below). |
@andrewblane actually this is a limitation of the http semantics of otel, which don't currently defined request/response body logging. incidentally the genai layer ones do, so happy to help put a layer above that which would have the prompt/completion (as span events). Another way is to somehow patch the HTTP otel instrumentation to also log the bodies, despite not being in their specs. Thoughts? |
Hi @codefromthecrypt Thanks for putting up this PR! Firstly, I would like to make sure that my understanding of your changes in the PR is correct. Your changes include:
Please correct me if my understanding is incorrect. Some feedbacks: ❓ Currently the instrumentation can be only enabled for tests. Just wondering whether it will also be useful to enable instrument while using |
added this to start the story in a better place. block/goose#75 With this in, any future instrumentation here, e.g. LLM spans, would be great. For those who don't know, the GenAI (aka LLM) span would be a parent to the HTTP one and look like this, but with something goose would do: |
So basically I can close this out and/or repurpose this PR for GenAI spans (for openai as more than that is too much for one PR). Then, when goose runs it will have the token and prompt/completion data as well the metadata about the underlying http calls (which do not in otel include bodies). lemme know which is preferred. |
Yes, please close this PR. @baxen and I had a chat. At the moment we will use a collector to get the token usage and cost as a simple solution. In the long run, it makes sense to use open telemetry traced data. When there are requirements to track more data from the http calls in the future, we will consider to get them from open telemetry instrumentation data. Also just curious about GenAI span which you showed in the image, does it require any specific handling to get these data (input_tokens, output_tokens) or they are automatically tracked? I could not find the code about tracking these data in this PR |
Gotcha and closing for now.
yeah basically it is scraped from the request/response data. since code handling openai here is custom, it would need to be reproduced. p.s. kibana also has a "business function" of token counting which is independent of the telemetry aspect. In the near future you can expect standard dashboards which will help folks in aggregate know about token spend from telemetry. That said it is also expected that products that do their own counting for UX reasons, will continue to do so. |
This helps understand what's going on as sometimes errors that bubble up in tests aren't entirely helpful
Note: I also looked at https://github.com/kuisathaverat/pytest_otel which makes a single trace for all tests with subspans for individual ones. I ran into dependency conflicts and raised an issue about it. We can do this and raise another PR later, should pytest_otel be good for us.