Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTFX optimization of aigenerate("whats the meaning of life"; model) #236

Closed
Sixzero opened this issue Nov 18, 2024 · 5 comments
Closed

TTFX optimization of aigenerate("whats the meaning of life"; model) #236

Sixzero opened this issue Nov 18, 2024 · 5 comments

Comments

@Sixzero
Copy link
Collaborator

Sixzero commented Nov 18, 2024

Recently I decided to cut down on TTFX of EasyContext.jl and realized that TTFX of PromptingTools needs to improve a lot:

time julia -e 'using PromptingTools; @time ai"Hi there"gpt4om;'
[ Info: Tokens: 28 @ Cost: $0.0 in 6.8 seconds
  7.875881 seconds (11.58 M allocations: 785.347 MiB, 5.01% gc time, 99.61% compilation time)
julia -e 'using PromptingTools; @time ai"Hi there"gpt4om;'  8.28s user 0.73s system 106% cpu 8.469 total

Correct me if I did something wrong here.

@Sixzero
Copy link
Collaborator Author

Sixzero commented Nov 18, 2024

I wonder if we could somehow bring it down 0.3 seconds, what is the time for ai"Hello"echo in the precompilation.jl .

@svilupp
Copy link
Owner

svilupp commented Nov 18, 2024

There is already mocking like this: https://github.com/svilupp/PromptingTools.jl/blob/main/src/precompilation.jl

It seems that the majority of the time is spent on the HTTP call (as per our Slack chat), so we would need to make sure the right HTTP paths get precompiled, perhaps with a mock server to make sure the HTTP stack gets called.

Did you manage to isolate how much is the compilation vs the API request itself?

@Sixzero
Copy link
Collaborator Author

Sixzero commented Nov 19, 2024

JuliaWeb/HTTP.jl#1194

Yes, it looks like HTTP.jl takes up 6 seconds and 0.3-0.4 on PromptingTools, so hopefully we will have a solution for this issue, it seems surreal, I hope I am just missing here something.

@Sixzero Sixzero changed the title TTFX optimization of aigenerate("text"; model) TTFX optimization of aigenerate("whats the meaning of life"; model) Nov 19, 2024
@Sixzero Sixzero closed this as completed Nov 27, 2024
@Sixzero
Copy link
Collaborator Author

Sixzero commented Nov 27, 2024

It got fixed in HTTP.jl

@svilupp
Copy link
Owner

svilupp commented Nov 28, 2024

FYI.
It was fixed here: JuliaWeb/HTTP.jl#1201

It's in the patch release 1.10.12, so set the dep if you want to enforce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants