TTFX optimization of aigenerate("whats the meaning of life"; model) #236

Sixzero · 2024-11-18T16:18:55Z

Recently I decided to cut down on TTFX of EasyContext.jl and realized that TTFX of PromptingTools needs to improve a lot:

time julia -e 'using PromptingTools; @time ai"Hi there"gpt4om;'
[ Info: Tokens: 28 @ Cost: $0.0 in 6.8 seconds
  7.875881 seconds (11.58 M allocations: 785.347 MiB, 5.01% gc time, 99.61% compilation time)
julia -e 'using PromptingTools; @time ai"Hi there"gpt4om;'  8.28s user 0.73s system 106% cpu 8.469 total

Correct me if I did something wrong here.

The text was updated successfully, but these errors were encountered:

Sixzero · 2024-11-18T16:20:15Z

I wonder if we could somehow bring it down 0.3 seconds, what is the time for ai"Hello"echo in the precompilation.jl .

svilupp · 2024-11-18T19:42:15Z

There is already mocking like this: https://github.com/svilupp/PromptingTools.jl/blob/main/src/precompilation.jl

It seems that the majority of the time is spent on the HTTP call (as per our Slack chat), so we would need to make sure the right HTTP paths get precompiled, perhaps with a mock server to make sure the HTTP stack gets called.

Did you manage to isolate how much is the compilation vs the API request itself?

Sixzero · 2024-11-19T10:06:07Z

JuliaWeb/HTTP.jl#1194

Yes, it looks like HTTP.jl takes up 6 seconds and 0.3-0.4 on PromptingTools, so hopefully we will have a solution for this issue, it seems surreal, I hope I am just missing here something.

Sixzero · 2024-11-27T22:51:43Z

It got fixed in HTTP.jl

svilupp · 2024-11-28T07:49:42Z

FYI.
It was fixed here: JuliaWeb/HTTP.jl#1201

It's in the patch release 1.10.12, so set the dep if you want to enforce it.

Sixzero changed the title ~~TTFX optimization of aigenerate("text"; model)~~ TTFX optimization of aigenerate("whats the meaning of life"; model) Nov 19, 2024

Sixzero closed this as completed Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTFX optimization of aigenerate("whats the meaning of life"; model) #236

TTFX optimization of aigenerate("whats the meaning of life"; model) #236

Sixzero commented Nov 18, 2024

Sixzero commented Nov 18, 2024 •

edited

Loading

svilupp commented Nov 18, 2024

Sixzero commented Nov 19, 2024

Sixzero commented Nov 27, 2024

svilupp commented Nov 28, 2024

TTFX optimization of aigenerate("whats the meaning of life"; model) #236

TTFX optimization of aigenerate("whats the meaning of life"; model) #236

Comments

Sixzero commented Nov 18, 2024

Sixzero commented Nov 18, 2024 • edited Loading

svilupp commented Nov 18, 2024

Sixzero commented Nov 19, 2024

Sixzero commented Nov 27, 2024

svilupp commented Nov 28, 2024

Sixzero commented Nov 18, 2024 •

edited

Loading