Replies: 1 comment
-
the task seems to be more complicated than I thought, in the first approximation, it will be necessary to raise several backends ggml_backend_rpc_start_server(backend, endpoint.c_str(), free_mem, total_mem), which is not good, it is necessary to “adapt” ggml_backend_cuda_init/ggml_backend. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, we need summarize all(all videocards) gpu memory count when rpc. When rpc server running, we may have any count of gpu, we need count all memory.
llama.cpp/examples/rpc/rpc-server.cpp
Line 116 in c05e8c9
Beta Was this translation helpful? Give feedback.
All reactions