-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I load huggingface models? #1719
Comments
Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU). Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms |
Hi @okhat, I did go through that example. I was not aware of SGLang, so it seems like to use hugging face models on my GPU, I would need to figure out SGLang first? Is there some advantage to SGLang over HF that I'm missing? Best, |
Yes, you need a server-client architecture with good batching to get acceptable speed with local models. Otherwise evaluation and optimization will have to be single threaded and hence extremely slow. |
You don’t need to figure out anything per se. Just follow the 3-4 instructions there and let me know if you face any issues. |
@okhat so are local non server-client HF models no longer going to be supported at all going forward? |
@dzimmerman-nci We will experiment with things like SGLang's |
So I followed the steps and for:
I receive an error for the OpenAI API key, is this supposed to happen? The model is up and running.....
|
Hi @Zoher15 , I believe configuring your HuggingFace API Token via |
I did the login. The only way I resolved it was setting the OpenAI token....... |
ah i see. let me update that in the docs. to clarify, you just needed to set the api_key variable but can pass in an empty string right? |
Yes you are right. |
Overall, SGLang seems fast, but I have to figure out a lot about it to get to run the way HF was running before. I don't know what is the fp of the weights it loads. Even in 'text' mode it is using the user-assistant template I would like to get rid of. The transition is not as easy as just following three steps. |
@Zoher15 Even in the text mode it's using the user-assistant template? That sounds different from what I'd like. Can you share more about how you identified this? |
I used fewshot, loaded some hand created examples. And this is the template in text mode (model history). I am assuming this is how OpenAI's API processes it, so SGLANG is re-using it for Huggingface----incorrectly assuming all hugging face models are instruction tuned with the same template:
|
@Zoher15 Not necessarily, this is just how DSPy's inspect_history prints things. If you pass That said, I see a few action items here:
|
Hi,
There seem to be some big changes and I cannot find a single example that tells me how to load huggingface models that I was using with
HF.model
before. Also the dspy AI tool is broken and no longer able to help.Best,
Zoher
The text was updated successfully, but these errors were encountered: