Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I load huggingface models? #1719

Open
Zoher15 opened this issue Oct 29, 2024 · 15 comments
Open

How do I load huggingface models? #1719

Zoher15 opened this issue Oct 29, 2024 · 15 comments

Comments

@Zoher15
Copy link

Zoher15 commented Oct 29, 2024

Hi,

There seem to be some big changes and I cannot find a single example that tells me how to load huggingface models that I was using with HF.model before. Also the dspy AI tool is broken and no longer able to help.

Best,
Zoher

@okhat
Copy link
Collaborator

okhat commented Oct 30, 2024

Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).

Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms

@Zoher15
Copy link
Author

Zoher15 commented Oct 30, 2024

Hi @okhat,

I did go through that example. I was not aware of SGLang, so it seems like to use hugging face models on my GPU, I would need to figure out SGLang first? Is there some advantage to SGLang over HF that I'm missing?

Best,
Zoher

@okhat
Copy link
Collaborator

okhat commented Oct 30, 2024

Yes, you need a server-client architecture with good batching to get acceptable speed with local models. Otherwise evaluation and optimization will have to be single threaded and hence extremely slow.

@okhat
Copy link
Collaborator

okhat commented Oct 30, 2024

You don’t need to figure out anything per se. Just follow the 3-4 instructions there and let me know if you face any issues.

@dzimmerman-nci
Copy link

dzimmerman-nci commented Oct 31, 2024

@okhat so are local non server-client HF models no longer going to be supported at all going forward?

@okhat
Copy link
Collaborator

okhat commented Nov 1, 2024

@dzimmerman-nci We will experiment with things like SGLang's Engine which is non-server client. But standard HF Transformers without additional batching or serving infrastructure are not appropriate for DSPy, or really for any library targeted at using LMs at inference time.

@Zoher15
Copy link
Author

Zoher15 commented Nov 1, 2024

Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).

Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms

So I followed the steps and for:

sglang_port = 7501
sglang_url = f"http://localhost:{sglang_port}/v1"
model = dspy.LM("openai/meta-llama/Meta-Llama-3-8B-Instruct", api_base=sglang_url, model_type='text')
dspy.configure(lm=model)

I receive an error for the OpenAI API key, is this supposed to happen? The model is up and running.....

LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Traceback (most recent call last):
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1625, in completion
    openai_client = OpenAI(
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/openai/_client.py", line 105, in __init__
    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/main.py", line 1346, in completion
    _response = openai_text_completions.completion(
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1660, in completion
    raise OpenAIError(
litellm.llms.OpenAI.openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

@arnavsinghvi11
Copy link
Collaborator

Hi @Zoher15 , I believe configuring your HuggingFace API Token via huggingface-cli login or export HUGGINGFACEHUB_API_TOKEN=your_api_token resolves this. lmk if that doesn't work

@Zoher15
Copy link
Author

Zoher15 commented Nov 1, 2024

I did the login. The only way I resolved it was setting the OpenAI token.......

@arnavsinghvi11
Copy link
Collaborator

ah i see. let me update that in the docs. to clarify, you just needed to set the api_key variable but can pass in an empty string right?

@Zoher15
Copy link
Author

Zoher15 commented Nov 1, 2024

Yes you are right.

@Zoher15
Copy link
Author

Zoher15 commented Nov 1, 2024

Overall, SGLang seems fast, but I have to figure out a lot about it to get to run the way HF was running before. I don't know what is the fp of the weights it loads. Even in 'text' mode it is using the user-assistant template I would like to get rid of. The transition is not as easy as just following three steps.

@okhat
Copy link
Collaborator

okhat commented Nov 4, 2024

@Zoher15 Even in the text mode it's using the user-assistant template? That sounds different from what I'd like. Can you share more about how you identified this?

@Zoher15
Copy link
Author

Zoher15 commented Nov 4, 2024

I used fewshot, loaded some hand created examples. And this is the template in text mode (model history). I am assuming this is how OpenAI's API processes it, so SGLANG is re-using it for Huggingface----incorrectly assuming all hugging face models are instruction tuned with the same template:

User message:

[[ ## question ## ]]
Is the following sentence plausible? "Steven Stamkos hit the slant pass."

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Assistant message:

[[ ## answer ## ]]
No

[[ ## completed ## ]]


User message:

[[ ## question ## ]]
Is the following sentence plausible? "Carlos Correa threw to first base"

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## answer ## ]]
Yes

[[ ## completed ## ]]

@okhat
Copy link
Collaborator

okhat commented Nov 5, 2024

@Zoher15 Not necessarily, this is just how DSPy's inspect_history prints things.

If you pass model_type="text" the model gets one string that concatenates the contents of the "messages" above into one blurb.

That said, I see a few action items here:

  • Handling model_type may need to happen at the Adapter level, perhaps in BaseAdapter.
  • Inspect history needs to be aware of that, so it shows things in a way that doesn't confuse users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants