How do I load huggingface models? #1719

Zoher15 · 2024-10-29T23:12:57Z

Hi,

There seem to be some big changes and I cannot find a single example that tells me how to load huggingface models that I was using with HF.model before. Also the dspy AI tool is broken and no longer able to help.

Best,
Zoher

The text was updated successfully, but these errors were encountered:

okhat · 2024-10-30T02:58:18Z

Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).

Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms

Zoher15 · 2024-10-30T03:03:54Z

Hi @okhat,

I did go through that example. I was not aware of SGLang, so it seems like to use hugging face models on my GPU, I would need to figure out SGLang first? Is there some advantage to SGLang over HF that I'm missing?

Best,
Zoher

okhat · 2024-10-30T04:22:32Z

Yes, you need a server-client architecture with good batching to get acceptable speed with local models. Otherwise evaluation and optimization will have to be single threaded and hence extremely slow.

okhat · 2024-10-30T04:23:01Z

You don’t need to figure out anything per se. Just follow the 3-4 instructions there and let me know if you face any issues.

dzimmerman-nci · 2024-10-31T19:25:44Z

@okhat so are local non server-client HF models no longer going to be supported at all going forward?

okhat · 2024-11-01T11:13:40Z

@dzimmerman-nci We will experiment with things like SGLang's Engine which is non-server client. But standard HF Transformers without additional batching or serving infrastructure are not appropriate for DSPy, or really for any library targeted at using LMs at inference time.

Zoher15 · 2024-11-01T23:04:37Z

Hey @Zoher15 , you should install SGLang (if you have a GPU) or Ollama (if you don't have a GPU).

Follow the instructions here: https://dspy-docs.vercel.app/building-blocks/1-language_models/?h=using+locally+hosted+lms#using-locally-hosted-lms

So I followed the steps and for:

sglang_port = 7501
sglang_url = f"http://localhost:{sglang_port}/v1"
model = dspy.LM("openai/meta-llama/Meta-Llama-3-8B-Instruct", api_base=sglang_url, model_type='text')
dspy.configure(lm=model)

I receive an error for the OpenAI API key, is this supposed to happen? The model is up and running.....

LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Traceback (most recent call last):
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1625, in completion
    openai_client = OpenAI(
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/openai/_client.py", line 105, in __init__
    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/main.py", line 1346, in completion
    _response = openai_text_completions.completion(
  File "/data/zkachwal/miniconda3/envs/moderation/lib/python3.10/site-packages/litellm/llms/OpenAI/openai.py", line 1660, in completion
    raise OpenAIError(
litellm.llms.OpenAI.openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

arnavsinghvi11 · 2024-11-01T23:18:05Z

Hi @Zoher15 , I believe configuring your HuggingFace API Token via huggingface-cli login or export HUGGINGFACEHUB_API_TOKEN=your_api_token resolves this. lmk if that doesn't work

Zoher15 · 2024-11-01T23:19:15Z

I did the login. The only way I resolved it was setting the OpenAI token.......

arnavsinghvi11 · 2024-11-01T23:22:30Z

ah i see. let me update that in the docs. to clarify, you just needed to set the api_key variable but can pass in an empty string right?

Zoher15 · 2024-11-01T23:28:28Z

Yes you are right.

Zoher15 · 2024-11-01T23:32:36Z

Overall, SGLang seems fast, but I have to figure out a lot about it to get to run the way HF was running before. I don't know what is the fp of the weights it loads. Even in 'text' mode it is using the user-assistant template I would like to get rid of. The transition is not as easy as just following three steps.

okhat · 2024-11-04T20:35:01Z

@Zoher15 Even in the text mode it's using the user-assistant template? That sounds different from what I'd like. Can you share more about how you identified this?

Zoher15 · 2024-11-04T21:47:57Z

I used fewshot, loaded some hand created examples. And this is the template in text mode (model history). I am assuming this is how OpenAI's API processes it, so SGLANG is re-using it for Huggingface----incorrectly assuming all hugging face models are instruction tuned with the same template:

User message:

[[ ## question ## ]]
Is the following sentence plausible? "Steven Stamkos hit the slant pass."

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Assistant message:

[[ ## answer ## ]]
No

[[ ## completed ## ]]


User message:

[[ ## question ## ]]
Is the following sentence plausible? "Carlos Correa threw to first base"

Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## answer ## ]]
Yes

[[ ## completed ## ]]

okhat · 2024-11-05T13:22:18Z

@Zoher15 Not necessarily, this is just how DSPy's inspect_history prints things.

If you pass model_type="text" the model gets one string that concatenates the contents of the "messages" above into one blurb.

That said, I see a few action items here:

Handling model_type may need to happen at the Adapter level, perhaps in BaseAdapter.
Inspect history needs to be aware of that, so it shows things in a way that doesn't confuse users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I load huggingface models? #1719

How do I load huggingface models? #1719

Zoher15 commented Oct 29, 2024 •

edited

Loading

okhat commented Oct 30, 2024

Zoher15 commented Oct 30, 2024

okhat commented Oct 30, 2024

okhat commented Oct 30, 2024

dzimmerman-nci commented Oct 31, 2024 •

edited

Loading

okhat commented Nov 1, 2024 •

edited

Loading

Zoher15 commented Nov 1, 2024

arnavsinghvi11 commented Nov 1, 2024

Zoher15 commented Nov 1, 2024

arnavsinghvi11 commented Nov 1, 2024

Zoher15 commented Nov 1, 2024

Zoher15 commented Nov 1, 2024

okhat commented Nov 4, 2024

Zoher15 commented Nov 4, 2024 •

edited

Loading

okhat commented Nov 5, 2024 •

edited

Loading

How do I load huggingface models? #1719

How do I load huggingface models? #1719

Comments

Zoher15 commented Oct 29, 2024 • edited Loading

okhat commented Oct 30, 2024

Zoher15 commented Oct 30, 2024

okhat commented Oct 30, 2024

okhat commented Oct 30, 2024

dzimmerman-nci commented Oct 31, 2024 • edited Loading

okhat commented Nov 1, 2024 • edited Loading

Zoher15 commented Nov 1, 2024

arnavsinghvi11 commented Nov 1, 2024

Zoher15 commented Nov 1, 2024

arnavsinghvi11 commented Nov 1, 2024

Zoher15 commented Nov 1, 2024

Zoher15 commented Nov 1, 2024

okhat commented Nov 4, 2024

Zoher15 commented Nov 4, 2024 • edited Loading

okhat commented Nov 5, 2024 • edited Loading

Zoher15 commented Oct 29, 2024 •

edited

Loading

dzimmerman-nci commented Oct 31, 2024 •

edited

Loading

okhat commented Nov 1, 2024 •

edited

Loading

Zoher15 commented Nov 4, 2024 •

edited

Loading

okhat commented Nov 5, 2024 •

edited

Loading