You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am sending requests to a llama-server by using the openai api. I also wrote the code in pytorch without a server to compare the results. I noticed that the in the first case the text generation does not stop after after giving an answer and keeps tellling me about climate change. When running the corresponding pytorch code, the generation is stopped appropriately and the quality of the answer is way better.
This is a behaviour I would expect if there is an issue with the chat_template but I am using the exact same format I found in examples. This the code in pytorch:
def stream_document():
# OpenAI client setup
client = openai.OpenAI(
base_url="", # API-Server URL
api_key="sk-no-key-required"
)
user_prompt = ""
messages = [{"role": "system", "content": ''},
{"role": "user", "content":user_prompt}]
text=messages
response = client.chat.completions.create(
#model="gpt-3.5-turbo",
model="Llama3",
messages = text,
stream=True, # Enable streaming
temperature=0.01,
max_completion_tokens=1024
)
# Process each chunk of data as it comes in
for chunk in response:
# Accessing the choices in the chunk
for choice in chunk.choices:
# Accessing the delta content within each choice
if choice.delta and choice.delta.content:
print(choice.delta.content, end='', flush=True) # Print content without newline
print("\nStream finished.")
Is there some way to give special tokens or specify the chat template to the client?
Regards
The text was updated successfully, but these errors were encountered:
Hi,
I am sending requests to a llama-server by using the openai api. I also wrote the code in pytorch without a server to compare the results. I noticed that the in the first case the text generation does not stop after after giving an answer and keeps tellling me about climate change. When running the corresponding pytorch code, the generation is stopped appropriately and the quality of the answer is way better.
This is a behaviour I would expect if there is an issue with the chat_template but I am using the exact same format I found in examples. This the code in pytorch:
and this is the code using the openai api
Is there some way to give special tokens or specify the chat template to the client?
Regards
The text was updated successfully, but these errors were encountered: