You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have gone through the notebooks but couldn't able to stream the tokens from the TensorRTLLM.
Here's the issue:
Code used:
fromlangchain_nvidia_trt.llmsimportTritonTensorRTLLMimporttimeimportrandomtriton_url="localhost:8001"pload= {
'tokens':300,
'server_url': triton_url,
'model_name': "ensemble",
'temperature':1.0,
'top_k':1,
'top_p':0,
'beam_width':1,
'repetition_penalty':1.0,
'length_penalty':1.0
}
client=TritonTensorRTLLM(**pload)
LLAMA_PROMPT_TEMPLATE= (
"<s>[INST] <<SYS>>""{system_prompt}""<</SYS>>""[/INST] {context} </s><s>[INST] {question} [/INST]"
)
system_prompt="You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Please ensure that your responses are positive in nature."context=""question='What is the fastest land animal?'prompt=LLAMA_PROMPT_TEMPLATE.format(system_prompt=system_prompt, context=context, question=question)
start_time=time.time()
tokens_generated=0forvalinclient._stream(prompt):
tokens_generated+=1print(val, end="", flush=True)
total_time=time.time() -start_timeprint(f"\n--- Generated {tokens_generated} tokens in {total_time} seconds ---")
print(f"--- {tokens_generated/total_time} tokens/sec")
The text was updated successfully, but these errors were encountered:
I have gone through the notebooks but couldn't able to stream the tokens from the TensorRTLLM.
Here's the issue:
Code used:
The text was updated successfully, but these errors were encountered: