We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-cpp-python 版本号 0.2.90 我的核心代码
def get_model(mmp_model, Q_model): chat_handler = MiniCPMv26ChatHandler(clip_model_path=mmp_model, verbose=False) llm = Llama( n_gpu_layers=-1, model_path=Q_model, chat_handler=chat_handler, n_ctx=1024, #draft_model=True
) return llm
#get model self.llm = get_model(settings.MMP_MODEL, settings.Q_MODEL) #infer result = self.llm.create_chat_completion( max_tokens=20, stop=['。'], messages=msgs ) ######################################################### 每次调用都会出现内存泄露,llama_chat_format.py 我手动释放image_emded 还是不行 后来,我使用llama.cpp 编译出 minicpmv-cli ,我更改部分代码,想执行多次,看是否有内存泄露问题,但是执行到100次左右出现问题:
这是llama.cpp的问题吗?
No response
- OS:Ubuntu 20.04 - Python:3.10 - Transformers: - PyTorch: - CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.2
The text was updated successfully, but these errors were encountered:
No branches or pull requests
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
llama-cpp-python 版本号 0.2.90
我的核心代码
获取量化模型
def get_model(mmp_model, Q_model):
chat_handler = MiniCPMv26ChatHandler(clip_model_path=mmp_model, verbose=False)
llm = Llama(
n_gpu_layers=-1,
model_path=Q_model,
chat_handler=chat_handler,
n_ctx=1024,
#draft_model=True
#get model
self.llm = get_model(settings.MMP_MODEL, settings.Q_MODEL)
#infer
result = self.llm.create_chat_completion(
max_tokens=20,
stop=['。'],
messages=msgs
)
#########################################################
每次调用都会出现内存泄露,llama_chat_format.py
我手动释放image_emded 还是不行
后来,我使用llama.cpp 编译出 minicpmv-cli ,我更改部分代码,想执行多次,看是否有内存泄露问题,但是执行到100次左右出现问题:
期望行为 | Expected Behavior
这是llama.cpp的问题吗?
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
备注 | Anything else?
No response
The text was updated successfully, but these errors were encountered: