[BUG] <title>llama minicpmv-cli 有内存泄露问题？ #703

Liwx1014 · 2024-12-30T07:31:02Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

llama-cpp-python 版本号 0.2.90
我的核心代码

获取量化模型

def get_model(mmp_model, Q_model):
chat_handler = MiniCPMv26ChatHandler(clip_model_path=mmp_model, verbose=False)
llm = Llama(
n_gpu_layers=-1,
model_path=Q_model,
chat_handler=chat_handler,
n_ctx=1024,
#draft_model=True

)
return llm

#get model
self.llm = get_model(settings.MMP_MODEL, settings.Q_MODEL)
#infer
result = self.llm.create_chat_completion(
max_tokens=20,
stop=['。'],
messages=msgs
)
#########################################################
每次调用都会出现内存泄露，llama_chat_format.py

我手动释放image_emded 还是不行
后来，我使用llama.cpp 编译出 minicpmv-cli ,我更改部分代码，想执行多次，看是否有内存泄露问题，但是执行到100次左右出现问题：

期望行为 | Expected Behavior

这是llama.cpp的问题吗？

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:Ubuntu 20.04
- Python:3.10
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.2

备注 | Anything else?

No response

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] <title>llama minicpmv-cli 有内存泄露问题？ #703

[BUG] <title>llama minicpmv-cli 有内存泄露问题？ #703

Liwx1014 commented Dec 30, 2024

[BUG] <title>llama minicpmv-cli 有内存泄露问题？ #703

[BUG] <title>llama minicpmv-cli 有内存泄露问题？ #703

Comments

Liwx1014 commented Dec 30, 2024

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

获取量化模型

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?