Status of Qwen*ForCausalLm ? (re: Support for MiniCPM changes) #10666

robbiemu · 2024-12-04T23:39:39Z

robbiemu
Dec 4, 2024

I notice a lot of (I believe) fairly new changes related to Qwen models in convert_hf_to_gguf.py -- originally I thought it was probably in answer to bartowski's question. But the work appears incomplete.

I dont notice any plan or any public detail to align the work towards, which makes it hard to contribute. I was actually going to try to knock out bartowski's request after completing some work on a project I have ( llama-gguf-optimize), which I've done, but now that I am looking again to begin, I see these changes that I wasn't expecting.

This was added with Support for MiniCPM about 14 hours ago .. so, not terribly long ago and maybe there's something I am missing -- I could use some help in finding it. Is there any plan to add support for the yarn long context, as described in my comment on Bartowski's earlier post?

cc: @JFLFY2255

ggerganov · 2024-12-05T08:05:39Z

ggerganov
Dec 5, 2024
Maintainer

I don't think anyone is working on Yarn long context for Qwen models. I'm not very familiar with the concept and personally find the 32k context enough for my needs. But there is no reason not to support it if it works, so feel free to add whatever changes are necessary.

3 replies

robbiemu Dec 7, 2024
Author

thanks, the PR draft is here: https://github.com/ggerganov/llama.cpp/pull/10698/files

I still need to verify it, figuring that out now (who knew it could be so complicated -- I wrote a very basic test but I notice I run OOM with Ph3Mini at fp16 when using the full context to actually feed forward, and do not with my model, so I may be missing something still).

from llama_cpp import Llama


model=0
if model == 0: #n_ctx = 131072
    MODEL_PATH="Qwen2.5_7b_bf16.gguf"
    arch_prefix = "qwen2"
elif model == 1: #n_ctx = 32768 -- should fail
    MODEL_PATH="../Qwen2.5-0.5B/Qwen2.5_0.5b_bf16.gguf"
    arch_prefix = "qwen2"
else: #n_ctx = 131072
    MODEL_PATH="../Phi-3-mini-128k-instruct/Phi-3-mini-128k-instruct_fp16.gguf"
    arch_prefix = "phi3"

# Load your model
model = Llama(model_path=MODEL_PATH, n_ctx=0, n_gpu_layers=99, verbose=True)

# Access the metadata
metadata_context_length = int(model.metadata.get(f"{arch_prefix}.context_length"))
n_ctx_value = model.n_ctx()

# Print metadata values
print(f"Model's metadata context length (qwen2.context_length): {metadata_context_length}")
print(f"Model's `n_ctx()` reported value: {n_ctx_value}")

# Helper function to create test input
def generate_input(length):
    return " ".join(["test"] * length)

# Test below, at, and above the context length
#test_lengths = [metadata_context_length - 1, metadata_context_length, metadata_context_length + 1]
test_lengths = [131072 - 2]

for length in test_lengths:
    print(f"\nTesting with input length: {length}")
    test_input = generate_input(length)
    
    try:
        # Generate a response from the model
        response = model(test_input)
        print(f"Success for length {length}. Response received.")
    except Exception as e:
        print(f"Error for length {length}: {e}")

ggerganov Dec 7, 2024
Maintainer

Another way to test if it works is with the passkey test.

robbiemu Dec 7, 2024
Author

💯 wow that is handy :) thank you.

I ran a quick test and can see that it works!

main: passkey = 25250, inserted at position 1807 / 2500 (token pos: ~43414)

 What is the pass key? The pass key is 25250. Remember it. 25250

43k > 32k that is otherwise the max for this model.
I am now rerunning with 5420:

main: n_len = 130161, n_ctx = 131072, n_kv_req = 131104, n_grp = 1, n_batch = 2048, n_junk = 5420, i_pos = 547

prefix tokens: 30
prompt tokens: 130145

which is very nearly the entire context size, will attach to my pr and remove from draft :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status of Qwen*ForCausalLm ? (re: Support for MiniCPM changes) #10666

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Status of Qwen*ForCausalLm ? (re: Support for MiniCPM changes) #10666

robbiemu Dec 4, 2024

Replies: 1 comment · 3 replies

ggerganov Dec 5, 2024 Maintainer

robbiemu Dec 7, 2024 Author

ggerganov Dec 7, 2024 Maintainer

robbiemu Dec 7, 2024 Author

robbiemu
Dec 4, 2024

Replies: 1 comment 3 replies

ggerganov
Dec 5, 2024
Maintainer

robbiemu Dec 7, 2024
Author

ggerganov Dec 7, 2024
Maintainer

robbiemu Dec 7, 2024
Author