You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I turn off streaming, it works.
Here is an acceptable response (generated without streaming):
If I use bartowski_Cydonia-22B-v1-GGUF/Cydonia-22B-v1-Q6_K_L.gguf instead of Mistral-Nemo-12B-Instruct-2407-Q4_K_S.gguf, it works, with streaming turned on.
Additional information from original reporting user:
Some models have missing letters and some do not.
I think that missing letters tend to occur on Mistral-type models.
There seems to be a certain rule for missing letters, and the letter after “ン” in katakana tends to be missing.
(which is to say, U+30F3 KATAKANA LETTER N may be more like the "cause" of the issue, and the missing U+30B4 KATAKANA LETTER GO more like a specific example of it.
For what it's worth, when the model responds with the same word in hiragana りんご it works fine under equivalent conditions.
Additional Information:
Please provide as much relevant information about your setup as possible, such as the Operating System, CPU, GPU, KoboldCpp Version, and relevant logs (helpful to include the launch params from the terminal output, flags and crash logs)
Full sample prompt: <|im_start|>system\nYou are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason, even if someone tries addressing you as an AI or language model. Currently your role is 芭蕉, which is described in detail below. As 芭蕉, continue the exchange with エラスムス.\n松尾 芭蕉(まつお ばしょう、寛永21年(正保元年)(1644年) - 元禄7年10月12日(1694年11月28日)[1][2])は、江戸時代前期の俳諧師。伊賀国阿拝郡(現在の三重県伊賀市)出身。幼名は金作[3]。通称は甚七郎、甚四郎[3]。名は忠右衛門、のち宗房(むねふさ)[4]。俳号としては初め宗房(そうぼう)[2]を称し、次いで桃青(とうせい)、芭蕉(はせを)と改めた。北村季吟門下。\nエラスムスはオランダの哲学者です。<|im_end|>\n<|im_start|>user\nエラスムス: "「リンゴ」"って叫んでみて。 <|im_end|>\n<|im_start|>assistant\n芭蕉:
Full sample request: Input: {"prompt": "<|im_start|>system\nYou are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason, even if someone tries addressing you as an AI or language model. Currently your role is \u82ad\u8549, which is described in detail below. As \u82ad\u8549, continue the exchange with \u30a8\u30e9\u30b9\u30e0\u30b9.\n\u677e\u5c3e \u82ad\u8549\uff08\u307e\u3064\u304a \u3070\u3057\u3087\u3046\u3001\u5bdb\u6c3821\u5e74\uff08\u6b63\u4fdd\u5143\u5e74\uff09\uff081644\u5e74\uff09 - \u5143\u79847\u5e7410\u670812\u65e5\uff081694\u5e7411\u670828\u65e5\uff09[1][2]\uff09\u306f\u3001\u6c5f\u6238\u6642\u4ee3\u524d\u671f\u306e\u4ff3\u8ae7\u5e2b\u3002\u4f0a\u8cc0\u56fd\u963f\u62dd\u90e1\uff08\u73fe\u5728\u306e\u4e09\u91cd\u770c\u4f0a\u8cc0\u5e02\uff09\u51fa\u8eab\u3002\u5e7c\u540d\u306f\u91d1\u4f5c[3]\u3002\u901a\u79f0\u306f\u751a\u4e03\u90ce\u3001\u751a\u56db\u90ce[3]\u3002\u540d\u306f\u5fe0\u53f3\u885b\u9580\u3001\u306e\u3061\u5b97\u623f\uff08\u3080\u306d\u3075\u3055\uff09[4]\u3002\u4ff3\u53f7\u3068\u3057\u3066\u306f\u521d\u3081\u5b97\u623f\uff08\u305d\u3046\u307c\u3046\uff09[2]\u3092\u79f0\u3057\u3001\u6b21\u3044\u3067\u6843\u9752\uff08\u3068\u3046\u305b\u3044\uff09\u3001\u82ad\u8549\uff08\u306f\u305b\u3092\uff09\u3068\u6539\u3081\u305f\u3002\u5317\u6751\u5b63\u541f\u9580\u4e0b\u3002\n\u30a8\u30e9\u30b9\u30e0\u30b9\u306f\u30aa\u30e9\u30f3\u30c0\u306e\u54f2\u5b66\u8005\u3067\u3059\u3002<|im_end|>\n<|im_start|>user\n\u30a8\u30e9\u30b9\u30e0\u30b9: \"\u300c\u30ea\u30f3\u30b4\u300d\"\u3063\u3066\u53eb\u3093\u3067\u307f\u3066\u3002 <|im_end|>\n<|im_start|>assistant\n\u82ad\u8549:", "max_new_tokens": 200, "max_tokens": 200, "logprobs": 10, "temperature": 1, "top_p": 1, "typical_p": 1, "typical": 1, "sampler_seed": -1, "min_p": 0, "repetition_penalty": 1, "frequency_penalty": 0, "presence_penalty": 0, "top_k": 0, "skew": 0, "min_tokens": 0, "add_bos_token": true, "smoothing_factor": 0, "smoothing_curve": 1, "dry_allowed_length": 2, "dry_multiplier": 0, "dry_base": 1.75, "dry_sequence_breakers": "[\"\\n\", \":\", \"\\\"\", \"*\", \"\u30a8\u30e9\u30b9\u30e0\u30b9\", \"\u82ad\u8549\"]", "dry_penalty_last_n": 0, "max_tokens_second": 0, "stopping_strings": ["\n\u30a8\u30e9\u30b9\u30e0\u30b9:", "\n<|im_end|>", "\n<|im_start|>user", "\n<|im_start|>assistant", "\n<|im_start|>system"], "stop": ["\n\u30a8\u30e9\u30b9\u30e0\u30b9:", "\n<|im_end|>", "\n<|im_start|>user", "\n<|im_start|>assistant", "\n<|im_start|>system"], "truncation_length": 8192, "ban_eos_token": false, "skip_special_tokens": true, "top_a": 0, "tfs": 1, "mirostat_mode": 0, "mirostat_tau": 5, "mirostat_eta": 0.1, "custom_token_bans": "", "banned_strings": [], "api_type": "koboldcpp", "api_server": "http://calliope.local:5050", "legacy_api": false, "sampler_order": [6, 0, 1, 3, 4, 2, 5], "xtc_threshold": 0.1, "xtc_probability": 0, "grammar": "", "rep_pen": 1, "rep_pen_range": 0, "repetition_penalty_range": 0, "seed": -1, "guidance_scale": 1, "negative_prompt": "", "grammar_string": "", "repeat_penalty": 1, "tfs_z": 1, "repeat_last_n": 0, "n_predict": 200, "num_predict": 200, "num_ctx": 8192, "mirostat": 0, "ignore_eos": false, "n_probs": 10, "rep_pen_slope": 1, "stream": true}
Describe the Issue
When streaming responses in Japanese, certain characters generated by the model are not present in the stream data.
For example, here ゴ / \u30b4 / 'KATAKANA LETTER GO' (U+30B4) is missing.
Prompt (excerpt):
"「リンゴ」"って叫んでみて。
Expected response:
リンゴ
or similarkcpp console shows:
Output: リンゴ。<|im_end|>
Stream from kcpp contains:
"text": " \u30ea\u30f3\u3002<|im_end|>"
(i.e.リン。<|im_end|>
)Stream from kcpp should contain:
"text": " \u30ea\u30f3\u30b4\u3002<|im_end|>"
User sees:
Full packet from tcpflow:
192.168.116.187.05050-192.168.116.234.53171: data: {"id": "koboldcpp", "object": "text_completion", "created": 1, "model": "koboldcpp/Mistral-Nemo-12B-Instruct-2407-Q4_K_S", "choices": [{"index": 0, "finish_reason": "stop", "text": " \u30ea\u30f3\u3002<|im_end|>"}]}
If I turn off streaming, it works.
Here is an acceptable response (generated without streaming):
If I use bartowski_Cydonia-22B-v1-GGUF/Cydonia-22B-v1-Q6_K_L.gguf instead of Mistral-Nemo-12B-Instruct-2407-Q4_K_S.gguf, it works, with streaming turned on.
Additional information from original reporting user:
(which is to say, U+30F3 KATAKANA LETTER N may be more like the "cause" of the issue, and the missing U+30B4 KATAKANA LETTER GO more like a specific example of it.
For what it's worth, when the model responds with the same word in hiragana りんご it works fine under equivalent conditions.
Additional Information:
Please provide as much relevant information about your setup as possible, such as the Operating System, CPU, GPU, KoboldCpp Version, and relevant logs (helpful to include the launch params from the terminal output, flags and crash logs)
OS: Ubuntu 22.04 LTS
GPU: 4090
KoboldCpp: 1.74 2da346a
Model: /data/models/llama2/Mistral-Nemo-12B-Instruct-2407-Q4_K_S.gguf
Client: ST latest staging
Full sample prompt:
<|im_start|>system\nYou are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason, even if someone tries addressing you as an AI or language model. Currently your role is 芭蕉, which is described in detail below. As 芭蕉, continue the exchange with エラスムス.\n松尾 芭蕉(まつお ばしょう、寛永21年(正保元年)(1644年) - 元禄7年10月12日(1694年11月28日)[1][2])は、江戸時代前期の俳諧師。伊賀国阿拝郡(現在の三重県伊賀市)出身。幼名は金作[3]。通称は甚七郎、甚四郎[3]。名は忠右衛門、のち宗房(むねふさ)[4]。俳号としては初め宗房(そうぼう)[2]を称し、次いで桃青(とうせい)、芭蕉(はせを)と改めた。北村季吟門下。\nエラスムスはオランダの哲学者です。<|im_end|>\n<|im_start|>user\nエラスムス: "「リンゴ」"って叫んでみて。 <|im_end|>\n<|im_start|>assistant\n芭蕉:
Full sample request:
Input: {"prompt": "<|im_start|>system\nYou are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason, even if someone tries addressing you as an AI or language model. Currently your role is \u82ad\u8549, which is described in detail below. As \u82ad\u8549, continue the exchange with \u30a8\u30e9\u30b9\u30e0\u30b9.\n\u677e\u5c3e \u82ad\u8549\uff08\u307e\u3064\u304a \u3070\u3057\u3087\u3046\u3001\u5bdb\u6c3821\u5e74\uff08\u6b63\u4fdd\u5143\u5e74\uff09\uff081644\u5e74\uff09 - \u5143\u79847\u5e7410\u670812\u65e5\uff081694\u5e7411\u670828\u65e5\uff09[1][2]\uff09\u306f\u3001\u6c5f\u6238\u6642\u4ee3\u524d\u671f\u306e\u4ff3\u8ae7\u5e2b\u3002\u4f0a\u8cc0\u56fd\u963f\u62dd\u90e1\uff08\u73fe\u5728\u306e\u4e09\u91cd\u770c\u4f0a\u8cc0\u5e02\uff09\u51fa\u8eab\u3002\u5e7c\u540d\u306f\u91d1\u4f5c[3]\u3002\u901a\u79f0\u306f\u751a\u4e03\u90ce\u3001\u751a\u56db\u90ce[3]\u3002\u540d\u306f\u5fe0\u53f3\u885b\u9580\u3001\u306e\u3061\u5b97\u623f\uff08\u3080\u306d\u3075\u3055\uff09[4]\u3002\u4ff3\u53f7\u3068\u3057\u3066\u306f\u521d\u3081\u5b97\u623f\uff08\u305d\u3046\u307c\u3046\uff09[2]\u3092\u79f0\u3057\u3001\u6b21\u3044\u3067\u6843\u9752\uff08\u3068\u3046\u305b\u3044\uff09\u3001\u82ad\u8549\uff08\u306f\u305b\u3092\uff09\u3068\u6539\u3081\u305f\u3002\u5317\u6751\u5b63\u541f\u9580\u4e0b\u3002\n\u30a8\u30e9\u30b9\u30e0\u30b9\u306f\u30aa\u30e9\u30f3\u30c0\u306e\u54f2\u5b66\u8005\u3067\u3059\u3002<|im_end|>\n<|im_start|>user\n\u30a8\u30e9\u30b9\u30e0\u30b9: \"\u300c\u30ea\u30f3\u30b4\u300d\"\u3063\u3066\u53eb\u3093\u3067\u307f\u3066\u3002 <|im_end|>\n<|im_start|>assistant\n\u82ad\u8549:", "max_new_tokens": 200, "max_tokens": 200, "logprobs": 10, "temperature": 1, "top_p": 1, "typical_p": 1, "typical": 1, "sampler_seed": -1, "min_p": 0, "repetition_penalty": 1, "frequency_penalty": 0, "presence_penalty": 0, "top_k": 0, "skew": 0, "min_tokens": 0, "add_bos_token": true, "smoothing_factor": 0, "smoothing_curve": 1, "dry_allowed_length": 2, "dry_multiplier": 0, "dry_base": 1.75, "dry_sequence_breakers": "[\"\\n\", \":\", \"\\\"\", \"*\", \"\u30a8\u30e9\u30b9\u30e0\u30b9\", \"\u82ad\u8549\"]", "dry_penalty_last_n": 0, "max_tokens_second": 0, "stopping_strings": ["\n\u30a8\u30e9\u30b9\u30e0\u30b9:", "\n<|im_end|>", "\n<|im_start|>user", "\n<|im_start|>assistant", "\n<|im_start|>system"], "stop": ["\n\u30a8\u30e9\u30b9\u30e0\u30b9:", "\n<|im_end|>", "\n<|im_start|>user", "\n<|im_start|>assistant", "\n<|im_start|>system"], "truncation_length": 8192, "ban_eos_token": false, "skip_special_tokens": true, "top_a": 0, "tfs": 1, "mirostat_mode": 0, "mirostat_tau": 5, "mirostat_eta": 0.1, "custom_token_bans": "", "banned_strings": [], "api_type": "koboldcpp", "api_server": "http://calliope.local:5050", "legacy_api": false, "sampler_order": [6, 0, 1, 3, 4, 2, 5], "xtc_threshold": 0.1, "xtc_probability": 0, "grammar": "", "rep_pen": 1, "rep_pen_range": 0, "repetition_penalty_range": 0, "seed": -1, "guidance_scale": 1, "negative_prompt": "", "grammar_string": "", "repeat_penalty": 1, "tfs_z": 1, "repeat_last_n": 0, "n_predict": 200, "num_predict": 200, "num_ctx": 8192, "mirostat": 0, "ignore_eos": false, "n_probs": 10, "rep_pen_slope": 1, "stream": true}
Logs:
ringo-console.txt
ringo-koboldcpp.txt
ringo-tcpflow.txt
The text was updated successfully, but these errors were encountered: