Poro-34B-chat tokenizer support #7713

ezosa · 2024-06-03T12:10:06Z

Implemented pre-tokenizer support for Poro-34B-chat.

Added tokenizer type for Poro-34B-chat in convert-hf-to-gguf-update.py
Added the chkhsh for Poro-34B-chat in convert-hf-to-gguf.py
Added LLAMA_VOCAB_PRE_TYPE_PORO enum to llama.h
Added pre-tokenizer regex for LLAMA_VOCAB_PRE_TYPE_PORO to llama.cpp
Ran ./tests/test-tokenizer-0 ./models/ggml-vocab-Poro-34B-chat.gguf. Tests passed.

Related to PR #7328 since Poro and Viking share the same pre-tokenizer regex

akx · 2024-06-04T06:40:51Z

@ezosa Are you seeing the same issues as in #7328 (comment)? 🤔

ezosa · 2024-06-04T06:53:00Z

@ezosa Are you seeing the same issues as in #7328 (comment)? 🤔

Weirdly, no. My tests all passed. Maybe the failed test is specific to Viking? I'll have a look at Viking soon.

src: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天～'
res: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天～'
tok: 17720 35 356 90701 24 2888 564 569 11892 234 2076 13217 37414 7359 21264 55110 1688 1581 45843 29066 65074 263 

src: 'ied 4 ½ months'
res: 'ied 4 ½ months'
tok: 907 802 51074 5481 

src: 'w048 7tuijk dsdfhu'
res: 'w048 7tuijk dsdfhu'
tok: 72235 2928 1158 507 72043 32710 3128 3836 

src: 'нещо на Български'
res: 'нещо на Български'
tok: 40411 12118 921 7866 24106 24892 1953 6197 13534 19610 

src: 'កាន់តែពិសេសអាចខលចេញ'
res: 'កាន់តែពិសេសអាចខលចេញ'
tok: 19523 233 104963 252 52087 244 19523 248 52087 235 19523 255 19523 139 19523 264 52087 234 19523 264 19523 119 104963 238 19523 234 19523 260 19523 238 52087 234 19523 242 

src: '🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token)'
res: '🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token)'
tok: 4318 259 233 365 16007 32 11892 138 102753 117264 128 26036 365 66533 2953 106742 65851 708 32 38132 238 365 5864 88269 451 773 920 1974 7023 32 

Tests passed

jonabur · 2024-06-04T07:27:58Z

The regex @ezosa used is slightly different than the one I used, so that likely explains the difference in test performance.

github-actions · 2024-06-07T06:10:56Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 561 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8325.25ms p(95)=19115.68ms fails=, finish reason: stop=516 truncated=45
Prompt processing (pp): avg=92.16tk/s p(95)=374.46tk/s
Token generation (tg): avg=46.27tk/s p(95)=50.98tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=master commit=de60204de3e39f15853f4fb6bdbe48a6ef18589e

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 561 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717740021 --> 1717740649
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1030.35, 1030.35, 1030.35, 1030.35, 1030.35, 974.44, 974.44, 974.44, 974.44, 974.44, 991.93, 991.93, 991.93, 991.93, 991.93, 1013.77, 1013.77, 1013.77, 1013.77, 1013.77, 1001.05, 1001.05, 1001.05, 1001.05, 1001.05, 991.39, 991.39, 991.39, 991.39, 991.39, 1003.88, 1003.88, 1003.88, 1003.88, 1003.88, 995.24, 995.24, 995.24, 995.24, 995.24, 994.8, 994.8, 994.8, 994.8, 994.8, 1004.35, 1004.35, 1004.35, 1004.35, 1004.35, 992.6, 992.6, 992.6, 992.6, 992.6, 972.13, 972.13, 972.13, 972.13, 972.13, 977.95, 977.95, 977.95, 977.95, 977.95, 981.86, 981.86, 981.86, 981.86, 981.86, 978.78, 978.78, 978.78, 978.78, 978.78, 978.1, 978.1, 978.1, 978.1, 978.1, 972.37, 972.37, 972.37, 972.37, 972.37, 988.64, 988.64, 988.64, 988.64, 988.64, 977.56, 977.56, 977.56, 977.56, 977.56, 982.48, 982.48, 982.48, 982.48, 982.48, 976.35, 976.35, 976.35, 976.35, 976.35, 975.08, 975.08, 975.08, 975.08, 975.08, 958.05, 958.05, 958.05, 958.05, 958.05, 958.02, 958.02, 958.02, 958.02, 958.02, 957.07, 957.07, 957.07, 957.07, 957.07, 946.93, 946.93, 946.93, 946.93, 946.93, 943.66, 943.66, 943.66, 943.66, 943.66, 942.52, 942.52, 942.52, 942.52, 942.52, 946.44, 946.44, 946.44, 946.44, 946.44, 944.69, 944.69, 944.69, 944.69, 944.69, 943.19, 943.19, 943.19, 943.19, 943.19, 944.09, 944.09, 944.09, 944.09, 944.09, 906.5, 906.5, 906.5, 906.5, 906.5, 896.39, 896.39, 896.39, 896.39, 896.39, 880.71, 880.71, 880.71, 880.71, 880.71, 877.71, 877.71, 877.71, 877.71, 877.71, 880.67, 880.67, 880.67, 880.67, 880.67, 882.04, 882.04, 882.04, 882.04, 882.04, 882.64, 882.64, 882.64, 882.64, 882.64, 877.8, 877.8, 877.8, 877.8, 877.8, 831.93, 831.93, 831.93, 831.93, 831.93, 831.78, 831.78, 831.78, 831.78, 831.78, 831.55, 831.55, 831.55, 831.55, 831.55, 823.49, 823.49, 823.49, 823.49, 823.49, 827.02, 827.02, 827.02, 827.02, 827.02, 829.1, 829.1, 829.1, 829.1, 829.1, 827.39, 827.39, 827.39, 827.39, 827.39, 827.08, 827.08, 827.08, 827.08, 827.08, 833.13, 833.13, 833.13, 833.13, 833.13, 837.79, 837.79, 837.79, 837.79, 837.79, 842.96, 842.96, 842.96, 842.96, 842.96, 844.18, 844.18, 844.18, 844.18, 844.18, 848.99, 848.99, 848.99, 848.99, 848.99, 848.56, 848.56, 848.56, 848.56, 848.56, 848.28, 848.28, 848.28, 848.28, 848.28, 850.97, 850.97, 850.97, 850.97, 850.97, 852.34, 852.34, 852.34, 852.34, 852.34, 852.01, 852.01, 852.01, 852.01, 852.01, 852.08, 852.08, 852.08, 852.08, 852.08, 852.76, 852.76, 852.76, 852.76, 852.76]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 561 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717740021 --> 1717740649
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 47.53, 47.53, 47.53, 47.53, 47.53, 25.39, 25.39, 25.39, 25.39, 25.39, 28.51, 28.51, 28.51, 28.51, 28.51, 32.05, 32.05, 32.05, 32.05, 32.05, 32.46, 32.46, 32.46, 32.46, 32.46, 33.58, 33.58, 33.58, 33.58, 33.58, 34.23, 34.23, 34.23, 34.23, 34.23, 34.52, 34.52, 34.52, 34.52, 34.52, 34.3, 34.3, 34.3, 34.3, 34.3, 33.95, 33.95, 33.95, 33.95, 33.95, 33.75, 33.75, 33.75, 33.75, 33.75, 33.3, 33.3, 33.3, 33.3, 33.3, 33.19, 33.19, 33.19, 33.19, 33.19, 32.88, 32.88, 32.88, 32.88, 32.88, 31.67, 31.67, 31.67, 31.67, 31.67, 29.73, 29.73, 29.73, 29.73, 29.73, 30.1, 30.1, 30.1, 30.1, 30.1, 30.15, 30.15, 30.15, 30.15, 30.15, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 30.0, 30.0, 30.0, 30.0, 30.0, 30.02, 30.02, 30.02, 30.02, 30.02, 30.1, 30.1, 30.1, 30.1, 30.1, 30.01, 30.01, 30.01, 30.01, 30.01, 30.3, 30.3, 30.3, 30.3, 30.3, 30.44, 30.44, 30.44, 30.44, 30.44, 30.3, 30.3, 30.3, 30.3, 30.3, 30.7, 30.7, 30.7, 30.7, 30.7, 30.87, 30.87, 30.87, 30.87, 30.87, 31.03, 31.03, 31.03, 31.03, 31.03, 31.17, 31.17, 31.17, 31.17, 31.17, 31.28, 31.28, 31.28, 31.28, 31.28, 31.22, 31.22, 31.22, 31.22, 31.22, 31.04, 31.04, 31.04, 31.04, 31.04, 31.07, 31.07, 31.07, 31.07, 31.07, 30.98, 30.98, 30.98, 30.98, 30.98, 31.1, 31.1, 31.1, 31.1, 31.1, 31.27, 31.27, 31.27, 31.27, 31.27, 31.39, 31.39, 31.39, 31.39, 31.39, 31.51, 31.51, 31.51, 31.51, 31.51, 31.34, 31.34, 31.34, 31.34, 31.34, 31.31, 31.31, 31.31, 31.31, 31.31, 31.26, 31.26, 31.26, 31.26, 31.26, 29.51, 29.51, 29.51, 29.51, 29.51, 29.5, 29.5, 29.5, 29.5, 29.5, 29.51, 29.51, 29.51, 29.51, 29.51, 29.31, 29.31, 29.31, 29.31, 29.31, 29.34, 29.34, 29.34, 29.34, 29.34, 29.35, 29.35, 29.35, 29.35, 29.35, 29.45, 29.45, 29.45, 29.45, 29.45, 29.44, 29.44, 29.44, 29.44, 29.44, 29.36, 29.36, 29.36, 29.36, 29.36, 29.29, 29.29, 29.29, 29.29, 29.29, 29.26, 29.26, 29.26, 29.26, 29.26, 29.34, 29.34, 29.34, 29.34, 29.34, 29.45, 29.45, 29.45, 29.45, 29.45, 29.54, 29.54, 29.54, 29.54, 29.54, 29.6, 29.6, 29.6, 29.6, 29.6, 29.68, 29.68, 29.68, 29.68, 29.68, 29.72, 29.72, 29.72, 29.72, 29.72]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 561 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717740021 --> 1717740649
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.41, 0.41, 0.41, 0.41, 0.41, 0.24, 0.24, 0.24, 0.24, 0.24, 0.11, 0.11, 0.11, 0.11, 0.11, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.35, 0.35, 0.35, 0.35, 0.35, 0.44, 0.44, 0.44, 0.44, 0.44, 0.34, 0.34, 0.34, 0.34, 0.34, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.15, 0.15, 0.15, 0.15, 0.15, 0.31, 0.31, 0.31, 0.31, 0.31, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.28, 0.28, 0.28, 0.28, 0.28, 0.08, 0.08, 0.08, 0.08, 0.08, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.27, 0.27, 0.27, 0.27, 0.27, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.38, 0.38, 0.38, 0.38, 0.38, 0.53, 0.53, 0.53, 0.53, 0.53, 0.55, 0.55, 0.55, 0.55, 0.55, 0.59, 0.59, 0.59, 0.59, 0.59, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.3, 0.3, 0.3, 0.32, 0.32, 0.32, 0.32, 0.32, 0.1, 0.1, 0.1, 0.1, 0.1, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.32, 0.32, 0.32, 0.32, 0.32, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 561 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717740021 --> 1717740649
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0]

jonabur · 2024-06-14T07:37:35Z

Any hope on getting this merged?

convert-hf-to-gguf-update.py

Co-authored-by: Georgi Gerganov <[email protected]>

ezosa

Changed Poro-34B-chat to poro-chat in the relevant files

convert-hf-to-gguf-update.py

llama.cpp

ezosa added 2 commits May 27, 2024 14:09

support for Poro chat pre-tokenizer

f353414

add support for Poro pre-tokenizer

d8033d9

github-actions bot added the python python script changes label Jun 3, 2024

mofosyne added enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels Jun 5, 2024

Merge branch 'master' into master

de60204

ggerganov approved these changes Jun 14, 2024

View reviewed changes

convert-hf-to-gguf-update.py Outdated Show resolved Hide resolved

ezosa and others added 3 commits June 14, 2024 13:06

Update convert-hf-to-gguf-update.py

a75f69a

Co-authored-by: Georgi Gerganov <[email protected]>

Change Poro-34B-chat to poro-chat

cd974f1

Change Poro-34B-chat to poro-chat

5d676a2

ezosa commented Jun 14, 2024

View reviewed changes

ggerganov reviewed Jun 14, 2024

View reviewed changes

convert-hf-to-gguf-update.py Outdated Show resolved Hide resolved

Update convert-hf-to-gguf-update.py

1c03036

ggerganov reviewed Jun 14, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

Update llama.cpp

af01910

ggerganov merged commit 41b9260 into ggerganov:master Jun 14, 2024
56 of 66 checks passed

ezosa mentioned this pull request Jun 20, 2024

Viking tokenizer support #7328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poro-34B-chat tokenizer support #7713

Poro-34B-chat tokenizer support #7713

ezosa commented Jun 3, 2024

akx commented Jun 4, 2024

ezosa commented Jun 4, 2024

jonabur commented Jun 4, 2024

github-actions bot commented Jun 7, 2024

jonabur commented Jun 14, 2024

ezosa left a comment

Poro-34B-chat tokenizer support #7713

Poro-34B-chat tokenizer support #7713

Conversation

ezosa commented Jun 3, 2024

akx commented Jun 4, 2024

ezosa commented Jun 4, 2024

jonabur commented Jun 4, 2024

github-actions bot commented Jun 7, 2024

jonabur commented Jun 14, 2024

ezosa left a comment

Choose a reason for hiding this comment