Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poro-34B-chat tokenizer support #7713

Merged
merged 8 commits into from
Jun 14, 2024
Merged

Poro-34B-chat tokenizer support #7713

merged 8 commits into from
Jun 14, 2024

Conversation

ezosa
Copy link
Contributor

@ezosa ezosa commented Jun 3, 2024

Implemented pre-tokenizer support for Poro-34B-chat.

  • Added tokenizer type for Poro-34B-chat in convert-hf-to-gguf-update.py
  • Added the chkhsh for Poro-34B-chat in convert-hf-to-gguf.py
  • Added LLAMA_VOCAB_PRE_TYPE_PORO enum to llama.h
  • Added pre-tokenizer regex for LLAMA_VOCAB_PRE_TYPE_PORO to llama.cpp
  • Ran ./tests/test-tokenizer-0 ./models/ggml-vocab-Poro-34B-chat.gguf. Tests passed.

Related to PR #7328 since Poro and Viking share the same pre-tokenizer regex

@github-actions github-actions bot added the python python script changes label Jun 3, 2024
@akx
Copy link
Contributor

akx commented Jun 4, 2024

@ezosa Are you seeing the same issues as in #7328 (comment)? 🤔

@ezosa
Copy link
Contributor Author

ezosa commented Jun 4, 2024

@ezosa Are you seeing the same issues as in #7328 (comment)? 🤔

Weirdly, no. My tests all passed. Maybe the failed test is specific to Viking? I'll have a look at Viking soon.

src: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天~'
res: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天~'
tok: 17720 35 356 90701 24 2888 564 569 11892 234 2076 13217 37414 7359 21264 55110 1688 1581 45843 29066 65074 263 

src: 'ied 4 ½ months'
res: 'ied 4 ½ months'
tok: 907 802 51074 5481 

src: 'w048 7tuijk dsdfhu'
res: 'w048 7tuijk dsdfhu'
tok: 72235 2928 1158 507 72043 32710 3128 3836 

src: 'нещо на Български'
res: 'нещо на Български'
tok: 40411 12118 921 7866 24106 24892 1953 6197 13534 19610 

src: 'កាន់តែពិសេសអាចខលចេញ'
res: 'កាន់តែពិសេសអាចខលចេញ'
tok: 19523 233 104963 252 52087 244 19523 248 52087 235 19523 255 19523 139 19523 264 52087 234 19523 264 19523 119 104963 238 19523 234 19523 260 19523 238 52087 234 19523 242 

src: '🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token)'
res: '🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token)'
tok: 4318 259 233 365 16007 32 11892 138 102753 117264 128 26036 365 66533 2953 106742 65851 708 32 38132 238 365 5864 88269 451 773 920 1974 7023 32 

Tests passed

@jonabur
Copy link

jonabur commented Jun 4, 2024

The regex @ezosa used is slightly different than the one I used, so that likely explains the difference in test performance.

@mofosyne mofosyne added enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels Jun 5, 2024
Copy link
Contributor

github-actions bot commented Jun 7, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 561 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8325.25ms p(95)=19115.68ms fails=, finish reason: stop=516 truncated=45
  • Prompt processing (pp): avg=92.16tk/s p(95)=374.46tk/s
  • Token generation (tg): avg=46.27tk/s p(95)=50.98tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=master commit=de60204de3e39f15853f4fb6bdbe48a6ef18589e

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 561 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717740021 --> 1717740649
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1030.35, 1030.35, 1030.35, 1030.35, 1030.35, 974.44, 974.44, 974.44, 974.44, 974.44, 991.93, 991.93, 991.93, 991.93, 991.93, 1013.77, 1013.77, 1013.77, 1013.77, 1013.77, 1001.05, 1001.05, 1001.05, 1001.05, 1001.05, 991.39, 991.39, 991.39, 991.39, 991.39, 1003.88, 1003.88, 1003.88, 1003.88, 1003.88, 995.24, 995.24, 995.24, 995.24, 995.24, 994.8, 994.8, 994.8, 994.8, 994.8, 1004.35, 1004.35, 1004.35, 1004.35, 1004.35, 992.6, 992.6, 992.6, 992.6, 992.6, 972.13, 972.13, 972.13, 972.13, 972.13, 977.95, 977.95, 977.95, 977.95, 977.95, 981.86, 981.86, 981.86, 981.86, 981.86, 978.78, 978.78, 978.78, 978.78, 978.78, 978.1, 978.1, 978.1, 978.1, 978.1, 972.37, 972.37, 972.37, 972.37, 972.37, 988.64, 988.64, 988.64, 988.64, 988.64, 977.56, 977.56, 977.56, 977.56, 977.56, 982.48, 982.48, 982.48, 982.48, 982.48, 976.35, 976.35, 976.35, 976.35, 976.35, 975.08, 975.08, 975.08, 975.08, 975.08, 958.05, 958.05, 958.05, 958.05, 958.05, 958.02, 958.02, 958.02, 958.02, 958.02, 957.07, 957.07, 957.07, 957.07, 957.07, 946.93, 946.93, 946.93, 946.93, 946.93, 943.66, 943.66, 943.66, 943.66, 943.66, 942.52, 942.52, 942.52, 942.52, 942.52, 946.44, 946.44, 946.44, 946.44, 946.44, 944.69, 944.69, 944.69, 944.69, 944.69, 943.19, 943.19, 943.19, 943.19, 943.19, 944.09, 944.09, 944.09, 944.09, 944.09, 906.5, 906.5, 906.5, 906.5, 906.5, 896.39, 896.39, 896.39, 896.39, 896.39, 880.71, 880.71, 880.71, 880.71, 880.71, 877.71, 877.71, 877.71, 877.71, 877.71, 880.67, 880.67, 880.67, 880.67, 880.67, 882.04, 882.04, 882.04, 882.04, 882.04, 882.64, 882.64, 882.64, 882.64, 882.64, 877.8, 877.8, 877.8, 877.8, 877.8, 831.93, 831.93, 831.93, 831.93, 831.93, 831.78, 831.78, 831.78, 831.78, 831.78, 831.55, 831.55, 831.55, 831.55, 831.55, 823.49, 823.49, 823.49, 823.49, 823.49, 827.02, 827.02, 827.02, 827.02, 827.02, 829.1, 829.1, 829.1, 829.1, 829.1, 827.39, 827.39, 827.39, 827.39, 827.39, 827.08, 827.08, 827.08, 827.08, 827.08, 833.13, 833.13, 833.13, 833.13, 833.13, 837.79, 837.79, 837.79, 837.79, 837.79, 842.96, 842.96, 842.96, 842.96, 842.96, 844.18, 844.18, 844.18, 844.18, 844.18, 848.99, 848.99, 848.99, 848.99, 848.99, 848.56, 848.56, 848.56, 848.56, 848.56, 848.28, 848.28, 848.28, 848.28, 848.28, 850.97, 850.97, 850.97, 850.97, 850.97, 852.34, 852.34, 852.34, 852.34, 852.34, 852.01, 852.01, 852.01, 852.01, 852.01, 852.08, 852.08, 852.08, 852.08, 852.08, 852.76, 852.76, 852.76, 852.76, 852.76]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 561 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717740021 --> 1717740649
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 47.53, 47.53, 47.53, 47.53, 47.53, 25.39, 25.39, 25.39, 25.39, 25.39, 28.51, 28.51, 28.51, 28.51, 28.51, 32.05, 32.05, 32.05, 32.05, 32.05, 32.46, 32.46, 32.46, 32.46, 32.46, 33.58, 33.58, 33.58, 33.58, 33.58, 34.23, 34.23, 34.23, 34.23, 34.23, 34.52, 34.52, 34.52, 34.52, 34.52, 34.3, 34.3, 34.3, 34.3, 34.3, 33.95, 33.95, 33.95, 33.95, 33.95, 33.75, 33.75, 33.75, 33.75, 33.75, 33.3, 33.3, 33.3, 33.3, 33.3, 33.19, 33.19, 33.19, 33.19, 33.19, 32.88, 32.88, 32.88, 32.88, 32.88, 31.67, 31.67, 31.67, 31.67, 31.67, 29.73, 29.73, 29.73, 29.73, 29.73, 30.1, 30.1, 30.1, 30.1, 30.1, 30.15, 30.15, 30.15, 30.15, 30.15, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 29.97, 30.0, 30.0, 30.0, 30.0, 30.0, 30.02, 30.02, 30.02, 30.02, 30.02, 30.1, 30.1, 30.1, 30.1, 30.1, 30.01, 30.01, 30.01, 30.01, 30.01, 30.3, 30.3, 30.3, 30.3, 30.3, 30.44, 30.44, 30.44, 30.44, 30.44, 30.3, 30.3, 30.3, 30.3, 30.3, 30.7, 30.7, 30.7, 30.7, 30.7, 30.87, 30.87, 30.87, 30.87, 30.87, 31.03, 31.03, 31.03, 31.03, 31.03, 31.17, 31.17, 31.17, 31.17, 31.17, 31.28, 31.28, 31.28, 31.28, 31.28, 31.22, 31.22, 31.22, 31.22, 31.22, 31.04, 31.04, 31.04, 31.04, 31.04, 31.07, 31.07, 31.07, 31.07, 31.07, 30.98, 30.98, 30.98, 30.98, 30.98, 31.1, 31.1, 31.1, 31.1, 31.1, 31.27, 31.27, 31.27, 31.27, 31.27, 31.39, 31.39, 31.39, 31.39, 31.39, 31.51, 31.51, 31.51, 31.51, 31.51, 31.34, 31.34, 31.34, 31.34, 31.34, 31.31, 31.31, 31.31, 31.31, 31.31, 31.26, 31.26, 31.26, 31.26, 31.26, 29.51, 29.51, 29.51, 29.51, 29.51, 29.5, 29.5, 29.5, 29.5, 29.5, 29.51, 29.51, 29.51, 29.51, 29.51, 29.31, 29.31, 29.31, 29.31, 29.31, 29.34, 29.34, 29.34, 29.34, 29.34, 29.35, 29.35, 29.35, 29.35, 29.35, 29.45, 29.45, 29.45, 29.45, 29.45, 29.44, 29.44, 29.44, 29.44, 29.44, 29.36, 29.36, 29.36, 29.36, 29.36, 29.29, 29.29, 29.29, 29.29, 29.29, 29.26, 29.26, 29.26, 29.26, 29.26, 29.34, 29.34, 29.34, 29.34, 29.34, 29.45, 29.45, 29.45, 29.45, 29.45, 29.54, 29.54, 29.54, 29.54, 29.54, 29.6, 29.6, 29.6, 29.6, 29.6, 29.68, 29.68, 29.68, 29.68, 29.68, 29.72, 29.72, 29.72, 29.72, 29.72]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 561 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717740021 --> 1717740649
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.41, 0.41, 0.41, 0.41, 0.41, 0.24, 0.24, 0.24, 0.24, 0.24, 0.11, 0.11, 0.11, 0.11, 0.11, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.35, 0.35, 0.35, 0.35, 0.35, 0.44, 0.44, 0.44, 0.44, 0.44, 0.34, 0.34, 0.34, 0.34, 0.34, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.15, 0.15, 0.15, 0.15, 0.15, 0.31, 0.31, 0.31, 0.31, 0.31, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.28, 0.28, 0.28, 0.28, 0.28, 0.08, 0.08, 0.08, 0.08, 0.08, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.27, 0.27, 0.27, 0.27, 0.27, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.38, 0.38, 0.38, 0.38, 0.38, 0.53, 0.53, 0.53, 0.53, 0.53, 0.55, 0.55, 0.55, 0.55, 0.55, 0.59, 0.59, 0.59, 0.59, 0.59, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.3, 0.3, 0.3, 0.32, 0.32, 0.32, 0.32, 0.32, 0.1, 0.1, 0.1, 0.1, 0.1, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.32, 0.32, 0.32, 0.32, 0.32, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 561 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717740021 --> 1717740649
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0]
                    
Loading

@jonabur
Copy link

jonabur commented Jun 14, 2024

Any hope on getting this merged?

convert-hf-to-gguf-update.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@ezosa ezosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed Poro-34B-chat to poro-chat in the relevant files

llama.cpp Outdated Show resolved Hide resolved
@ggerganov ggerganov merged commit 41b9260 into ggerganov:master Jun 14, 2024
56 of 66 checks passed
@ezosa ezosa mentioned this pull request Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants