llama-server <embedding> exited with status code -1 #3056

Gnomesenpai · 2024-09-03T19:28:08Z

Describe the bug
llama-server exited with status code -1

Information about your version
Unable to get version as it will not start. Docker image used:

REPOSITORY                        TAG                                IMAGE ID       CREATED         SIZE
tabbyml/tabby                     latest                             bc5a49b31c6f   6 days ago      2.64GB

Information about your GPU

Tue Sep  3 20:27:02 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla P4                       On  |   00000000:13:00.0 Off |                  Off |
| N/A   67C    P0             25W /   75W |     795MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Additional context
I removed my old Tabby setup and pulled the new container and set a new data folder however it fails with the error:

`Starting...2024-09-03T19:25:59.506655Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:98: llama-server <embedding> exited with status code -1, args: `Command { std: "/opt/tabby/bin/llama-server" "-m" "/data/models/TabbyML/Nomic-Embed-Text/ggml/model.gguf" "--cont-batching" "--port" "30888" "-np" "1" "--log-disable" "--ctx-size" "4096" "-ngl" "9999" "--embedding" "--ubatch-size" "4096", kill_on_drop: true }`
`

The text was updated successfully, but these errors were encountered:

wsxiaoys · 2024-09-03T19:29:36Z

Hi - could you also share the command being used to start docker container?

Gnomesenpai · 2024-09-03T19:31:39Z

Sure, its the one in the getting started guide just with a different file path.

version: '3.5'

services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
    volumes:
      - "./data:/data"
    ports:
      - 8080:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

wsxiaoys · 2024-09-03T19:34:44Z

Thanks - could you execute the quoted command in container (e.g docker exec -it ...) and share its output?

/opt/tabby/bin/llama-server -m /data/models/TabbyML/Nomic-Embed-Text/ggml/model.gguf --cont-batching --port 30888 -np 1 --ctx-size 4096 -ngl 9999 --embedding --ubatch-size 4096

Gnomesenpai · 2024-09-03T19:35:46Z

/opt/tabby/bin/llama-server -m /data/models/TabbyML/Nomic-Embed-Text/ggml/model.gguf --cont-batching --port 30888 -np 1 --ctx-size 4096 -ngl 9999 --embedding --ubatch-size 4096

root@f59755a58746:/# /opt/tabby/bin/llama-server -m /data/models/TabbyML/Nomic-Embed-Text/ggml/model.gguf --cont-batching --port 30888 -np 1 --ctx-size 4096 -ngl 9999 --embedding --ubatch-size 4096
Illegal instruction (core dumped)

wsxiaoys · 2024-09-03T19:38:48Z

Could you share the contents of your /proc/cpuinfo? I assume it does not come with avx2 support. If that's the case, this is a known issue, as documented in #2597.

The error logging could use some improvement, though

Gnomesenpai · 2024-09-03T19:40:27Z

Could you share the contents of your /proc/cpuinfo? I assume it does not come with avx2 support. If that's the case, this is a known issue, as documented in #2597.

The error logging could use some improvement, though

That may be it, lack of AVX2, Which version was the last that didn't require it?

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz
stepping        : 4
microcode       : 0x42e
cpu MHz         : 3499.999
cache size      : 15360 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep arat md_clear flush_l1d arch_capabilities
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown retbleed
bogomips        : 6999.99
clflush size    : 64
cache_alignment : 64
address sizes   : 45 bits physical, 48 bits virtual
power management:

wsxiaoys · 2024-09-03T19:46:02Z

I believe we need AVX2 support for all versions following the migration to llama.cpp from ctranslate2 (0.5+)

Gnomesenpai · 2024-09-03T19:48:30Z

It should be possible to fallback to AVX, its in llama.cpp

ggerganov/llama.cpp#1430

wsxiaoys · 2024-09-03T19:50:22Z

The last time I checked, ppl still needed to compile llama.cpp individually for AVX/AVX2. Given that llama.cpp itself still distributes different versions for AVX, I guess that's still the case?

Gnomesenpai · 2024-09-03T19:54:18Z

Looks like they have seperated out NOAVX, AVX and AVX2. would this be possible to impliment as branches? I won't lie its outside my knowledge depth. I just remember running something else on this server before that used llama.cpp but with just AVX.

wsxiaoys · 2024-09-03T19:59:53Z

This PR should do the trick: #3057 - you might want to try running Docker compilation from the branch to build the image.

Gnomesenpai · 2024-09-03T20:05:32Z

This PR should do the trick: #3057 - you might want to try running Docker compilation from the branch to build the image.

So that would be cloning the repo then making the Docker image from the provided docker/Dockerfile.cuda with the PR change done locally?

Gnomesenpai added the bug-unconfirmed label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-server <embedding> exited with status code -1 #3056

llama-server <embedding> exited with status code -1 #3056

Gnomesenpai commented Sep 3, 2024 •

edited

Loading

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024 •

edited

Loading

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024 •

edited

Loading

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024

llama-server <embedding> exited with status code -1 #3056

llama-server <embedding> exited with status code -1 #3056

Comments

Gnomesenpai commented Sep 3, 2024 • edited Loading

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024 • edited Loading

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024 • edited Loading

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024

wsxiaoys commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024

Gnomesenpai commented Sep 3, 2024 •

edited

Loading

Gnomesenpai commented Sep 3, 2024 •

edited

Loading

Gnomesenpai commented Sep 3, 2024 •

edited

Loading