Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama-server <embedding> exited with status code -1 #3056

Open
Gnomesenpai opened this issue Sep 3, 2024 · 12 comments
Open

llama-server <embedding> exited with status code -1 #3056

Gnomesenpai opened this issue Sep 3, 2024 · 12 comments

Comments

@Gnomesenpai
Copy link

Gnomesenpai commented Sep 3, 2024

Describe the bug
llama-server exited with status code -1

Information about your version
Unable to get version as it will not start. Docker image used:

REPOSITORY                        TAG                                IMAGE ID       CREATED         SIZE
tabbyml/tabby                     latest                             bc5a49b31c6f   6 days ago      2.64GB

Information about your GPU

Tue Sep  3 20:27:02 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla P4                       On  |   00000000:13:00.0 Off |                  Off |
| N/A   67C    P0             25W /   75W |     795MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Additional context
I removed my old Tabby setup and pulled the new container and set a new data folder however it fails with the error:

`Starting...2024-09-03T19:25:59.506655Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:98: llama-server <embedding> exited with status code -1, args: `Command { std: "/opt/tabby/bin/llama-server" "-m" "/data/models/TabbyML/Nomic-Embed-Text/ggml/model.gguf" "--cont-batching" "--port" "30888" "-np" "1" "--log-disable" "--ctx-size" "4096" "-ngl" "9999" "--embedding" "--ubatch-size" "4096", kill_on_drop: true }`
`
@wsxiaoys
Copy link
Member

wsxiaoys commented Sep 3, 2024

Hi - could you also share the command being used to start docker container?

@Gnomesenpai
Copy link
Author

Gnomesenpai commented Sep 3, 2024

Sure, its the one in the getting started guide just with a different file path.

version: '3.5'

services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
    volumes:
      - "./data:/data"
    ports:
      - 8080:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

@wsxiaoys
Copy link
Member

wsxiaoys commented Sep 3, 2024

Thanks - could you execute the quoted command in container (e.g docker exec -it ...) and share its output?

/opt/tabby/bin/llama-server -m /data/models/TabbyML/Nomic-Embed-Text/ggml/model.gguf --cont-batching --port 30888 -np 1 --ctx-size 4096 -ngl 9999 --embedding --ubatch-size 4096

@Gnomesenpai
Copy link
Author

/opt/tabby/bin/llama-server -m /data/models/TabbyML/Nomic-Embed-Text/ggml/model.gguf --cont-batching --port 30888 -np 1 --ctx-size 4096 -ngl 9999 --embedding --ubatch-size 4096

root@f59755a58746:/# /opt/tabby/bin/llama-server -m /data/models/TabbyML/Nomic-Embed-Text/ggml/model.gguf --cont-batching --port 30888 -np 1 --ctx-size 4096 -ngl 9999 --embedding --ubatch-size 4096
Illegal instruction (core dumped)

@wsxiaoys
Copy link
Member

wsxiaoys commented Sep 3, 2024

Could you share the contents of your /proc/cpuinfo? I assume it does not come with avx2 support. If that's the case, this is a known issue, as documented in #2597.

The error logging could use some improvement, though

@Gnomesenpai
Copy link
Author

Gnomesenpai commented Sep 3, 2024

Could you share the contents of your /proc/cpuinfo? I assume it does not come with avx2 support. If that's the case, this is a known issue, as documented in #2597.

The error logging could use some improvement, though

That may be it, lack of AVX2, Which version was the last that didn't require it?

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz
stepping        : 4
microcode       : 0x42e
cpu MHz         : 3499.999
cache size      : 15360 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep arat md_clear flush_l1d arch_capabilities
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown retbleed
bogomips        : 6999.99
clflush size    : 64
cache_alignment : 64
address sizes   : 45 bits physical, 48 bits virtual
power management:

@wsxiaoys
Copy link
Member

wsxiaoys commented Sep 3, 2024

I believe we need AVX2 support for all versions following the migration to llama.cpp from ctranslate2 (0.5+)

@Gnomesenpai
Copy link
Author

It should be possible to fallback to AVX, its in llama.cpp

ggerganov/llama.cpp#1430

@wsxiaoys
Copy link
Member

wsxiaoys commented Sep 3, 2024

The last time I checked, ppl still needed to compile llama.cpp individually for AVX/AVX2. Given that llama.cpp itself still distributes different versions for AVX, I guess that's still the case?

@Gnomesenpai
Copy link
Author

Looks like they have seperated out NOAVX, AVX and AVX2. would this be possible to impliment as branches? I won't lie its outside my knowledge depth. I just remember running something else on this server before that used llama.cpp but with just AVX.

@wsxiaoys
Copy link
Member

wsxiaoys commented Sep 3, 2024

This PR should do the trick: #3057 - you might want to try running Docker compilation from the branch to build the image.

@Gnomesenpai
Copy link
Author

This PR should do the trick: #3057 - you might want to try running Docker compilation from the branch to build the image.

So that would be cloning the repo then making the Docker image from the provided docker/Dockerfile.cuda with the PR change done locally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants