Clarification on GPU Setup? #975
Replies: 3 comments 1 reply
-
after digging around some more, i think i found the problem, after running the python in venv, then import torch and checking it's version, it seems somehow the cpu only version of pytorch is installed, maybe khoj only installed the cpu version of pytorch?:
imma try uninstalling and reinstalling the gpu version of pytorch to see perhaps that fixes the issue edit: FAILEDeven after reinstalling the GPU version of pytorch and ensuring that it's working with cuda:
khoj is still loading the model in the RAM and running on CPU maybe perhaps this initial error in the initial configuration is causing problems?:
although the above error doesn't always happen, i reinstalled khoj and the error didn't occured that time:
khoj still runs whether that error occurs or not . edit 2: SOME ACTIVITY?i decided to take a closer look again today and noticed something, when running khoj, although it neither loads the model into VRAM, nor runs it on the GPU...
. just to confirm it's not just a fluke or something, i uninstalled torch and let khoj install pytorch and it installed the CPU only version of it (2.2.2+cpu), and that did not load anything on the GPU, whereas manually installing CUDA compatible pytorch version (2.2.2+cu121) resulted in khoj loading something in the GPU memory (of size 361 MiB) |
Beta Was this translation helpful? Give feedback.
-
Hey @STORMFIRE007 , a few follow-up questions. Thanks for the detailed investigation!
The teeny spike in GPU after installing the correct |
Beta Was this translation helpful? Give feedback.
-
I think the issue you're having might stem from several possible misconfigurations. Let's troubleshoot step by step to make sure your RTX 2060 is used to run the model with CUDA support.
GPTQ (quantized models) or AWQ (Accelerated Weights Quantization) from Hugging Face or similar sources. huggingface-cli login Replace with your specific model and revision.transformers-cli repo download facebook/opt-125m --revision=gptq
2.Build with CUDA support: 3.Reinstall the model/tool using the GPU-optimized GGML library. I this here you resolve your problem if not let's just connect and after I will resolve your problem with by myself. |
Beta Was this translation helpful? Give feedback.
-
hi everyone,
I’m having trouble getting the LLM model to run on my GPU (RTX 2060) despite following the setup guide for CUDA support. I set the environment variable as instructed:
However, the model still seems to run on the CPU, loading into RAM instead of VRAM. It also defaults to downloading GGUF models, which I believe can only run on CPUs?
I’m wondering if I’m missing something in the configuration, or if there’s a need to manually specify different model types (like GPTQ or AWQ from Hugging Face) to enable GPU usage? Is there a specific step I should follow to ensure the model runs on my RTX 2060 with CUDA support?
cuda seems to be installed and gpu is detected as shown by nvcc and nvidia-smi output:
Beta Was this translation helpful? Give feedback.
All reactions