-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM ERROR: out of memory #368
Comments
Thank you for the report! Could you post how the model was generated and the model config file you used to load it into Triton? |
Possibly related: dmlc/treelite#364. If that is indeed the underlying issue, the |
Hi @wphicks, thanks for your quick response. sorry for the late reply...For model generation and saving.Import required librariesimport numpy import os Generate dummy data to perform binary classificationseed = 7 test_size = 0.33 model = XGBClassifier() y_pred = model.predict(X_test) Create directory to save the modelSave your xgboost model as xgboost.modelFor more information on saving xgboost model check https://xgboost.readthedocs.io/en/latest/python/python_intro.html#trainingModel can also be dumped to json formatmodel.save_model('/opt/tritonserver/notebooks/simple-xgboost/model_repository/fil/1/xgboost.model') triton_process = subprocess.Popen(["tritonserver", "--model-repository=/opt/tritonserver/notebooks/simple-xgboost/model_repository"], stdout=subprocess.PIPE, preexec_fn=os.setsid) --------config-------name: "fil" # Name of the model directory (fil in our case) |
For building the docker image : |
Hmmm... I don't see why that particular model would trigger that Treelite issue, so we may need to dig deeper. Can you try the |
Apologies; I was too hasty when I was thinking about this before. As soon as I saw Can you give us a little more detail on exactly how you got this error? Are there any more details available on the workflow? LLVM should not be involved with Triton at all at the deployment stage. |
Hi @wphicks,
localhost/triton_fil latest 8fdf060142f9 3 weeks ago 12.4 GBafter running the docker image able to access the environment but unable to access Jupyter notebook, so created python script ----------------- sample.py-------------------- import os Generate dummy data to perform binary classificationseed = 7 test_size = 0.33 model = XGBClassifier() y_pred = model.predict(X_test) Create directory to save the modelSave your xgboost model as xgboost.modelFor more information on saving xgboost model check https://xgboost.readthedocs.io/en/latest/python/python_intro.html#trainingModel can also be dumped to json formatmodel.save_model('/opt/tritonserver/notebooks/simple-xgboost/model_repository/fil/1/xgboost.model') triton_process = subprocess.Popen(["tritonserver", "--model-repository=/opt/tritonserver/notebooks/simple-xgboost/model_repository"], stdout=subprocess.PIPE, preexec_fn=os.setsid) name: "fil" # Name of the model directory (fil in our case) Finally while runing the sample .py "LLVM" is appearing cross verified the model and config.pbtxt and structure of the model repo ..... |
Hi @wphicks , |
when i looked into further other backend(pytorch) could be the reason for LLVM issue. However i'm more interested trying out the FI backend and i kept only FIL backend in the triton backend directory, and facing the below error. I1121 10:41:01.087972 1 model_lifecycle.cc:462] loading: fil:1 |
Do we have any specific minimal memory requirement for FIl backend to start?.. Thanks |
@sandeepb2013 Could you try with an officially-released Triton Docker image and enable |
root@lees1:~/work/fil_backend# docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /root/work/fil_backend/models:/models --name tritonserver nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models
|
=========config.pbtxt============ |
root@lees1:~/work/fil_backend# docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /root/work/fil_backend/models:/models --name tritonserver fil_23 tritonserver --model-repository=/models =============================
|
root@2ff024ed2346:/opt/tritonserver/tmp/simple-xgboost# python3 sample.py
Test Accuracy: 51.24
/usr/local/lib/python3.10/dist-packages/xgboost/core.py:160: UserWarning: [09:16:55] WARNING: /workspace/src/c_api/c_api.cc:1240: Saving into deprecated binary model format, please consider using
json
orubj
. Model format will default to JSON in XGBoost 2.2 if not specified.warnings.warn(smsg, UserWarning)
root@2ff024ed2346:/opt/tritonserver/tmp/simple-xgboost# WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
I1030 09:17:00.890915 1358 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I1030 09:17:00.892801 1358 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I1030 09:17:00.893583 1358 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
W1030 09:17:00.895411 1358 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I1030 09:17:00.896514 1358 cuda_memory_manager.cc:117] CUDA memory pool disabled
I1030 09:17:00.933129 1358 model_lifecycle.cc:462] loading: fil:1
I1030 09:17:00.947223 1358 initialize.hpp:43] TRITONBACKEND_Initialize: fil
I1030 09:17:00.948097 1358 backend.hpp:47] Triton TRITONBACKEND API version: 1.15
I1030 09:17:00.948809 1358 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15
I1030 09:17:00.950459 1358 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1)
I1030 09:17:00.988559 1358 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0)
LLVM ERROR: out of memory
The text was updated successfully, but these errors were encountered: