Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM ERROR: out of memory #368

Open
sandeepb2013 opened this issue Oct 30, 2023 · 15 comments
Open

LLVM ERROR: out of memory #368

sandeepb2013 opened this issue Oct 30, 2023 · 15 comments

Comments

@sandeepb2013
Copy link

root@2ff024ed2346:/opt/tritonserver/tmp/simple-xgboost# python3 sample.py
Test Accuracy: 51.24
/usr/local/lib/python3.10/dist-packages/xgboost/core.py:160: UserWarning: [09:16:55] WARNING: /workspace/src/c_api/c_api.cc:1240: Saving into deprecated binary model format, please consider using json or ubj. Model format will default to JSON in XGBoost 2.2 if not specified.
warnings.warn(smsg, UserWarning)
root@2ff024ed2346:/opt/tritonserver/tmp/simple-xgboost# WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
I1030 09:17:00.890915 1358 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I1030 09:17:00.892801 1358 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I1030 09:17:00.893583 1358 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
W1030 09:17:00.895411 1358 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I1030 09:17:00.896514 1358 cuda_memory_manager.cc:117] CUDA memory pool disabled
I1030 09:17:00.933129 1358 model_lifecycle.cc:462] loading: fil:1
I1030 09:17:00.947223 1358 initialize.hpp:43] TRITONBACKEND_Initialize: fil
I1030 09:17:00.948097 1358 backend.hpp:47] Triton TRITONBACKEND API version: 1.15
I1030 09:17:00.948809 1358 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15
I1030 09:17:00.950459 1358 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1)
I1030 09:17:00.988559 1358 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0)
LLVM ERROR: out of memory

@wphicks
Copy link
Collaborator

wphicks commented Oct 30, 2023

Thank you for the report! Could you post how the model was generated and the model config file you used to load it into Triton?

@wphicks
Copy link
Collaborator

wphicks commented Oct 30, 2023

Possibly related: dmlc/treelite#364. If that is indeed the underlying issue, the use_experimental_optimizations flag may be a workaround for the moment.

@sandeepb2013
Copy link
Author

sandeepb2013 commented Nov 3, 2023

Hi @wphicks, thanks for your quick response. sorry for the late reply...

For model generation and saving.


Import required libraries

import numpy
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

import os
import signal
import subprocess

Generate dummy data to perform binary classification

seed = 7
features = 9 # number of sample features
samples = 10000 # number of samples
X = numpy.random.rand(samples, features).astype('float32')
Y = numpy.random.randint(2, size=samples)

test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

model = XGBClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy: {:.2f}".format(accuracy * 100.0))

Create directory to save the model

Save your xgboost model as xgboost.model

For more information on saving xgboost model check https://xgboost.readthedocs.io/en/latest/python/python_intro.html#training

Model can also be dumped to json format

model.save_model('/opt/tritonserver/notebooks/simple-xgboost/model_repository/fil/1/xgboost.model')

triton_process = subprocess.Popen(["tritonserver", "--model-repository=/opt/tritonserver/notebooks/simple-xgboost/model_repository"], stdout=subprocess.PIPE, preexec_fn=os.setsid)

--------config-------

name: "fil" # Name of the model directory (fil in our case)
backend: "fil" # Triton FIL backend for deploying forest models
max_batch_size: 8192
input [
{
name: "input__0"
data_type: TYPE_FP32
dims: [ 9 ] # Input feature dimensions, in our sample case it's 9
}
]
output [
{
name: "output__0"
data_type: TYPE_FP32
dims: [ 1 ] # Output 2 for binary classification model
}
]
instance_group [{ kind: KIND_CPU }]
parameters [
{
key: "model_type"
value: { string_value: "xgboost" }
},
{
key: "predict_proba"
value: { string_value: "false" }
},
{
key: "output_class"
value: { string_value: "true" }
},
{
key: "threshold"
value: { string_value: "0.5" }
},
{
key: "algo"
value: { string_value: "ALGO_AUTO" }
},
{
key: "storage_type"
value: { string_value: "AUTO" }
},
{
key: "blocks_per_sm"
value: { string_value: "0" }
}
]

@sandeepb2013
Copy link
Author

@wphicks
Copy link
Collaborator

wphicks commented Nov 3, 2023

Hmmm... I don't see why that particular model would trigger that Treelite issue, so we may need to dig deeper. Can you try the use_experimental_optimizations flag and let me know if you can successfully run the model with that flag?

@wphicks
Copy link
Collaborator

wphicks commented Nov 4, 2023

Apologies; I was too hasty when I was thinking about this before. As soon as I saw LLVM, I was thinking about Treelite compiled models, but the FIL backend does not and has never invoked Treelite compiled models. CPU execution is performed through GTIL or our internal optimized CPU implementation.

Can you give us a little more detail on exactly how you got this error? Are there any more details available on the workflow? LLVM should not be involved with Triton at all at the deployment stage.

@sandeepb2013
Copy link
Author

Hi @wphicks,

  Using Build script (https://github.com/triton-inference-server/fil_backend/blob/main/docs/build.md) able to built 2 docker images 
  ----------------------------------------
  REPOSITORY            TAG         IMAGE ID      CREATED      SIZE

localhost/triton_fil latest 8fdf060142f9 3 weeks ago 12.4 GB

after running the docker image able to access the environment but unable to access Jupyter notebook, so created python script

----------------- sample.py--------------------
import numpy
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

import os
import signal
import subprocess

Generate dummy data to perform binary classification

seed = 7
features = 9 # number of sample features
samples = 10000 # number of samples
X = numpy.random.rand(samples, features).astype('float32')
Y = numpy.random.randint(2, size=samples)

test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

model = XGBClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy: {:.2f}".format(accuracy * 100.0))

Create directory to save the model

Save your xgboost model as xgboost.model

For more information on saving xgboost model check https://xgboost.readthedocs.io/en/latest/python/python_intro.html#training

Model can also be dumped to json format

model.save_model('/opt/tritonserver/notebooks/simple-xgboost/model_repository/fil/1/xgboost.model')

triton_process = subprocess.Popen(["tritonserver", "--model-repository=/opt/tritonserver/notebooks/simple-xgboost/model_repository"], stdout=subprocess.PIPE, preexec_fn=os.setsid)
-------------------------config.pbtxt---------------

name: "fil" # Name of the model directory (fil in our case)
backend: "fil" # Triton FIL backend for deploying forest models
max_batch_size: 8192
input [
{
name: "input__0"
data_type: TYPE_FP32
dims: [ 9 ] # Input feature dimensions, in our sample case it's 9
}
]
output [
{
name: "output__0"
data_type: TYPE_FP32
dims: [ 1 ] # Output 2 for binary classification model
}
]
instance_group [{ kind: KIND_CPU }]
parameters [
{
key: "model_type"
value: { string_value: "xgboost" }
},
{
key: "predict_proba"
value: { string_value: "false" }
},
{
key: "output_class"
value: { string_value: "true" }
},
{
key: "threshold"
value: { string_value: "0.5" }
},
{
key: "algo"
value: { string_value: "ALGO_AUTO" }
},
{
key: "storage_type"
value: { string_value: "AUTO" }
},
{
key: "blocks_per_sm"
value: { string_value: "0" }
}
]


Finally while runing the sample .py "LLVM" is appearing

cross verified the model and config.pbtxt and structure of the model repo .....

@sandeepb2013
Copy link
Author

Screenshot 2023-11-07 at 2 11 34 PM

@sandeepb2013
Copy link
Author

Hi @wphicks ,
any further pointers would really help. thanks in advance..

@sandeepb2013
Copy link
Author

when i looked into further other backend(pytorch) could be the reason for LLVM issue. However i'm more interested trying out the FI backend and i kept only FIL backend in the triton backend directory, and facing the below error.

I1121 10:41:01.087972 1 model_lifecycle.cc:462] loading: fil:1
I1121 10:41:01.088345 1 backend_model.cc:364] Adding default backend config setting: default-max-batch-size,4
I1121 10:41:01.088435 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/fil/libtriton_fil.so
I1121 10:41:01.092161 1 initialize.hpp:43] TRITONBACKEND_Initialize: fil
I1121 10:41:01.092195 1 backend.hpp:47] Triton TRITONBACKEND API version: 1.15
I1121 10:41:01.092203 1 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15
I1121 10:41:01.092240 1 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1)
I1121 10:41:01.093017 1 model_config_utils.cc:1872] ModelConfig 64-bit fields:
I1121 10:41:01.093053 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::default_priority_level
I1121 10:41:01.093061 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I1121 10:41:01.093068 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::max_queue_delay_microseconds
I1121 10:41:01.093074 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::priority_levels
I1121 10:41:01.093081 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::priority_queue_policy::key
I1121 10:41:01.093088 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I1121 10:41:01.093095 1 model_config_utils.cc:1874] ModelConfig::ensemble_scheduling::step::model_version
I1121 10:41:01.093102 1 model_config_utils.cc:1874] ModelConfig::input::dims
I1121 10:41:01.093110 1 model_config_utils.cc:1874] ModelConfig::input::reshape::shape
I1121 10:41:01.093117 1 model_config_utils.cc:1874] ModelConfig::instance_group::secondary_devices::device_id
I1121 10:41:01.093123 1 model_config_utils.cc:1874] ModelConfig::model_warmup::inputs::value::dims
I1121 10:41:01.093130 1 model_config_utils.cc:1874] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I1121 10:41:01.093138 1 model_config_utils.cc:1874] ModelConfig::optimization::cuda::graph_spec::input::value::dim
I1121 10:41:01.093145 1 model_config_utils.cc:1874] ModelConfig::output::dims
I1121 10:41:01.093152 1 model_config_utils.cc:1874] ModelConfig::output::reshape::shape
I1121 10:41:01.093159 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I1121 10:41:01.093166 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::max_sequence_idle_microseconds
I1121 10:41:01.093173 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I1121 10:41:01.093234 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::state::dims
I1121 10:41:01.093244 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::state::initial_state::dims
I1121 10:41:01.093252 1 model_config_utils.cc:1874] ModelConfig::version_policy::specific::versions
I1121 10:41:01.094102 1 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0)
I1121 10:41:01.094137 1 backend_model_instance.cc:69] Creating instance fil_0_0 on CPU using artifact ''
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

@sandeepb2013
Copy link
Author

sandeepb2013 commented Nov 21, 2023

Do we have any specific minimal memory requirement for FIl backend to start?.. Thanks

@wphicks
Copy link
Collaborator

wphicks commented Nov 27, 2023

@sandeepb2013 Could you try with an officially-released Triton Docker image and enable use_experimental_optimizations in your config.pbtxt? The memory requirements should be quite modest, though they'll depend on the details of the model. If you still run into issues, can you see how far you get running either the fraud detection or FAQ notebook before a cell fails?

@sandeepb2013
Copy link
Author

sandeepb2013 commented Nov 28, 2023

root@lees1:~/work/fil_backend# docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /root/work/fil_backend/models:/models --name tritonserver nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models
"""

== Triton Inference Server ==

NVIDIA Release 23.08 (build 66820947)
Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .

WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
I1128 08:57:49.478413 1 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I1128 08:57:49.478480 1 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I1128 08:57:49.478494 1 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
W1128 08:57:49.478588 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I1128 08:57:49.478721 1 cuda_memory_manager.cc:117] CUDA memory pool disabled
I1128 08:57:49.481254 1 model_lifecycle.cc:462] loading: fil:1
I1128 08:57:49.490362 1 initialize.hpp:43] TRITONBACKEND_Initialize: fil
I1128 08:57:49.490404 1 backend.hpp:47] Triton TRITONBACKEND API version: 1.15
I1128 08:57:49.490413 1 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15
I1128 08:57:49.490465 1 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1)
I1128 08:57:49.492124 1 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0)
LLVM ERROR: out of memory
[4f6fbf8992de:1 :0:54] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid: 54) ====
0 0x0000000000042520 __sigaction() ???:0
1 0x0000000000028898 abort() ???:0
2 0x000000000261cdab getInferLibVersion() ???:0
3 0x00000000000ae9a3 operator new() ???:0
4 0x0000000000219479 std::vector<char, std::allocator >::_M_default_append() ???:0
5 0x0000000000212e3a (anonymous namespace)::XGBTree::Load() xgboost.cc:0
6 0x0000000000213fe4 (anonymous namespace)::ParseStream() xgboost.cc:0
7 0x0000000000215d5a treelite::frontend::LoadXGBoostModel() ???:0
8 0x00000000001656a5 triton::backend::fil::load_tl_base_model() ???:0
9 0x00000000001deaad triton::backend::fil::RapidsModel::load() ???:0
10 0x00000000001e0860 triton::backend::rapids::triton_api::instance_initialize<triton::backend::rapids::TritonModelStatetriton::backend::fil::RapidsSharedState, triton::backend::rapids::ModelInstanceState<triton::backend::fil::RapidsModel, triton::backend::fil::RapidsSharedState> >() ???:0
11 0x00000000001a0116 triton::core::TritonModelInstance::ConstructAndInitializeInstance() :0
12 0x00000000001a1356 triton::core::TritonModelInstance::CreateInstance() :0
13 0x0000000000185bd5 triton::core::TritonModel::PrepareInstances(inference::ModelConfig const&, std::vector<std::shared_ptrtriton::core::TritonModelInstance, std::allocator<std::shared_ptrtriton::core::TritonModelInstance > >, std::vector<std::shared_ptrtriton::core::TritonModelInstance, std::allocator<std::shared_ptrtriton::core::TritonModelInstance > >)::{lambda()#1}::operator()() backend_model.cc:0
14 0x0000000000186216 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Resulttriton::core::Status, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<triton::core::TritonModel::PrepareInstances(inference::ModelConfig const&, std::vector<std::shared_ptrtriton::core::TritonModelInstance, std::allocator<std::shared_ptrtriton::core::TritonModelInstance > >, std::vector<std::shared_ptrtriton::core::TritonModelInstance, std::allocator<std::shared_ptrtriton::core::TritonModelInstance > >)::{lambda()#1}> >, triton::core::Status> >::_M_invoke() backend_model.cc:0
15 0x000000000019131d std::__future_base::_State_baseV2::_M_do_set() :0
16 0x0000000000099f68 pthread_mutexattr_setkind_np() ???:0
17 0x000000000017dadb std::__future_base::_Deferred_state<std::thread::_Invoker<std::tuple<triton::core::TritonModel::PrepareInstances(inference::ModelConfig const&, std::vector<std::shared_ptrtriton::core::TritonModelInstance, std::allocator<std::shared_ptrtriton::core::TritonModelInstance > >, std::vector<std::shared_ptrtriton::core::TritonModelInstance, std::allocator<std::shared_ptrtriton::core::TritonModelInstance > >)::{lambda()#1}> >, triton::core::Status>::_M_complete_async() backend_model.cc:0
18 0x000000000018b865 triton::core::TritonModel::PrepareInstances() :0
19 0x0000000000190682 triton::core::TritonModel::Create() :0
20 0x0000000000273230 triton::core::ModelLifeCycle::CreateModel() :0
21 0x0000000000276923 std::_Function_handler<void (), triton::core::ModelLifeCycle::AsyncLoad(triton::core::ModelIdentifier const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, inference::ModelConfig const&, bool, bool, std::shared_ptrtriton::core::TritonRepoAgentModelList const&, std::function<void (triton::core::Status)>&&)::{lambda()#2}>::_M_invoke() model_lifecycle.cc:0
22 0x00000000003bfe52 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::common::ThreadPool::ThreadPool(unsigned long)::{lambda()#1}> > >::_M_run() thread_pool.cc:0
23 0x00000000000dc253 std::error_code::default_error_condition() ???:0
24 0x0000000000094b43 pthread_condattr_setpshared() ???:0
25 0x0000000000125bb4 clone() ???:0

"""

@sandeepb2013
Copy link
Author

=========config.pbtxt============
name: "fil" # Name of the model directory (fil in our case)
backend: "fil" # Triton FIL backend for deploying forest models
max_batch_size: 8192
input [
{
name: "input__0"
data_type: TYPE_FP32
dims: [ 9 ] # Input feature dimensions, in our sample case it's 9
}
]
output [
{
name: "output__0"
data_type: TYPE_FP32
dims: [ 1 ] # Output 2 for binary classification model
}
]
instance_group [{ kind: KIND_AUTO }]
parameters [
{
key: "model_type"
value: { string_value: "xgboost" }
},
{
key: "predict_proba"
value: { string_value: "true" }
},
{
key: "output_class"
value: { string_value: "true" }
},
{
key: "threshold"
value: { string_value: "0.5" }
},
{
key: "algo"
value: { string_value: "ALGO_AUTO" }
},
{
key: "storage_type"
value: { string_value: "AUTO" }
},
{
key: "blocks_per_sm"
value: { string_value: "0" }
},
{
key: "use_experimental_optimizations"
value: { string_value: "true" }
}
]

@sandeepb2013
Copy link
Author

root@lees1:~/work/fil_backend# docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /root/work/fil_backend/models:/models --name tritonserver fil_23 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==

NVIDIA Release 23.08 (build 66820947)
Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .

W1129 06:31:14.927633 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I1129 06:31:14.927720 1 cuda_memory_manager.cc:117] CUDA memory pool disabled
I1129 06:31:14.929950 1 model_lifecycle.cc:462] loading: fil:1
I1129 06:31:14.938431 1 initialize.hpp:43] TRITONBACKEND_Initialize: fil
I1129 06:31:14.938468 1 backend.hpp:47] Triton TRITONBACKEND API version: 1.15
I1129 06:31:14.938477 1 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15
I1129 06:31:14.938513 1 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1)
I1129 06:31:14.940011 1 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0)
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants