Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load numpy module in a DALI backend #223

Open
mvpel opened this issue Jan 24, 2024 · 3 comments
Open

Unable to load numpy module in a DALI backend #223

mvpel opened this issue Jan 24, 2024 · 3 comments
Assignees

Comments

@mvpel
Copy link

mvpel commented Jan 24, 2024

I'm using a very close approximation of the https://docs.nvidia.com/deeplearning/dali/user-guide/docs/math.html example to try to scale 0-255 RGB image values to 0.0-1.0 floating point numbers, due to the way our inference models were trained.

I tried running it without an "import numpy as np" line at first, which threw a NameError, but when I added that line, I got a "no module named numpy" error as Triton was working to load the model:

I0124 23:43:09.142134 210089 dali_backend.cc:43] TRITONBACKEND_Initialize: dali
I0124 23:43:09.142195 210089 dali_backend.cc:50] Triton TRITONBACKEND API version: 1.10
I0124 23:43:09.142203 210089 dali_backend.cc:54] 'dali' TRITONBACKEND API version: 1.10
I0124 23:43:09.142209 210089 dali_backend.cc:71] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0124 23:43:09.142289 210089 dali_backend.cc:119] TRITONBACKEND_ModelInitialize: image_one255_494x648x3 (version 1)
I0124 23:43:09.142295 210089 dali_backend.cc:131] Repository location: /triton.repos.d/image_one255_494x648x3
I0124 23:43:09.142300 210089 dali_backend.cc:142] backend state is 'backend state'
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "<frozen importlib._bootstrap>", line 553, in module_from_spec
AttributeError: 'NoneType' object has no attribute 'loader'
Traceback (most recent call last):
  File "<string>", line 7, in <module>
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/triton.repos.d/image_one255_494x648x3/1/dali.py", line 1, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'
I0124 23:43:10.164297 210089 dali_backend.cc:170] TRITONBACKEND_ModelFinalize: delete model state
E0124 23:43:10.164338 210089 model_lifecycle.cc:596] failed to load 'image_one255_494x648x3' version 1: Unknown: DALI Backend error: Failed to load model file. The program looked in the following locations: /triton.repos.d/image_one255_494x648x3/1/dali.py, /triton.repos.d/image_one255_494x648x3/1/dali.py. Please make sure that the model exists in any of the locations and is properly serialized or can be properly serialized.

Here's my full pipeline in the dali.py:

import numpy as np
import nvidia.dali as dali
from nvidia.dali.plugin.triton import autoserialize
import nvidia.dali.types as types

@dali.plugin.triton.autoserialize
@dali.pipeline_def(batch_size=256, num_threads=4, device_id=0, output_dtype=types.FLOAT, output_ndim=[3])
def one255_pipe():
    images = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
    images = dali.fn.decoders.image(images, device="cpu")
    images = images / types.Constant(np.float32([255.0, 255.0, 255.0]))
    return images

I'm testing this under Triton v22.08, due to some program software approval requirements here, using the NGC Triton container.

Thanks for any suggestions you can offer!

@mvpel
Copy link
Author

mvpel commented Jan 25, 2024

It occurred to me to try importing sys to check sys.path, and I found:

-------> sys.path is:  ['', '/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python38.zip',
'/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.8', 
'/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.8/lib-dynload', 
'/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.8/site-packages']`

The only references to numpy in these paths was a collection of .h files:

Apptainer> find . -name 'numpy*'
./lib/python3.8/site-packages/nvidia/dali/include/dali/operators/reader/loader/numpy_loader.h
./lib/python3.8/site-packages/nvidia/dali/include/dali/operators/reader/loader/numpy_loader_gpu.h
./lib/python3.8/site-packages/nvidia/dali/include/dali/operators/reader/numpy_reader_gpu_op.h
./lib/python3.8/site-packages/nvidia/dali/include/dali/operators/reader/numpy_reader_op.h
Apptainer>

I threw in a sys.path.append() to add the /usr/local Python installation's path to sys.path, and that seems to have enabled Numpy to load. It threw an error but it appears to have loaded succesfully:

I0125 15:23:32.870758 1421448 dali_backend.cc:119] TRITONBACKEND_ModelInitialize: image_one255_494x648x3 (version 1)
I0125 15:23:32.870836 1421448 dali_backend.cc:131] Repository location: /triton.repos.d/image_one255_494x648x3
I0125 15:23:32.870843 1421448 dali_backend.cc:142] backend state is 'backend state'
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "<frozen importlib._bootstrap>", line 553, in module_from_spec
AttributeError: 'NoneType' object has no attribute 'loader'
I0125 15:23:34.544454 1421448 dali_model.h:175] DALI pipeline from file /triton.repos.d/image_one255_494x648x3/1/dali.py
loaded successfully.

Hopefully it will work as intended in spite of the error. Any idea what might be going on? The message is pretty ambiguous.

With respect to Numpy, did I miss a step somewhere? Maybe I need to add Numpy to the DALI backend virtualenv?

@banasraf
Copy link
Collaborator

banasraf commented Jan 26, 2024

Hey @mvpel
The easiest solution for this case would be not to use numpy at all. You can use the Constant type like that:

images = images / types.Constant(255.0, dtype=types.FLOAT)
# or even better
images = images / 255.

@mvpel
Copy link
Author

mvpel commented Jan 29, 2024

Hey @mvpel The easiest solution for this case would be not to use numpy at all. You can use the Constant type like that:

images = images / types.Constant(255.0, dtype=types.FLOAT)
# or even better
images = images / 255.

Nice, thanks! I'm puzzled that the example in the DALI math user guide didn't take that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants