-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opencv: misc cleanups; fix CUDA build #339619
Conversation
6132dac
to
26c8023
Compare
26c8023
to
1da478e
Compare
Nixpkgs Default Config (
|
Whaaaaaat is that really a thing? |
It’s a thing but apparently it doesn’t really show up in perf measurements at all. |
So the fix is to both pin cuda 12.3 and disable lto? |
Correct. As usual, not something I set out to do, but discovered because I ran into a build failure and then tried to reproduce on master... and it failed there, too. |
Yes, although truthfully you'd only see performance differences in super-frequently evaluated code (like I personally do it because IMHO it's cleaner than repeatedly reaching into |
@SomeoneSerge OpenCV’s CMake configuration files require consumers to use the exact same version of CUDA packages. Is that something I should patch out, or should I make consumers use the same version? I’m more a fan of patching it out given you could have multiple packages in a closure, each of which tries to mandate a global CUDA version. Thoughts? |
I wasn't aware of this behaviour. This would've been maybe OK if opencv was more flexible about which versions opencv itself can be built against? But if we are forced to use an older cuda for opencv, and that forces us to use an older cuda for everything else, that's clearly wrong |
To clarify, "this" above refers to patching out OpenCV's CMake requirement for CUDA version, correct? As in, it would be wrong to patch out the requirement because we should respect that it wants us to use a particular CUDA version for everything? EDIT: I've got it in my TODOs to verify that we don't run into diamond-dependency issues where we have multiple versions of CUDA libraries in scope and packages load them arbitrarily because they're all in the same namespace. If you have any tips for testing that, I'd appreciate it -- I don't have any OpenCV code on hand I could think of to use to test that. |
I'm converting this to a draft until I've done some testing on the packages in the original post to see what happens when we have multiple versions of CUDA libraries in the path. |
No, it refers to opencv enforcing that consumers use the same cuda version. It would've been ok of them to enforce that (we'd just compute a fixpoint) but they additionally enforce an upperbound on the cuda version which limits all downstream packages |
- Removed cuda support, currently broken NixOS/nixpkgs#339619 - Remove prismlauncher override since PR merged - Remove nose3 and replaced with nose
1da478e
to
da7281a
Compare
NOTE: This is with OpenCV's default CMake configuration, which causes build errors for downstream CMake projects relying on OpenCV and Result of 341 packages failed to build:
318 packages built:
|
da7281a
to
20f3ec9
Compare
This is with the "fix" of patching out OpenCV's CMake configuration requirement that the CUDA version match exactly. Result of 68 packages marked as broken and skipped:
134 packages failed to build:
518 packages built:
|
@SomeoneSerge both of the main tests work: $ nix run --impure .#cudaPackages.tests.test-opencv-with-default-cuda-then-torch-with-default-cuda
OpenCV version: 4.9.0
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 1
Device 0: "NVIDIA GeForce RTX 4090"
CUDA Driver Version / Runtime Version 12.50 / 12.30
CUDA Capability Major/Minor version number: 8.9
Total amount of global memory: 24118 MBytes (25289621504 bytes)
GPU Clock Speed: 2.61 GHz
Max Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
Max Layered Texture Size (dim) x layers 1D=(32768) x 2048, 2D=(32768,32768) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.50, CUDA Runtime Version = 12.30, NumDevs = 1
OpenCV CUDA device: None
General configuration for OpenCV 4.9.0 =====================================
Version control: unknown
Extra modules:
Location (extra): /build/source/opencv_contrib
Version control (extra): unknown
Platform:
Timestamp: 1980-01-01T00:00:00Z
Host: Linux 6.8.12 x86_64
CMake: 3.29.6
CMake generator: Unix Makefiles
CMake build tool: /nix/store/axrdky652lsmif6m5i8b55q91v4ly4gy-gnumake-4.4.1/bin/make
Configuration: Release
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (18 files): + SSSE3 SSE4_1
SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (9 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (38 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: /nix/store/68chgznhnw6hf3wb98nnfkzsl4q8ws5g-gcc-wrapper-12.4.0/bin/g++ (ver 12.4.0)
C++ flags (Release): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -g -O0 -DDEBUG -D_DEBUG
C Compiler: /nix/store/68chgznhnw6hf3wb98nnfkzsl4q8ws5g-gcc-wrapper-12.4.0/bin/gcc
C flags (Release): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fopenmp -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fopenmp -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release): -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
Linker flags (Debug): -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
ccache: NO
Precompiled headers: NO
Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cufft -L/nix/store/493nswiflls1r5ckjwrwi14kgxfb38gl-cuda_cudart-12.3.52-static/lib -L/nix/store/3dyw8dzj9ab4m8hv5dpyx7zii8d0w6fi-glibc-2.39-52/lib -L/nix/store/iqphxa64b9ar5fgxyslx157syq9l024y-libnpp-12.2.2.32-lib/lib -L/nix/store/skrc7rj8gb7qm0a31qa8iins3vl9dzk9-libcublas-12.3.2.9-lib/lib -L/nix/store/0z2kzvga150nwza0v0r779rskj8p94k9-libcufft-11.0.11.19-lib/lib
3rdparty dependencies:
OpenCV modules:
To be built: alphamat aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
Disabled: world
Disabled by dependency: -
Unavailable: cannops cvv freetype java julia matlab ovis python2 sfm viz
Applications: tests apps
Documentation: NO
Non-free algorithms: NO
GUI: NONE
GTK+: NO
VTK support: NO
Media I/O:
ZLib: /nix/store/rqs1zrcncqz3966khjndg1183cpdnqxs-zlib-1.3.1/lib/libz.so (ver 1.3.1)
JPEG: /nix/store/56y3hxpkl382s34i0p8hka3dg4na1vkp-libjpeg-turbo-3.0.3/lib/libjpeg.so (ver 62)
WEBP: /nix/store/nj3grvd1g7njfzxzmw07ca17sbi354yy-libwebp-1.4.0/lib/libwebp.so (ver encoder: 0x020f)
PNG: /nix/store/mp79jmmfs2bfjmnac72c1kxn7im1px38-libpng-apng-1.6.43/lib/libpng.so (ver 1.6.43)
TIFF: /nix/store/x0nvqydsb8p48k7kz0qpgd0qi4amzjpv-libtiff-4.6.0/lib/libtiff.so (ver 42 / 4.6.0)
JPEG 2000: OpenJPEG (ver 2.5.2)
OpenEXR: IlmImf-2_5 Imath-2_5 Half-2_5 Iex-2_5 IexMath-2_5 IlmThread-2_5 (ver 2.5.10)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
DC1394: NO
FFMPEG: YES
avcodec: YES (60.31.102)
avformat: YES (60.16.100)
avutil: YES (58.29.100)
swscale: YES (7.5.100)
avresample: NO
GStreamer: YES (1.24.3)
v4l/v4l2: YES (linux/videodev2.h)
Parallel framework: OpenMP
Trace: YES (with Intel ITT)
Other third-party libraries:
VA: YES
Lapack: YES (/nix/store/7dnr43zwh6mr4pd0qslfrdqq9myd1ffv-openblas-0.3.28/lib/libopenblas.so)
Eigen: YES (ver 3.4.0)
Custom HAL: NO
Protobuf: /nix/store/4p7dyslavx8yllsr7k15q9vmdzi14fmk-protobuf-21.12/lib/libprotobuf.so.3.21.12.0 (3.21.12.0)
Flatbuffers: builtin/3rdparty (23.5.9)
NVIDIA CUDA: YES (ver 12.3, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch: 89
NVIDIA PTX archs: 89
OpenCL: YES (INTELVA)
Include path: /build/source/3rdparty/include/opencl/1.2
Link libraries: /nix/store/d6bvbwydh6fbgmvab8bpbjrg1zdcavik-ocl-icd-2.3.2/lib/libOpenCL.so
Python 3:
Interpreter: /nix/store/h3i0acpmr8mrjx07519xxmidv8mpax4y-python3-3.12.5/bin/python3 (ver 3.12.5)
Libraries: /nix/store/h3i0acpmr8mrjx07519xxmidv8mpax4y-python3-3.12.5/lib/libpython3.12.so (ver 3.12.5)
numpy: /nix/store/fvxlmgcjanv8j0qxzxxgxigq1344zn39-python3.12-numpy-1.26.4/lib/python3.12/site-packages/numpy/core/include (ver 1.26.4)
install path: lib/python3.12/site-packages
Python (for build): /nix/store/h3i0acpmr8mrjx07519xxmidv8mpax4y-python3-3.12.5/bin/python3
Java:
ant: NO
Java: NO
JNI: NO
Java wrappers: NO
Java tests: NO
Install to: /nix/store/2sblq3w11nrj0sncw506x78wcnl450id-opencv-4.9.0
-----------------------------------------------------------------
Torch version: 2.4.0
Torch CUDA device: _CudaDeviceProperties(name='NVIDIA GeForce RTX 4090', major=8, minor=9, total_memory=24118MB, multi_processor_count=128) $ nix run --impure .#cudaPackages.tests.test-torch-with-default-cuda-then-opencv-with-default-cuda
Torch version: 2.4.0
Torch CUDA device: _CudaDeviceProperties(name='NVIDIA GeForce RTX 4090', major=8, minor=9, total_memory=24118MB, multi_processor_count=128)
OpenCV version: 4.9.0
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 1
Device 0: "NVIDIA GeForce RTX 4090"
CUDA Driver Version / Runtime Version 12.50 / 12.30
CUDA Capability Major/Minor version number: 8.9
Total amount of global memory: 24118 MBytes (25289621504 bytes)
GPU Clock Speed: 2.61 GHz
Max Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
Max Layered Texture Size (dim) x layers 1D=(32768) x 2048, 2D=(32768,32768) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.50, CUDA Runtime Version = 12.30, NumDevs = 1
OpenCV CUDA device: None
General configuration for OpenCV 4.9.0 =====================================
Version control: unknown
Extra modules:
Location (extra): /build/source/opencv_contrib
Version control (extra): unknown
Platform:
Timestamp: 1980-01-01T00:00:00Z
Host: Linux 6.8.12 x86_64
CMake: 3.29.6
CMake generator: Unix Makefiles
CMake build tool: /nix/store/axrdky652lsmif6m5i8b55q91v4ly4gy-gnumake-4.4.1/bin/make
Configuration: Release
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (18 files): + SSSE3 SSE4_1
SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (9 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (38 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: /nix/store/68chgznhnw6hf3wb98nnfkzsl4q8ws5g-gcc-wrapper-12.4.0/bin/g++ (ver 12.4.0)
C++ flags (Release): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -g -O0 -DDEBUG -D_DEBUG
C Compiler: /nix/store/68chgznhnw6hf3wb98nnfkzsl4q8ws5g-gcc-wrapper-12.4.0/bin/gcc
C flags (Release): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fopenmp -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fopenmp -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release): -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
Linker flags (Debug): -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
ccache: NO
Precompiled headers: NO
Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cufft -L/nix/store/493nswiflls1r5ckjwrwi14kgxfb38gl-cuda_cudart-12.3.52-static/lib -L/nix/store/3dyw8dzj9ab4m8hv5dpyx7zii8d0w6fi-glibc-2.39-52/lib -L/nix/store/iqphxa64b9ar5fgxyslx157syq9l024y-libnpp-12.2.2.32-lib/lib -L/nix/store/skrc7rj8gb7qm0a31qa8iins3vl9dzk9-libcublas-12.3.2.9-lib/lib -L/nix/store/0z2kzvga150nwza0v0r779rskj8p94k9-libcufft-11.0.11.19-lib/lib
3rdparty dependencies:
OpenCV modules:
To be built: alphamat aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
Disabled: world
Disabled by dependency: -
Unavailable: cannops cvv freetype java julia matlab ovis python2 sfm viz
Applications: tests apps
Documentation: NO
Non-free algorithms: NO
GUI: NONE
GTK+: NO
VTK support: NO
Media I/O:
ZLib: /nix/store/rqs1zrcncqz3966khjndg1183cpdnqxs-zlib-1.3.1/lib/libz.so (ver 1.3.1)
JPEG: /nix/store/56y3hxpkl382s34i0p8hka3dg4na1vkp-libjpeg-turbo-3.0.3/lib/libjpeg.so (ver 62)
WEBP: /nix/store/nj3grvd1g7njfzxzmw07ca17sbi354yy-libwebp-1.4.0/lib/libwebp.so (ver encoder: 0x020f)
PNG: /nix/store/mp79jmmfs2bfjmnac72c1kxn7im1px38-libpng-apng-1.6.43/lib/libpng.so (ver 1.6.43)
TIFF: /nix/store/x0nvqydsb8p48k7kz0qpgd0qi4amzjpv-libtiff-4.6.0/lib/libtiff.so (ver 42 / 4.6.0)
JPEG 2000: OpenJPEG (ver 2.5.2)
OpenEXR: IlmImf-2_5 Imath-2_5 Half-2_5 Iex-2_5 IexMath-2_5 IlmThread-2_5 (ver 2.5.10)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
DC1394: NO
FFMPEG: YES
avcodec: YES (60.31.102)
avformat: YES (60.16.100)
avutil: YES (58.29.100)
swscale: YES (7.5.100)
avresample: NO
GStreamer: YES (1.24.3)
v4l/v4l2: YES (linux/videodev2.h)
Parallel framework: OpenMP
Trace: YES (with Intel ITT)
Other third-party libraries:
VA: YES
Lapack: YES (/nix/store/7dnr43zwh6mr4pd0qslfrdqq9myd1ffv-openblas-0.3.28/lib/libopenblas.so)
Eigen: YES (ver 3.4.0)
Custom HAL: NO
Protobuf: /nix/store/4p7dyslavx8yllsr7k15q9vmdzi14fmk-protobuf-21.12/lib/libprotobuf.so.3.21.12.0 (3.21.12.0)
Flatbuffers: builtin/3rdparty (23.5.9)
NVIDIA CUDA: YES (ver 12.3, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch: 89
NVIDIA PTX archs: 89
OpenCL: YES (INTELVA)
Include path: /build/source/3rdparty/include/opencl/1.2
Link libraries: /nix/store/d6bvbwydh6fbgmvab8bpbjrg1zdcavik-ocl-icd-2.3.2/lib/libOpenCL.so
Python 3:
Interpreter: /nix/store/h3i0acpmr8mrjx07519xxmidv8mpax4y-python3-3.12.5/bin/python3 (ver 3.12.5)
Libraries: /nix/store/h3i0acpmr8mrjx07519xxmidv8mpax4y-python3-3.12.5/lib/libpython3.12.so (ver 3.12.5)
numpy: /nix/store/fvxlmgcjanv8j0qxzxxgxigq1344zn39-python3.12-numpy-1.26.4/lib/python3.12/site-packages/numpy/core/include (ver 1.26.4)
install path: lib/python3.12/site-packages
Python (for build): /nix/store/h3i0acpmr8mrjx07519xxmidv8mpax4y-python3-3.12.5/bin/python3
Java:
ant: NO
Java: NO
JNI: NO
Java wrappers: NO
Java tests: NO
Install to: /nix/store/2sblq3w11nrj0sncw506x78wcnl450id-opencv-4.9.0
-----------------------------------------------------------------
|
As part of further cleanup, I'd love to move CUDA samples and various other things into the |
Could you elaborate? |
"-DCUDA_FAST_MATH=ON" | ||
"-DCUDA_NVCC_FLAGS=--expt-relaxed-constexpr" | ||
] ++ optionals enableCuda [ | ||
(cmakeBool "CUDA_FAST_MATH" true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side-note: this looks suspicious, if it's anything like the respective cpu flag we maybe want to keep away
Here's a weird thing I stumbled upon while building let
nixpkgs = builtins.getFlake "github:NixOS/nixpkgs/nixos-unstable"; # 99dc8785f6a0adac95f5e2ab05cc2e1bf666d172
pkgs = import nixpkgs {
system = "x86_64-linux";
config.cudaSupport = true;
config.allowUnfree = true;
overlays = [(fin: _: { cudaPackages = fin.cudaPackages_12_3; })];
};
in pkgs.opencv4.override { enablePython = true; } builds fine, i.e. This is not a suggestion to replace one for the other, I don't know which is better. But it might help while looking into why LTO is broken. |
32fa788
to
dd50598
Compare
dd50598
to
ec77f27
Compare
Ah yes, sorry. I believe that rather than filtering the package set based on package name, derivations are known to be tests based on where they're located (i.e., a Having a |
Tbh I'm still not aware of any use of pkgs/test/cuda, other than that some people evaluate it as part of a larger routine |
pythonPackages: | ||
let | ||
effectiveOpenCV = pythonPackages.opencv4.override (prevAttrs: { | ||
cudaPackages = if useOpenCVDefaultCuda then prevAttrs.cudaPackages else cudaPackages; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we referring to prevAttrs
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we actually want s/prevAttrs.cudaPackages/finalAttrs/
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why do we need this branch, we cudaPackages
passed from the caller aren't enough? What exactly are we testing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we referring to
prevAttrs
here?
The override
attribute also accepts a function of the form prevAttrs: <attr set>
, where prevAttrs
is the previous set of attributes passed to (or resolved by) callPackage
.
I suspect we actually want
s/prevAttrs.cudaPackages/finalAttrs/
here?
The cudaPackages
provided to opencv4
should be the one given by top-level (or the one automatically provided by callPackage
) when useOpenCVDefaultCuda
is true, thus essentially passing through the previous value for cudaPackages
. When useOpenCVDefaultCuda
is false, cudaPackages
should be the version of cudaPackages
from the enclosing scope (e.g., cudaPackages_11_8
if the derivation is from cudaPackages_11_8.tests.<whatever>
).
Also why do we need this branch, we
cudaPackages
passed from the caller aren't enough? What exactly are we testing here
In short, it's a matrix of tests of packages against versions of the cudaPackages
package set.
Because the tests exist in each cudaPackages*
package set, it makes sense to have a copy of the test which builds not with the cudaPackages
argument supplied at the top-level or by callPackage
, but with the cudaPackages
from the enclosing scope. In this way, we're testing a single package (well, in the case of this PR, OpenCV and PyTorch) against multiple different versions of CUDA.
If we want to hold the version of cudaPackages
used by both packages fixed (meaning they are either set to some specific version at the top-level due to compatibility requirements or use the default version by way of cudaPackages
being populated by callPackage
), it doesn't make sense for the test to live in cudaPackages.tests
because the tests are irrespective of the version of cudaPackages
.
Does that make sense?
ec77f27
to
4302ca2
Compare
4302ca2
to
7edb29b
Compare
Rebased and force-pushed to avoid merge conflicts. |
Description of changes
lib
instead of accessing directly (avoids repeated lookups)cmake*
helper functions instead of the customopencvFlag
helpercudaFlags.cmakeCudaArchitecturesString
rather than constructing it manually''
) rather than inserting a newline characters in a string ("
)cudaPackages.tests
attribute set for tests which should be available for each version ofcudaPackages
Note
As part of removing OpenCV's CMake CUDA version consistency checks, I have opened #341650 to track discussion, issues, and features related to having multiple versions of the same CUDA libraries in a single closure.
Note
As part of the move from the
opencvFlag
helper, I noticed that the expression was passing the wrong arguments to control building with LTO: the helper function was being used withENABLE_LTO
, passingWITH_ENABLE_LTO
to CMake. The correct variable isENABLE_LTO
. Building with LTO is actually broken.This issue is being tracked in #343123.
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.