You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
we should build Bazel, ml_dtypes and tensorboard first and install them in the directory for CPU-only software (double-check if and why there are not there yet)
Bazel/6.3.1 is installed but not Bazel/6.1.0 which is a dependency for this PR
ml_dtypes is not installed ... not sure if it should be (see comment/question for tensorboard below) ... OR it's a new dependency for TensorFlow (check easyconfig for CPU-only version)
tensorboard/2.13.0 is available as an extension of the CPU-only installation of TensorFlow/2.13.0-foss-2023a ... we might want to install the extension under the GPU directory?
we should check why cuDNN is installed again (in directory for CPU-only software) ... maybe related to switching to EESSI-extend/2023.06-easybuild and the installation path not being configured correctly
New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.11/pr_808/28458
date
job status
comment
Nov 12 11:32:37 UTC 2024
submitted
job id 28458 awaits release by job manager
Nov 12 11:32:49 UTC 2024
released
job awaits launch by Slurm scheduler
Nov 12 11:37:52 UTC 2024
running
job 28458 is running
Nov 12 11:52:06 UTC 2024
finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-28458.out ❌ found message matching ERROR: ✅ no message matching FAILED: ✅ no message matching required modules missing: ❌ no message matching No missing installations ✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1731411645.tar.gzsize: 698 MiB (732482400 bytes) entries: 71 modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.11/pr_808/28459
date
job status
comment
Nov 12 13:10:47 UTC 2024
submitted
job id 28459 awaits release by job manager
Nov 12 13:11:15 UTC 2024
released
job awaits launch by Slurm scheduler
Nov 12 13:17:18 UTC 2024
running
job 28459 is running
job failed and job manager crashed when trying to update the above table with a too large update (~ 300 KB) ... might be related to that the install path is wrong (CPU vs GPU directory)
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR to debug issues building TensorFlow v2.15.1 with CUDA v12.1.1
tensorflow.py
easyblock that solves anImportError
issue withlibnccl.so.2
. See tweak libpaths in TensorFlow easyblock by adding directory containing libnccl.so.2 easybuilders/easybuild-easyblocks#3497Notes:
Bazel
,ml_dtypes
andtensorboard
first and install them in the directory for CPU-only software (double-check if and why there are not there yet)Bazel/6.3.1
is installed but notBazel/6.1.0
which is a dependency for this PRml_dtypes
is not installed ... not sure if it should be (see comment/question for tensorboard below) ... OR it's a new dependency for TensorFlow (check easyconfig for CPU-only version)tensorboard/2.13.0
is available as an extension of the CPU-only installation ofTensorFlow/2.13.0-foss-2023a
... we might want to install the extension under the GPU directory?cuDNN
is installed again (in directory for CPU-only software) ... maybe related to switching toEESSI-extend/2023.06-easybuild
and the installation path not being configured correctly