Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow ROCm support? #120

Open
tuxplorer opened this issue Jan 21, 2023 · 1 comment
Open

TensorFlow ROCm support? #120

tuxplorer opened this issue Jan 21, 2023 · 1 comment

Comments

@tuxplorer
Copy link

tuxplorer commented Jan 21, 2023

Hello everyone,

recently I tried reverb with the AMD GPU enhanced Tensorflow.
Doing it the streight forward way:
In completely new venv

  • pip3 install tensorflow-rocm
  • pip3 install dm-reverb

got me this error:

>>> import tensorflow
2023-01-21 14:18:31.516305: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> import reverb
Traceback (most recent call last):
  File "/tmp/venv/lib/python3.10/site-packages/reverb/pybind.py", line 4, in <module>
    from .libpybind import *
ImportError: /tmp/venv/lib/python3.10/site-packages/reverb/libschema_cc_proto.so: undefined symbol: _ZNK6google8protobuf7Message25InitializationErrorStringB5cxx11Ev

So I tried to recompile reverb myself by basically using an AMD provided docker base image:
ARG cpu_base_image="rocm/tensorflow-build:latest-python3.10-rocm5.4.0".

With a little tinkering (I will provide the details if necessary) everything compiled. But now I'm stuck at linking with the message below.
It's a bit strange as the local compiler seems to have those symbols defined in its libraries.

ERROR: /root/.cache/bazel/_bazel_root/a8a0a4aa310fa7eb496b741cf02da395/external/com_github_grpc_grpc/src/compiler/BUILD:80:18: Linking of rule '@com_github_grpc_grpc//src/compiler:grpc_cpp_plugin' failed (Exit 1): process-wrapper failed: error executing command 
  (cd /root/.cache/bazel/_bazel_root/a8a0a4aa310fa7eb496b741cf02da395/sandbox/processwrapper-sandbox/548/execroot/reverb && \
  exec env - \
    LD_LIBRARY_PATH=/opt/rh/devtoolset-9/root/usr/lib64:/opt/rh/devtoolset-9/root/usr/lib:/opt/rh/devtoolset-9/root/usr/lib64/dyninst:/opt/rh/devtoolset-9/root/usr/lib/dyninst:/usr/local/lib64 \
    PATH=/opt/rh/devtoolset-9/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
    TMPDIR=/tmp \
  /root/.cache/bazel/_bazel_root/install/46850c2a96e4b4b07623822a03209f74/process-wrapper '--timeout=0' '--kill_delay=15' /opt/rh/devtoolset-9/root/usr/bin/gcc @bazel-out/host/bin/external/com_github_grpc_grpc/src/compiler/grpc_cpp_plugin-2.params) process-wrapper failed: error executing command 
  (cd /root/.cache/bazel/_bazel_root/a8a0a4aa310fa7eb496b741cf02da395/sandbox/processwrapper-sandbox/548/execroot/reverb && \
  exec env - \
    LD_LIBRARY_PATH=/opt/rh/devtoolset-9/root/usr/lib64:/opt/rh/devtoolset-9/root/usr/lib:/opt/rh/devtoolset-9/root/usr/lib64/dyninst:/opt/rh/devtoolset-9/root/usr/lib/dyninst:/usr/local/lib64 \
    PATH=/opt/rh/devtoolset-9/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
    TMPDIR=/tmp \
  /root/.cache/bazel/_bazel_root/install/46850c2a96e4b4b07623822a03209f74/process-wrapper '--timeout=0' '--kill_delay=15' /opt/rh/devtoolset-9/root/usr/bin/gcc @bazel-out/host/bin/external/com_github_grpc_grpc/src/compiler/grpc_cpp_plugin-2.params)
bazel-out/host/bin/external/com_github_grpc_grpc/src/compiler/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function grpc_cpp_generator::ClassName(google::protobuf::Descriptor const*, bool): error: undefined reference to 'std::__throw_out_of_range_fmt(char const*, ...)'
bazel-out/host/bin/external/com_github_grpc_grpc/src/compiler/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function ProtoBufFile::package_parts() const: error: undefined reference to 'std::__throw_out_of_range_fmt(char const*, ...)'
bazel-out/host/bin/external/com_github_grpc_grpc/src/compiler/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function CppGrpcGenerator::Generate(google::protobuf::FileDescriptor const*, std::string const&, google::protobuf::compiler::GeneratorContext*, std::string*) const: error: undefined reference to 'std::__throw_out_of_range_fmt(char const*, ...)'
bazel-out/host/bin/external/com_github_grpc_grpc/src/compiler/_objs/grpc_cpp_plugin/cpp_plugin.o:cpp_plugin.cc:function CppGrpcGenerator::Generate(google::protobuf::FileDescriptor const*, std::string const&, google::protobuf::compiler::GeneratorContext*, std::string*) const: error: undefined reference to 'std::__throw_out_of_range_fmt(char const*, ...)'
bazel-out/host/bin/external/com_google_protobuf/_objs/protobuf/dynamic_message.o:dynamic_message.cc:function google::protobuf::DynamicMessageFactory::GetPrototypeNoLock(google::protobuf::Descriptor const*) [clone .cold]: error: undefined reference to '__cxa_throw_bad_array_new_length'
collect2: error: ld returned 1 exit status
Target //reverb/pip_package:build_pip_package failed to build
@tfboyd
Copy link
Collaborator

tfboyd commented Feb 4, 2023

These build errors are hard to figure out. I doubt you will get much help. Not saying this to be mean. I just believe in transparency as it would frustrate me if I were you. We don't use AMD internally (same with MacOS) so we don't maintain or test it. Reverb is used heavily but is mostly just in maintenance mode given it is "feature complete" and performs well. Here are my only tips. I do the stable builds just because I kind of like doing it.

  • When I build the reverb nightly and the stable releases I am using the same instructions and code
  • For the latest stable ROCm the source for reverb would have been at this tag. https://github.com/deepmind/reverb/tree/v0.10.0. I don't think that will fix your issue but you will have the reverb source that was used to build against TensorFlow 2.11 in December.

Wishing you luck. I just fought a horrible build issue that ended up being silly but took me days and asking (begging) some experts to look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants