Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Cannot install torch-npu==2.3.1, torch==2.3.1 and torchvision==0.18.1 because these package versions have conflicting dependencies. #2745

Open
3 tasks
jiabao-wang opened this issue Nov 13, 2024 · 3 comments
Assignees

Comments

@jiabao-wang
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

when i run :
DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest
-f docker/Dockerfile_aarch64_ascend .

ERROR: Cannot install torch-npu==2.3.1, torch==2.3.1 and torchvision==0.18.1 because these package versions have conflicting dependencies.
341.8
341.8 The conflict is caused by:
341.8 The user requested torch==2.3.1
341.8 torchvision 0.18.1 depends on torch==2.3.1
341.8 torch-npu 2.3.1 depends on torch==2.3.1+cpu
341.8
341.8 To fix this you could try to:
341.8 1. loosen the range of package versions you've specified
341.8 2. remove package versions to allow pip to attempt to solve the dependency conflict
341.8
341.8 ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Dockerfile_aarch64_ascend:110

109 | # timm is required for internvl2 model
110 | >>> RUN --mount=type=cache,target=/root/.cache/pip
111 | >>> pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 &&
112 | >>> pip3 install transformers timm &&
113 | >>> pip3 install dlinfer-ascend
114 |

ERROR: failed to solve: process "/bin/bash -c pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 && pip3 install transformers timm && pip3 install dlinfer-ascend" did not complete successfully: exit code: 1

Reproduction

DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest
-f docker/Dockerfile_aarch64_ascend .

Environment

Atlas-800-Model-3010
Ascend Docker Runtime has already been installed.

Error traceback

DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest     -f docker/Dockerfile_aarch64_ascend .
[+] Building 1038.2s (15/18)                                                                                                                                                                     docker:default
 => [internal] load build definition from Dockerfile_aarch64_ascend                                                                                                                                        0.0s
 => => transferring dockerfile: 5.15kB                                                                                                                                                                     0.0s
 => [internal] load .dockerignore                                                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/ubuntu:20.04                                                                                                                                            3.3s
 => [build_temp 1/2] FROM docker.io/library/ubuntu:20.04@sha256:8e5c4f0285ecbb4ead070431d29b576a530d3166df73ec44affc1cd27555141b                                                                          11.5s
 => => resolve docker.io/library/ubuntu:20.04@sha256:8e5c4f0285ecbb4ead070431d29b576a530d3166df73ec44affc1cd27555141b                                                                                      0.0s
 => => sha256:8e5c4f0285ecbb4ead070431d29b576a530d3166df73ec44affc1cd27555141b 6.69kB / 6.69kB                                                                                                             0.0s
 => => sha256:e5a6aeef391a8a9bdaee3de6b28f393837c479d8217324a2340b64e45a81e0ef 424B / 424B                                                                                                                 0.0s
 => => sha256:6013ae1a63c2ee58a8949f03c6366a3ef6a2f386a7db27d86de2de965e9f450b 2.30kB / 2.30kB                                                                                                             0.0s
 => => sha256:d9802f032d6798e2086607424bfe88cb8ec1d6f116e11cd99592dcaf261e9cd2 27.51MB / 27.51MB                                                                                                           9.8s
 => => extracting sha256:d9802f032d6798e2086607424bfe88cb8ec1d6f116e11cd99592dcaf261e9cd2                                                                                                                  1.4s
 => [internal] load build context                                                                                                                                                                         27.3s
 => => transferring context: 3.56GB                                                                                                                                                                       27.2s
 => [base_builder 2/6] WORKDIR /tmp                                                                                                                                                                        1.6s
 => [base_builder 3/6] RUN sed -i 's@http://.*.ubuntu.com@http://mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list &&     apt update &&     apt install --no-install-recommends ca-certificates -y &  84.3s
 => [build_temp 2/2] COPY . /tmp                                                                                                                                                                          15.3s
 => [copy_temp 1/1] RUN rm -rf /tmp/*.run                                                                                                                                                                  0.3s
 => [base_builder 4/6] RUN umask 0022  &&     wget https://repo.huaweicloud.com/python/3.10.5/Python-3.10.5.tar.xz &&     tar -xf Python-3.10.5.tar.xz && cd Python-3.10.5 && ./configure --prefix=/usr/  99.0s
 => [base_builder 5/6] RUN --mount=type=cache,target=/root/.cache/pip pip3 config set global.index-url http://mirrors.aliyun.com/pypi/simple &&     pip3 config set global.trusted-host mirrors.aliyun.c  53.2s
 => [base_builder 6/6] RUN if [ ! -d "/lib64" ];     then         mkdir /lib64 && ln -sf /lib/ld-linux-aarch64.so.1 /lib64/ld-linux-aarch64.so.1;     fi                                                   0.5s
 => [cann_builder 1/3] RUN --mount=type=cache,target=/tmp,from=build_temp,source=/tmp     umask 0022 &&     mkdir -p /usr/local/Ascend/driver &&     if [ "all" != "all" ];     then         CHIPOPTION  441.9s
 => [cann_builder 2/3] RUN echo "source /usr/local/Ascend/ascend-toolkit/set_env.sh" >> ~/.bashrc &&     echo "source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=0" >> ~/.bashrc &&     . ~/.bashrc   0.4s
 => ERROR [cann_builder 3/3] RUN --mount=type=cache,target=/root/.cache/pip     pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 &&     pip3 install transformers timm &&     pip3 instal  342.3s
------
 > [cann_builder 3/3] RUN --mount=type=cache,target=/root/.cache/pip     pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 &&     pip3 install transformers timm &&     pip3 install dlinfer-ascend:
0.306 ERROR: ld.so: object '/lib/aarch64-linux-gnu/libGLdispatch.so.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
0.309 ERROR: ld.so: object '/lib/aarch64-linux-gnu/libGLdispatch.so.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
0.830 Looking in indexes: http://mirrors.aliyun.com/pypi/simple
1.137 Collecting torch==2.3.1
1.339   Downloading http://mirrors.aliyun.com/pypi/packages/cb/e2/1bd899d3eb60c6495cf5d0d2885edacac08bde7a1407eadeb2ab36eca3c7/torch-2.3.1-cp310-cp310-manylinux1_x86_64.whl (779.1 MB)
107.3      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 779.1/779.1 MB 10.1 MB/s eta 0:00:00
110.0 Collecting torchvision==0.18.1
110.0   Downloading http://mirrors.aliyun.com/pypi/packages/08/04/17425bf3c0620465ee182cea5c674db4debab87ed0627145d38039cb2a9e/torchvision-0.18.1-cp310-cp310-manylinux1_x86_64.whl (7.0 MB)
110.7      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 10.3 MB/s eta 0:00:00
110.9 Collecting torch-npu==2.3.1
111.1   Downloading http://mirrors.aliyun.com/pypi/packages/a6/e1/60664898a464930397632eb718a4330dd9b394d543394fd07d7b837abef4/torch_npu-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
112.2      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.7/11.7 MB 10.8 MB/s eta 0:00:00
112.4 Collecting filelock (from torch==2.3.1)
112.5   Downloading http://mirrors.aliyun.com/pypi/packages/b9/f8/feced7779d755758a52d1f6635d990b8d98dc0a29fa568bbe0625f18fdf3/filelock-3.16.1-py3-none-any.whl (16 kB)
112.5 Collecting typing-extensions>=4.8.0 (from torch==2.3.1)
112.6   Downloading http://mirrors.aliyun.com/pypi/packages/26/9f/ad63fc0248c5379346306f8668cda6e2e2e9c95e01216d2b8ffd9ff037d0/typing_extensions-4.12.2-py3-none-any.whl (37 kB)
112.6 Requirement already satisfied: sympy in /usr/local/python3.10.5/lib/python3.10/site-packages (from torch==2.3.1) (1.13.3)
112.7 Collecting networkx (from torch==2.3.1)
112.7   Downloading http://mirrors.aliyun.com/pypi/packages/b9/54/dd730b32ea14ea797530a4479b2ed46a6fb250f682a9cfb997e968bf0261/networkx-3.4.2-py3-none-any.whl (1.7 MB)
112.8      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 11.3 MB/s eta 0:00:00
112.9 Collecting jinja2 (from torch==2.3.1)
113.0   Downloading http://mirrors.aliyun.com/pypi/packages/31/80/3a54838c3fb461f6fec263ebf3a3a41771bd05190238de3486aae8540c36/jinja2-3.1.4-py3-none-any.whl (133 kB)
113.1 Collecting fsspec (from torch==2.3.1)
113.1   Downloading http://mirrors.aliyun.com/pypi/packages/c6/b2/454d6e7f0158951d8a78c2e1eb4f69ae81beb8dca5fee9809c6c99e9d0d0/fsspec-2024.10.0-py3-none-any.whl (179 kB)
113.3 Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.3.1)
113.3   Downloading http://mirrors.aliyun.com/pypi/packages/b6/9f/c64c03f49d6fbc56196664d05dba14e3a561038a81a638eeb47f4d4cfd48/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
115.5      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 10.7 MB/s eta 0:00:00
115.7 Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.3.1)
115.7   Downloading http://mirrors.aliyun.com/pypi/packages/eb/d5/c68b1d2cdfcc59e72e8a5949a37ddb22ae6cade80cd4a57a84d4c8b55472/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
115.8      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 11.9 MB/s eta 0:00:00
115.8 Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.3.1)
115.8   Downloading http://mirrors.aliyun.com/pypi/packages/7e/00/6b218edd739ecfc60524e585ba8e6b00554dd908de2c9c66c1af3e44e18d/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
117.1      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 11.1 MB/s eta 0:00:00
117.2 Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.3.1)
117.6   Downloading http://mirrors.aliyun.com/pypi/packages/ff/74/a2e2be7fb83aaedec84f391f082cf765dfb635e7caa9b49065f73e4835d8/nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
193.2      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 6.6 MB/s eta 0:00:00
195.4 Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.3.1)
195.5   Downloading http://mirrors.aliyun.com/pypi/packages/37/6d/121efd7382d5b0284239f4ab1fc1590d86d34ed4a4a2fdb13b30ca8e5740/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
240.8      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 8.2 MB/s eta 0:00:00
242.1 Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.3.1)
242.1   Downloading http://mirrors.aliyun.com/pypi/packages/86/94/eb540db023ce1d162e7bea9f8f5aa781d57c65aed513c33ee9a5123ead4d/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
254.1      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 10.2 MB/s eta 0:00:00
254.5 Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.3.1)
254.6   Downloading http://mirrors.aliyun.com/pypi/packages/44/31/4890b1c9abc496303412947fc7dcea3d14861720642b49e8ceed89636705/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
260.1      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 10.2 MB/s eta 0:00:00
260.3 Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.3.1)
260.4   Downloading http://mirrors.aliyun.com/pypi/packages/bc/1d/8de1e5c67099015c834315e333911273a8c6aaba78923dd1d1e25fc5f217/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
272.5      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 10.2 MB/s eta 0:00:00
273.0 Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.3.1)
273.0   Downloading http://mirrors.aliyun.com/pypi/packages/65/5b/cfaeebf25cd9fdec14338ccb16f6b2c4c7fa9163aefcf057d86b9cc248bb/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
290.5      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 11.2 MB/s eta 0:00:00
291.2 Collecting nvidia-nccl-cu12==2.20.5 (from torch==2.3.1)
291.2   Downloading http://mirrors.aliyun.com/pypi/packages/4b/2a/0a131f572aa09f741c30ccd45a8e56316e8be8dfc7bc19bf0ab7cfef7b19/nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)
306.9      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 176.2/176.2 MB 11.3 MB/s eta 0:00:00
307.5 Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.3.1)
307.5   Downloading http://mirrors.aliyun.com/pypi/packages/da/d3/8057f0587683ed2fcd4dbfbdfdfa807b9160b809976099d36b8f60d08f03/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
307.6 Collecting triton==2.3.1 (from torch==2.3.1)
307.7   Downloading http://mirrors.aliyun.com/pypi/packages/d7/69/8a9fde07d2d27a90e16488cdfe9878e985a247b2496a4b5b1a2126042528/triton-2.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (168.1 MB)
339.7      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.1/168.1 MB 4.8 MB/s eta 0:00:00
340.3 Requirement already satisfied: numpy in /usr/local/python3.10.5/lib/python3.10/site-packages (from torchvision==0.18.1) (1.24.0)
341.0 Collecting pillow!=8.3.*,>=5.3.0 (from torchvision==0.18.1)
341.1   Downloading http://mirrors.aliyun.com/pypi/packages/41/c3/94f33af0762ed76b5a237c5797e088aa57f2b7fa8ee7932d399087be66a8/pillow-11.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (4.4 MB)
341.7      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 7.0 MB/s eta 0:00:00
341.8 INFO: pip is looking at multiple versions of torch-npu to determine which version is compatible with other requirements. This could take a while.
341.8 ERROR: Cannot install torch-npu==2.3.1, torch==2.3.1 and torchvision==0.18.1 because these package versions have conflicting dependencies.
341.8
341.8 The conflict is caused by:
341.8     The user requested torch==2.3.1
341.8     torchvision 0.18.1 depends on torch==2.3.1
341.8     torch-npu 2.3.1 depends on torch==2.3.1+cpu
341.8
341.8 To fix this you could try to:
341.8 1. loosen the range of package versions you've specified
341.8 2. remove package versions to allow pip to attempt to solve the dependency conflict
341.8
341.8 ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
------
Dockerfile_aarch64_ascend:110
--------------------
 109 |     # timm is required for internvl2 model
 110 | >>> RUN --mount=type=cache,target=/root/.cache/pip \
 111 | >>>     pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 && \
 112 | >>>     pip3 install transformers timm && \
 113 | >>>     pip3 install dlinfer-ascend
 114 |
--------------------
ERROR: failed to solve: process "/bin/bash -c pip3 install torch==2.3.1 torchvision==0.18.1 torch-npu==2.3.1 &&     pip3 install transformers timm &&     pip3 install dlinfer-ascend" did not complete successfully: exit code: 1
@CyCle1024
Copy link
Collaborator

CyCle1024 commented Nov 13, 2024

@jiabao-wang Hi, are you building docker image on x86_64 platform? Currently, the Dockerfile is only supported on aarch64. For x86_64 platform, the pypi package of dlinfer is not uploaded, as well as the problem you mentioned above.
There's a workaround for this case which is not released yet.

@CyCle1024 CyCle1024 self-assigned this Nov 13, 2024
@CyCle1024
Copy link
Collaborator

@jiabao-wang Here is a new Dockerfile for ascend x86_64 platform, it's only tested for building on x86_64 machine, the inference of models is not tested yet since we don't have any x86_64 ascend npu machine.

FROM ubuntu:20.04 as base_builder

WORKDIR /tmp

ARG http_proxy
ARG https_proxy
ARG DEBIAN_FRONTEND=noninteractive

RUN sed -i 's@http://.*.ubuntu.com@http://mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list && \
    apt update && \
    apt install --no-install-recommends ca-certificates -y && \
    apt install --no-install-recommends bc wget -y && \
    apt install --no-install-recommends git curl gcc make g++ pkg-config unzip -y && \
    apt install --no-install-recommends libsqlite3-dev libblas3 liblapack3 gfortran vim -y && \
    apt install --no-install-recommends liblapack-dev libblas-dev libhdf5-dev libffi-dev -y && \
    apt install --no-install-recommends libssl-dev zlib1g-dev xz-utils cython3 python3-h5py -y && \
    apt install --no-install-recommends libopenblas-dev libgmpxx4ldbl liblzma-dev -y && \
    apt install --no-install-recommends libicu66 libxml2 pciutils libgl1-mesa-glx libbz2-dev -y && \
    apt install --no-install-recommends libreadline-dev libncurses5 libncurses5-dev libncursesw5 -y && \
    sed -i 's@http://mirrors.tuna.tsinghua.edu.cn@https://mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list && \
    apt clean && rm -rf /var/lib/apt/lists/*

ARG PYVERSION=3.10.5

ENV LD_LIBRARY_PATH=/usr/local/python${PYVERSION}/lib: \
    PATH=/usr/local/python${PYVERSION}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

RUN umask 0022  && \
    wget https://repo.huaweicloud.com/python/${PYVERSION}/Python-${PYVERSION}.tar.xz && \
    tar -xf Python-${PYVERSION}.tar.xz && cd Python-${PYVERSION} && ./configure --prefix=/usr/local/python${PYVERSION} --enable-shared && \
    make -j 16 && make install && \
    ln -sf /usr/local/python${PYVERSION}/bin/python3 /usr/bin/python3 && \
    ln -sf /usr/local/python${PYVERSION}/bin/python3 /usr/bin/python && \
    ln -sf /usr/local/python${PYVERSION}/bin/pip3 /usr/bin/pip3 && \
    ln -sf /usr/local/python${PYVERSION}/bin/pip3 /usr/bin/pip && \
    cd .. && \
    rm -rf Python*

RUN --mount=type=cache,target=/root/.cache/pip pip3 config set global.index-url http://mirrors.aliyun.com/pypi/simple && \
    pip3 config set global.trusted-host mirrors.aliyun.com && \
    pip3 install -U pip && \
    pip3 install wheel==0.43.0 scikit-build==0.18.0 numpy==1.24 setuptools==69.5.1 && \
    pip3 install decorator sympy cffi && \
    pip3 install cmake ninja pyyaml && \
    pip3 install pathlib2 protobuf attrs attr scipy && \
    pip3 install requests psutil absl-py

ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/hdf5/serial:$LD_LIBRARY_PATH

FROM ubuntu:20.04 as build_temp
COPY . /tmp

FROM base_builder as cann_builder

ARG ASCEND_BASE=/usr/local/Ascend
ARG TOOLKIT_PATH=$ASCEND_BASE/ascend-toolkit/latest

ENV LD_LIBRARY_PATH=\
$ASCEND_BASE/driver/lib64:\
$ASCEND_BASE/driver/lib64/common:\
$ASCEND_BASE/driver/lib64/driver:\
$ASCEND_BASE/driver/tools/hccn_tool/:\
$TOOLKIT_PATH/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/x86_64/:\
$LD_LIBRARY_PATH

# run files should be placed at the root dir of repo
ARG CHIP=all
ARG TOOLKIT_PKG=Ascend-cann-toolkit_*.run
ARG KERNELS_PKG=Ascend-cann-kernels-*.run
ARG NNAL_PKG=Ascend-cann-nnal_*.run

RUN --mount=type=cache,target=/tmp,from=build_temp,source=/tmp \
    umask 0022 && \
    mkdir -p $ASCEND_BASE/driver && \
    if [ "$CHIP" != "all" ]; \
    then \
        CHIPOPTION="--chip=$CHIP"; \
    else \
        CHIPOPTION=""; \
    fi && \
    chmod +x $TOOLKIT_PKG $KERNELS_PKG $NNAL_PKG && \
    ./$TOOLKIT_PKG --quiet --install --install-path=$ASCEND_BASE --install-for-all $CHIPOPTION && \
    ./$KERNELS_PKG --quiet --install --install-path=$ASCEND_BASE --install-for-all && \
    . /usr/local/Ascend/ascend-toolkit/set_env.sh && \
    ./$NNAL_PKG --quiet --install --install-path=$ASCEND_BASE && \
    rm -f $TOOLKIT_PKG $KERNELS_PKG $NNAL_PKG

ENV GLOG_v=2 \
    LD_LIBRARY_PATH=$TOOLKIT_PATH/lib64:$LD_LIBRARY_PATH \
    TBE_IMPL_PATH=$TOOLKIT_PATH/opp/op_impl/built-in/ai_core/tbe \
    PATH=$TOOLKIT_PATH/ccec_compiler/bin:$PATH \
    ASCEND_OPP_PATH=$TOOLKIT_PATH/opp \
    ASCEND_AICPU_PATH=$TOOLKIT_PATH

ENV PYTHONPATH=$TBE_IMPL_PATH:$PYTHONPATH

SHELL ["/bin/bash", "-c"]
RUN echo "source /usr/local/Ascend/ascend-toolkit/set_env.sh" >> ~/.bashrc && \
    echo "source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=0" >> ~/.bashrc && \
    . ~/.bashrc

# dlinfer
# timm is required for internvl2 model
WORKDIR /opt/
RUN --mount=type=cache,target=/root/.cache/pip \
    pip3 install torch==2.3.1+cpu torchvision==0.18.1+cpu --index-url=https://download.pytorch.org/whl/cpu && \
    pip3 install torch-npu==2.3.1 && \
    pip3 install transformers timm && \
    git clone https://github.com/DeepLink-org/dlinfer.git && \
    cd dlinfer && DEVICE=ascend python setup.py develop

# lmdeploy
FROM build_temp as copy_temp
RUN rm -rf /tmp/*.run

FROM cann_builder as final_builder
COPY --from=copy_temp /tmp /opt/lmdeploy
WORKDIR /opt/lmdeploy

RUN --mount=type=cache,target=/root/.cache/pip \
    sed -i '/triton/d' requirements/runtime.txt && \
    pip3 install -v --no-build-isolation -e .

@jiabao-wang
Copy link
Author

jiabao-wang commented Nov 14, 2024

@CyCle1024
I have make the docker iamge following the Dockerfile for ascend x86_64
but when i try to run: docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env

output error:
(base) wjb@ubuntu-Atlas-800-Model-3010:~$ docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env
Traceback (most recent call last):
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1778, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/usr/local/python3.10.5/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/generation/utils.py", line 115, in
from accelerate.hooks import AlignDevicesHook, add_hook_to_module
File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/init.py", line 16, in
from .accelerator import Accelerator
File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/accelerator.py", line 36, in
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in
from .utils import (
File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/utils/init.py", line 126, in
from .modeling import (
File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 31, in
from ..state import AcceleratorState
File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/state.py", line 64, in
if is_npu_available(check_device=False):
File "/usr/local/python3.10.5/lib/python3.10/site-packages/accelerate/utils/imports.py", line 362, in is_npu_available
import torch_npu # noqa: F401
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/init.py", line 16, in
import torch_npu.npu
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/npu/init.py", line 119, in
from torch_npu.utils.error_code import ErrCode, pta_error, prof_error
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/utils/init.py", line 1, in
from ._module import _apply_module_patch
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/utils/_module.py", line 26, in
from torch_npu.npu.amp.autocast_mode import autocast
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/npu/amp/init.py", line 6, in
from .grad_scaler import GradScaler # noqa: F401
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/npu/amp/grad_scaler.py", line 8, in
from torch.amp.grad_scaler import _MultiDeviceReplicator, OptState, _refresh_per_optimizer_state
ModuleNotFoundError: No module named 'torch.amp.grad_scaler'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1778, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/usr/local/python3.10.5/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/models/auto/modeling_auto.py", line 21, in
from .auto_factory import (
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 40, in
from ...generation import GenerationMixin
File "", line 1075, in _handle_fromlist
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1766, in getattr
module = self._get_module(self._class_to_module[name])
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1780, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
No module named 'torch.amp.grad_scaler'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/python3.10.5/bin/lmdeploy", line 33, in
sys.exit(load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')())
File "/usr/local/python3.10.5/bin/lmdeploy", line 25, in importlib_load_entry_point
return next(matches).load()
File "/usr/local/python3.10.5/lib/python3.10/importlib/metadata/init.py", line 171, in load
module = import_module(match.group('module'))
File "/usr/local/python3.10.5/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/opt/lmdeploy/lmdeploy/init.py", line 3, in
from .api import client, pipeline, serve
File "/opt/lmdeploy/lmdeploy/api.py", line 5, in
from .archs import autoget_backend_config, get_task
File "/opt/lmdeploy/lmdeploy/archs.py", line 6, in
from lmdeploy.serve.vl_async_engine import VLAsyncEngine
File "/opt/lmdeploy/lmdeploy/serve/vl_async_engine.py", line 8, in
from lmdeploy.vl.engine import ImageEncoder
File "/opt/lmdeploy/lmdeploy/vl/engine.py", line 12, in
from lmdeploy.vl.model.builder import load_vl_model
File "/opt/lmdeploy/lmdeploy/vl/model/builder.py", line 7, in
from .internvl import InternVLVisionModel
File "/opt/lmdeploy/lmdeploy/vl/model/internvl.py", line 7, in
from transformers import AutoConfig, AutoModel, CLIPImageProcessor
File "", line 1075, in _handle_fromlist
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1767, in getattr
value = getattr(module, name)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1766, in getattr
module = self._get_module(self._class_to_module[name])
File "/usr/local/python3.10.5/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1780, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.auto.modeling_auto because of the following error (look up to see its traceback):
Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
No module named 'torch.amp.grad_scaler'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants