如何在 Docker 容器中安装支持 CUDA 的 PyTorch？

Question

我正在尝试在构建了 conda 环境的服务器上构建一个 Docker 容器。除了支持 CUDA 的 PyTorch 之外，所有其他要求都得到满足（我可以让 PyTorch 在没有 CUDA 的情况下工作，但是没问题）。我如何确保 PyTorch 正在使用 CUDA？

这是 Dockerfile :

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml \
    && conda install -y -c conda-forge -n camera-seg flake8

# From the pythonspeed tutorial; Make RUN commands use the new environment
SHELL ["conda", "run", "-n", "camera-seg", "/bin/bash", "-c"]

# PyTorch with CUDA 10.2
RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

RUN echo "conda activate camera-seg" > ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH

当我尝试构建此容器 (docker build -t camera-seg .) 时出现以下错误：

.....

Step 10/12 : RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
 ---> Running in e0dd3e648f7b
ERROR conda.cli.main_run:execute(34): Subprocess for 'conda run ['/bin/bash', '-c', 'conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch']' command failed.  (See above for error)

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.



The command 'conda run -n camera-seg /bin/bash -c conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch' returned a non-zero code: 1

这是requirements.yaml:

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm

当我将 pytorch、torchvision 和 cudatoolkit=10.2 放入 requirements.yaml 时，PyTorch 已成功安装，但无法识别 CUDA ( torch.cuda.is_available() returns False ).

我尝试了各种解决方案，例如 this, this and this 和它们的一些不同组合，但都无济于事。

非常感谢任何帮助。谢谢。

Answer 1

经过多次尝试，我终于成功了。在这里发布答案以防对任何人有帮助。

基本上，我通过 pip（在 conda 环境中）安装了 pytorch 和 torchvision，并像往常一样通过 conda 安装了其余依赖项.

这是最终 Dockerfile 的样子：

# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# set bash as current shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]

# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
        apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
        /bin/bash ~/anaconda.sh -b -p /opt/conda && \
        rm ~/anaconda.sh && \
        ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
        echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
        find /opt/conda/ -follow -type f -name '*.a' -delete && \
        find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
        /opt/conda/bin/conda clean -afy

# set path to conda
ENV PATH /opt/conda/bin:$PATH


# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
    && conda env create --name camera-seg -f /tmp/requirements.yaml

RUN echo "conda activate camera-seg" >> ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
ENV CONDA_DEFAULT_ENV $camera-seg

这就是 requirements.yaml 的样子：

name: camera-seg
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - pip
  - numpy
  - pillow
  - yaml
  - pyyaml
  - matplotlib
  - jupyter
  - notebook
  - tensorboardx
  - tensorboard
  - protobuf
  - tqdm
  - pip:
    - torch
    - torchvision

然后我使用命令 docker build -t camera-seg . 构建容器，PyTorch 现在能够识别 CUDA。

如何在 Docker 容器中安装支持 CUDA 的 PyTorch？

How to conda install CUDA enabled PyTorch in a Docker container?

python-3.x

docker

anaconda

pytorch