如何在 Docker 容器中安装支持 CUDA 的 PyTorch?
How to conda install CUDA enabled PyTorch in a Docker container?
我正在尝试在构建了 conda 环境的服务器上构建一个 Docker 容器。除了支持 CUDA 的 PyTorch 之外,所有其他要求都得到满足(我可以让 PyTorch 在没有 CUDA 的情况下工作,但是没问题)。我如何确保 PyTorch 正在使用 CUDA?
这是 Dockerfile
:
# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
# set bash as current shell
RUN chsh -s /bin/bash
# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
/opt/conda/bin/conda clean -afy
# set path to conda
ENV PATH /opt/conda/bin:$PATH
# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
&& conda env create --name camera-seg -f /tmp/requirements.yaml \
&& conda install -y -c conda-forge -n camera-seg flake8
# From the pythonspeed tutorial; Make RUN commands use the new environment
SHELL ["conda", "run", "-n", "camera-seg", "/bin/bash", "-c"]
# PyTorch with CUDA 10.2
RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
RUN echo "conda activate camera-seg" > ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
当我尝试构建此容器 (docker build -t camera-seg .
) 时出现以下错误:
.....
Step 10/12 : RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
---> Running in e0dd3e648f7b
ERROR conda.cli.main_run:execute(34): Subprocess for 'conda run ['/bin/bash', '-c', 'conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch']' command failed. (See above for error)
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
The command 'conda run -n camera-seg /bin/bash -c conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch' returned a non-zero code: 1
这是requirements.yaml
:
name: camera-seg
channels:
- defaults
- conda-forge
dependencies:
- python=3.6
- numpy
- pillow
- yaml
- pyyaml
- matplotlib
- jupyter
- notebook
- tensorboardx
- tensorboard
- protobuf
- tqdm
当我将 pytorch
、torchvision
和 cudatoolkit=10.2
放入 requirements.yaml
时,PyTorch 已成功安装,但无法识别 CUDA ( torch.cuda.is_available()
returns False
).
我尝试了各种解决方案,例如 this, this and this 和它们的一些不同组合,但都无济于事。
非常感谢任何帮助。谢谢。
经过多次尝试,我终于成功了。在这里发布答案以防对任何人有帮助。
基本上,我通过 pip
(在 conda
环境中)安装了 pytorch
和 torchvision
,并像往常一样通过 conda
安装了其余依赖项.
这是最终 Dockerfile
的样子:
# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
# set bash as current shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
/opt/conda/bin/conda clean -afy
# set path to conda
ENV PATH /opt/conda/bin:$PATH
# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
&& conda env create --name camera-seg -f /tmp/requirements.yaml
RUN echo "conda activate camera-seg" >> ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
ENV CONDA_DEFAULT_ENV $camera-seg
这就是 requirements.yaml
的样子:
name: camera-seg
channels:
- defaults
- conda-forge
dependencies:
- python=3.6
- pip
- numpy
- pillow
- yaml
- pyyaml
- matplotlib
- jupyter
- notebook
- tensorboardx
- tensorboard
- protobuf
- tqdm
- pip:
- torch
- torchvision
然后我使用命令 docker build -t camera-seg .
构建容器,PyTorch 现在能够识别 CUDA。
我正在尝试在构建了 conda 环境的服务器上构建一个 Docker 容器。除了支持 CUDA 的 PyTorch 之外,所有其他要求都得到满足(我可以让 PyTorch 在没有 CUDA 的情况下工作,但是没问题)。我如何确保 PyTorch 正在使用 CUDA?
这是 Dockerfile
:
# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
# set bash as current shell
RUN chsh -s /bin/bash
# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
/opt/conda/bin/conda clean -afy
# set path to conda
ENV PATH /opt/conda/bin:$PATH
# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
&& conda env create --name camera-seg -f /tmp/requirements.yaml \
&& conda install -y -c conda-forge -n camera-seg flake8
# From the pythonspeed tutorial; Make RUN commands use the new environment
SHELL ["conda", "run", "-n", "camera-seg", "/bin/bash", "-c"]
# PyTorch with CUDA 10.2
RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
RUN echo "conda activate camera-seg" > ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
当我尝试构建此容器 (docker build -t camera-seg .
) 时出现以下错误:
.....
Step 10/12 : RUN conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
---> Running in e0dd3e648f7b
ERROR conda.cli.main_run:execute(34): Subprocess for 'conda run ['/bin/bash', '-c', 'conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch']' command failed. (See above for error)
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
The command 'conda run -n camera-seg /bin/bash -c conda activate camera-seg && conda install pytorch torchvision cudatoolkit=10.2 -c pytorch' returned a non-zero code: 1
这是requirements.yaml
:
name: camera-seg
channels:
- defaults
- conda-forge
dependencies:
- python=3.6
- numpy
- pillow
- yaml
- pyyaml
- matplotlib
- jupyter
- notebook
- tensorboardx
- tensorboard
- protobuf
- tqdm
当我将 pytorch
、torchvision
和 cudatoolkit=10.2
放入 requirements.yaml
时,PyTorch 已成功安装,但无法识别 CUDA ( torch.cuda.is_available()
returns False
).
我尝试了各种解决方案,例如 this, this and this 和它们的一些不同组合,但都无济于事。
非常感谢任何帮助。谢谢。
经过多次尝试,我终于成功了。在这里发布答案以防对任何人有帮助。
基本上,我通过 pip
(在 conda
环境中)安装了 pytorch
和 torchvision
,并像往常一样通过 conda
安装了其余依赖项.
这是最终 Dockerfile
的样子:
# Use nvidia/cuda image
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
# set bash as current shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# install anaconda
RUN apt-get update
RUN apt-get install -y wget bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 git mercurial subversion && \
apt-get clean
RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
find /opt/conda/ -follow -type f -name '*.a' -delete && \
find /opt/conda/ -follow -type f -name '*.js.map' -delete && \
/opt/conda/bin/conda clean -afy
# set path to conda
ENV PATH /opt/conda/bin:$PATH
# setup conda virtual environment
COPY ./requirements.yaml /tmp/requirements.yaml
RUN conda update conda \
&& conda env create --name camera-seg -f /tmp/requirements.yaml
RUN echo "conda activate camera-seg" >> ~/.bashrc
ENV PATH /opt/conda/envs/camera-seg/bin:$PATH
ENV CONDA_DEFAULT_ENV $camera-seg
这就是 requirements.yaml
的样子:
name: camera-seg
channels:
- defaults
- conda-forge
dependencies:
- python=3.6
- pip
- numpy
- pillow
- yaml
- pyyaml
- matplotlib
- jupyter
- notebook
- tensorboardx
- tensorboard
- protobuf
- tqdm
- pip:
- torch
- torchvision
然后我使用命令 docker build -t camera-seg .
构建容器,PyTorch 现在能够识别 CUDA。