After building dockerfile: ModuleNotFoundError: No module named 'numpy'
After building dockerfile: ModuleNotFoundError: No module named 'numpy'
我必须运行 Redhat8 中的python 程序。所以我拉了 Redhat docker 镜像并写了一个 Dockerfile 如下:
FROM redhat/ubi8:latest
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf && mkdir /home/spark && mkdir /home/spark/spark && mkdir /home/spark/ETL && mkdir /usr/lib/java && mkdir /usr/share/oracle
# set environment vars
ENV SPARK_HOME /home/spark/spark
ENV JAVA_HOME /usr/lib/java
# install packages
RUN \
echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
yum install -y rsync && yum install -y wget && yum install -y python3-pip && yum
install -y openssh-server && yum install -y openssh-clients && \
yum install -y unzip && yum install -y python38 && yum install -y nano
# create ssh keys
RUN \
echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
chmod 0600 ~/.ssh/authorized_keys
# copy ssh config
COPY ssh_config /root/.ssh/config
COPY spark-3.1.2-bin-hadoop3.2.tgz /home/
COPY jdk-8u25-linux-x64.tar.gz /home/
COPY instantclient-basic-linux.x64-19.8.0.0.0dbru.zip /home
COPY etl /home/ETL/
RUN \
tar -zxvf /home/spark-3.1.2-bin-hadoop3.2.tgz -C /home/spark && mv -v
/home/spark/spark-3.1.2-bin-hadoop3.2/* $SPARK_HOME && tar -zxvf /home/jdk-8u25-linux-x64.tar.gz -C /home/spark && mv -v /home/spark/jdk1.8.0_25/* $JAVA_HOME && unzip /home/instantclient-basic-linux.x64-19.8.0.0.0dbru.zip -d /home/spark && mv -v /home/spark/instantclient_19_8 /usr/share/oracle && echo "export JAVA_HOME=$JAVA_HOME" >> ~/.bashrc && \
echo "export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:/usr/share/oracle/instantclient_19_8" >> ~/.bashrc && echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/share/oracle/instantclient_19_8" >> ~/.bashrc && echo "PYTHONPATH = $PYTHONPATH:/usr/bin/python3.8" >> ~/.bashrc && echo "alias python=/usr/bin/python3.8" >> ~/.bashrc
#WARNING: Running pip install with root privileges is generally not a good idea. Try `python3.8 -m pip install --user` instead.
# so I have to create a user
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf
RUN useradd -d /home/spark/myuser myuser
USER myuser
WORKDIR /home/spark/myuser
ENV PATH="/home/spark/myuser/.local/bin:$PATH"
RUN \
python3.8 -m pip install --user pandas && \
python3.8 -m pip install --user cx-Oracle && \
python3.8 -m pip install --user persiantools && \
python3.8 -m pip install --user pyspark && \
python3.8 -m pip install --user py4j && \
python3.8 -m pip install --user python-dateutil && \
python3.8 -m pip install --user pytz && \
python3.8 -m pip install --user setuptools && \
python3.8 -m pip install --user six && \
python3.8 -m pip install --user numpy
# copy spark configs
ADD spark-env.sh $SPARK_HOME/conf/
ADD workers $SPARK_HOME/conf/
# expose various ports
EXPOSE 7012 7013 7014 7015 7016 8881 8081 7077
此外,我使用此脚本复制并构建 docker 文件:
#/bin/bash
cp /etc/ssh/ssh_config .
cp /opt/spark/conf/spark-env.sh .
cp /opt/spark/conf/workers .
sudo docker build -t my_docker .
echo "Script Finished."
构建的docker文件没有任何错误。然后我用这个命令制作的图像制作一个 tar 文件:
sudo docker save my_docker > my_docker.tar
之后我将 my_docker.tar 复制到另一台计算机并加载它:
sudo docker load < my_docker.tar
sudo docker run -it my_docker
不幸的是,当我 运行 我的程序在 docker 容器中时,我收到关于 python 包的错误,如 numpy、pyspark、pandas。
File "/home/spark/ETL/test/main.py", line 3, in <module>
import cst_utils as cu
File "/home/spark/ETL/test/cst_utils.py", line 5, in <module>
import group_state as gs
File "/home/spark/ETL/test/group_state.py", line 1, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
我也尝试在docker容器中安装python包,然后提交container.But,退出容器再进入时,没有python 包已安装。
请您指导我的方法有什么问题吗?
非常感谢任何帮助。
除了 Dockerfile 设置本身的任何问题,
在您的 spark-env.sh
中,设置这些以确保它使用与安装 pip 相同的环境
export PYSPARK_PYTHON="/usr/bin/python3.8"
export PYSPARK_DRIVER_PYTHON="/usr/bin/python3.8"
请记住,SparkSQL Dataframes 应该真正被使用 而不是 numpy
,并且你不需要 pip install pyspark
因为它已经是一部分下载的 spark 包。
我试过你的代码,删除了大部分(对我来说)与问题无关的东西。
我发现移动
echo "alias python=/usr/bin/python3.8" >> ~/.bashrc
下来,USER myuser
解决后。在此之前,我发现 python
未找到,结果 python3
也没有 numpy,而 python3.8
有。所以那里有些混乱,也许在您的完整示例中发生了一些事情,使这一点更加模糊。
但是请尝试移动该语句,因为 ~/.bashrc
在您更改用户时不一样。
问题已解决。我更改了 Dockerfile。
首先,我没有定义任何用户。然后,我设置PYSPARK_PYTHON,所以导入任何包都没有错误。
Dockerfile 是这样的:
FROM redhat/ubi8:latest
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf
RUN mkdir /home/spark && mkdir /home/ETL && mkdir /usr/lib/java && mkdir /usr/share/oracle
# set environment vars
ENV SPARK_HOME /home/spark
ENV JAVA_HOME /usr/lib/java
# install packages
RUN \
echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
yum -y update && \
yum install -y libaio && \
yum install -y libaio.so.1 && \
dnf install libnsl* && \
yum install -y rsync && yum install -y wget && yum install -y python3-pip && yum install -y openssh-server && yum install -y openssh-clients && \
yum install -y unzip && yum install -y python38 && yum install -y nano
#WARNING: Running pip install with root privileges is generally not a good idea. Try `python3.8 -m pip install --user` instead.
# It is just a warning
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
python3.8 -m pip install pandas && \
python3.8 -m pip install cx-Oracle && \
python3.8 -m pip install persiantools && \
python3.8 -m pip install pyspark && \
python3.8 -m pip install py4j && \
python3.8 -m pip install python-dateutil && \
python3.8 -m pip install pytz && \
python3.8 -m pip install setuptools && \
python3.8 -m pip install numpy && \
python3.8 -m pip install six
# create ssh keys
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
chmod 0600 ~/.ssh/authorized_keys
# copy ssh config
COPY ssh_config /root/.ssh/config
COPY spark-3.1.2-bin-hadoop3.2.tgz /home
COPY jdk-8u25-linux-x64.tar.gz /home
COPY instantclient-basic-linux.x64-21.4.0.0.0dbru.zip /home
COPY etl /home/ETL/
RUN \
tar -zxvf /home/spark-3.1.2-bin-hadoop3.2.tgz -C /home && mv -v /home/spark-3.1.2-bin-hadoop3.2/* $SPARK_HOME && tar -zxvf /home/jdk-8u25-linux-x64.tar.gz -C /home && mv -v /home/jdk1.8.0_25/* $JAVA_HOME && unzip /home/instantclient-basic-linux.x64-21.4.0.0.0dbru.zip -d /home
RUN \
echo "export JAVA_HOME=$JAVA_HOME" >> ~/.bashrc && \
echo "export SPARK_HOME=$SPARK_HOME" >> ~/.bashrc && \
echo "export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:/home/instantclient_21_4" >> ~/.bashrc && echo "export LD_LIBRARY_PATH=/home/instantclient_21_4" >> ~/.bashrc && echo "alias python=/usr/bin/python3.8" >> ~/.bashrc && \
echo "export PYTHONPATH=$SPARK_HOME/python:/usr/bin/python3.8" >> ~/.bashrc && echo "export PYSPARK_PYTHON=/usr/bin/python3.8" >> ~/.bashrc
ENV LD_LIBRARY_PATH="/home/instantclient_21_4"
ENV PYTHONPATH = "$SPARK_HOME/python:/usr/bin/python3.8"
ENV PATH="$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:/home/instantclient_21_4"
RUN \
touch /etc/ld.so.conf.d/instantclient.conf && \
echo "#path of instant client" >> /etc/ld.so.conf.d/instantclient.conf && \
echo "/home/instantclient_21_4" >> /etc/ld.so.conf.d/instantclient.conf && \
ldconfig
# copy spark configs
ADD spark-env.sh $SPARK_HOME/conf/
ADD workers $SPARK_HOME/conf/
# expose various ports
EXPOSE 7012 7013 7014 7015 7016 8881 8081 7077
希望对其他人有用。
我必须运行 Redhat8 中的python 程序。所以我拉了 Redhat docker 镜像并写了一个 Dockerfile 如下:
FROM redhat/ubi8:latest
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf && mkdir /home/spark && mkdir /home/spark/spark && mkdir /home/spark/ETL && mkdir /usr/lib/java && mkdir /usr/share/oracle
# set environment vars
ENV SPARK_HOME /home/spark/spark
ENV JAVA_HOME /usr/lib/java
# install packages
RUN \
echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
yum install -y rsync && yum install -y wget && yum install -y python3-pip && yum
install -y openssh-server && yum install -y openssh-clients && \
yum install -y unzip && yum install -y python38 && yum install -y nano
# create ssh keys
RUN \
echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
chmod 0600 ~/.ssh/authorized_keys
# copy ssh config
COPY ssh_config /root/.ssh/config
COPY spark-3.1.2-bin-hadoop3.2.tgz /home/
COPY jdk-8u25-linux-x64.tar.gz /home/
COPY instantclient-basic-linux.x64-19.8.0.0.0dbru.zip /home
COPY etl /home/ETL/
RUN \
tar -zxvf /home/spark-3.1.2-bin-hadoop3.2.tgz -C /home/spark && mv -v
/home/spark/spark-3.1.2-bin-hadoop3.2/* $SPARK_HOME && tar -zxvf /home/jdk-8u25-linux-x64.tar.gz -C /home/spark && mv -v /home/spark/jdk1.8.0_25/* $JAVA_HOME && unzip /home/instantclient-basic-linux.x64-19.8.0.0.0dbru.zip -d /home/spark && mv -v /home/spark/instantclient_19_8 /usr/share/oracle && echo "export JAVA_HOME=$JAVA_HOME" >> ~/.bashrc && \
echo "export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:/usr/share/oracle/instantclient_19_8" >> ~/.bashrc && echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/share/oracle/instantclient_19_8" >> ~/.bashrc && echo "PYTHONPATH = $PYTHONPATH:/usr/bin/python3.8" >> ~/.bashrc && echo "alias python=/usr/bin/python3.8" >> ~/.bashrc
#WARNING: Running pip install with root privileges is generally not a good idea. Try `python3.8 -m pip install --user` instead.
# so I have to create a user
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf
RUN useradd -d /home/spark/myuser myuser
USER myuser
WORKDIR /home/spark/myuser
ENV PATH="/home/spark/myuser/.local/bin:$PATH"
RUN \
python3.8 -m pip install --user pandas && \
python3.8 -m pip install --user cx-Oracle && \
python3.8 -m pip install --user persiantools && \
python3.8 -m pip install --user pyspark && \
python3.8 -m pip install --user py4j && \
python3.8 -m pip install --user python-dateutil && \
python3.8 -m pip install --user pytz && \
python3.8 -m pip install --user setuptools && \
python3.8 -m pip install --user six && \
python3.8 -m pip install --user numpy
# copy spark configs
ADD spark-env.sh $SPARK_HOME/conf/
ADD workers $SPARK_HOME/conf/
# expose various ports
EXPOSE 7012 7013 7014 7015 7016 8881 8081 7077
此外,我使用此脚本复制并构建 docker 文件:
#/bin/bash
cp /etc/ssh/ssh_config .
cp /opt/spark/conf/spark-env.sh .
cp /opt/spark/conf/workers .
sudo docker build -t my_docker .
echo "Script Finished."
构建的docker文件没有任何错误。然后我用这个命令制作的图像制作一个 tar 文件:
sudo docker save my_docker > my_docker.tar
之后我将 my_docker.tar 复制到另一台计算机并加载它:
sudo docker load < my_docker.tar
sudo docker run -it my_docker
不幸的是,当我 运行 我的程序在 docker 容器中时,我收到关于 python 包的错误,如 numpy、pyspark、pandas。
File "/home/spark/ETL/test/main.py", line 3, in <module>
import cst_utils as cu
File "/home/spark/ETL/test/cst_utils.py", line 5, in <module>
import group_state as gs
File "/home/spark/ETL/test/group_state.py", line 1, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
我也尝试在docker容器中安装python包,然后提交container.But,退出容器再进入时,没有python 包已安装。
请您指导我的方法有什么问题吗?
非常感谢任何帮助。
除了 Dockerfile 设置本身的任何问题,
在您的 spark-env.sh
中,设置这些以确保它使用与安装 pip 相同的环境
export PYSPARK_PYTHON="/usr/bin/python3.8"
export PYSPARK_DRIVER_PYTHON="/usr/bin/python3.8"
请记住,SparkSQL Dataframes 应该真正被使用 而不是 numpy
,并且你不需要 pip install pyspark
因为它已经是一部分下载的 spark 包。
我试过你的代码,删除了大部分(对我来说)与问题无关的东西。
我发现移动
echo "alias python=/usr/bin/python3.8" >> ~/.bashrc
下来,USER myuser
解决后。在此之前,我发现 python
未找到,结果 python3
也没有 numpy,而 python3.8
有。所以那里有些混乱,也许在您的完整示例中发生了一些事情,使这一点更加模糊。
但是请尝试移动该语句,因为 ~/.bashrc
在您更改用户时不一样。
问题已解决。我更改了 Dockerfile。 首先,我没有定义任何用户。然后,我设置PYSPARK_PYTHON,所以导入任何包都没有错误。 Dockerfile 是这样的:
FROM redhat/ubi8:latest
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf
RUN mkdir /home/spark && mkdir /home/ETL && mkdir /usr/lib/java && mkdir /usr/share/oracle
# set environment vars
ENV SPARK_HOME /home/spark
ENV JAVA_HOME /usr/lib/java
# install packages
RUN \
echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
yum -y update && \
yum install -y libaio && \
yum install -y libaio.so.1 && \
dnf install libnsl* && \
yum install -y rsync && yum install -y wget && yum install -y python3-pip && yum install -y openssh-server && yum install -y openssh-clients && \
yum install -y unzip && yum install -y python38 && yum install -y nano
#WARNING: Running pip install with root privileges is generally not a good idea. Try `python3.8 -m pip install --user` instead.
# It is just a warning
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
python3.8 -m pip install pandas && \
python3.8 -m pip install cx-Oracle && \
python3.8 -m pip install persiantools && \
python3.8 -m pip install pyspark && \
python3.8 -m pip install py4j && \
python3.8 -m pip install python-dateutil && \
python3.8 -m pip install pytz && \
python3.8 -m pip install setuptools && \
python3.8 -m pip install numpy && \
python3.8 -m pip install six
# create ssh keys
RUN echo "nameserver 9.9.9.9" >> /etc/resolv.conf && \
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
chmod 0600 ~/.ssh/authorized_keys
# copy ssh config
COPY ssh_config /root/.ssh/config
COPY spark-3.1.2-bin-hadoop3.2.tgz /home
COPY jdk-8u25-linux-x64.tar.gz /home
COPY instantclient-basic-linux.x64-21.4.0.0.0dbru.zip /home
COPY etl /home/ETL/
RUN \
tar -zxvf /home/spark-3.1.2-bin-hadoop3.2.tgz -C /home && mv -v /home/spark-3.1.2-bin-hadoop3.2/* $SPARK_HOME && tar -zxvf /home/jdk-8u25-linux-x64.tar.gz -C /home && mv -v /home/jdk1.8.0_25/* $JAVA_HOME && unzip /home/instantclient-basic-linux.x64-21.4.0.0.0dbru.zip -d /home
RUN \
echo "export JAVA_HOME=$JAVA_HOME" >> ~/.bashrc && \
echo "export SPARK_HOME=$SPARK_HOME" >> ~/.bashrc && \
echo "export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:/home/instantclient_21_4" >> ~/.bashrc && echo "export LD_LIBRARY_PATH=/home/instantclient_21_4" >> ~/.bashrc && echo "alias python=/usr/bin/python3.8" >> ~/.bashrc && \
echo "export PYTHONPATH=$SPARK_HOME/python:/usr/bin/python3.8" >> ~/.bashrc && echo "export PYSPARK_PYTHON=/usr/bin/python3.8" >> ~/.bashrc
ENV LD_LIBRARY_PATH="/home/instantclient_21_4"
ENV PYTHONPATH = "$SPARK_HOME/python:/usr/bin/python3.8"
ENV PATH="$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:/home/instantclient_21_4"
RUN \
touch /etc/ld.so.conf.d/instantclient.conf && \
echo "#path of instant client" >> /etc/ld.so.conf.d/instantclient.conf && \
echo "/home/instantclient_21_4" >> /etc/ld.so.conf.d/instantclient.conf && \
ldconfig
# copy spark configs
ADD spark-env.sh $SPARK_HOME/conf/
ADD workers $SPARK_HOME/conf/
# expose various ports
EXPOSE 7012 7013 7014 7015 7016 8881 8081 7077
希望对其他人有用。