Python 子进程：读取返回码有时与返回码不同

Question

我有一个 Python 脚本，它使用 subprocess.Popen 调用另一个 Python 脚本。我知道被调用的代码总是 returns 10 ，这意味着它失败了。

我的问题是，调用者在大约 75% 的时间里只读取 10。其他 25% 它读取 0 并将调用的程序失败代码误认为是成功。相同的命令，相同的环境，显然是随机出现的。

环境：Python 2.7.10，Linux Redhat 6.4。此处提供的代码是（非常）简化的版本，但我仍然可以使用它重现问题。

这是被调用的脚本，constant_return.py:

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-

"""
Simplified called code
"""
import sys

if __name__ == "__main__":
    sys.exit(10)

这是来电号码：

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-

"""
Simplified version of the calling code
"""

try:
    import sys
    import subprocess
    import threading

except Exception, eImp:
    print "Error while loading Python library : %s" % eImp
    sys.exit(100)


class BizarreProcessing(object):
    """
    Simplified caller class
    """

    def __init__(self):
        """
        Classic initialization
        """
        object.__init__(self)


    def logPipe(self, isStdOut_, process_):
        """
        Simplified log handler
        """
        try:
            if isStdOut_:
                output = process_.stdout
                logfile = open("./log_out.txt", "wb")
            else:
                output = process_.stderr
                logfile = open("./log_err.txt", "wb")

            #Read pipe content as long as the process is running
            while (process_.poll() == None):
                text = output.readline()
                if (text != '' and text.strip() != ''):
                    logfile.write(text)

        #When the process is finished, there might still be lines remaining in the pipe
            output.readlines()
            for oneline in output.readlines():
                if (oneline != None and oneline.strip() != ''):
                    logfile.write(text)
        finally:
            logfile.close()


    def startProcessing(self):
        """
        Launch process
        """

        # Simplified command line definition
        command = "/absolute/path/to/file/constant_return.py"

        # Execute command in a new process
        process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

        #Launch a thread to gather called programm stdout and stderr
        #This to avoid a deadlock with pipe filled and such
        stdoutTread = threading.Thread(target=self.logPipe, args=(True, process))
        stdoutTread.start()
        stderrThread = threading.Thread(target=self.logPipe, args=(False, process))
        stderrThread.start()

        #Wait for the end of the process and get process result
        stdoutTread.join()
        stderrThread.join()
        result = process.wait()

        print("returned code: " + str(result))

        #Send it back to the caller
        return (result)


#
# Main
#
if __name__ == "__main__":

    # Execute caller code
    processingInstance = BizarreProcessing()
    aResult = processingInstance.startProcessing()

    #Return the code
    sys.exit(aResult)

这是我在 bash 中输入的内容来执行调用者脚本：

for res in {1..100}
do
    /path/to/caller/script.py
    echo $? >> /tmp/returncodelist.txt
done

它似乎与我读取被调用程序输出的方式有某种联系，因为当我使用 process = subprocess.Popen(command, shell=True, stdout=sys.stdout, stderr=sys.stderr) 创建子进程并删除所有 Thread 内容时，它会读取正确的 return 代码（但不再按照我的意愿登录...)

知道我做错了什么吗？

非常感谢您的帮助

Answer 1

logPipe 还在检查 process 是否存活以确定是否有更多数据要读取。这是不正确的 - 您应该通过查找零长度读取或使用 output.readlines() 来检查 pipe 是否已达到 EOF。 I/O 管道可能比该过程更有效。

这显着简化了 logPipe：更改 logPipe 如下：

  def logPipe(self, isStdOut_, process_):
      """
      Simplified log handler
      """
      try:
          if isStdOut_:
              output = process_.stdout
              logfile = open("./log_out.txt", "wb")
          else:
              output = process_.stderr
              logfile = open("./log_err.txt", "wb")

          #Read pipe content as long as the process is running
          with output:
              for text in output:
                  if text.strip(): # ... checks if it's not an empty string
                      logfile.write(text)

      finally:
          logfile.close()

其次，在 process.wait() 之前不要加入您的日志记录线程，出于同样的原因 - I/O 管道可能比该过程更长寿.

我认为幕后发生的事情是发出了一个 SIGPIPE 并在某处处理不当 - 可能被误解为进程终止条件。这是因为管道在一端或另一端被关闭而没有被冲洗。 SIGPIPE 有时在较大的应用程序中会很麻烦；可能是 Python 库吞下了它或者用它做了一些幼稚的事情。

编辑正如@Blackjack 指出的那样，SIGPIPE 被Python 自动阻止。因此，这排除了 SIGPIPE 渎职行为。第二种理论：Popen.poll() 背后的文档指出：

Check if child process has terminated. Set and return returncode attribute.

如果你跟踪它（例如，strace -f -o strace.log ./caller.py），这似乎是通过 wait4(WNOHANG) 完成的。您有 2 个线程在等待 WNOHANG，一个在正常等待，但只有一个调用会 return 正确地使用进程退出代码。如果在 subprocess.poll() 的实现中没有锁定，那么很可能会出现分配 process.resultcode 的竞争，或者可能无法正确分配。将 Popen.waits/polls 限制为单个线程应该是避免这种情况的好方法。参见 man waitpid。

edit 顺便说一句，如果您可以将所有 stdout/stderr 数据保存在内存中，subprocess.communicate() 会更容易使用并且不会完全需要 logPipe 或后台线程。

https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate

Python 子进程：读取返回码有时与返回码不同

Python subprocess: read returncode is sometimes different from returned code

python

subprocess