从 popen 句柄读取的低开销方法

Question

我继承了进入繁忙循环的代码，该循环读取寻找关键字的子进程的输出，但我希望它以较低的开销工作。代码如下：

def stdout_search(self, file, keyword)
    s = ''
    while True:
        c = file.read(1)
        if not c:
            return None
        if c != '\r' and c != '\n':
            s += c
            continue
        s = s.strip()
        if keyword in s:
            break
        s = ''
    i = s.find(keyword) + len(keyword)
    return s[i:]

def scan_output(self, file, ev)
    while not ev.wait(0):
        s = self.stdout_search(file, 'Keyword:')
        if not s:
            break
        # Do something useful with s
        offset = #calculate offset
        wx.CallAfter(self.offset_label.SetLabel offset)
        #time.sleep(0.03)

Popened 过程的输出类似于：

Keyword: 1 of 100
Keyword: 2 of 100
...etc...

取消注释 scan_output 末尾的 time.sleep(0.03) 会使单核负载从 100% 降低到可接受的 25% 左右，但不幸的是偏移标签重绘时断断续续，虽然我正在读取 30 fps 播放的帧数，标签通常每秒更新不到一次。如何通过更正确的等待输入来实现此代码？

顺便说一句，完整代码may be found here。

Answer 1

一次读取一个字节是低效的。参见 Reading binary file in Python and looping over each byte。

如果您不需要立即反馈； use Popen.communicate() to get all output一次。

为避免冻结您的 GUI，您可以 put IO into a background thread。它是支持增量读取的阻塞 IO 的简单便携式选项。

要在 child 进程刷新输出后立即处理输出，您可以使用异步 I/O，例如 Tkinter 的 createfilehandler(), Gtk's io_add_watch() 等——您提供一个回调和当下一个数据块准备好时，GUI 会调用它。

如果child刷新数据过于频繁；回调可能只是读取块并将其放入缓冲区，然后您可以使用 Tkinter's widget.after(), Gtk's GObject.timeout_add() 每隔 X 秒处理一次缓冲区，或者每当它达到某个大小或某个行数时，等等。

要阅读到 'Keyword:'，您可以使用类似于 asyncio's readuntil(). See also, How to read records terminated by custom separator from file in python?

的代码

从 popen 句柄读取的低开销方法

Low-overhead method of reading from a popen handle

python

io

subprocess

popen

python-2.7