通过带有显式引号的子进程发送多个管道命令

Question

我一直在尝试通过 subprocess 模块执行管道命令，但遇到了一些问题。

我已经看到下面提出的解决方案，但是 none 已经解决了我的问题： - sending a sequence (list) of arguments - several Popen commands using subprocess.PIPE -

我想避免使用第三个选项 shell=True，尽管它确实在我的测试系统上产生了预期的结果。

这是在终端中运行的命令，我想复制它：

tr -c "[:alpha:]" " " < some\ file\ name_raw.txt | sed -E "s/ +/ /g" | tr "[:upper:]" "[:lower:]" > clean_in_one_command.txt

此命令根据需要清理文件。它首先在名称中包含空格的输入文件上使用 tr 命令。输出被传递给 sed，它删除了一些空格，然后再次将内容传递给 tr，使所有内容都变成小写。

经过几次迭代，我最终将其分解为最简单的形式，实现了上面的第二种方法：Popen 的多个实例，使用 subprocess.PIPE 传递信息。它是冗长的，但希望能使调试更容易：

from subprocess import run, Popen, PIPE

cmd1_func = ['tr']
cmd1_flags = ['-c']
cmd1_arg1 = [r'"[:alpha:]\"']
cmd1_arg2 = [r'" "']
cmd1_pass_input = ['<']
cmd1_infile = ['some file name_raw.txt']
cmd1 = cmd1_func + cmd1_flags + cmd1_arg1 + cmd1_arg2 + cmd1_pass_input + cmd1_infile
print("Command 1:", cmd1)    # just to see if things look fine

cmd2_func = ['sed']
cmd2_flags = ['-E']
cmd2_arg = [r'"s/ +/ /g\"']
cmd2 = cmd2_func + cmd2_flags + cmd2_arg
print("command 2:", cmd2)

cmd3_func = ['tr']
cmd3_arg1 = ["\"[:upper:]\""]
cmd3_arg2 = ["\"[:lower:]\""]
cmd3_pass_output = ['>']
cmd3_outfile = [output_file_abs]
cmd3 = cmd3_func + cmd3_arg1 + cmd3_arg2 + cmd3_pass_output + cmd3_outfile
print("command 3:", cmd3)

# run first command into first process
proc1, _ = Popen(cmd1, stdout=PIPE)
# pass its output as input to second process
proc2, _ = Popen(cmd2, stdin=proc1.stdout, stdout=PIPE)
# close first process
proc1.stdout.close()
# output of second process into third process
proc3, _ = Popen(cmd3, stdin=proc2.stdout, stdout=PIPE)
# close second process output
proc2.stdout.close()
# save any output from final process to a logger
output = proc3.communicate()[0]

然后我会简单地将输出写入一个文本文件，但程序并没有走那么远，因为我收到以下错误：

usage: tr [-Ccsu] string1 string2
       tr [-Ccu] -d string1
       tr [-Ccu] -s string1
       tr [-Ccu] -ds string1 string2
sed: 1: ""s/ +/ /g\"": invalid command code "
usage: tr [-Ccsu] string1 string2
       tr [-Ccu] -d string1
       tr [-Ccu] -s string1
       tr [-Ccu] -ds string1 string2

这表明我的论点没有被正确传递。似乎 ' 和 " 引号都作为 " 传递给 sed。我确实确实确实需要其中之一。如果我只将一组放入我的列表中，那么它们将在命令中完全剥离，这也会破坏命令。

我尝试过的事情：

不为我需要显式引号的那些字符串声明文字字符串
转义和双重转义显式引号
将整个命令作为一个列表传递给 subprocess.Popen 和 subprocess.run 函数。
使用 shlex 包来处理引号
删除部分 cmd3_pass_output = ['>'] 和 cmd3_outfile= [output_file_abs] 以便仅处理原始（管道）输出。

我是不是遗漏了什么，或者我将被迫使用 shell=True？

Answer 1

这个程序似乎可以做你想做的事。每个进程必须单独运行。当您构建它们时，一个的输出会通过管道传输到下一个的输入。这些文件是独立处理的，并在流程的开始和结束时使用。

#! /usr/bin/env python3
import subprocess


def main():
    with open('raw.txt', 'r') as stdin, open('clean.txt', 'w') as stdout:
        step_1 = subprocess.Popen(
            ('tr', '-c', '[:alpha:]', ' '),
            stdin=stdin,
            stdout=subprocess.PIPE
        )
        step_2 = subprocess.Popen(
            ('sed', '-E', 's/ +/ /g'),
            stdin=step_1.stdout,
            stdout=subprocess.PIPE
        )
        step_3 = subprocess.Popen(
            ('tr', '[:upper:]', '[:lower:]'),
            stdin=step_2.stdout,
            stdout=stdout
        )
        step_3.wait()


if __name__ == '__main__':
    main()

通过带有显式引号的子进程发送多个管道命令

Sending multiple piped commands via subprocess with explicit quotations

shell

subprocess

pipe

popen

python-3.x

我尝试过的事情：