linux 中的 "find" 命令中的子进程 stdout readline() 编码错误

Question

我已经提到了关于相同错误的其他问题。但我不想指定编码，只想跳到下一行。是否可以 ignore errors in readline() 然后阅读下一个？

我正在使用 find 实用程序来获取超过 30 天的文件。它 returns 具有完整路径的文件。但是当不同的用户将代码用于另一条路径时，他得到了编码错误。因此，如果 stdout.readline() 中有错误，那么我想跳过该行并转到下一个。 stdout.readline() 是否允许跳过错误？

同样在 find 结果的给定场景中，我可以使用 utf-8 编码并确保路径被正确读取吗？

find_cmd = ['find', '/a/b', '-mtime', f'+30', '-readable', '-type', 'f', '-print']
j = ' '.join(find_cmd)
proc = subprocess.Popen(j, universal_newlines=True, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

while True:
  file = proc.stdout.readline().replace('\n', '') #Error here 'utf-8' codec can't decode byte 0xe4 in position 1478: invalid continuation byte
  if not file: break
  movefile(file)

Answer 1

在

find_cmd = ['find', '/a/b', '-mtime', f'+30', '-readable', '-type', 'f', '-print']

至 -（将错误重定向至 /dev/null）

find_cmd = ['find', '/a/b', '-mtime', f'+30', '-readable', '-type', 'f', '-print','&> /dev/null']

错误不应该出现

Answer 2

如果 find 的输出不能保证是 UTF-8，请不要使用 universal_newlines=True（也就是 Python 3.7 上的 text=True）。

您可以在阅读时有选择地解码，如果需要，可以跳过无效的 UTF-8 条目。

此外，看在 $dmr 的份上，请不要 join 将您的完美列表重新拼凑在一起，以便您可以 waste an unnecessary shell=True 使用它。

最后，如果您不想让来自 find 的错误消息显示为文件名，请不要将 stderr 重定向到 stdout。完全不重定向 stderr 以将它们显示在控制台上，或者如果您想完全丢弃它们，则直接将 stderr 重定向到 subprocess.DEVNULL。

find_cmd = [
    'find', '/a/b', '-mtime', f'+30', '-readable',
    '-type', 'f', '-print']
proc = subprocess.Popen(find_cmd, stdout=subprocess.PIPE, check=True)

while True:
  filename = proc.stdout.readline().replace(b'\n', b'')
  if not filename:
    break
  try:
    file = filename.decode('utf-8')
    movefile(file)
  except UnicodeDecodeError:
    logging.info('Skipping non-UTF8 filename %r' % filename)

您会注意到我在 subprocess.Popen() 中添加了 check=True；如果你想忽略 find 个失败，也许再把它去掉。

linux 中的 "find" 命令中的子进程 stdout readline() 编码错误

subprocess stdout readline() encoding error in "find" command in linux

python

subprocess

find