尽可能使用最大参数循环执行命令

Question

我有一个程序可以同时处理 ~256 个文件（编辑：限制由命令行参数数给出）；我必须在我拥有的许多文件（超过 100k）上执行它。

为此，我现在使用一个简单的循环，为每个文件一个一个地调用我的程序一次。

FILESLIST="$(find /folder/where/the/files/are/ -name '*.xml' 2>/dev/null)"
FILESTAB=($FILESLIST)

for f in "${FILESTAB[@]}"
do
    ./myProgram $f || break
done

但是为了提高我的处理速度，我需要每次都使用多个文件来使用我的程序，如下所示：

./myProgram path/to/file1.xml path/to/file2.xml ...

我想到了类似下面的东西，但我找不到一个好主意来做这个（见评论）：

FILESLIST="$(find /folder/where/the/files/are/ -name '*.xml' 2>/dev/null)"
FILESTAB=($FILESLIST)

while [ ${#FILESTAB[@]} -gt 256 ]
do
    ListOf256FilesNames=$FILETAB[0:256]        # << My problem is to do this
    FILETAB=$FILETAB[256:end] # shifting array # <<   and do this too

    ./myProgram $ListOf256FilesNames  # << this works supposing the 2 lines before works
done

./myProgram $FILESTAB  # do the work for remaining files

有没有类似我想做的事情，或者你有什么想法用其他方式来做这个吗？

Answer 1

假设您的真实目标是避免超过操作系统允许的最大数量 space 环境变量和参数，您最好让 find 或 xargs 为您完成这项工作。（这也避免了当你在批处理中得到一堆异常长的文件名时无意中越过，或者当你有很短的文件名并且可以容纳更多文件时通过运行额外的进程浪费 CPU）。

最佳实践：让 `find` 进行除法

与每个文件名运行一个命令的 -exec ... {} ... \; 不同，-exec ... {} + 将尽可能多的参数放入每个命令行。自 2007 年以来，这是 POSIX 合规的。

find /folder/where/the/files/are -name `*.xml` -exec ./myProgram '{}' +

备选方案：GNU `xargs`

使用 find -0 和 xargs -0 以与旧工具兼容的方式提供类似的功能：

find /folder/where/the/files/are -name `*.xml` -print0 | xargs -0 ./myProgram

如果你真的想要告诉xargs每次调用传递的参数不超过 256 个，你可以 xargs -n 256 -0 ./myProgram.

次优：正是所要求的

files=()
while IFS= read -r -d '' filename; do
  files+=( "$filename" )
done < <(find /folder/where/the/files/are/ -name '*.xml' -print0)

idx=0
while ((idx=0; idx<${#files[@]}; idx+=256)); do
  ./myProgram "${files[@]:$idx:256}"
done

尽可能使用最大参数循环执行命令

Loop executing a command using maximum arguments as possible

arrays

shell

loops

command-line-arguments

最佳实践：让 `find` 进行除法

备选方案：GNU `xargs`

次优：正是所要求的

尽可能使用最大参数循环执行命令

Loop executing a command using maximum arguments as possible

arrays

shell

loops

command-line-arguments

最佳实践：让 find 进行除法

备选方案：GNU xargs

次优：正是所要求的

最佳实践：让 `find` 进行除法

备选方案：GNU `xargs`