dd:将二进制文件读取为大小为 N 的块返回的数据少于 N
dd: reading binary file as blocks of size N returned less data than N
我需要分段处理大型二进制文件。在概念上这类似于 split,但不是将每个段写入文件,我需要获取该段并将其作为另一个进程的输入发送。我以为我可以使用 dd
到 read/write 块中的文件,但结果完全不是我所期望的。例如,如果我尝试:
dd if=some_big_file bs=1M |
while : ; do
dd bs=1M count=1 | processor
done
...输出大小实际上是 131,072
字节而不是 1,048,576
。
谁能告诉我为什么我没有看到输出被阻塞到 1M
块以及我如何才能更好地完成我想做的事情?
谢谢。
首先,你不需要第一个dd
。 cat file | while
或 done < file
也可以解决问题。
dd bs=1M count=1
可能return不到1M,看
When is dd suitable for copying data? (or, when are read() and write() partial)
而不是 dd count=…
使用 head
和(非 posix)选项 -c …
.
file=some_big_file
(( m = 1024 ** 2 ))
(( blocks = ($(stat -c %s "$file") + m - 1) / m ))
for ((i=0; i<blocks; ++i)); do
head -c "$m" | processor
done < "$file"
或posix符合但效率很低
(( octM = 4 * 1024 * 1024 ))
someCommand | od -v -to1 -An | tr -d \n | tr ' ' '\' |
while IFS= read -rN $octM block; do
printf %b "$block" | processor
done
根据 dd 的 manual:
bs=bytes
[...] if no data-transforming conv
option is specified, input is copied to the output as soon as it's read, even if it is smaller than the block size.
所以尝试 dd iflag=fullblock
:
fullblock
Accumulate full blocks from input. The read
system call may
return early if a full block is not available. When that
happens, continue calling read
to fill the remainder of the
block. This flag can be used only with iflag
. This flag is
useful with pipes for example as they may return short reads.
In that case, this flag is needed to ensure that a count=
argument is interpreted as a block count rather than a count
of read operations.
我需要分段处理大型二进制文件。在概念上这类似于 split,但不是将每个段写入文件,我需要获取该段并将其作为另一个进程的输入发送。我以为我可以使用 dd
到 read/write 块中的文件,但结果完全不是我所期望的。例如,如果我尝试:
dd if=some_big_file bs=1M |
while : ; do
dd bs=1M count=1 | processor
done
...输出大小实际上是 131,072
字节而不是 1,048,576
。
谁能告诉我为什么我没有看到输出被阻塞到 1M
块以及我如何才能更好地完成我想做的事情?
谢谢。
首先,你不需要第一个dd
。 cat file | while
或 done < file
也可以解决问题。
dd bs=1M count=1
可能return不到1M,看
When is dd suitable for copying data? (or, when are read() and write() partial)
而不是 dd count=…
使用 head
和(非 posix)选项 -c …
.
file=some_big_file
(( m = 1024 ** 2 ))
(( blocks = ($(stat -c %s "$file") + m - 1) / m ))
for ((i=0; i<blocks; ++i)); do
head -c "$m" | processor
done < "$file"
或posix符合但效率很低
(( octM = 4 * 1024 * 1024 ))
someCommand | od -v -to1 -An | tr -d \n | tr ' ' '\' |
while IFS= read -rN $octM block; do
printf %b "$block" | processor
done
根据 dd 的 manual:
bs=bytes
[...] if no data-transforming
conv
option is specified, input is copied to the output as soon as it's read, even if it is smaller than the block size.
所以尝试 dd iflag=fullblock
:
fullblock
Accumulate full blocks from input. The
read
system call may return early if a full block is not available. When that happens, continue callingread
to fill the remainder of the block. This flag can be used only withiflag
. This flag is useful with pipes for example as they may return short reads. In that case, this flag is needed to ensure that acount=
argument is interpreted as a block count rather than a count of read operations.