Java 8 - 将 List<byte[]> 合并到 byte[] 的最有效方法
Java 8 - Most effective way to merge List<byte[]> to byte[]
我有一个库 returns 一些二进制数据作为二进制数组的列表。那些 byte[] 需要合并到一个 InputStream 中。
这是我当前的实现:
public static InputStream foo(List<byte[]> binary) {
byte[] streamArray = null;
binary.forEach(bin -> {
org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
});
return new ByteArrayInputStream(streamArray);
}
但这相当 cpu 激烈。有没有更好的方法?
谢谢大家的回答。我做了一个性能测试。这些是我的结果:
- 函数:'NicolasFilotto' => 100 次调用平均耗时 68.04 毫秒
- 函数:'NicolasFilottoEstSize' => 100 次调用平均耗时 65.24 毫秒
- 函数:'NicolasFilottoSequenceInputStream' => 100 次调用平均耗时 63.09 毫秒
- 函数:'Saka1029_1' => 100 次调用平均耗时 63.06 毫秒
- 函数:'Saka1029_2' => 100 次调用平均耗时 0.79 毫秒
- 函数:'Coco' => 10 次调用平均耗时 541.60 毫秒
我不确定 'Saka1029_2' 是否测量正确...
这是执行函数:
private static double execute(Callable<InputStream> funct, int times) throws Exception {
List<Long> executions = new ArrayList<>(times);
for (int idx = 0; idx < times; idx++) {
BufferedReader br = null;
long startTime = System.currentTimeMillis();
InputStream is = funct.call();
br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null) {}
executions.add(System.currentTimeMillis() - startTime);
}
return calculateAverage(executions);
}
请注意,我读取了每个输入流
这些是使用的实现:
public static class NicolasFilotto implements Callable<InputStream> {
private final List<byte[]> binary;
public NicolasFilotto(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
}
public static class NicolasFilottoSequenceInputStream implements Callable<InputStream> {
private final List<byte[]> binary;
public NicolasFilottoSequenceInputStream(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
return new SequenceInputStream(
Collections.enumeration(
binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())));
}
}
public static class NicolasFilottoEstSize implements Callable<InputStream> {
private final List<byte[]> binary;
private final int lineSize;
public NicolasFilottoEstSize(List<byte[]> binary, int lineSize) {
this.binary = binary;
this.lineSize = lineSize;
}
@Override
public InputStream call() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream(binary.size() * lineSize);
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
}
public static class Saka1029_1 implements Callable<InputStream> {
private final List<byte[]> binary;
public Saka1029_1(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
int pos = 0;
for (byte[] bin : binary) {
int length = bin.length;
System.arraycopy(bin, 0, all, pos, length);
pos += length;
}
return new ByteArrayInputStream(all);
}
}
public static class Saka1029_2 implements Callable<InputStream> {
private final List<byte[]> binary;
public Saka1029_2(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
int size = binary.size();
return new InputStream() {
int i = 0, j = 0;
@Override
public int read() throws IOException {
if (i >= size) return -1;
if (j >= binary.get(i).length) {
++i;
j = 0;
}
if (i >= size) return -1;
return binary.get(i)[j++];
}
};
}
}
public static class Coco implements Callable<InputStream> {
private final List<byte[]> binary;
public Coco(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
byte[] streamArray = new byte[0];
for (byte[] bin : binary) {
streamArray = org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
}
return new ByteArrayInputStream(streamArray);
}
}
您可以使用 ByteArrayOutputStream
来存储列表中每个字节数组的内容,但为了使其高效,我们需要创建 ByteArrayOutputStream
的实例一个初始大小 与目标大小尽可能匹配,所以如果你知道大小或至少知道字节数组的平均大小,你应该使用它,代码将是:
public static InputStream foo(List<byte[]> binary) {
ByteArrayOutputStream baos = new ByteArrayOutputStream(ARRAY_SIZE * binary.size());
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
另一种方法是使用 SequenceInputStream
以便在逻辑上连接所有 ByteArrayInputStream
代表列表中一个元素的实例,如下所示:
public static InputStream foo(List<byte[]> binary) {
return new SequenceInputStream(
Collections.enumeration(
binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())
)
);
}
这种方法的有趣之处在于您无需复制任何内容,您只需创建 ByteArrayInputStream
的实例,这些实例将按原样使用字节数组。
为了避免将结果收集为 List
这有成本,特别是如果您的初始 List
很大,您可以按照 @Holger, then we will simply need to convert an iterator
into an enumeration
which can be done with IteratorUtils.asEnumeration(iterator)
from Apache Commons Collection 的建议直接调用 iterator()
,最终代码将是:
public static InputStream foo(List<byte[]> binary) {
return new SequenceInputStream(
IteratorUtils.asEnumeration(
binary.stream().map(ByteArrayInputStream::new).iterator()
)
);
}
试试这个。
public static InputStream foo(List<byte[]> binary) {
byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
int pos = 0;
for (byte[] bin : binary) {
int length = bin.length;
System.arraycopy(bin, 0, all, pos, length);
pos += length;
}
return new ByteArrayInputStream(all);
}
或者
public static InputStream foo(List<byte[]> binary) {
int size = binary.size();
return new InputStream() {
int i = 0, j = 0;
@Override
public int read() throws IOException {
if (i >= size) return -1;
if (j >= binary.get(i).length) {
++i;
j = 0;
}
if (i >= size) return -1;
return binary.get(i)[j++];
}
};
}
我有一个库 returns 一些二进制数据作为二进制数组的列表。那些 byte[] 需要合并到一个 InputStream 中。
这是我当前的实现:
public static InputStream foo(List<byte[]> binary) {
byte[] streamArray = null;
binary.forEach(bin -> {
org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
});
return new ByteArrayInputStream(streamArray);
}
但这相当 cpu 激烈。有没有更好的方法?
谢谢大家的回答。我做了一个性能测试。这些是我的结果:
- 函数:'NicolasFilotto' => 100 次调用平均耗时 68.04 毫秒
- 函数:'NicolasFilottoEstSize' => 100 次调用平均耗时 65.24 毫秒
- 函数:'NicolasFilottoSequenceInputStream' => 100 次调用平均耗时 63.09 毫秒
- 函数:'Saka1029_1' => 100 次调用平均耗时 63.06 毫秒
- 函数:'Saka1029_2' => 100 次调用平均耗时 0.79 毫秒
- 函数:'Coco' => 10 次调用平均耗时 541.60 毫秒
我不确定 'Saka1029_2' 是否测量正确...
这是执行函数:
private static double execute(Callable<InputStream> funct, int times) throws Exception {
List<Long> executions = new ArrayList<>(times);
for (int idx = 0; idx < times; idx++) {
BufferedReader br = null;
long startTime = System.currentTimeMillis();
InputStream is = funct.call();
br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null) {}
executions.add(System.currentTimeMillis() - startTime);
}
return calculateAverage(executions);
}
请注意,我读取了每个输入流
这些是使用的实现:
public static class NicolasFilotto implements Callable<InputStream> {
private final List<byte[]> binary;
public NicolasFilotto(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
}
public static class NicolasFilottoSequenceInputStream implements Callable<InputStream> {
private final List<byte[]> binary;
public NicolasFilottoSequenceInputStream(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
return new SequenceInputStream(
Collections.enumeration(
binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())));
}
}
public static class NicolasFilottoEstSize implements Callable<InputStream> {
private final List<byte[]> binary;
private final int lineSize;
public NicolasFilottoEstSize(List<byte[]> binary, int lineSize) {
this.binary = binary;
this.lineSize = lineSize;
}
@Override
public InputStream call() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream(binary.size() * lineSize);
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
}
public static class Saka1029_1 implements Callable<InputStream> {
private final List<byte[]> binary;
public Saka1029_1(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
int pos = 0;
for (byte[] bin : binary) {
int length = bin.length;
System.arraycopy(bin, 0, all, pos, length);
pos += length;
}
return new ByteArrayInputStream(all);
}
}
public static class Saka1029_2 implements Callable<InputStream> {
private final List<byte[]> binary;
public Saka1029_2(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
int size = binary.size();
return new InputStream() {
int i = 0, j = 0;
@Override
public int read() throws IOException {
if (i >= size) return -1;
if (j >= binary.get(i).length) {
++i;
j = 0;
}
if (i >= size) return -1;
return binary.get(i)[j++];
}
};
}
}
public static class Coco implements Callable<InputStream> {
private final List<byte[]> binary;
public Coco(List<byte[]> binary) {
this.binary = binary;
}
@Override
public InputStream call() throws Exception {
byte[] streamArray = new byte[0];
for (byte[] bin : binary) {
streamArray = org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
}
return new ByteArrayInputStream(streamArray);
}
}
您可以使用 ByteArrayOutputStream
来存储列表中每个字节数组的内容,但为了使其高效,我们需要创建 ByteArrayOutputStream
的实例一个初始大小 与目标大小尽可能匹配,所以如果你知道大小或至少知道字节数组的平均大小,你应该使用它,代码将是:
public static InputStream foo(List<byte[]> binary) {
ByteArrayOutputStream baos = new ByteArrayOutputStream(ARRAY_SIZE * binary.size());
for (byte[] bytes : binary) {
baos.write(bytes, 0, bytes.length);
}
return new ByteArrayInputStream(baos.toByteArray());
}
另一种方法是使用 SequenceInputStream
以便在逻辑上连接所有 ByteArrayInputStream
代表列表中一个元素的实例,如下所示:
public static InputStream foo(List<byte[]> binary) {
return new SequenceInputStream(
Collections.enumeration(
binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())
)
);
}
这种方法的有趣之处在于您无需复制任何内容,您只需创建 ByteArrayInputStream
的实例,这些实例将按原样使用字节数组。
为了避免将结果收集为 List
这有成本,特别是如果您的初始 List
很大,您可以按照 @Holger, then we will simply need to convert an iterator
into an enumeration
which can be done with IteratorUtils.asEnumeration(iterator)
from Apache Commons Collection 的建议直接调用 iterator()
,最终代码将是:
public static InputStream foo(List<byte[]> binary) {
return new SequenceInputStream(
IteratorUtils.asEnumeration(
binary.stream().map(ByteArrayInputStream::new).iterator()
)
);
}
试试这个。
public static InputStream foo(List<byte[]> binary) {
byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
int pos = 0;
for (byte[] bin : binary) {
int length = bin.length;
System.arraycopy(bin, 0, all, pos, length);
pos += length;
}
return new ByteArrayInputStream(all);
}
或者
public static InputStream foo(List<byte[]> binary) {
int size = binary.size();
return new InputStream() {
int i = 0, j = 0;
@Override
public int read() throws IOException {
if (i >= size) return -1;
if (j >= binary.get(i).length) {
++i;
j = 0;
}
if (i >= size) return -1;
return binary.get(i)[j++];
}
};
}