Java 8 - 将 List<byte[]> 合并到 byte[] 的最有效方法

Question

我有一个库 returns 一些二进制数据作为二进制数组的列表。那些 byte[] 需要合并到一个 InputStream 中。

这是我当前的实现：

public static InputStream foo(List<byte[]> binary) {
    byte[] streamArray = null;
    binary.forEach(bin -> {
        org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
    });
    return new ByteArrayInputStream(streamArray);
}

但这相当 cpu 激烈。有没有更好的方法？

谢谢大家的回答。我做了一个性能测试。这些是我的结果：

函数：'NicolasFilotto' => 100 次调用平均耗时 68.04 毫秒
函数：'NicolasFilottoEstSize' => 100 次调用平均耗时 65.24 毫秒
函数：'NicolasFilottoSequenceInputStream' => 100 次调用平均耗时 63.09 毫秒
函数：'Saka1029_1' => 100 次调用平均耗时 63.06 毫秒
函数：'Saka1029_2' => 100 次调用平均耗时 0.79 毫秒
函数：'Coco' => 10 次调用平均耗时 541.60 毫秒

我不确定 'Saka1029_2' 是否测量正确...

这是执行函数：

private static double execute(Callable<InputStream> funct, int times) throws Exception {
    List<Long> executions = new ArrayList<>(times);

    for (int idx = 0; idx < times; idx++) {
        BufferedReader br = null;
        long startTime = System.currentTimeMillis();
        InputStream is = funct.call();
        br = new BufferedReader(new InputStreamReader(is));
        String line = null;
        while ((line = br.readLine()) != null) {}
        executions.add(System.currentTimeMillis() - startTime);
    }

    return calculateAverage(executions);
}

请注意，我读取了每个输入流

这些是使用的实现：

public static class NicolasFilotto implements Callable<InputStream> {

    private final List<byte[]> binary;

    public NicolasFilotto(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        for (byte[] bytes : binary) {
            baos.write(bytes, 0, bytes.length);
        }
        return new ByteArrayInputStream(baos.toByteArray());
    }

}

public static class NicolasFilottoSequenceInputStream implements Callable<InputStream> {

    private final List<byte[]> binary;

    public NicolasFilottoSequenceInputStream(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        return new SequenceInputStream(
                Collections.enumeration(
                        binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())));
    }

}

public static class NicolasFilottoEstSize implements Callable<InputStream> {

    private final List<byte[]> binary;
    private final int lineSize;

    public NicolasFilottoEstSize(List<byte[]> binary, int lineSize) {
        this.binary = binary;
        this.lineSize = lineSize;
    }

    @Override
    public InputStream call() throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream(binary.size() * lineSize);
        for (byte[] bytes : binary) {
            baos.write(bytes, 0, bytes.length);
        }
        return new ByteArrayInputStream(baos.toByteArray());
    }

}

public static class Saka1029_1 implements Callable<InputStream> {

    private final List<byte[]> binary;

    public Saka1029_1(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
        int pos = 0;
        for (byte[] bin : binary) {
            int length = bin.length;
            System.arraycopy(bin, 0, all, pos, length);
            pos += length;
        }
        return new ByteArrayInputStream(all);
    }

}

public static class Saka1029_2 implements Callable<InputStream> {

    private final List<byte[]> binary;

    public Saka1029_2(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        int size = binary.size();
        return new InputStream() {
            int i = 0, j = 0;

            @Override
            public int read() throws IOException {
                if (i >= size) return -1;
                if (j >= binary.get(i).length) {
                    ++i;
                    j = 0;
                }
                if (i >= size) return -1;
                return binary.get(i)[j++];
            }
        };
    }

}

public static class Coco implements Callable<InputStream> {

    private final List<byte[]> binary;

    public Coco(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        byte[] streamArray = new byte[0];
        for (byte[] bin : binary) {
            streamArray = org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
        }
        return new ByteArrayInputStream(streamArray);
    }

}

Answer 1

您可以使用 ByteArrayOutputStream 来存储列表中每个字节数组的内容，但为了使其高效，我们需要创建 ByteArrayOutputStream 的实例一个初始大小 与目标大小尽可能匹配，所以如果你知道大小或至少知道字节数组的平均大小，你应该使用它，代码将是：

public static InputStream foo(List<byte[]> binary) {
    ByteArrayOutputStream baos = new ByteArrayOutputStream(ARRAY_SIZE * binary.size());
    for (byte[] bytes : binary) {
        baos.write(bytes, 0, bytes.length);
    }
    return new ByteArrayInputStream(baos.toByteArray());
}

另一种方法是使用 SequenceInputStream 以便在逻辑上连接所有 ByteArrayInputStream 代表列表中一个元素的实例，如下所示：

public static InputStream foo(List<byte[]> binary) {
    return new SequenceInputStream(
        Collections.enumeration(
            binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())
        )
    );
}

这种方法的有趣之处在于您无需复制任何内容，您只需创建 ByteArrayInputStream 的实例，这些实例将按原样使用字节数组。

为了避免将结果收集为 List 这有成本，特别是如果您的初始 List 很大，您可以按照 @Holger, then we will simply need to convert an iterator into an enumeration which can be done with IteratorUtils.asEnumeration(iterator) from Apache Commons Collection 的建议直接调用 iterator() ，最终代码将是：

public static InputStream foo(List<byte[]> binary) {
    return new SequenceInputStream(
        IteratorUtils.asEnumeration(
            binary.stream().map(ByteArrayInputStream::new).iterator()
        )
    );
}

Answer 2

试试这个。

public static InputStream foo(List<byte[]> binary) {
    byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
    int pos = 0;
    for (byte[] bin : binary) {
        int length = bin.length;
        System.arraycopy(bin, 0, all, pos, length);
        pos += length;
    }
    return new ByteArrayInputStream(all);
}

或者

public static InputStream foo(List<byte[]> binary) {
    int size = binary.size();
    return new InputStream() {
        int i = 0, j = 0;
        @Override
        public int read() throws IOException {
            if (i >= size) return -1;
            if (j >= binary.get(i).length) {
                ++i;
                j = 0;
            }
            if (i >= size) return -1;
            return binary.get(i)[j++];
        }
    };
}

Java 8 - 将 List<byte[]> 合并到 byte[] 的最有效方法

Java 8 - Most effective way to merge List<byte[]> to byte[]

java

performance

inputstream

stream