与单线程代码相比，为什么使用 CompletableFuture 的多线程速度较慢？

Question

我正在尝试提高我项目中当前代码的性能，该代码在单线程中运行。代码正在做这样的事情： 1. 获取 10000000 个对象的第一个列表。 2. 获取 10000000 个对象的第二个列表。 3. 将这两个（经过一些更改后）合并到第三个列表中。

   Instant s = Instant.now();
    List<Integer> l1 = getFirstList();
    List<Integer> l2 = getSecondList();
    List<Integer> l3 = new ArrayList<>();
    l3.addAll(l1);
    l3.addAll(l2);
    Instant e = Instant.now();
    System.out.println("Execution time: " + Duration.between(s, e).toMillis());

以下是获取和组合列表的示例方法

    private static List<Integer> getFirstList() {
    System.out.println("First list is being created by: "+ Thread.currentThread().getName());
    List<Integer> l = new ArrayList<>();
    for (int i = 0; i < 10000000; i++) {
        l.add(i);
    }
    return l;
}

private static List<Integer> getSecondList() {

    System.out.println("Second list is being created by: "+ Thread.currentThread().getName());
    List<Integer> l = new ArrayList<>();
    for (int i = 10000000; i < 20000000; i++) {
        l.add(i);
    }
    return l;
}
private static List<Integer> combine(List<Integer> l1, List<Integer> l2) {

    System.out.println("Third list is being created by: "+ Thread.currentThread().getName());
   ArrayList<Integer> l3 = new ArrayList<>();
   l3.addAll(l1);
   l3.addAll(l2);
    return l3;
}

我正在尝试重写上面的代码如下：

    ExecutorService executor = Executors.newFixedThreadPool(10);
    Instant start = Instant.now();
    CompletableFuture<List<Integer>> cf1 = CompletableFuture.supplyAsync(() -> getFirstList(), executor);
    CompletableFuture<List<Integer>> cf2 = CompletableFuture.supplyAsync(() -> getSecondList(), executor);

    CompletableFuture<Void> cf3 = cf1.thenAcceptBothAsync(cf2, (l1, l2) -> combine(l1, l2), executor);
    try {
        cf3.get();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } catch (ExecutionException e) {
        e.printStackTrace();
    }
    Instant end = Instant.now();
    System.out.println("Execution time: " + Duration.between(start, end).toMillis());

    executor.shutdown();

单线程代码的执行时间为 4-5 秒，而多线程代码的执行时间为 6 秒以上。我做错了什么吗？

Answer 1

在单线程变体中，l3.addAll(l1); l3.addAll(l2); 从处理器缓存中获取 l1 和 l2 的元素（它们在执行 getFirstList 和 [=14 时被放在那里=]).

在并行变体中，方法 combine() 在具有空缓存的不同处理器核心上运行，并从主内存中获取所有元素，这要慢得多。

Answer 2

您是第一次执行这些方法，因此它们以解释模式启动。为了加速它们的第一次执行，优化器必须在它们运行时替换它们（称为堆栈替换），这并不总是提供与重新输入优化结果时相同的性能。同时执行此操作似乎更糟，至少对于 Java 8，因为我在 Java 11.

中得到完全不同的结果

所以第一步是插入一个显式调用，例如getFirstList(); getSecondList();，看看它在第一次不被调用时的表现如何。

另一方面是垃圾回收。一些JVM以一个小的初始堆开始，每次堆扩展时都会执行一次full GC，这对所有线程都有影响。

所以第二步将从 -Xms1G 开始（或者更好，-Xms2G），从合理的堆大小开始，以适应您要创建的对象数量。

但请注意，将中间结果列表添加到最终结果列表的第 3 步（在任何一种情况下都是按顺序发生的）对性能有重大影响。

所以第 3 步将用 l3 = new ArrayList<>(l1.size() + l2.size()) 替换两个变体的最终列表的构造，以确保列表具有适当的初始容量。

这些步骤的组合导致顺序执行不到一秒，多线程执行不到半秒Java8。

对于 Java11，它的起点要好得多，开箱即用仅需要大约一秒钟，这些改进带来的加速效果不那么显着。这段代码似乎也有更高的内存消耗。

与单线程代码相比，为什么使用 CompletableFuture 的多线程速度较慢？

Why is multi-threaded with CompletableFuture slow as compared to single threaded code?

asynchronous

java-8

completable-future