Java：用1m对象过滤List的最快方法

Question

现在我有了 ProductDTO 和 Product 的列表。

这个列表可以包含100个对象，也可以包含1m的对象。

我正在从 csv 文件中读取此列表。

我现在是如何过滤它的：

productDtos.parralelStream()
    .filter(i -> i.getName.equals(product.getName))
    .filter(i -> Objects.equals(i.getCode(), product.getCode()))
    .map(Product::new)
    // getting object here

那么，哪种解析方式最好？我想我应该使用多线程，一个线程将从列表的开头开始，另一个将从列表的末尾开始。

有什么想法可以提高大数据情况下过滤列表的速度吗？谢谢

Answer 1

我曾经构建地图并在其上使用 get 来避免循环过滤。

例如，如果您有 1 个产品的 N 个代码，您可以这样做：

Map<String, Map<String, List<ProductDTO>>> productDtoByNameAndCode= productDtos.stream().collect(groupingBy(ProductDTO::getName, groupingBy(ProductDTO::getCode)));

那么您只需为每个产品做：

List<ProductDTO> correspondingProductDTOs = productDtoByNameAndCode.get(product.getName()).get(Product.getCode());

这样，您不必每次都为每种产品过滤所有列表。

Answer 2

首先，我明白了，你已经上传了所有productsDtos在内存中。它可能会导致您的内存消耗非常高。我建议您按行读取 CSV 文件并逐行过滤。在这种情况下，您的代码可能如下所示：

public class Csv {
    public static void main(String[] args) {
        File file = new File("your.csv");
        try (final BufferedReader br = new BufferedReader(new FileReader(file))) {
            final List<String> filtered = br.lines().parallel()
                    .map(Csv::toYourDTO)
                    .filter(Csv::yourFilter)
                    .collect(Collectors.toList());
            System.out.println(filtered);
        } catch (IOException e) {
            //todo something with the error
        }
    }

    private static boolean yourFilter(String s) {
        return true; //todo
    }

    private static String toYourDTO(String s) {
        return "";//todo
    }
}

Java：用1m对象过滤List的最快方法

Java: the fastest way to filter List with 1m of objects

java

multithreading

bigdata

java-stream