如何使用来自 apache.commons 的 CSVParser 以任何顺序读取 CSV 列

Question

我有一个包含以下格式数据的 csv 文件：

id,first,last,city
1,john,doe,austin
2,jane,mary,seattle

截至目前，我正在使用以下代码读取 csv：

    String path = "./data/data.csv";
    Map<Integer, User> map = new HashMap<>();

    Reader reader = Files.newBufferedReader(Paths.get(path));

    try (CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT)) {

        List<CSVRecord> csvRecords = csvParser.getRecords();

        for(int i=0; i < csvRecords.size(); i++){

            if(0<i){//skip over header
                CSVRecord csvRecord = csvRecords.get(i);
                User currentUser = new User(
                        Double.valueOf(csvRecord.get(0)).intValue(),
                        Double.valueOf(csvRecord.get(1)),
                        Double.valueOf(csvRecord.get(2)),
                        Double.valueOf(csvRecord.get(3))
                );
                map.put(currentUser.getId(), currentUser);
            }
        }
    } catch (IOException e){
        System.out.println(e);
    }

获取正确的值，但如果值的顺序不同，比如 [city,last,id,first]，它会被错误地读取，因为读取是用 [id,first] 的顺序硬编码的，最后，城市]。（用户 object 也必须按照 id、first、last、city 的精确顺序创建字段）

我知道我可以使用 'withHeader' 选项，但这也需要我像这样预先定义 header 列顺序：

String header = "id,first,last,city";
CSVParser csvParser = new CSVParser(reader, CSVFormat.EXCEL.withHeader(header.split(",")));

我也知道有一个 built in function getHeaderNames()，但它只会在我已经将它们作为字符串传入后才得到 header（再次进行硬编码）。因此，如果我传入 header 字符串“last,first,id,city”，它将 return 与列表中的字符串完全相同。

有没有一种方法可以将这些位组合起来以在 csv 中读取，无论列顺序是什么，并定义我的 'User' object 以及按顺序传递的字段（id、first、last ,城市)?

Answer 1

我们需要告诉解析器为我们处理 header 行。我们将其指定为 CSVFormat 的一部分，因此我们将创建如下自定义格式：

CSVFormat csvFormat = CSVFormat.RFC4180.withFirstRecordAsHeader();

使用问题代码 DEFAULT, but this is based on RFC4180 代替。比较它们 side-by-side:

DEFAULT                               RFC4180                       Comment
===================================   ===========================   ========================
withDelimiter(',')                    withDelimiter(',')            Same
withQuote('"')                        withQuote('"')                Same
withRecordSeparator("\r\n")           withRecordSeparator("\r\n")   Same
withIgnoreEmptyLines(true)            withIgnoreEmptyLines(false)   Don't ignore blank lines
withAllowDuplicateHeaderNames(true)   -                             Don't allow duplicates
===================================   ===========================   ========================
                                      withFirstRecordAsHeader()     We need this

有了这个改变，我们可以调用 get(String name) instead of get(int i):

User currentUser = new User(
        Integer.parseInt(csvRecord.get("id")),
        csvRecord.get("first"),
        csvRecord.get("last"),
        csvRecord.get("city")
);

注意 CSVParser 实现了 Iterable<CSVRecord>，所以我们可以使用 for-each 循环，这使得代码看起来像这样：

String path = "./data/data.csv";

Map<Integer, User> map = new HashMap<>();
try (CSVParser csvParser = new CSVParser(Files.newBufferedReader(Paths.get(path)),
                                         CSVFormat.RFC4180.withFirstRecordAsHeader())) {
    for (CSVRecord csvRecord : csvParser) {
        User currentUser = new User(
                Integer.parseInt(csvRecord.get("id")),
                csvRecord.get("first"),
                csvRecord.get("last"),
                csvRecord.get("city")
        );
        map.put(currentUser.getId(), currentUser);
    }
}

该代码可以正确解析文件，即使列顺序发生变化，例如至：

last,first,id,city
doe,john,1,austin
mary,jane,2,seattle

如何使用来自 apache.commons 的 CSVParser 以任何顺序读取 CSV 列

How to read in CSV columns in any order using CSVParser from apache.commons

java

csv

parsing

apache-commons