如何使用 CsvHelper 从特定行读取 header?

How to read a header from a specific line with CsvHelper?

我正在尝试读取一个 CSV 文件,其中 header 位于第 3 行:

some crap line
some empty line
COL1,COL2,COl3,...
val1,val2,val3
val1,val2,val3

如何告诉 CSVHelper header 不在第一行?

我试图用 Read() 跳过 2 行,但是对 ReadHeader() 的后续调用引发异常,表明 header 已被读取。

using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration)) {
   csv.Read();
   csv.Read();
   csv.ReadHeader();
   .....

如果我将 csvConfiguration.HasHeaderRecord 设置为 false ReadHeader() 再次失败。

试试这个:

using (var reader = new StreamReader(stream)) {
      reader.ReadLine();
      reader.ReadLine();
      using (var csv = new CsvReader(reader)) {                    
          csv.ReadHeader();                    
    }
}

这并不比 Evk 的回答好,但我很感兴趣。

CsvConfiguration class 似乎有一个名为 ShouldSkipRecord 的 Func 回调,可以连接到它以实现自定义逻辑。

https://github.com/JoshClose/CsvHelper/tree/master/src/CsvHelper

CsvConfiguration.cs

/// <summary>
/// Gets or sets the callback that will be called to
/// determine whether to skip the given record or not.
/// This overrides the <see cref="SkipEmptyRecords"/> setting.
/// </summary>
public virtual Func<string[], bool> ShouldSkipRecord { get; set; }

CsvReader.cs

/// <summary>
/// Advances the reader to the next record.
/// If HasHeaderRecord is true (true by default), the first record of
/// the CSV file will be automatically read in as the header record
/// and the second record will be returned.
/// </summary>
/// <returns>True if there are more records, otherwise false.</returns>
public virtual bool Read()
{
    if (doneReading)
    {
        throw new CsvReaderException(DoneReadingExceptionMessage);
    }

    if (configuration.HasHeaderRecord && headerRecord == null)
    {
        ReadHeader();
    }

    do
    {
        currentRecord = parser.Read();
    }
    while (ShouldSkipRecord());

    currentIndex = -1;
    hasBeenRead = true;

    if (currentRecord == null)
    {
        doneReading = true;
    }

    return currentRecord != null;
}

/// <summary>
/// Checks if the current record should be skipped or not.
/// </summary>
/// <returns><c>true</c> if the current record should be skipped, <c>false</c> otherwise.</returns>
protected virtual bool ShouldSkipRecord()
{
    if (currentRecord == null)
    {
        return false;
    }

    return configuration.ShouldSkipRecord != null
        ? configuration.ShouldSkipRecord(currentRecord)
        : configuration.SkipEmptyRecords && IsRecordEmpty(false);
}

不幸的是,在调用 ReadHeaders 或在第三行调用 Read 之前,您似乎必须将 HasHeaderRecord 设置为 false,然后将其设置回 true,因为 Read() 中的 ShouldSkipRecord 逻辑在ReadHeader() 逻辑。

自 CsvHelper 27.0 起,问题不再重现。 现在可以从任何行读入 header。这可能早在 Release 3.0.0 from 2017 which included, according to the change log:

就已实施

3.0.0

Read more than 1 header row.

因此下面的代码现在可以正常工作,并且已经工作了一段时间:

var csvText = "some crap line\nsome empty line\nCOL1,COL2,COl3\nval1,val2,val3\nval1,val2,val3\n\n";
using var stream = new MemoryStream(Encoding.UTF8.GetBytes(csvText));

var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    // Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
    csv.Read(); // Read in the first row "some crap line"
    csv.Read(); // Read in the second row "some empty line"
    csv.Read(); // Read in the third row which is the actual header.
    csv.ReadHeader(); // Process the currently read row as the header.

    Assert.AreEqual(3, csv.HeaderRecord.Length);
    Assert.AreEqual(@"COL1,COL2,COl3", String.Join(",", csv.HeaderRecord));

演示成功fiddle #1 here.

警告:请注意,CsvHelper 默认跳过空白行,所以如果要跳过的一些初步行可能会或可能不会是空白的,然后 csv.Read() 可能会默默地读过它们——然后也消耗你的 header,导致错误的行被用作 header 行!

失败的演示 fiddle #2 here.

为避免这种可能性并确定性地跳过文件开头的特定行数,您必须设置 CsvConfiguration.IgnoreBlankLines = false. However, this property cannot be modified once the CsvReader is created, so if you need to skip blank data lines this can be accomplished by using a ShouldSkipRecord 回调:

bool ignoreBlankLines = false;
var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    IgnoreBlankLines = false,
    ShouldSkipRecord = (args) => !ignoreBlankLines ? false : args.Record.Length == 0 || args.Record.Length == 1 && string.IsNullOrEmpty(args.Record[0]),
    // Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
    csv.Read(); // Read in the first row "some crap line"
    csv.Read(); // Read in the second empty row, which is empty.
    csv.Read(); // Read in the third row which is the actual header.
    csv.ReadHeader(); // Process the currently read row as the header.
    ignoreBlankLines = true; // Now that the header has been read, ignore blank data lines.

演示成功fiddle #3 here.