如何使用 CsvHelper 从特定行读取 header?
How to read a header from a specific line with CsvHelper?
我正在尝试读取一个 CSV 文件,其中 header 位于第 3 行:
some crap line
some empty line
COL1,COL2,COl3,...
val1,val2,val3
val1,val2,val3
如何告诉 CSVHelper header 不在第一行?
我试图用 Read()
跳过 2 行,但是对 ReadHeader()
的后续调用引发异常,表明 header 已被读取。
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration)) {
csv.Read();
csv.Read();
csv.ReadHeader();
.....
如果我将 csvConfiguration.HasHeaderRecord
设置为 false
ReadHeader()
再次失败。
试试这个:
using (var reader = new StreamReader(stream)) {
reader.ReadLine();
reader.ReadLine();
using (var csv = new CsvReader(reader)) {
csv.ReadHeader();
}
}
这并不比 Evk 的回答好,但我很感兴趣。
CsvConfiguration class 似乎有一个名为 ShouldSkipRecord 的 Func 回调,可以连接到它以实现自定义逻辑。
https://github.com/JoshClose/CsvHelper/tree/master/src/CsvHelper
CsvConfiguration.cs
/// <summary>
/// Gets or sets the callback that will be called to
/// determine whether to skip the given record or not.
/// This overrides the <see cref="SkipEmptyRecords"/> setting.
/// </summary>
public virtual Func<string[], bool> ShouldSkipRecord { get; set; }
CsvReader.cs
/// <summary>
/// Advances the reader to the next record.
/// If HasHeaderRecord is true (true by default), the first record of
/// the CSV file will be automatically read in as the header record
/// and the second record will be returned.
/// </summary>
/// <returns>True if there are more records, otherwise false.</returns>
public virtual bool Read()
{
if (doneReading)
{
throw new CsvReaderException(DoneReadingExceptionMessage);
}
if (configuration.HasHeaderRecord && headerRecord == null)
{
ReadHeader();
}
do
{
currentRecord = parser.Read();
}
while (ShouldSkipRecord());
currentIndex = -1;
hasBeenRead = true;
if (currentRecord == null)
{
doneReading = true;
}
return currentRecord != null;
}
/// <summary>
/// Checks if the current record should be skipped or not.
/// </summary>
/// <returns><c>true</c> if the current record should be skipped, <c>false</c> otherwise.</returns>
protected virtual bool ShouldSkipRecord()
{
if (currentRecord == null)
{
return false;
}
return configuration.ShouldSkipRecord != null
? configuration.ShouldSkipRecord(currentRecord)
: configuration.SkipEmptyRecords && IsRecordEmpty(false);
}
不幸的是,在调用 ReadHeaders 或在第三行调用 Read 之前,您似乎必须将 HasHeaderRecord 设置为 false,然后将其设置回 true,因为 Read() 中的 ShouldSkipRecord 逻辑在ReadHeader() 逻辑。
自 CsvHelper 27.0 起,问题不再重现。 现在可以从任何行读入 header。这可能早在 Release 3.0.0 from 2017 which included, according to the change log:
就已实施
3.0.0
Read more than 1 header row.
因此下面的代码现在可以正常工作,并且已经工作了一段时间:
var csvText = "some crap line\nsome empty line\nCOL1,COL2,COl3\nval1,val2,val3\nval1,val2,val3\n\n";
using var stream = new MemoryStream(Encoding.UTF8.GetBytes(csvText));
var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
// Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
csv.Read(); // Read in the first row "some crap line"
csv.Read(); // Read in the second row "some empty line"
csv.Read(); // Read in the third row which is the actual header.
csv.ReadHeader(); // Process the currently read row as the header.
Assert.AreEqual(3, csv.HeaderRecord.Length);
Assert.AreEqual(@"COL1,COL2,COl3", String.Join(",", csv.HeaderRecord));
演示成功fiddle #1 here.
警告:请注意,CsvHelper 默认跳过空白行,所以如果要跳过的一些初步行可能会或可能不会是空白的,然后 csv.Read()
可能会默默地读过它们——然后也消耗你的 header,导致错误的行被用作 header 行!
失败的演示 fiddle #2 here.
为避免这种可能性并确定性地跳过文件开头的特定行数,您必须设置 CsvConfiguration.IgnoreBlankLines = false
. However, this property cannot be modified once the CsvReader
is created, so if you need to skip blank data lines this can be accomplished by using a ShouldSkipRecord
回调:
bool ignoreBlankLines = false;
var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
IgnoreBlankLines = false,
ShouldSkipRecord = (args) => !ignoreBlankLines ? false : args.Record.Length == 0 || args.Record.Length == 1 && string.IsNullOrEmpty(args.Record[0]),
// Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
csv.Read(); // Read in the first row "some crap line"
csv.Read(); // Read in the second empty row, which is empty.
csv.Read(); // Read in the third row which is the actual header.
csv.ReadHeader(); // Process the currently read row as the header.
ignoreBlankLines = true; // Now that the header has been read, ignore blank data lines.
演示成功fiddle #3 here.
我正在尝试读取一个 CSV 文件,其中 header 位于第 3 行:
some crap line
some empty line
COL1,COL2,COl3,...
val1,val2,val3
val1,val2,val3
如何告诉 CSVHelper header 不在第一行?
我试图用 Read()
跳过 2 行,但是对 ReadHeader()
的后续调用引发异常,表明 header 已被读取。
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration)) {
csv.Read();
csv.Read();
csv.ReadHeader();
.....
如果我将 csvConfiguration.HasHeaderRecord
设置为 false
ReadHeader()
再次失败。
试试这个:
using (var reader = new StreamReader(stream)) {
reader.ReadLine();
reader.ReadLine();
using (var csv = new CsvReader(reader)) {
csv.ReadHeader();
}
}
这并不比 Evk 的回答好,但我很感兴趣。
CsvConfiguration class 似乎有一个名为 ShouldSkipRecord 的 Func 回调,可以连接到它以实现自定义逻辑。
https://github.com/JoshClose/CsvHelper/tree/master/src/CsvHelper
CsvConfiguration.cs
/// <summary>
/// Gets or sets the callback that will be called to
/// determine whether to skip the given record or not.
/// This overrides the <see cref="SkipEmptyRecords"/> setting.
/// </summary>
public virtual Func<string[], bool> ShouldSkipRecord { get; set; }
CsvReader.cs
/// <summary>
/// Advances the reader to the next record.
/// If HasHeaderRecord is true (true by default), the first record of
/// the CSV file will be automatically read in as the header record
/// and the second record will be returned.
/// </summary>
/// <returns>True if there are more records, otherwise false.</returns>
public virtual bool Read()
{
if (doneReading)
{
throw new CsvReaderException(DoneReadingExceptionMessage);
}
if (configuration.HasHeaderRecord && headerRecord == null)
{
ReadHeader();
}
do
{
currentRecord = parser.Read();
}
while (ShouldSkipRecord());
currentIndex = -1;
hasBeenRead = true;
if (currentRecord == null)
{
doneReading = true;
}
return currentRecord != null;
}
/// <summary>
/// Checks if the current record should be skipped or not.
/// </summary>
/// <returns><c>true</c> if the current record should be skipped, <c>false</c> otherwise.</returns>
protected virtual bool ShouldSkipRecord()
{
if (currentRecord == null)
{
return false;
}
return configuration.ShouldSkipRecord != null
? configuration.ShouldSkipRecord(currentRecord)
: configuration.SkipEmptyRecords && IsRecordEmpty(false);
}
不幸的是,在调用 ReadHeaders 或在第三行调用 Read 之前,您似乎必须将 HasHeaderRecord 设置为 false,然后将其设置回 true,因为 Read() 中的 ShouldSkipRecord 逻辑在ReadHeader() 逻辑。
自 CsvHelper 27.0 起,问题不再重现。 现在可以从任何行读入 header。这可能早在 Release 3.0.0 from 2017 which included, according to the change log:
就已实施3.0.0
Read more than 1 header row.
因此下面的代码现在可以正常工作,并且已经工作了一段时间:
var csvText = "some crap line\nsome empty line\nCOL1,COL2,COl3\nval1,val2,val3\nval1,val2,val3\n\n";
using var stream = new MemoryStream(Encoding.UTF8.GetBytes(csvText));
var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
// Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
csv.Read(); // Read in the first row "some crap line"
csv.Read(); // Read in the second row "some empty line"
csv.Read(); // Read in the third row which is the actual header.
csv.ReadHeader(); // Process the currently read row as the header.
Assert.AreEqual(3, csv.HeaderRecord.Length);
Assert.AreEqual(@"COL1,COL2,COl3", String.Join(",", csv.HeaderRecord));
演示成功fiddle #1 here.
警告:请注意,CsvHelper 默认跳过空白行,所以如果要跳过的一些初步行可能会或可能不会是空白的,然后 csv.Read()
可能会默默地读过它们——然后也消耗你的 header,导致错误的行被用作 header 行!
失败的演示 fiddle #2 here.
为避免这种可能性并确定性地跳过文件开头的特定行数,您必须设置 CsvConfiguration.IgnoreBlankLines = false
. However, this property cannot be modified once the CsvReader
is created, so if you need to skip blank data lines this can be accomplished by using a ShouldSkipRecord
回调:
bool ignoreBlankLines = false;
var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
IgnoreBlankLines = false,
ShouldSkipRecord = (args) => !ignoreBlankLines ? false : args.Record.Length == 0 || args.Record.Length == 1 && string.IsNullOrEmpty(args.Record[0]),
// Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
csv.Read(); // Read in the first row "some crap line"
csv.Read(); // Read in the second empty row, which is empty.
csv.Read(); // Read in the third row which is the actual header.
csv.ReadHeader(); // Process the currently read row as the header.
ignoreBlankLines = true; // Now that the header has been read, ignore blank data lines.
演示成功fiddle #3 here.