Sylvan CSV Reader C# 检查 CSV 中缺少的列
Sylvan CSV Reader C# Check for Missing Column in CSV
@MarkPflug 我需要阅读 45 - 85 列中的 12 列。 这来自多个 csv 文件(数百个)。但这就是问题所在,很多时候某些 csv 数据文件中会丢失一两列。如果我使用 nuget 包 sylvan csv reader,如何在 C# 中检查 csv 文件中缺少的列。这是一些代码:
// Create a reader
CsvDataReader reader = CsvDataReader.Create(file, new CsvDataReaderOptions { ResultSetMode = ResultSetMode.MultiResult });
// Get column by name from csv. This is where the error occurs only in the files that have missing columns. I store these and then use them in a GetString(Ordinal).
reader.GetOrdinal("HomeTeam");
reader.GetOrdinal("AwayTeam");
reader.GetOrdinal("Referee");
reader.GetOrdinal("FTHG");
reader.GetOrdinal("FTAG");
reader.GetOrdinal("Division");
// There is more data here, but anyway you get the point.
// Here I run the reader and for each piece of data I run my database write method.
while (await reader.ReadAsync())
{
await AddEntry(idCounter.ToString(), idCounter.ToString(), attendance, referee, division, date, home_team, away_team, fthg, ftag, hthg, htag, ftr, htr);
}
我尝试了以下方法:
// This still causes it to go out of bounds.
if(reader.GetOrdinal("Division") < reader.FieldCount)
// only if the ordinal exists then assign it in a temp variable
else
// skip this column (set the data in add entry method to "")
查看源代码,如果列名未找到或不明确,GetOrdinal 似乎会抛出异常。因此,我希望您可以这样做:
int blah1Ord = -1;
try{ blah1Ord = reader.GetOrdinal("blah1"); } catch { }
int blah2Ord = -1;
try{ blah2Ord = reader.GetOrdinal("blah2"); } catch { }
while (await reader.ReadAsync())
{
var x = new Whatever();
if(blah1Ord > -1) x.Blah1 = reader.GetString(blah1Ord);
if(blah2Ord > -1) x.Blah2 = reader.GetString(blah2Ord);
}
依此类推,因此您可以有效地判断列是否存在 - 如果不存在,则序号保持 -1 - 然后使用它来决定是否读取该列
顺便说一句,我一直在处理名称为 poor/misspelled/partial header 的 CSV,我发现自己获取了列架构并在其中搜索部分内容,例如:
using var cdr = CsvDataReader.Create(sr);
var cs = await cdr.GetColumnSchemaAsync();
var sc = StringComparison.OrdinalIgnoreCase;
var blah1Ord = cs.FirstOrDefault(c => c.ColumnName.Contains("blah1", sc))?.ColumnOrdinal ?? -1;
我开始使用 Sylvan 库,它真的很强大。
不确定这是否对您有帮助,但如果您使用来自实体的 DataBinder.Create<T>
通用方法,您可以执行以下操作以获取 CSV 文件中未映射到任何实体属性的列:
var dataBinderOptions = new DataBinderOptions()
{
// AllColumns is required to throw UnboundMemberException
BindingMode = DataBindingMode.AllColumns,
};
IDataBinder<TEntity> binder;
try
{
binder = DataBinder.Create<TEntity>(dataReader, dataBinderOptions);
}
catch (UnboundMemberException ex)
{
// Use ex.UnboundColumns to get unmapped columnns
readResult.ValidationProblems.Add($"Unmapped columns: {String.Join(", ", ex.UnboundColumns)}");
return;
}
@MarkPflug 我需要阅读 45 - 85 列中的 12 列。 这来自多个 csv 文件(数百个)。但这就是问题所在,很多时候某些 csv 数据文件中会丢失一两列。如果我使用 nuget 包 sylvan csv reader,如何在 C# 中检查 csv 文件中缺少的列。这是一些代码:
// Create a reader
CsvDataReader reader = CsvDataReader.Create(file, new CsvDataReaderOptions { ResultSetMode = ResultSetMode.MultiResult });
// Get column by name from csv. This is where the error occurs only in the files that have missing columns. I store these and then use them in a GetString(Ordinal).
reader.GetOrdinal("HomeTeam");
reader.GetOrdinal("AwayTeam");
reader.GetOrdinal("Referee");
reader.GetOrdinal("FTHG");
reader.GetOrdinal("FTAG");
reader.GetOrdinal("Division");
// There is more data here, but anyway you get the point.
// Here I run the reader and for each piece of data I run my database write method.
while (await reader.ReadAsync())
{
await AddEntry(idCounter.ToString(), idCounter.ToString(), attendance, referee, division, date, home_team, away_team, fthg, ftag, hthg, htag, ftr, htr);
}
我尝试了以下方法:
// This still causes it to go out of bounds.
if(reader.GetOrdinal("Division") < reader.FieldCount)
// only if the ordinal exists then assign it in a temp variable
else
// skip this column (set the data in add entry method to "")
查看源代码,如果列名未找到或不明确,GetOrdinal 似乎会抛出异常。因此,我希望您可以这样做:
int blah1Ord = -1;
try{ blah1Ord = reader.GetOrdinal("blah1"); } catch { }
int blah2Ord = -1;
try{ blah2Ord = reader.GetOrdinal("blah2"); } catch { }
while (await reader.ReadAsync())
{
var x = new Whatever();
if(blah1Ord > -1) x.Blah1 = reader.GetString(blah1Ord);
if(blah2Ord > -1) x.Blah2 = reader.GetString(blah2Ord);
}
依此类推,因此您可以有效地判断列是否存在 - 如果不存在,则序号保持 -1 - 然后使用它来决定是否读取该列
顺便说一句,我一直在处理名称为 poor/misspelled/partial header 的 CSV,我发现自己获取了列架构并在其中搜索部分内容,例如:
using var cdr = CsvDataReader.Create(sr);
var cs = await cdr.GetColumnSchemaAsync();
var sc = StringComparison.OrdinalIgnoreCase;
var blah1Ord = cs.FirstOrDefault(c => c.ColumnName.Contains("blah1", sc))?.ColumnOrdinal ?? -1;
我开始使用 Sylvan 库,它真的很强大。
不确定这是否对您有帮助,但如果您使用来自实体的 DataBinder.Create<T>
通用方法,您可以执行以下操作以获取 CSV 文件中未映射到任何实体属性的列:
var dataBinderOptions = new DataBinderOptions()
{
// AllColumns is required to throw UnboundMemberException
BindingMode = DataBindingMode.AllColumns,
};
IDataBinder<TEntity> binder;
try
{
binder = DataBinder.Create<TEntity>(dataReader, dataBinderOptions);
}
catch (UnboundMemberException ex)
{
// Use ex.UnboundColumns to get unmapped columnns
readResult.ValidationProblems.Add($"Unmapped columns: {String.Join(", ", ex.UnboundColumns)}");
return;
}