读取 csv 流时出现 CsvHelper 错误 "No header record found"
CsvHelper Error "No header record found" on reading a csv stream
下面是我用来读取 csv 文件流源的代码,但出现“未找到 header 记录”的错误。该库是 15.0,我已经按照某些解决方案中的建议使用 .ToList(),但错误仍然存在。下面是方法以及表字段 class 和读取流方法。
这里还要注意,如果我将源作为 MemoryStream 传递,我可以获得想要的结果,但如果我将它作为 Stream 传递,它会失败,因为我需要避免每次都写入内存。
public async Task<Stream> DownloadBlob(string containerName, string fileName, string connectionString)
{
// MemoryStream memoryStream = new MemoryStream();
if (string.IsNullOrEmpty(connectionString))
{
connectionString = @"UseDevelopmentStorage=true";
containerName = "testblobs";
}
Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
if (!blob.Exists())
{
throw new Exception($"Blob Not found");
}
return await blob.OpenReadAsync();
public class TableField
{
public string Name { get; set; }
public string Type { get; set; }
public Type DataType
{
get
{
switch( Type.ToUpper() )
{
case "STRING":
return typeof(string);
case "INT":
return typeof( int );
case "BOOL":
case "BOOLEAN":
return typeof( bool );
case "FLOAT":
case "SINGLE":
case "DOUBLE":
return typeof( double );
case "DATETIME":
return typeof( DateTime );
default:
throw new NotSupportedException( $"CSVColumn data type '{Type}' not supported" );
}
}
}
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
using (TextReader reader = new StreamReader(source, Encoding.UTF8))
{
var cache = new TypeConverterCache();
cache.AddConverter<float>(new CSVSingleConverter());
cache.AddConverter<double>(new CSVDoubleConverter());
var csv = new CsvReader(reader,
new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
{
Delimiter = ";",
HasHeaderRecord = true,
CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
TypeConverterCache = cache
});
csv.Read();
csv.ReadHeader();
var map = (
from col in cols
from src in col.Sources()
let index = csv.GetFieldIndex(src, isTryGet: true)
where index != -1
select new { col.Name, Index = index, Type = col.DataType }).ToList();
while (csv.Read())
{
yield return map.ToDictionary(
col => col.Name,
col => EntityProperty.CreateEntityPropertyFromObject(csv.GetField(col.Type, col.Index)));
}
}
}
流读代码:
public async Task<Stream> ReadStream(string containerName, string digestFileName, string fileName, string connectionString)
{
string data = string.Empty;
string fileExtension = Path.GetExtension(fileName);
var contents = await DownloadBlob(containerName, digestFileName, connectionString);
return contents;
}
要读取的示例 CSv:
PartitionKey;Time;RowKey;State;RPM;Distance;RespirationConfidence;HeartBPM
te123;2020-11-06T13:33:37.593Z;10;1;8;20946;26;815
te123;2020-11-06T13:33:37.593Z;4;2;79944;8;36635;6
te123;2020-11-06T13:33:37.593Z;3;3;80042;9;8774;5
te123;2020-11-06T13:33:37.593Z;1;4;0;06642;6925;37
te123;2020-11-06T13:33:37.593Z;6;5;04740;74753;94628;21
te123;2020-11-06T13:33:37.593Z;7;6;6;2;14;629
te123;2020-11-06T13:33:37.593Z;9;7;126;86296;9157;05
te123;2020-11-06T13:33:37.593Z;5;8;5;3;7775;08
te123;2020-11-06T13:33:37.593Z;2;9;44363;65;70;229
te123;2020-11-06T13:33:37.593Z;8;10;02;24666;2;2
我曾尝试使用库的 15.0 版重现该问题,但在 类 CSVSingleConverter
和 CSVDoubleConverter
中都失败了。然而,使用 CSVHelper 的标准 类,阅读 header 作品:
using System;
using System.IO;
using System.Text;
using CsvHelper;
using CsvHelper.TypeConversion;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
using (Stream stream = new FileStream(@"e:\demo.csv", FileMode.Open, FileAccess.Read))
{
ReadCSV(stream);
}
}
private static void ReadCSV(Stream source)
{
using (TextReader reader = new StreamReader(source, Encoding.UTF8))
{
var cache = new TypeConverterCache();
cache.AddConverter<float>(new SingleConverter());
cache.AddConverter<double>(new DoubleConverter());
var csv = new CsvReader(reader,
new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
{
Delimiter = ";",
HasHeaderRecord = true,
CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
TypeConverterCache = cache
});
csv.Read();
csv.ReadHeader();
foreach (string headerRow in csv.Context.HeaderRecord)
{
Console.WriteLine(headerRow);
}
}
}
}
}
我更改了台词...
cache.AddConverter<float>(new CSVSingleConverter());
cache.AddConverter<double>(new CSVDoubleConverter());
...到...
cache.AddConverter<float>(new SingleConverter());
cache.AddConverter<double>(new DoubleConverter());
我将 CSV 数据放入 UTF-8 文本文件中。控制台的输出是:
PartitionKey
Time
RowKey
State
RPM
Distance
RespirationConfidence
HeartBPM
编辑 2020-12-24:
将整个源文本放在网上,而不仅仅是其中的一部分。
尝试将源流设置回开头。
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
source.Position = 0;
你也不能在那里使用 yield return
。它会延迟代码的执行,直到您访问从 ReadCSV
方法返回的 IEnumerable<Dictionary<string, EntityProperty>>
。问题是此时你已经关闭了 TextReader
的 using 语句,CsvHelper
需要读取你的数据,所以你得到一个 NullReferenceException
.
您要么需要删除 yield return
var result = new List<Dictionary<string, EntityProperty>>();
while (csv.Read()){
// Add to result
}
return result;
或将 TextReader
传递给您的方法。 IEnumerable<Dictionary<string, EntityProperty>>
的任何枚举都必须在离开 using 语句之前发生,该语句将处理 CsvReader
所需的 TextReader
IEnumerable<Dictionary<string, EntityProperty>> result;
using (TextReader reader = new StreamReader(source, Encoding.UTF8)){
// Calling ToList() will enumerate your yield statement
result = ReadCSV(reader, cols).ToList();
}
与我的 answer to your other question 有关(它有更多详细信息;您可以在那里阅读)我在将 CsvHelper 连接到 blob 存储源流时没有遇到任何问题
这是使用的代码(我获取了您发布的 CSV 数据,将其添加到文件中,并将其升级为 blob):
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private async void button1_Click(object sender, EventArgs e)
{
var cstr = "YOUR CONNSTR" HERE;
var bbc = new BlockBlobClient(cstr, "temp", "ankit.csv");
var s = await bbc.OpenReadAsync(new BlobOpenReadOptions(true) { BufferSize = 16384 });
var sr = new StreamReader(s);
var csv = new CsvHelper.CsvReader(sr, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true, Delimiter = ";" });
//try by read/getrecord
while(await csv.ReadAsync())
{
var rec = csv.GetRecord<X>();
Console.WriteLine(rec.PartitionKey);
}
var x = new X();
//try by await foreach
await foreach (var r in csv.EnumerateRecordsAsync(x))
{
Console.WriteLine(r.PartitionKey);
}
}
}
class X {
public string PartitionKey { get; set; }
}
我遇到了同样的错误 'No header found...',这是在成功读取同一文件数百次之后发生的。我添加了定界符=","
reader = csv.reader(文件名, delimiter=",")
这就解决了问题。我认为如果未指定分隔符,csv_reader 将尝试确定分隔符,并在一段时间后失败,可能是内存泄漏?逗号是默认值,但如果 reader 必须以编程方式确定它,则更有可能失败。
下面是我用来读取 csv 文件流源的代码,但出现“未找到 header 记录”的错误。该库是 15.0,我已经按照某些解决方案中的建议使用 .ToList(),但错误仍然存在。下面是方法以及表字段 class 和读取流方法。
这里还要注意,如果我将源作为 MemoryStream 传递,我可以获得想要的结果,但如果我将它作为 Stream 传递,它会失败,因为我需要避免每次都写入内存。
public async Task<Stream> DownloadBlob(string containerName, string fileName, string connectionString)
{
// MemoryStream memoryStream = new MemoryStream();
if (string.IsNullOrEmpty(connectionString))
{
connectionString = @"UseDevelopmentStorage=true";
containerName = "testblobs";
}
Microsoft.Azure.Storage.CloudStorageAccount storageAccount = Microsoft.Azure.Storage.CloudStorageAccount.Parse(connectionString);
CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = serviceClient.GetContainerReference(containerName);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);
if (!blob.Exists())
{
throw new Exception($"Blob Not found");
}
return await blob.OpenReadAsync();
public class TableField
{
public string Name { get; set; }
public string Type { get; set; }
public Type DataType
{
get
{
switch( Type.ToUpper() )
{
case "STRING":
return typeof(string);
case "INT":
return typeof( int );
case "BOOL":
case "BOOLEAN":
return typeof( bool );
case "FLOAT":
case "SINGLE":
case "DOUBLE":
return typeof( double );
case "DATETIME":
return typeof( DateTime );
default:
throw new NotSupportedException( $"CSVColumn data type '{Type}' not supported" );
}
}
}
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
using (TextReader reader = new StreamReader(source, Encoding.UTF8))
{
var cache = new TypeConverterCache();
cache.AddConverter<float>(new CSVSingleConverter());
cache.AddConverter<double>(new CSVDoubleConverter());
var csv = new CsvReader(reader,
new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
{
Delimiter = ";",
HasHeaderRecord = true,
CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
TypeConverterCache = cache
});
csv.Read();
csv.ReadHeader();
var map = (
from col in cols
from src in col.Sources()
let index = csv.GetFieldIndex(src, isTryGet: true)
where index != -1
select new { col.Name, Index = index, Type = col.DataType }).ToList();
while (csv.Read())
{
yield return map.ToDictionary(
col => col.Name,
col => EntityProperty.CreateEntityPropertyFromObject(csv.GetField(col.Type, col.Index)));
}
}
}
流读代码:
public async Task<Stream> ReadStream(string containerName, string digestFileName, string fileName, string connectionString)
{
string data = string.Empty;
string fileExtension = Path.GetExtension(fileName);
var contents = await DownloadBlob(containerName, digestFileName, connectionString);
return contents;
}
要读取的示例 CSv:
PartitionKey;Time;RowKey;State;RPM;Distance;RespirationConfidence;HeartBPM
te123;2020-11-06T13:33:37.593Z;10;1;8;20946;26;815
te123;2020-11-06T13:33:37.593Z;4;2;79944;8;36635;6
te123;2020-11-06T13:33:37.593Z;3;3;80042;9;8774;5
te123;2020-11-06T13:33:37.593Z;1;4;0;06642;6925;37
te123;2020-11-06T13:33:37.593Z;6;5;04740;74753;94628;21
te123;2020-11-06T13:33:37.593Z;7;6;6;2;14;629
te123;2020-11-06T13:33:37.593Z;9;7;126;86296;9157;05
te123;2020-11-06T13:33:37.593Z;5;8;5;3;7775;08
te123;2020-11-06T13:33:37.593Z;2;9;44363;65;70;229
te123;2020-11-06T13:33:37.593Z;8;10;02;24666;2;2
我曾尝试使用库的 15.0 版重现该问题,但在 类 CSVSingleConverter
和 CSVDoubleConverter
中都失败了。然而,使用 CSVHelper 的标准 类,阅读 header 作品:
using System;
using System.IO;
using System.Text;
using CsvHelper;
using CsvHelper.TypeConversion;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
using (Stream stream = new FileStream(@"e:\demo.csv", FileMode.Open, FileAccess.Read))
{
ReadCSV(stream);
}
}
private static void ReadCSV(Stream source)
{
using (TextReader reader = new StreamReader(source, Encoding.UTF8))
{
var cache = new TypeConverterCache();
cache.AddConverter<float>(new SingleConverter());
cache.AddConverter<double>(new DoubleConverter());
var csv = new CsvReader(reader,
new CsvHelper.Configuration.CsvConfiguration(global::System.Globalization.CultureInfo.InvariantCulture)
{
Delimiter = ";",
HasHeaderRecord = true,
CultureInfo = global::System.Globalization.CultureInfo.InvariantCulture,
TypeConverterCache = cache
});
csv.Read();
csv.ReadHeader();
foreach (string headerRow in csv.Context.HeaderRecord)
{
Console.WriteLine(headerRow);
}
}
}
}
}
我更改了台词...
cache.AddConverter<float>(new CSVSingleConverter());
cache.AddConverter<double>(new CSVDoubleConverter());
...到...
cache.AddConverter<float>(new SingleConverter());
cache.AddConverter<double>(new DoubleConverter());
我将 CSV 数据放入 UTF-8 文本文件中。控制台的输出是:
PartitionKey
Time
RowKey
State
RPM
Distance
RespirationConfidence
HeartBPM
编辑 2020-12-24: 将整个源文本放在网上,而不仅仅是其中的一部分。
尝试将源流设置回开头。
private IEnumerable<Dictionary<string, EntityProperty>> ReadCSV(Stream source, IEnumerable<TableField> cols)
{
source.Position = 0;
你也不能在那里使用 yield return
。它会延迟代码的执行,直到您访问从 ReadCSV
方法返回的 IEnumerable<Dictionary<string, EntityProperty>>
。问题是此时你已经关闭了 TextReader
的 using 语句,CsvHelper
需要读取你的数据,所以你得到一个 NullReferenceException
.
您要么需要删除 yield return
var result = new List<Dictionary<string, EntityProperty>>();
while (csv.Read()){
// Add to result
}
return result;
或将 TextReader
传递给您的方法。 IEnumerable<Dictionary<string, EntityProperty>>
的任何枚举都必须在离开 using 语句之前发生,该语句将处理 CsvReader
TextReader
IEnumerable<Dictionary<string, EntityProperty>> result;
using (TextReader reader = new StreamReader(source, Encoding.UTF8)){
// Calling ToList() will enumerate your yield statement
result = ReadCSV(reader, cols).ToList();
}
与我的 answer to your other question 有关(它有更多详细信息;您可以在那里阅读)我在将 CsvHelper 连接到 blob 存储源流时没有遇到任何问题
这是使用的代码(我获取了您发布的 CSV 数据,将其添加到文件中,并将其升级为 blob):
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private async void button1_Click(object sender, EventArgs e)
{
var cstr = "YOUR CONNSTR" HERE;
var bbc = new BlockBlobClient(cstr, "temp", "ankit.csv");
var s = await bbc.OpenReadAsync(new BlobOpenReadOptions(true) { BufferSize = 16384 });
var sr = new StreamReader(s);
var csv = new CsvHelper.CsvReader(sr, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true, Delimiter = ";" });
//try by read/getrecord
while(await csv.ReadAsync())
{
var rec = csv.GetRecord<X>();
Console.WriteLine(rec.PartitionKey);
}
var x = new X();
//try by await foreach
await foreach (var r in csv.EnumerateRecordsAsync(x))
{
Console.WriteLine(r.PartitionKey);
}
}
}
class X {
public string PartitionKey { get; set; }
}
我遇到了同样的错误 'No header found...',这是在成功读取同一文件数百次之后发生的。我添加了定界符=","
reader = csv.reader(文件名, delimiter=",")
这就解决了问题。我认为如果未指定分隔符,csv_reader 将尝试确定分隔符,并在一段时间后失败,可能是内存泄漏?逗号是默认值,但如果 reader 必须以编程方式确定它,则更有可能失败。