CsvHelper 在错误日期时呕吐(在使用 Configuration.Seed() 进行 EF 迁移时)

CsvHelper barfing on bad dates (on EF migration with Configuration.Seed())

我有一个 Excel 数据的旧文件要导入到我的 ASP.NET 应用程序中。 (我正在使用 EF 和 Configuration.Seed() 方法。)一些日期字段只是……不是日期。数百行数据中的每一行大约有 10 个日期,因此我不打算手动查看它。如何让 CsvHelper 跳过(并记录)日期错误的行?

我能找到的最接近的东西在这里:https://github.com/JoshClose/CsvHelper/issues/956

我的代码目前是这样的:

protected override void Seed(pmg2_tracker_net.DAL.Pmg2TrackerContext context)
{
    //  This method will be called after migrating to the latest version.

    //  You can use the DbSet<T>.AddOrUpdate() helper extension method 
    //  to avoid creating duplicate seed data.

    Assembly assembly = Assembly.GetExecutingAssembly();
    string resourceName = "pmg2_tracker_net.DAL.CustomCalJobTracker.csv";
    using (Stream stream = assembly.GetManifestResourceStream(resourceName))
    {
        using (StreamReader reader = new StreamReader(stream, Encoding.UTF8))
        {
            CsvReader csvReader = new CsvReader(reader);
            csvReader.Configuration.HeaderValidated = null;
            csvReader.Configuration.MissingFieldFound = null;
            while (csvReader.Read())
            {
                var assignment = csvReader.GetRecord<Assignment>();
                var status = csvReader.GetField<string>("Overall Job Status");
                assignment.Status = context.Statuses.Local.Single(s => s.Designation == status);
                context.Assignments.AddOrUpdate(a => a.Status, assignment);
            }
        }
    }

}

并产生此错误消息:

CsvHelper.ReaderException: An unexpected error occurred. ---> System.FormatException: String was not recognized as a valid DateTime.
   at System.DateTimeParse.Parse(String s, DateTimeFormatInfo dtfi, DateTimeStyles styles)
   at System.DateTime.Parse(String s, IFormatProvider provider, DateTimeStyles styles)
   at CsvHelper.TypeConversion.DateTimeConverter.ConvertFromString(String text, IReaderRow row, MemberMapData memberMapData)
   at lambda_method(Closure )
   at CsvHelper.Expressions.RecordCreator.Create[T]()
   at CsvHelper.Expressions.RecordManager.Create[T]()
   at CsvHelper.CsvReader.GetRecord[T]()
   --- End of inner exception stack trace ---

BadDataFound() 似乎没有为我做任何事情。日期转换似乎发生在它无法处理的级别。

此外,出于某种原因,我无法将 GetRecord() 部分捕获在 try/catch 中。它 barfs,然后循环继续,这东西将负载炸毁到数据库中:System.InvalidOperationException: Sequence contains no matching element

我认为像这样的库可以优雅地处理这类事情,所以我一直认为我遗漏了一些明显的东西。


按照建议,我尝试遵循此处的模式:https://github.com/JoshClose/CsvHelper/issues/1205

我在我的代码中添加了配置行:

csvReader.Configuration.RegisterClassMap<AssignmentMap>();

我的 ClassMap 文件如下所示:

using CsvHelper;
using CsvHelper.Configuration;
using CsvHelper.TypeConversion;
using pmg2_tracker_net.Models;
using System;

namespace pmg2_tracker_net.DAL
{
    public class AssignmentMap : ClassMap<Assignment>
    {
        public AssignmentMap()
        {
            Map(m => m.Id);
            <SNIP>
            Map(m => m.OriginatorName);
            Map(m => m.InitiatedDate).TypeConverter<CustomDateTimeConverter>();
            Map(m => m.TechScreeningRequestDate).TypeConverter<CustomDateTimeConverter>();
            Map(m => m.TechScreeningCompletionDate).TypeConverter<CustomDateTimeConverter>();
            <SNIP>
            Map(m => m.CommentsOnClosingTime);
        }
    }
}

public class CustomDateTimeConverter : DateTimeConverter
{
    public override object ConvertFromString(string text, IReaderRow row, MemberMapData memberMapData)
    {
        try
        {
            return base.ConvertFromString(text, row, memberMapData);
        }
        catch (TypeConverterException)
        {
            Console.WriteLine(text);
            return default(DateTime);
        }
        catch
        {
            throw;
        }
    }
}

堆栈跟踪显示它正在使用我的 CustomDatetimeConverter():

CsvHelper.ReaderException: An unexpected error occurred. ---> System.FormatException: String was not recognized as a valid DateTime.
   at System.DateTimeParse.Parse(String s, DateTimeFormatInfo dtfi, DateTimeStyles styles)
   at System.DateTime.Parse(String s, IFormatProvider provider, DateTimeStyles styles)
   at CsvHelper.TypeConversion.DateTimeConverter.ConvertFromString(String text, IReaderRow row, MemberMapData memberMapData)
   at **CustomDateTimeConverter**.ConvertFromString(String text, IReaderRow row, MemberMapData memberMapData) in C:\Users\hq785\Projects\pmg2_tracker_net\pmg2_tracker_net\DAL\AssignmentMap.cs:line 72
   at lambda_method(Closure )
   at CsvHelper.Expressions.RecordCreator.Create[T]()
   at CsvHelper.Expressions.RecordManager.Create[T]()
   at CsvHelper.CsvReader.GetRecord[T]()
   --- End of inner exception stack trace ---

我无法闯入 Catch。我留下的印象是,因为我是从 Update-Database 迁移命令的 EntityFramework Seed() 方法调用它的,所以它 "wrapped" 以某种方式阻止了我从真正与 运行ning 代码交互。 (我更新了标题以反映这种复杂情况。)

来自Rails世界,这一切似乎都太聪明了一半。我想我所能做的就是解析字段 "by hand" (re: https://joshclose.github.io/CsvHelper/examples/reading/reading-by-hand/).

或者单独制作一个PS脚本,这样在Seed()方法下就不运行ning了。 (阻止我尝试的问题是如何 运行 它 "under" EF 层,例如 rails runner。)

或者将 CSV 文件处理成一系列 SQL 插入,然后手动清理所有内容。唉。

所以我终于手工打破了字段,并将它们包裹在 DateTime.TryParse:

    protected override void Seed(Pmg2TrackerContext context)
    {
        //  This method will be called after migrating to the latest version.

        //  You can use the DbSet<T>.AddOrUpdate() helper extension method 
        //  to avoid creating duplicate seed data.

        if (!System.Diagnostics.Debugger.IsAttached)
            System.Diagnostics.Debugger.Launch();

        Assembly assembly = Assembly.GetExecutingAssembly();
        string resourceName = "pmg2_tracker_net.DAL.CustomCalJobTracker.csv";
        using (Stream stream = assembly.GetManifestResourceStream(resourceName))
        {
            using (StreamReader reader = new StreamReader(stream, Encoding.UTF8))
            {
                CsvReader csvReader = new CsvReader(reader);
                csvReader.Configuration.HeaderValidated = null;
                csvReader.Configuration.MissingFieldFound = null;
                var records = new List<Assignment>();
                csvReader.Read();
                csvReader.ReadHeader();
                while (csvReader.Read())
                {
                    var assignment = new Assignment
                    {
                        Id = csvReader.GetField<int>("Id"),
                        Vpcr = csvReader.GetField("Vpcr"),
                        <SNIP>
                        InitiatedDate = ConvertBadDate(csvReader.GetField("InitiatedDate")),
                        TechScreeningRequestDate = ConvertBadDate(csvReader.GetField("TechScreeningRequestDate")),
                        TechScreeningCompletionDate = ConvertBadDate(csvReader.GetField("TechScreeningCompletionDate")),
                        <SNIP>
                    };
                    string status_string = csvReader.GetField<string>("Status");
                    assignment.Status = context.Statuses.First(s => s.Designation == status_string);
                    context.Assignments.AddOrUpdate(a => a.Vpcr, assignment);
                }
            }
        }

        base.Seed(context);

    }

    private DateTime ConvertBadDate(string PossibleDate)
    {
        DateTime val;
        if (DateTime.TryParse(PossibleDate, out val))
        {
            Console.WriteLine("Converted '{0}' to {1}.", PossibleDate, val);
            return val;
        }
        else
        {
            Console.WriteLine("Unable to convert '{0}' to a date.", PossibleDate);
            return DateTime.Now;
        }
    }

作为奖励,我还发现了在 Visual Studio 的另一个实例中启动进程的技巧,我可以在其中设置断点,并发现我的初始模式的其余部分存在一些小问题。