无法使用 Microsoft.ACE.OLEDB.12 检测从 CSV 批量导入的错误数据

Unable to detect bad data for bulk import from CSV using Microsoft.ACE.OLEDB.12

例如,如果日期列包含一个添加的字母,它会被视为空值,我不会收到任何警告。

我已经用尽了 Microsoft 的所有文档,但没有迹象表明可以更改此行为。在所有 google 中只找到一篇与此相关的文章,并且说无法更改。

schema.ini是通过代码创建的,但这是它的样子。

[NewEmployees.csv]
ColNameHeader=True
Format=CSVDelimited
DateTimeFormat=dd-MMM-yy
Col1=FirstName Text
Col2=LastName Text
Col3="Hire Date" Date

下面是最相关的代码行

string strSql = "SELECT * FROM [" + FileUpload1.FileName + "]";
                string strCSVConnString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + targetFolder + ";" + "Extended Properties='text;HDR=YES;'";
                OleDbDataAdapter oleda = new OleDbDataAdapter(strSql, strCSVConnString);
                DataTable importData = new DataTable();
                oleda.Fill(importData);

                GridView1.DataSource = importData;
                GridView1.DataBind();

如果有人想要完整的 ASP.Net 代码,如下所示。它将允许用户 select 他们计算机上的文件,创建一个名称基于当前日期和时间的文件夹,创建 schema.ini 并将其保存到文件夹,保存上传的 csv 文件到文件夹,而不是查询 csv 文件并将其绑定到 gridview。这是很好的代码,但如果它不能检测到坏数据就没用了。

代码隐藏

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;

using System.IO;
using System.Data;
using System.Data.OleDb;

using System.Data.SqlClient;
using System.Data;

namespace WebApplication1
{
    public partial class EmployeeImport : System.Web.UI.Page
    {
        public string GetDateTimeStampedFolderName()
        {
            return string.Format("{0:yyyy-MM-dd_hh-mm-ss-tt}", DateTime.Now);
        }

        public void CreateSchemIni(string targetFolder, string fileName)
        {
            using (FileStream filestr = new FileStream(targetFolder + "/schema.ini", FileMode.Create, FileAccess.Write))
            {
                using (StreamWriter writer = new StreamWriter(filestr))
                {
                    writer.WriteLine("[" + FileUpload1.FileName + "]");
                    writer.WriteLine("ColNameHeader=True");
                    writer.WriteLine("Format=CSVDelimited");
                    writer.WriteLine("DateTimeFormat=dd-MMM-yy");
                    writer.WriteLine("Col1=FirstName Text");
                    writer.WriteLine("Col2=LastName Text");
                    writer.WriteLine("Col3=\"Hire Date\" Date");
                    writer.Close();
                    writer.Dispose();
                }
                filestr.Close();
                filestr.Dispose();
            }
        }

        private void UploadAndImport()
        {
            if (FileUpload1.HasFile)
            {
                string targetFolder = Server.MapPath("~/Uploads/Employees/" + GetDateTimeStampedFolderName());

                if (System.IO.Directory.Exists(targetFolder) == false)
                {
                    System.IO.Directory.CreateDirectory(targetFolder);
                }

                FileUpload1.SaveAs(Path.Combine(targetFolder, FileUpload1.FileName));

                CreateSchemIni(targetFolder, FileUpload1.FileName);

                string strSql = "SELECT * FROM [" + FileUpload1.FileName + "]";
                string strCSVConnString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + targetFolder + ";" + "Extended Properties='text;HDR=YES;'";
                OleDbDataAdapter oleda = new OleDbDataAdapter(strSql, strCSVConnString);
                DataTable importData = new DataTable();
                oleda.Fill(importData);

                GridView1.DataSource = importData;
                GridView1.DataBind();
            }
        }

        protected void UploadButton_Click(object sender, EventArgs e)
        {
            if (FileUpload1.HasFile)
            {
                UploadAndImport();
            }
        }
    }
}

ASPX

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="EmployeeImport.aspx.cs" Inherits="WebApplication1.EmployeeImport" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>
</head>
<body>
    <form id="form1" runat="server">
    <div>

         <asp:FileUpload ID="FileUpload1" runat="server" />

        <br />
        <asp:Button ID="UploadButton" runat="server" Text="Upload" 
            onclick="UploadButton_Click" />
        <asp:GridView ID="GridView1" runat="server">
        </asp:GridView>

    </div>
    </form>
</body>
</html>
try
{
    oleda.Fill(importData);
}
catch(Exception) // put break point here
{
    throw;
}

看看你现在有没有例外

使用 Microsoft.ACE.OLEDB.12.0 和 schema.ini 导入数据可能会产生 2 个重大的无声但致命的问题。我正在为两者发布解决方案。尽管其中一个仅适用于 SQL 服务器,但类似的解决方案可能适用于其他数据库。

  1. 错误数据,例如带有“5/20/2016a”等字母的日期 转换为 null 时不会抛出异常或警告 那个会发生。它会愉快地继续前进并破坏您的数据。
  2. 在 schema.ini 中指定列类型是通过它的序号完成的 位置并将完全忽略 CSV 中的 headers。如果 CSV 中的列顺序不正确,您将不会收到异常或警告。并且您的数据将被损坏。

例如,如果 schema.ini 包含:

Col1=FirstName Text
Col2=LastName Text
Col3="Hire Date" Date

并且 CSV 中的 FirstName、LastName 的顺序错误:

LastName,FirstName,HireDate
Smith,Jon,5/1/2016
Moore,Larry,5/15/2016

ACE driver 不够智能,无法识别 headers 顺序错误,数据将被错误导入。

问题 1 的解决方案 - 错误数据

我想到的解决方案是使用 schema.ini 将所有列指定为文本字段,并使用 System.Data.SqlClient.SqlBulkCopy 将数据导入到 SQL 服务器。当 SQLBulkCopy 发现错误数据时,它会很聪明地抛出异常并阻止整个 CSV 的导入,即使只有最后一条记录是错误的。

问题 2 的解决方案 - CSV 列乱序,或包含 missing/extra 列

为了解决这个问题,我创建了 2 个数据表,其中一个填充了模式但没有数据。仅填充模式的必须在创建 schema.ini 之前完成,因为一旦创建 schema.ini,CSV 中的 headers 将被忽略。

DataTable importData = new DataTable();
DataTable importDataSourceSchema = new DataTable();

// Fill the schema prior to creating the schema.ini, as this is the only way to get the headers from the CSV
oleda.FillSchema(importDataSourceSchema, System.Data.SchemaType.Source);
CreateSchemIni(targetFolder, FileUpload1.FileName);
oleda.Fill(importData);

然后我创建了一个函数来验证 CSV 中的 headers 顺序正确并且 CSV 包含正确的列数:

private bool ValidateHeaders(DataTable importData, DataTable importDataSourceSchema)
{
    bool isValid = true;

    if (importData.Columns.Count != importDataSourceSchema.Columns.Count)
    {
        isValid = false;
        ValidationLabel.Text = ValidationLabel.Text + "<br />Wrong number of columns";
    }

    for (int i = 0; i < importData.Columns.Count; i++)
    {
        if (importData.Columns[i].ColumnName != importDataSourceSchema.Columns[i].ColumnName)
        {
            ValidationLabel.Text = ValidationLabel.Text + "<br />Error finding column " + importData.Columns[i].ColumnName;
            isValid = false;
        }
    }
    return isValid;
}

然后我在执行批量导入之前调用 ValidateHeaders

if (ValidateHeaders(importData, importDataSourceSchema))
{
    using (SqlBulkCopy bulkCopy = new SqlBulkCopy([Add your ConnectionString here]))
    {
        bulkCopy.DestinationTableName = "dbo.EmployeeImport";
        bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("FirstName", "FirstName"));
        bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("LastName", "LastName"));
        bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Hire Date", "HireDate"));
        try
        {
            bulkCopy.WriteToServer(importData);
            ValidationLabel.Text = "Success";
            GridView1.DataSource = importData;
            GridView1.DataBind();
        }
        catch (Exception e)
        {
            ValidationLabel.Text = e.Message;
        }
    }
}

下面是为 ASP.NET WebForms

编写的概念代码的完整证明

ASPX

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="EmployeeImport.aspx.cs" Inherits="WebApplication1.EmployeeImport" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>
</head>
<body>
    <form id="form1" runat="server">
    <div>

         <asp:FileUpload ID="FileUpload1" runat="server" />

        <br />
        <asp:Button ID="UploadButton" runat="server" Text="Upload" 
            onclick="UploadButton_Click" />

        <br />
        Data Imported: <asp:Label ID="ValidationLabel" runat="server" ForeColor="Red"></asp:Label>
        <asp:GridView ID="GridView1" runat="server">
        </asp:GridView>

    </div>
    </form>
</body>
</html>

代码隐藏

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;

using System.IO;
using System.Data;
using System.Data.OleDb;

using System.Data.SqlClient;
using System.Data;

namespace WebApplication1
{
    public partial class EmployeeImport : System.Web.UI.Page
    {
        public string GetDateTimeStampedFolderName()
        {
            return string.Format("{0:yyyy-MM-dd_hh-mm-ss-tt}", DateTime.Now);
        }

        public void CreateSchemIni(string targetFolder, string fileName)
        {
            using (FileStream filestr = new FileStream(targetFolder + "/schema.ini", FileMode.Create, FileAccess.Write))
            {
                using (StreamWriter writer = new StreamWriter(filestr))
                {
                    writer.WriteLine("[" + FileUpload1.FileName + "]");
                    writer.WriteLine("ColNameHeader=True");
                    writer.WriteLine("Format=CSVDelimited");
                    writer.WriteLine("Col1=FirstName Text");
                    writer.WriteLine("Col2=LastName Text");
                    writer.WriteLine("Col3=\"Hire Date\" Text");
                    writer.Close();
                    writer.Dispose();
                }
                filestr.Close();
                filestr.Dispose();
            }
        }

        private bool ValidateHeaders(DataTable importData, DataTable importDataSourceSchema)
        {

            bool isValid = true;

            if (importData.Columns.Count != importDataSourceSchema.Columns.Count)
            {
                isValid = false;
                ValidationLabel.Text = ValidationLabel.Text + "<br />Wrong number of columns";
            }

            for (int i = 0; i < importData.Columns.Count; i++)
            {
                if (importData.Columns[i].ColumnName != importDataSourceSchema.Columns[i].ColumnName)
                {
                    ValidationLabel.Text = ValidationLabel.Text + "<br />Error finding column " + importData.Columns[i].ColumnName;
                    isValid = false;
                }
            }

            return isValid;
        }

        private void UploadAndImport()
        {
            if (FileUpload1.HasFile)
            {
                string targetFolder = Server.MapPath("~/Uploads/Employees/" + GetDateTimeStampedFolderName());

                if (System.IO.Directory.Exists(targetFolder) == false)
                {
                    System.IO.Directory.CreateDirectory(targetFolder);
                }

                FileUpload1.SaveAs(Path.Combine(targetFolder, FileUpload1.FileName));



                string strSql = "SELECT * FROM [" + FileUpload1.FileName + "]";
                string strCSVConnString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + targetFolder + ";" + "Extended Properties='text;HDR=YES;'";
                using (OleDbDataAdapter oleda = new OleDbDataAdapter(strSql, strCSVConnString))
                {
                    DataTable importData = new DataTable();
                    DataTable importDataSourceSchema = new DataTable();

                    // Fill the schema prior to creating the schema.ini, as this is the only way to get the headers from the CSV
                    oleda.FillSchema(importDataSourceSchema, System.Data.SchemaType.Source);
                    CreateSchemIni(targetFolder, FileUpload1.FileName);
                    oleda.Fill(importData);

                    if (ValidateHeaders(importData, importDataSourceSchema))
                    {
                        using (SqlBulkCopy bulkCopy = new SqlBulkCopy([Add your ConnectionString here], SqlBulkCopyOptions.TableLock))
                        {
                            bulkCopy.DestinationTableName = "dbo.EmployeeImport";
                            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("FirstName", "FirstName"));
                            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("LastName", "LastName"));
                            bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Hire Date", "HireDate"));
                            try
                            {
                                bulkCopy.WriteToServer(importData);
                                ValidationLabel.Text = "Success";
                                GridView1.DataSource = importData;
                                GridView1.DataBind();
                            }
                            catch (Exception e)
                            {
                                ValidationLabel.Text = e.Message;
                            }
                        }


                    }
                }
            }
        }

        protected void UploadButton_Click(object sender, EventArgs e)
        {
            if (FileUpload1.HasFile)
            {
                ValidationLabel.Text = "";
                UploadAndImport();
            }
        }
    }
}