在 DataTable 中搜索和替换内部字符串列很慢?

Search and replace inside string column in DataTable is slow?

我在 DataTable (.dt) 的字符串列中获取不同的词,然后用另一个值替换唯一值,因此本质上是将词更改为其他词。下面列出的两种方法都有效,但是,对于 90k 记录,该过程不是很快。有没有办法加快这两种方法的速度?

第一种方法,如下:

   'fldNo is column number in dt
   For Each Word As String In DistinctWordList
      Dim myRow() As DataRow
      myRow = dt.Select(MyColumnName & "='" & Word & "'")
      For Each row In myRow
         row(fldNo) = dicNewWords(Word)
      Next
   Next

第二种基于 LINQ 的方法如下,实际上也不是很快:

   Dim flds as new List(of String)
   flds.Add(myColumnName)
   For Each Word As String In DistinctWordsList
     Dim rowData() As DataRow = dt.AsEnumerable().Where(Function(f) flds.Where(Function(el) f(el) IsNot DBNull.Value AndAlso f(el).ToString = Word).Count = flds.Count).ToArray
     ReDim foundrecs(rowData.Count)
     Cnt = 0
     For Each row As DataRow In rowData
       Dim Index As Integer = dt.Rows.IndexOf(row)
       foundrecs(Cnt) = Index + 1 'row.RowId
       Cnt += 1
     Next
     For i = 0 To Cnt
       dt(foundrecs(i))(fldNo) = dicNewWords(Word)
     Next 
   Next

所以你有你的替换字典:

Dim d as New Dictionary(Of String, String)
d("foo") = "bar"
d("baz") = "buf"

您可以将它们应用于 table 的 ReplaceMe 列:

Dim rep as String = Nothing
For Each r as DataRow In dt.Rows
  If d.TryGetValue(r.Field(Of String)("ReplaceMe"), rep) Then r("ReplaceMe") = rep 
Next r

在我的机器上,100 万次替换需要 340 毫秒。我可以通过使用列号而不是名称将其减少到 260 毫秒 - If d.TryGetValue(r.Field(Of String)(0), rep) Then r(0) = rep

时间:

    'setup, fill a dict with string replacements like "1" -> "11", "7" -> "17"
    Dim d As New Dictionary(Of String, String)
    For i = 0 To 9
        d(i.ToString()) = (i + 10).ToString()
    Next

    'put a million rows in a datatable, randomly assign dictionary keys as row values
    Dim dt As New DataTable
    dt.Columns.Add("ReplaceMe")
    Dim r As New Random()
    Dim k = d.Keys.ToArray()
    For i = 1 To 1000000
        dt.Rows.Add(k(r.Next(k.Length)))
    Next

    'what range of values do we have in our dt?
    Dim minToMaxBefore = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))

    'it's a crappy way to time, but it'll prove the point
    Dim start = DateTime.Now

    Dim rep As String = Nothing
    For Each ro As DataRow In dt.Rows
        If d.TryGetValue(ro.Field(Of String)("ReplaceMe"), rep) Then ro("ReplaceMe") = rep
    Next

    Dim ennd = DateTime.Now

    'what range of values do we have now
    Dim minToMaxAfter = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))


    MessageBox.Show($"min to max before of {minToMaxBefore} became {minToMaxAfter} proving replacements occurred, it took {(ennd - start).TotalMilliseconds} ms for 1 million replacements")