将文本行拆分为单词并根据投票决定哪个是正确的
Split text lines into words and decide which one is correct based on voting
以下代码将每行拆分为单词,并将每行中的第一个单词存储到数组列表中,将第二个单词存储到另一个数组列表中,依此类推。然后它从每个列表中选择最频繁的单词作为正确的单词。
Module Module1
Sub Main()
Dim correctLine As String = ""
Dim line1 As String = "Canda has more than ones official language"
Dim line2 As String = "Canada has more than one oficial languages"
Dim line3 As String = "Canada has nore than one official lnguage"
Dim line4 As String = "Canada has nore than one offical language"
Dim wordsOfLine1() As String = line1.Split(" ")
Dim wordsOfLine2() As String = line2.Split(" ")
Dim wordsOfLine3() As String = line3.Split(" ")
Dim wordsOfLine4() As String = line4.Split(" ")
For i As Integer = 0 To wordsOfLine1.Length - 1
Dim wordAllLinesTemp As New List(Of String)(New String() {wordsOfLine1(i), wordsOfLine2(i), wordsOfLine3(i), wordsOfLine4(i)})
Dim counts = From n In wordAllLinesTemp
Group n By n Into Group
Order By Group.Count() Descending
Select Group.First
correctLine = correctLine & counts.First & " "
Next
correctLine = correctLine.Remove(correctLine.Length - 1)
Console.WriteLine(correctLine)
Console.ReadKey()
End Sub
End Module
我的问题:我怎样才能让它适用于不同字数的行。我的意思是这里每行的长度是 7 个单词,for 循环使用这个长度 (length-1)。假设第 3 行包含 5 个单词。
编辑: 不小心在最短的地方有正确的索引。
据我所知,您正在尝试查看哪条线最接近正确的线。
您可以使用以下代码获取编辑距离:
Public Function LevDist(ByVal s As String,
ByVal t As String) As Integer
Dim n As Integer = s.Length
Dim m As Integer = t.Length
Dim d(n + 1, m + 1) As Integer
If n = 0 Then
Return m
End If
If m = 0 Then
Return n
End If
Dim i As Integer
Dim j As Integer
For i = 0 To n
d(i, 0) = i
Next
For j = 0 To m
d(0, j) = j
Next
For i = 1 To n
For j = 1 To m
Dim cost As Integer
If t(j - 1) = s(i - 1) Then
cost = 0
Else
cost = 1
End If
d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1),
d(i - 1, j - 1) + cost)
Next
Next
Return d(n, m)
End Function
然后,这将用于确定哪条线最近:
Dim correctLine As String = ""
Dim line1 As String = "Canda has more than ones official language"
Dim line2 As String = "Canada has more than one oficial languages"
Dim line3 As String = "Canada has nore than one official lnguage"
Dim line4 As String = "Canada has nore than one offical language"
Dim lineArray As new ArrayList
Dim countArray As new ArrayList
lineArray.Add(line1)
lineArray.Add(line2)
lineArray.Add(line3)
lineArray.Add(line4)
For i = 0 To lineArray.Count - 1
countArray.Add(LevDist(lineArray(i), correctLine))
Next
Dim shortest As Integer = Integer.MaxValue
Dim correctIndex As Integer = 0
For i = 0 To countArray.Count - 1
If countArray(i) <= shortest Then
correctIndex = i
shortest = countArray(i)
End If
Next
Console.WriteLine(lineArray(correctIndex))
以下代码将每行拆分为单词,并将每行中的第一个单词存储到数组列表中,将第二个单词存储到另一个数组列表中,依此类推。然后它从每个列表中选择最频繁的单词作为正确的单词。
Module Module1
Sub Main()
Dim correctLine As String = ""
Dim line1 As String = "Canda has more than ones official language"
Dim line2 As String = "Canada has more than one oficial languages"
Dim line3 As String = "Canada has nore than one official lnguage"
Dim line4 As String = "Canada has nore than one offical language"
Dim wordsOfLine1() As String = line1.Split(" ")
Dim wordsOfLine2() As String = line2.Split(" ")
Dim wordsOfLine3() As String = line3.Split(" ")
Dim wordsOfLine4() As String = line4.Split(" ")
For i As Integer = 0 To wordsOfLine1.Length - 1
Dim wordAllLinesTemp As New List(Of String)(New String() {wordsOfLine1(i), wordsOfLine2(i), wordsOfLine3(i), wordsOfLine4(i)})
Dim counts = From n In wordAllLinesTemp
Group n By n Into Group
Order By Group.Count() Descending
Select Group.First
correctLine = correctLine & counts.First & " "
Next
correctLine = correctLine.Remove(correctLine.Length - 1)
Console.WriteLine(correctLine)
Console.ReadKey()
End Sub
End Module
我的问题:我怎样才能让它适用于不同字数的行。我的意思是这里每行的长度是 7 个单词,for 循环使用这个长度 (length-1)。假设第 3 行包含 5 个单词。
编辑: 不小心在最短的地方有正确的索引。
据我所知,您正在尝试查看哪条线最接近正确的线。
您可以使用以下代码获取编辑距离:
Public Function LevDist(ByVal s As String,
ByVal t As String) As Integer
Dim n As Integer = s.Length
Dim m As Integer = t.Length
Dim d(n + 1, m + 1) As Integer
If n = 0 Then
Return m
End If
If m = 0 Then
Return n
End If
Dim i As Integer
Dim j As Integer
For i = 0 To n
d(i, 0) = i
Next
For j = 0 To m
d(0, j) = j
Next
For i = 1 To n
For j = 1 To m
Dim cost As Integer
If t(j - 1) = s(i - 1) Then
cost = 0
Else
cost = 1
End If
d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1),
d(i - 1, j - 1) + cost)
Next
Next
Return d(n, m)
End Function
然后,这将用于确定哪条线最近:
Dim correctLine As String = ""
Dim line1 As String = "Canda has more than ones official language"
Dim line2 As String = "Canada has more than one oficial languages"
Dim line3 As String = "Canada has nore than one official lnguage"
Dim line4 As String = "Canada has nore than one offical language"
Dim lineArray As new ArrayList
Dim countArray As new ArrayList
lineArray.Add(line1)
lineArray.Add(line2)
lineArray.Add(line3)
lineArray.Add(line4)
For i = 0 To lineArray.Count - 1
countArray.Add(LevDist(lineArray(i), correctLine))
Next
Dim shortest As Integer = Integer.MaxValue
Dim correctIndex As Integer = 0
For i = 0 To countArray.Count - 1
If countArray(i) <= shortest Then
correctIndex = i
shortest = countArray(i)
End If
Next
Console.WriteLine(lineArray(correctIndex))