C# 在 LINQ 查询中区分
C# Distinct in LINQ query
我在更改一些代码后遇到了问题。我的想法是这样的:我正在计算文档中的单词数,但每个文档只有 1 个单词的副本,例如:
Document 1 = Smith Smith Smith Smith => Smith x1
Document 2 = Smith Alan Alan => Smith x1, Alan x1
Document 3 = John John => John x1
但铁匠总数应该:
Smith x2 (in 2 documents out of 3), Alan x1 (1 out of 3 documents), John x1 (1 out of 3 documents)
我认为它在我有一个单独的 distinct 方法之前是有效的(如果 distinct = false
也计算所有单词),现在它只产生 1
.
之前的代码:
private Dictionary<string, int> tempDict = new Dictionary<string, int>();
private void Splitter(string[] file)
{
tempDict = file
.SelectMany(i => File.ReadAllLines(i)
.SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Distinct())
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
它应该被更改为 returns 字典,但在制作应用程序的过程中将其更改为以下代码:
private Dictionary<string, int> Splitter(string[] file, bool distinct, bool pairs)
{
var query = file
.SelectMany(i => File.ReadLines(i)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
if (distinct)
{
query = query.Distinct();
}
if (pairs)
{
var pairWise = query.Pairwise((first, second) => string.Format("{0} {1}", first, second));
return query
.Concat(pairWise)
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
另请注意,query = file.Distinct();
returns 只是文档的名称。所以它必须有所不同。
@编辑
这就是我调用此方法的方式:
private void EnterDocument(object sender, RoutedEventArgs e)
{
List<string> myFile= new List<string>();
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.Multiselect = true;
openFileDialog.Filter = "All files (*.*)|*.*|Text files (*.txt)|*.txt";
if (openFileDialog.ShowDialog() == true)
{
foreach (string filename in openFileDialog.FileNames)
{
myFile.Add(filename);
}
}
string[] myFiles= myFile.ToArray();
myDatabase = Splitter(myFiles, true, false);
}
Distinct()
将从您的 IEnumerable
中删除重复项,因此在以下内容之前调用它...
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
...将生成所有唯一单词的列表,但计数为 1。
编辑:
要解决合并所有行的问题,您可以这样做:
List<string> allFilesWords = new List<string>();
foreach (var filename in file)
{
var fileQuery = File.ReadLines(filename)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
allFilesWords.AddRange(fileQuery.Distinct());
}
return allFilesWords
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
我在更改一些代码后遇到了问题。我的想法是这样的:我正在计算文档中的单词数,但每个文档只有 1 个单词的副本,例如:
Document 1 = Smith Smith Smith Smith => Smith x1
Document 2 = Smith Alan Alan => Smith x1, Alan x1
Document 3 = John John => John x1
但铁匠总数应该:
Smith x2 (in 2 documents out of 3), Alan x1 (1 out of 3 documents), John x1 (1 out of 3 documents)
我认为它在我有一个单独的 distinct 方法之前是有效的(如果 distinct = false
也计算所有单词),现在它只产生 1
.
之前的代码:
private Dictionary<string, int> tempDict = new Dictionary<string, int>();
private void Splitter(string[] file)
{
tempDict = file
.SelectMany(i => File.ReadAllLines(i)
.SelectMany(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Distinct())
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
它应该被更改为 returns 字典,但在制作应用程序的过程中将其更改为以下代码:
private Dictionary<string, int> Splitter(string[] file, bool distinct, bool pairs)
{
var query = file
.SelectMany(i => File.ReadLines(i)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
if (distinct)
{
query = query.Distinct();
}
if (pairs)
{
var pairWise = query.Pairwise((first, second) => string.Format("{0} {1}", first, second));
return query
.Concat(pairWise)
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
}
另请注意,query = file.Distinct();
returns 只是文档的名称。所以它必须有所不同。
@编辑 这就是我调用此方法的方式:
private void EnterDocument(object sender, RoutedEventArgs e)
{
List<string> myFile= new List<string>();
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.Multiselect = true;
openFileDialog.Filter = "All files (*.*)|*.*|Text files (*.txt)|*.txt";
if (openFileDialog.ShowDialog() == true)
{
foreach (string filename in openFileDialog.FileNames)
{
myFile.Add(filename);
}
}
string[] myFiles= myFile.ToArray();
myDatabase = Splitter(myFiles, true, false);
}
Distinct()
将从您的 IEnumerable
中删除重复项,因此在以下内容之前调用它...
return query
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
...将生成所有唯一单词的列表,但计数为 1。
编辑:
要解决合并所有行的问题,您可以这样做:
List<string> allFilesWords = new List<string>();
foreach (var filename in file)
{
var fileQuery = File.ReadLines(filename)
.SelectMany(line => line.Split(new[] { ' '}, StringSplitOptions.RemoveEmptyEntries))
.AsParallel()
.Select(word => word.ToLower())
.Where(word => !word.All(char.IsDigit)));
allFilesWords.AddRange(fileQuery.Distinct());
}
return allFilesWords
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());