取一个文件并将其分成两组
Taking a file and splitting it into 2 groups
所以下面我有一段代码,它将一个数据文件分成两组,a 和 b。
string path = @"c:\users\povermyer\documents\visual studio 2013\Projects\DanProject\PNRS\PNRS.log";
string[] lines = System.IO.File.ReadAllLines(path);
var count = File.ReadLines(path).Count();
List<string> groupA = lines.Take(7678).ToList();
List<string> groupB = lines.Skip(7678).Take(5292).ToList();
为清楚起见,第一组采用代码的前 7678 行并将其放入组中,而第二组跳过前 7678 行并将其余行(5292 行)放入组中。唯一的问题是,如果我要使用未来的文件,它可能不包含拳头的 7678 和 5292。我知道第一组的开头以 A 开头并以 A 结尾,第二组以B 并以 B 结尾。所以我的问题是,如何获取上面的代码以根据文件的开始和结束方式将文件分为 2 组?
另外,开始和结束的线并不孤单。比如a的开头是
***********BEGIN PROCESSING A PNRS*********** and the end is ************END PROCESSING A PNRS************`
B组也一样,求助!
这个怎么样:
List<string> groupA = lines.Where(s => s.StartsWith("A") && s.EndsWith("A")).ToList();
List<string> groupB = lines.Where(s => s.StartsWith("B") && s.EndsWith("B")).ToList();
哦,而且,我知道这不是你的问题,但是...而不是
var count = File.ReadLines(path).Count();
...为什么不简单地这样做:
var count = lines.Length;
它避免了两次读取文件。
万一您需要拆分更多组,您可能需要考虑将您的组存储在 Dictionary<string, List<string>>
中,其中键是组名,值是仅包含组数据的列表.
更新
如果我理解这个场景,假设您的数据如下所示:
"***********BEGIN PROCESSING A PNRS*********** the beginning is 1 ************END PROCESSING A PNRS************",
"***********BEGIN PROCESSING A PNRS*********** the beginning is 2 ************END PROCESSING A PNRS************",
"***********BEGIN PROCESSING B PNRS*********** and the end is 1 ************END PROCESSING B PNRS************",
"***********BEGIN PROCESSING B PNRS*********** and the end is 2 ************END PROCESSING B PNRS************",
"***********BEGIN PROCESSING AB PNRS*********** good morning to you 1 ************END PROCESSING AB PNRS************",
"***********BEGIN PROCESSING AB PNRS*********** good morning to you 2 ************END PROCESSING AB PNRS************"
您想像这样分组:
A:
[0] the beginning is 1
[1] the beginning is 2
B:
[0] and the end is 1
[1] and the end is 2
AB:
[0] good morning to you 1
[1] good morning to you 2
这可能最适合 Regular Expressions
,我仍然建议将所有内容存储在 Dictionary<string, List<string>>
中
新代码
/// <summary>
/// Separates the List of string data into groups of data
/// </summary>
/// <param name="data">Array of string data</param>
/// <param name="groupNames">Array of group names</param>
/// <returns>Dictionary of List of string data broken into groups</returns>
private static Dictionary<string, List<string>> SeparateGroups(string[] data, params string[] groupNames)
{
return groupNames.ToDictionary(
groupName => groupName,
groupName => data.Select(d => {
Match m = Regex.Match(d, String.Format("^\*{{11,}}BEGIN PROCESSING {0} PNRS\*{{11,}}\s(.*)\s\*{{11,}}END PROCESSING {0} PNRS\*{{11,}}$", groupName));
return m.Success ? m.Groups[1].Value : String.Empty;
}).Where(s => !String.IsNullOrEmpty(s)).ToList()
);
}
用法:
string[] groupNames = new[] { "A", "B" , "AB" };
string[] lines = new[] {
"***********BEGIN PROCESSING A PNRS*********** the beginning is 1 ************END PROCESSING A PNRS************",
"***********BEGIN PROCESSING A PNRS*********** the beginning is 2 ************END PROCESSING A PNRS************",
"***********BEGIN PROCESSING B PNRS*********** and the end is 1 ************END PROCESSING B PNRS************",
"***********BEGIN PROCESSING B PNRS*********** and the end is 2 ************END PROCESSING B PNRS************",
"***********BEGIN PROCESSING AB PNRS*********** good morning to you 1 ************END PROCESSING AB PNRS************",
"***********BEGIN PROCESSING AB PNRS*********** good morning to you 2 ************END PROCESSING AB PNRS************"
};
int count = lines.Length;
Dictionary<string, List<string>> groups = SeparateGroups(lines, groupNames);
foreach (string key in groups.Keys)
{
Console.WriteLine(key + ":");
foreach (string value in groups[key])
{
Console.WriteLine(value);
}
}
结果:
A:
the beginning is 1
the beginning is 2
B:
and the end is 1
and the end is 2
AB:
good morning to you 1
good morning to you 2
旧代码
/// <summary>
/// Separates the List of string data into groups of data
/// </summary>
/// <param name="data">Array of string data</param>
/// <param name="groupNames">Array of group names</param>
/// <returns>Dictionary of List of string data broken into groups</returns>
private Dictionary<string, List<string>> SeparateGroups(string[] data, params string[] groupNames)
{
return groupNames.ToDictionary(
groupName => groupName,
groupName => data.Where(ag => ag.StartsWith(groupName) && ag.EndsWith(groupName)).ToList()
);
}
用法:
string[] groupNames = new[] { "A", "B", "AB" };
string[] lines = File.ReadAllLines(filePath);
int count = lines.Length
Dictionary<string, List<string>> groups = SeparateGroups(lines, groupNames);
所以下面我有一段代码,它将一个数据文件分成两组,a 和 b。
string path = @"c:\users\povermyer\documents\visual studio 2013\Projects\DanProject\PNRS\PNRS.log";
string[] lines = System.IO.File.ReadAllLines(path);
var count = File.ReadLines(path).Count();
List<string> groupA = lines.Take(7678).ToList();
List<string> groupB = lines.Skip(7678).Take(5292).ToList();
为清楚起见,第一组采用代码的前 7678 行并将其放入组中,而第二组跳过前 7678 行并将其余行(5292 行)放入组中。唯一的问题是,如果我要使用未来的文件,它可能不包含拳头的 7678 和 5292。我知道第一组的开头以 A 开头并以 A 结尾,第二组以B 并以 B 结尾。所以我的问题是,如何获取上面的代码以根据文件的开始和结束方式将文件分为 2 组?
另外,开始和结束的线并不孤单。比如a的开头是
***********BEGIN PROCESSING A PNRS*********** and the end is ************END PROCESSING A PNRS************`
B组也一样,求助!
这个怎么样:
List<string> groupA = lines.Where(s => s.StartsWith("A") && s.EndsWith("A")).ToList();
List<string> groupB = lines.Where(s => s.StartsWith("B") && s.EndsWith("B")).ToList();
哦,而且,我知道这不是你的问题,但是...而不是
var count = File.ReadLines(path).Count();
...为什么不简单地这样做:
var count = lines.Length;
它避免了两次读取文件。
万一您需要拆分更多组,您可能需要考虑将您的组存储在 Dictionary<string, List<string>>
中,其中键是组名,值是仅包含组数据的列表.
更新
如果我理解这个场景,假设您的数据如下所示:
"***********BEGIN PROCESSING A PNRS*********** the beginning is 1 ************END PROCESSING A PNRS************",
"***********BEGIN PROCESSING A PNRS*********** the beginning is 2 ************END PROCESSING A PNRS************",
"***********BEGIN PROCESSING B PNRS*********** and the end is 1 ************END PROCESSING B PNRS************",
"***********BEGIN PROCESSING B PNRS*********** and the end is 2 ************END PROCESSING B PNRS************",
"***********BEGIN PROCESSING AB PNRS*********** good morning to you 1 ************END PROCESSING AB PNRS************",
"***********BEGIN PROCESSING AB PNRS*********** good morning to you 2 ************END PROCESSING AB PNRS************"
您想像这样分组:
A:
[0] the beginning is 1
[1] the beginning is 2
B:
[0] and the end is 1
[1] and the end is 2
AB:
[0] good morning to you 1
[1] good morning to you 2
这可能最适合 Regular Expressions
,我仍然建议将所有内容存储在 Dictionary<string, List<string>>
新代码
/// <summary>
/// Separates the List of string data into groups of data
/// </summary>
/// <param name="data">Array of string data</param>
/// <param name="groupNames">Array of group names</param>
/// <returns>Dictionary of List of string data broken into groups</returns>
private static Dictionary<string, List<string>> SeparateGroups(string[] data, params string[] groupNames)
{
return groupNames.ToDictionary(
groupName => groupName,
groupName => data.Select(d => {
Match m = Regex.Match(d, String.Format("^\*{{11,}}BEGIN PROCESSING {0} PNRS\*{{11,}}\s(.*)\s\*{{11,}}END PROCESSING {0} PNRS\*{{11,}}$", groupName));
return m.Success ? m.Groups[1].Value : String.Empty;
}).Where(s => !String.IsNullOrEmpty(s)).ToList()
);
}
用法:
string[] groupNames = new[] { "A", "B" , "AB" };
string[] lines = new[] {
"***********BEGIN PROCESSING A PNRS*********** the beginning is 1 ************END PROCESSING A PNRS************",
"***********BEGIN PROCESSING A PNRS*********** the beginning is 2 ************END PROCESSING A PNRS************",
"***********BEGIN PROCESSING B PNRS*********** and the end is 1 ************END PROCESSING B PNRS************",
"***********BEGIN PROCESSING B PNRS*********** and the end is 2 ************END PROCESSING B PNRS************",
"***********BEGIN PROCESSING AB PNRS*********** good morning to you 1 ************END PROCESSING AB PNRS************",
"***********BEGIN PROCESSING AB PNRS*********** good morning to you 2 ************END PROCESSING AB PNRS************"
};
int count = lines.Length;
Dictionary<string, List<string>> groups = SeparateGroups(lines, groupNames);
foreach (string key in groups.Keys)
{
Console.WriteLine(key + ":");
foreach (string value in groups[key])
{
Console.WriteLine(value);
}
}
结果:
A:
the beginning is 1
the beginning is 2
B:
and the end is 1
and the end is 2
AB:
good morning to you 1
good morning to you 2
旧代码
/// <summary>
/// Separates the List of string data into groups of data
/// </summary>
/// <param name="data">Array of string data</param>
/// <param name="groupNames">Array of group names</param>
/// <returns>Dictionary of List of string data broken into groups</returns>
private Dictionary<string, List<string>> SeparateGroups(string[] data, params string[] groupNames)
{
return groupNames.ToDictionary(
groupName => groupName,
groupName => data.Where(ag => ag.StartsWith(groupName) && ag.EndsWith(groupName)).ToList()
);
}
用法:
string[] groupNames = new[] { "A", "B", "AB" };
string[] lines = File.ReadAllLines(filePath);
int count = lines.Length
Dictionary<string, List<string>> groups = SeparateGroups(lines, groupNames);