如何多次拆分输入文本
How to split input text multiple times
我有一个按以下方式构建的输入文件:年份由“-”分隔,研究由“=”分隔,学生及其表现由“\t”分隔。我的目标是解析输入文件以获得数字。当我最终得到数字时,我需要最后两个数字来代表学生的表现,以百分比表示。问题是,当我将输入拆分为连字符时,我得到了一个列表,但我当时不知道该怎么做,因为我不能再删除它,因为它现在是一个列表。
基本上,它是一长串名称,其中包含以毫秒为单位的性能时间,然后是以百分比表示的性能,如下所示:
Frank Pierre 1398 81。这是我要从列表中的所有不同名称中检索的第二个数字,因为该数字代表百分比。
我已经能够通过使用 for 循环遍历输入文件中的所有项目来检索数字,然后将它们附加到新列表(如果它们是整数),但问题是我的解决方案使用最后的所有数字都小于或等于一百(因为它们是百分比),最终将它们从新列表中删除并将它们添加到新的百分比列表中。但是,我希望该程序能够以更通用的方式处理具有相同结构的输入文件。
想象一个具有相同结构的文件,但在某些情况下学生姓名后的第一个数字小于 100。我的程序会将其识别为百分比,因为它小于 100,但是事实并非如此!只有第一个数字之后的第二个数字代表百分比。这就是为什么我认为解析输入文件会更好,这样您就可以将数字与其他所有内容分开,然后检索第二个数字,例如使用索引。我只是不知道该怎么做。
如果有人知道如何完成这项工作,那就太好了。代码必须在 python 2.7 中,我不能使用任何外部模块,我必须自己定义函数。我只需要能够得到第二个数字的列表,这样我就可以用它们对它们进行分析。
我目前有以下代码:
'with open("statistics_input.txt", "r") as input:
information = input.read()
splitted = information.split('-')
first = splitted[0]'
问题是我现在得到一个包含 6 个不同索引的列表,每年一个,我不知道如何进一步解析它。我首先将第一年作为变量,但我现在如何检索那一年的数字并为每一年重复该过程?
有很多不同的方法可以完成您正在尝试的事情。不过,我有几个建议:
- 按
'-'
拆分后,如您所说,您有一个列表。然而,该列表中的条目都是字符串,如果您想在一年的记录中拆分为研究,那么您可以采用列表中的一个字符串,并将其拆分为 '='
。这将为您提供另一个列表,但条目仍然是字符串,可以适当处理。
- 要获取一行中的最后一个数字,您可以将该行拆分为空格 (
' '
),然后获取结果列表的最后一个元素。您需要知道这条线是学生(而不是一年或学习标记),但听起来您可能已经明白了这一点。
至此你已经了解了文件的打开和读取,所以我跳过这部分。假设你的文件内容是在变量text
中读取的,这段代码:
data = {}
years = text.split('\n-') # text -- is your source text
for y in years :
year = y.split('\n') [0]
subj = y.split('\n') [1:]
data[year] = {}
subject = 'none'
for s in subj :
if len(s) < 5 or s[0] == '=' :
subject = s
data[year][subject] = []
continue
name, result = s.split('\t')
data[year][subject].append( (name, result) )
print json.dumps( data, indent=4)
给出以下结果:
{
"1999": {
"I": [
[
"Willem Jan van Steen",
"9859 77"
],
[
"Guillaume Kielmann",
"5264 77"
],
[
"Guillaume Bos",
"8200 6"
],
[
"Matty Klop",
"9066 42"
],
[
"Atze Klop",
"3318 45"
],
[
"Sven Kielmann",
"1160 63"
],
[
"Wartie Hijma",
"1904 65"
],
[
"Matty Evers",
"2516 100"
],
[
"Matty Bos",
"2941 99"
],
[
"Pieter van der Ploeg",
"8873 80"
],
[
"Jan Willem van Zeist",
"3934 95"
],
[
"Thilo van Steen",
"9665 61"
],
[
"Wan van Raamsdonk",
"1771 86"
],
[
"Henri Fokkink",
"7484 59"
],
[
"Jan Willem Evers",
"9709 82"
]
],
"=AI": [
[
"Sven Swarttouw",
"2604 73"
],
[
"Eline van Raamsdonk",
"9771 60"
],
[
"Herbert van der Ploeg",
"9325 41"
],
[
"Eline Hijma",
"430 23"
],
[
"Pieter Hijma",
"8203 65"
],
[
"Eline Silvis Cividjian",
"2700 79"
]
],
"=W": [
[
"Guillaume Zeggers",
"290 47"
],
[
"Natalia van Raamsdonk",
"2751 55"
],
[
"Wartie Zeggers",
"3079 92"
],
[
"Atze Swarttouw",
"9474 30"
],
[
"Rene Pierre",
"2125 62"
],
[
"Pieter van Mantgem",
"3023 67"
],
[
"Jan Willem Hijma",
"7441 86"
]
],
"=BWI": [
[
"Rene Zeggers",
"7679 8"
],
[
"Matty van Mantgem",
"7431 44"
],
[
"Sven van Raamsdonk",
"7248 46"
],
[
"Eline Pierre",
"5731 86"
],
[
"Maarten Kielmann",
"7162 59"
],
[
"Atze Zeggers",
"7065 72"
],
[
"Eline van Mantgem",
"830 78"
],
[
"Natalia van Steen",
"6321 49"
],
[
"Frank van Raamsdonk",
"1380 31"
],
[
"Pieter Bos",
"9639 94"
],
[
"Andy Zeggers",
"5232 78"
],
[
"Andy van Raamsdonk",
"1256 69"
],
[
"Eline Gude",
"4101 40"
],
[
"Matty Fokkink",
"9839 89"
],
[
"Natalia Hijma",
"203 11"
],
[
"Henri Bos",
"6728 66"
],
[
"Guillaume van der Ploeg",
"9998 48"
],
[
"Jan Willem van Steen",
"760 79"
],
[
"Matty Pierre",
"337 96"
],
[
"Wan Gude",
"3811 39"
]
],
"=ECTR": [
[
"Frank Swarttouw",
"6484 49"
],
[
"Wan Hijma",
"9845 36"
],
[
"Herbert Silvis Cividjian",
"1544 84"
],
[
"Natalia Kielmann",
"646 21"
]
]
},
"2002": {
"I": [
[
"Eline van Steen",
"7817 11"
],
[
"Andy van Steen",
"9212 51"
],
[
"Frank van Zeist",
"233 27"
],
[
"Rene Swarttouw",
"5695 68"
],
[
"Wan Bos",
"7039 29"
],
[
"Eline van der Ploeg",
"4410 99"
],
[
"Wartie van der Ploeg",
"2526 20"
],
[
"Sven Bos",
"4694 98"
],
[
"Wartie Swarttouw",
"5371 70"
],
[
"Thilo van Zeist",
"10009 77"
],
[
"Guillaume Fokkink",
"4125 86"
],
[
"Atze Bos",
"4227 97"
],
[
"Pieter Silvis Cividjian",
"9491 15"
],
[
"Sven Evers",
"6994 41"
]
],
"=AI": [
[
"Matty van Steen",
"9702 40"
],
[
"Thilo Silvis Cividjian",
"5553 42"
],
[
"Herbert van Raamsdonk",
"6867 90"
],
[
"Wartie Evers",
"2086 81"
],
[
"Jan Willem Bos",
"1566 92"
],
[
"Maarten van Mantgem",
"8960 92"
],
[
"Sven van Zeist",
"8629 74"
],
[
"Matty van Raamsdonk",
"496 41"
],
[
"Willem Jan Evers",
"1853 11"
],
[
"Guillaume van Zeist",
"9729 62"
],
[
"Maarten Klop",
"8653 74"
],
[
"Henri van der Ploeg",
"6755 39"
]
],
"=W": [
[
"Herbert Kielmann",
"2135 99"
],
[
"Andy van Mantgem",
"8033 49"
],
[
"Guillaume Gude",
"5356 52"
],
[
"Herbert Bos",
"1435 47"
],
[
"Pieter Gude",
"9460 36"
],
[
"Jan Willem van der Ploeg",
"8403 25"
],
[
"Wan van Mantgem",
"9672 68"
]
],
打印姓名和分数的方式如下:
for year in data.values() :
for subject in year.values() :
for student in subject :
print student[0], student[1].split()[1] # only the last number
我有一个按以下方式构建的输入文件:年份由“-”分隔,研究由“=”分隔,学生及其表现由“\t”分隔。我的目标是解析输入文件以获得数字。当我最终得到数字时,我需要最后两个数字来代表学生的表现,以百分比表示。问题是,当我将输入拆分为连字符时,我得到了一个列表,但我当时不知道该怎么做,因为我不能再删除它,因为它现在是一个列表。 基本上,它是一长串名称,其中包含以毫秒为单位的性能时间,然后是以百分比表示的性能,如下所示: Frank Pierre 1398 81。这是我要从列表中的所有不同名称中检索的第二个数字,因为该数字代表百分比。
我已经能够通过使用 for 循环遍历输入文件中的所有项目来检索数字,然后将它们附加到新列表(如果它们是整数),但问题是我的解决方案使用最后的所有数字都小于或等于一百(因为它们是百分比),最终将它们从新列表中删除并将它们添加到新的百分比列表中。但是,我希望该程序能够以更通用的方式处理具有相同结构的输入文件。
想象一个具有相同结构的文件,但在某些情况下学生姓名后的第一个数字小于 100。我的程序会将其识别为百分比,因为它小于 100,但是事实并非如此!只有第一个数字之后的第二个数字代表百分比。这就是为什么我认为解析输入文件会更好,这样您就可以将数字与其他所有内容分开,然后检索第二个数字,例如使用索引。我只是不知道该怎么做。
如果有人知道如何完成这项工作,那就太好了。代码必须在 python 2.7 中,我不能使用任何外部模块,我必须自己定义函数。我只需要能够得到第二个数字的列表,这样我就可以用它们对它们进行分析。
我目前有以下代码:
'with open("statistics_input.txt", "r") as input:
information = input.read()
splitted = information.split('-')
first = splitted[0]'
问题是我现在得到一个包含 6 个不同索引的列表,每年一个,我不知道如何进一步解析它。我首先将第一年作为变量,但我现在如何检索那一年的数字并为每一年重复该过程?
有很多不同的方法可以完成您正在尝试的事情。不过,我有几个建议:
- 按
'-'
拆分后,如您所说,您有一个列表。然而,该列表中的条目都是字符串,如果您想在一年的记录中拆分为研究,那么您可以采用列表中的一个字符串,并将其拆分为'='
。这将为您提供另一个列表,但条目仍然是字符串,可以适当处理。 - 要获取一行中的最后一个数字,您可以将该行拆分为空格 (
' '
),然后获取结果列表的最后一个元素。您需要知道这条线是学生(而不是一年或学习标记),但听起来您可能已经明白了这一点。
至此你已经了解了文件的打开和读取,所以我跳过这部分。假设你的文件内容是在变量text
中读取的,这段代码:
data = {}
years = text.split('\n-') # text -- is your source text
for y in years :
year = y.split('\n') [0]
subj = y.split('\n') [1:]
data[year] = {}
subject = 'none'
for s in subj :
if len(s) < 5 or s[0] == '=' :
subject = s
data[year][subject] = []
continue
name, result = s.split('\t')
data[year][subject].append( (name, result) )
print json.dumps( data, indent=4)
给出以下结果:
{
"1999": {
"I": [
[
"Willem Jan van Steen",
"9859 77"
],
[
"Guillaume Kielmann",
"5264 77"
],
[
"Guillaume Bos",
"8200 6"
],
[
"Matty Klop",
"9066 42"
],
[
"Atze Klop",
"3318 45"
],
[
"Sven Kielmann",
"1160 63"
],
[
"Wartie Hijma",
"1904 65"
],
[
"Matty Evers",
"2516 100"
],
[
"Matty Bos",
"2941 99"
],
[
"Pieter van der Ploeg",
"8873 80"
],
[
"Jan Willem van Zeist",
"3934 95"
],
[
"Thilo van Steen",
"9665 61"
],
[
"Wan van Raamsdonk",
"1771 86"
],
[
"Henri Fokkink",
"7484 59"
],
[
"Jan Willem Evers",
"9709 82"
]
],
"=AI": [
[
"Sven Swarttouw",
"2604 73"
],
[
"Eline van Raamsdonk",
"9771 60"
],
[
"Herbert van der Ploeg",
"9325 41"
],
[
"Eline Hijma",
"430 23"
],
[
"Pieter Hijma",
"8203 65"
],
[
"Eline Silvis Cividjian",
"2700 79"
]
],
"=W": [
[
"Guillaume Zeggers",
"290 47"
],
[
"Natalia van Raamsdonk",
"2751 55"
],
[
"Wartie Zeggers",
"3079 92"
],
[
"Atze Swarttouw",
"9474 30"
],
[
"Rene Pierre",
"2125 62"
],
[
"Pieter van Mantgem",
"3023 67"
],
[
"Jan Willem Hijma",
"7441 86"
]
],
"=BWI": [
[
"Rene Zeggers",
"7679 8"
],
[
"Matty van Mantgem",
"7431 44"
],
[
"Sven van Raamsdonk",
"7248 46"
],
[
"Eline Pierre",
"5731 86"
],
[
"Maarten Kielmann",
"7162 59"
],
[
"Atze Zeggers",
"7065 72"
],
[
"Eline van Mantgem",
"830 78"
],
[
"Natalia van Steen",
"6321 49"
],
[
"Frank van Raamsdonk",
"1380 31"
],
[
"Pieter Bos",
"9639 94"
],
[
"Andy Zeggers",
"5232 78"
],
[
"Andy van Raamsdonk",
"1256 69"
],
[
"Eline Gude",
"4101 40"
],
[
"Matty Fokkink",
"9839 89"
],
[
"Natalia Hijma",
"203 11"
],
[
"Henri Bos",
"6728 66"
],
[
"Guillaume van der Ploeg",
"9998 48"
],
[
"Jan Willem van Steen",
"760 79"
],
[
"Matty Pierre",
"337 96"
],
[
"Wan Gude",
"3811 39"
]
],
"=ECTR": [
[
"Frank Swarttouw",
"6484 49"
],
[
"Wan Hijma",
"9845 36"
],
[
"Herbert Silvis Cividjian",
"1544 84"
],
[
"Natalia Kielmann",
"646 21"
]
]
},
"2002": {
"I": [
[
"Eline van Steen",
"7817 11"
],
[
"Andy van Steen",
"9212 51"
],
[
"Frank van Zeist",
"233 27"
],
[
"Rene Swarttouw",
"5695 68"
],
[
"Wan Bos",
"7039 29"
],
[
"Eline van der Ploeg",
"4410 99"
],
[
"Wartie van der Ploeg",
"2526 20"
],
[
"Sven Bos",
"4694 98"
],
[
"Wartie Swarttouw",
"5371 70"
],
[
"Thilo van Zeist",
"10009 77"
],
[
"Guillaume Fokkink",
"4125 86"
],
[
"Atze Bos",
"4227 97"
],
[
"Pieter Silvis Cividjian",
"9491 15"
],
[
"Sven Evers",
"6994 41"
]
],
"=AI": [
[
"Matty van Steen",
"9702 40"
],
[
"Thilo Silvis Cividjian",
"5553 42"
],
[
"Herbert van Raamsdonk",
"6867 90"
],
[
"Wartie Evers",
"2086 81"
],
[
"Jan Willem Bos",
"1566 92"
],
[
"Maarten van Mantgem",
"8960 92"
],
[
"Sven van Zeist",
"8629 74"
],
[
"Matty van Raamsdonk",
"496 41"
],
[
"Willem Jan Evers",
"1853 11"
],
[
"Guillaume van Zeist",
"9729 62"
],
[
"Maarten Klop",
"8653 74"
],
[
"Henri van der Ploeg",
"6755 39"
]
],
"=W": [
[
"Herbert Kielmann",
"2135 99"
],
[
"Andy van Mantgem",
"8033 49"
],
[
"Guillaume Gude",
"5356 52"
],
[
"Herbert Bos",
"1435 47"
],
[
"Pieter Gude",
"9460 36"
],
[
"Jan Willem van der Ploeg",
"8403 25"
],
[
"Wan van Mantgem",
"9672 68"
]
],
打印姓名和分数的方式如下:
for year in data.values() :
for subject in year.values() :
for student in subject :
print student[0], student[1].split()[1] # only the last number