在文件行中搜索相同的字符串
Search same string in line of file
在一个文件中,我有几行具有这种结构:
> Present one time: "Instance: ...Edition: ..."
> Present two times: "Instance: ...Edition: ...Instance: ...Edition: ..."
> Present n times: "Instance: ...Edition: ... [n] Instance: ...Edition: ..."
这个结构可以每行出现一次,也可以在同一行出现多次。 思路是逐行读取文件,将[=18代表的值隔离开来=]... 并将它们写入 excel 文件中。我可以做到,但如果上面的结构在行中出现一次,我只能隔离这些值。如果结构不止一次出现在线上,我只能保存第一个结构的值。
这是我的代码:
#READ FILE
for i in fin:
if "Instance:" in i:
instance = ((i.split('Instance:'))[1].split('Edition')[0])
worksheet.write(row, col, instance)
if "Edition:" in i:
edition = ((i.split('Edition:'))[1].split('\n')[0])
worksheet.write(row, col, edition)
row += 1
知道如何解决这个问题吗?
请注意,这仅在您的输入以空行(称为换行符)结尾时有效。
如果没有,您可以这样添加:s += '\n'
s = '''Instance: A Edition: Limited
Instance: B Edition: Common Instance: C Edition: 2020 Instance: D Edition: Bla
'''
result = []
start_in = start_ed = None
for i in range(len(s)):
# Reaching the end of a data item
if s[i:i+9] == 'Instance:' or s[i] == '\n':
if start_in and start_ed:
result.append(
(s[start_in:start_ed-8].strip(), s[start_ed:i].strip())
)
start_in = start_ed = None
if s[i:i+9] == 'Instance:':
start_in = i+9
if s[i:i+8] == 'Edition:':
start_ed = i+8
print(result)
[('A', 'Limited'), ('B', 'Common'), ('C', '2020'), ('D', 'Bla')]
编辑: 根据要求使用 Version
字段
s = '''Instance: A Edition: Limited Version: 1
Instance: B Edition: Common Version: 2 Instance: C Edition: 2020 Version: 3 Instance: D Edition: Bla Version: 4
'''
result = []
start_in = start_ed = start_vs = None
for i in range(len(s)):
# Reaching the end of a data item
if s[i:i+9] == 'Instance:' or s[i] == '\n':
if start_in and start_ed and start_vs:
result.append((
s[start_in:start_ed-8].strip(),
s[start_ed:start_vs-8].strip(),
s[start_vs:i].strip()
))
start_in = start_ed = start_vs = None
if s[i:i+9] == 'Instance:':
start_in = i+9
if s[i:i+8] == 'Edition:':
start_ed = i+8
if s[i:i+8] == 'Version:':
start_vs = i+8
print(result)
使用 regular expression 的替代解决方案。这更短但可能更难阅读和维护:
import re
r = re.findall(r'Instance:([\w|\s]+?)Edition:([\w|\s]+?)(?=Instance|\n)', s)
[(' A ', ' Limited'), (' B ', ' Common '), (' C ', ' 2020 '), (' D ', ' Bla')]
如果您不希望匹配项周围有空格,您可以像我在其他解决方案中那样对所有元素应用 strip
,或者您可以修改正则表达式以读取 Instance: ([\w...
在一个文件中,我有几行具有这种结构:
> Present one time: "Instance: ...Edition: ..."
> Present two times: "Instance: ...Edition: ...Instance: ...Edition: ..."
> Present n times: "Instance: ...Edition: ... [n] Instance: ...Edition: ..."
这个结构可以每行出现一次,也可以在同一行出现多次。 思路是逐行读取文件,将[=18代表的值隔离开来=]... 并将它们写入 excel 文件中。我可以做到,但如果上面的结构在行中出现一次,我只能隔离这些值。如果结构不止一次出现在线上,我只能保存第一个结构的值。 这是我的代码:
#READ FILE
for i in fin:
if "Instance:" in i:
instance = ((i.split('Instance:'))[1].split('Edition')[0])
worksheet.write(row, col, instance)
if "Edition:" in i:
edition = ((i.split('Edition:'))[1].split('\n')[0])
worksheet.write(row, col, edition)
row += 1
知道如何解决这个问题吗?
请注意,这仅在您的输入以空行(称为换行符)结尾时有效。
如果没有,您可以这样添加:s += '\n'
s = '''Instance: A Edition: Limited
Instance: B Edition: Common Instance: C Edition: 2020 Instance: D Edition: Bla
'''
result = []
start_in = start_ed = None
for i in range(len(s)):
# Reaching the end of a data item
if s[i:i+9] == 'Instance:' or s[i] == '\n':
if start_in and start_ed:
result.append(
(s[start_in:start_ed-8].strip(), s[start_ed:i].strip())
)
start_in = start_ed = None
if s[i:i+9] == 'Instance:':
start_in = i+9
if s[i:i+8] == 'Edition:':
start_ed = i+8
print(result)
[('A', 'Limited'), ('B', 'Common'), ('C', '2020'), ('D', 'Bla')]
编辑: 根据要求使用 Version
字段
s = '''Instance: A Edition: Limited Version: 1
Instance: B Edition: Common Version: 2 Instance: C Edition: 2020 Version: 3 Instance: D Edition: Bla Version: 4
'''
result = []
start_in = start_ed = start_vs = None
for i in range(len(s)):
# Reaching the end of a data item
if s[i:i+9] == 'Instance:' or s[i] == '\n':
if start_in and start_ed and start_vs:
result.append((
s[start_in:start_ed-8].strip(),
s[start_ed:start_vs-8].strip(),
s[start_vs:i].strip()
))
start_in = start_ed = start_vs = None
if s[i:i+9] == 'Instance:':
start_in = i+9
if s[i:i+8] == 'Edition:':
start_ed = i+8
if s[i:i+8] == 'Version:':
start_vs = i+8
print(result)
使用 regular expression 的替代解决方案。这更短但可能更难阅读和维护:
import re
r = re.findall(r'Instance:([\w|\s]+?)Edition:([\w|\s]+?)(?=Instance|\n)', s)
[(' A ', ' Limited'), (' B ', ' Common '), (' C ', ' 2020 '), (' D ', ' Bla')]
如果您不希望匹配项周围有空格,您可以像我在其他解决方案中那样对所有元素应用 strip
,或者您可以修改正则表达式以读取 Instance: ([\w...