使用 python 如何使用模式匹配分隔文本行并将它们存储到不同的文本文件中
Using python how can I separate lines of text using pattern matching and store them into different text file
下面是代码示例,它是一个很长的日志,但我只是粘贴了其中的一个片段。
我需要提取模式之间的线条 ---------------------------------- 并将每个信息存储在每个单独的文本文件。
Like:
------------------
info1
------------------
info2
------------------
info3
------------------
输出:
fetch info1 and store it into file1.txt
fetch info2 and store it into file2.txt
fetch info3 and store it into file3.txt
And so on...
+++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++
**This is the text data :**
------------------------------------------------------------------------
revision88 106 | rohit | 2018-06-08 13:41:46 +0530 (Fri, 08 Jun 2018) | 1 line
initial code import from FinanavialAnalytics branch
------------------------------------------------------------------------
revision88 99 | dhammdip.sawate | 2018-06-04 20:59:48 +0530 (Mon, 04 Jun 2018) | 1 line
Added Little Bit Java Support.!
Index: resources.properties
===================================================================
--- resources.properties (revision 98)
+++ resources.properties (revision 99)
@@ -1,15 +1,15 @@
####################Elastsic Search#########################
ElasticClusterName=UProbe
-ElasticHost=192.168.0.91
+ElasticHost=192.168.0.73
ElasticPort=19300
-esSQLURL=http://192.168.0.91:19200/_sql?sql=
+esSQLURL=http://192.168.0.73:19200/_sql?sql=
resultsize =1024
@@ -72,45 +72,65 @@
secfile /home/sandeep/Desktop/LIC/Uprobe-LIC/Uprobe-Dev.seed
licfile /home/sandeep/Desktop/LIC/Uprobe-LIC/Uprobe-Dev.lic
------------------------------------------------------------------------
revision88 | sandeep.yadav | 2018-05-31 15:31:26 +0530 (Thu, 31 May 2018) | 1 line
Acc_Ref Data front-end side functionality with validation done.
------------------------------------------------------------------------
试试这个:
lg = open("log.txt")
fl = open("temp.txt", 'w')
cnt = 0
for i in lg:
if i == "------------------------------------------------------------------------\n":
fl.close()
cnt += 1
fl = open("file{}.txt".format(str(cnt)), 'w')
else:
fl.write(i)
fl.close()
lg.close()
这甚至可以不使用正则表达式来完成。
我假设主文本文件位于名称为 'text.txt' 的同一目录中,并且您想将文件保存在同一目录中。请根据您的 needs.This 适合您的文件路径更改文件路径:
with open('./text.txt', 'r') as content:
paragraphs = list(filter(lambda x : x != '', content.read().split('------------------------------------------------------------------------')))
for index, para in enumerate(paragraphs):
filepath = './new_file' + str(index) + '.txt'
with open(filepath, 'w') as file:
file.write(para)
如果日志文件不是太大(例如1GB),你可以这样做:
with open('log.log') as f:
content = f.read()
content = content.split('------------------------------------------------------------------------')
for idx, info in enumerate(content):
with open('info{}.txt'.format(idx + 1), 'w') as f:
f.write(info)
我认为“========...”也是其中一种模式,因此我使用了 re 模块...因此如果需要,您可以添加更多模式
[原来有re.compile("-+|=+")]
import re
with open("file.txt", "r") as input_file:
text = input_file.read()
regex = re.compile("-+")
mo = regex.findall(text)
text = text.split("\n")
mo_wanted_patterns = [pattern for pattern in mo if len(pattern) > 5]
print(mo_wanted_patterns)
output_text = []
for index,line in enumerate(text):
if line in mo_wanted_patterns:
filepath = 'new_file' + str(index) + '.txt'
with open(filepath, 'w') as file:
file.write("\n".join(output_text))
output_text = []
elif line not in mo_wanted_patterns:
output_text.append(line)
编辑: 我注意到它的代码比其他人提供的要复杂得多。实施正则表达式使事情变得更加复杂,但很想知道它是否适合你
下面是代码示例,它是一个很长的日志,但我只是粘贴了其中的一个片段。 我需要提取模式之间的线条 ---------------------------------- 并将每个信息存储在每个单独的文本文件。
Like:
------------------
info1
------------------
info2
------------------
info3
------------------
输出:
fetch info1 and store it into file1.txt
fetch info2 and store it into file2.txt
fetch info3 and store it into file3.txt
And so on...
+++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++
**This is the text data :**
------------------------------------------------------------------------
revision88 106 | rohit | 2018-06-08 13:41:46 +0530 (Fri, 08 Jun 2018) | 1 line
initial code import from FinanavialAnalytics branch
------------------------------------------------------------------------
revision88 99 | dhammdip.sawate | 2018-06-04 20:59:48 +0530 (Mon, 04 Jun 2018) | 1 line
Added Little Bit Java Support.!
Index: resources.properties
===================================================================
--- resources.properties (revision 98)
+++ resources.properties (revision 99)
@@ -1,15 +1,15 @@
####################Elastsic Search#########################
ElasticClusterName=UProbe
-ElasticHost=192.168.0.91
+ElasticHost=192.168.0.73
ElasticPort=19300
-esSQLURL=http://192.168.0.91:19200/_sql?sql=
+esSQLURL=http://192.168.0.73:19200/_sql?sql=
resultsize =1024
@@ -72,45 +72,65 @@
secfile /home/sandeep/Desktop/LIC/Uprobe-LIC/Uprobe-Dev.seed
licfile /home/sandeep/Desktop/LIC/Uprobe-LIC/Uprobe-Dev.lic
------------------------------------------------------------------------
revision88 | sandeep.yadav | 2018-05-31 15:31:26 +0530 (Thu, 31 May 2018) | 1 line
Acc_Ref Data front-end side functionality with validation done.
------------------------------------------------------------------------
试试这个:
lg = open("log.txt")
fl = open("temp.txt", 'w')
cnt = 0
for i in lg:
if i == "------------------------------------------------------------------------\n":
fl.close()
cnt += 1
fl = open("file{}.txt".format(str(cnt)), 'w')
else:
fl.write(i)
fl.close()
lg.close()
这甚至可以不使用正则表达式来完成。
我假设主文本文件位于名称为 'text.txt' 的同一目录中,并且您想将文件保存在同一目录中。请根据您的 needs.This 适合您的文件路径更改文件路径:
with open('./text.txt', 'r') as content:
paragraphs = list(filter(lambda x : x != '', content.read().split('------------------------------------------------------------------------')))
for index, para in enumerate(paragraphs):
filepath = './new_file' + str(index) + '.txt'
with open(filepath, 'w') as file:
file.write(para)
如果日志文件不是太大(例如1GB),你可以这样做:
with open('log.log') as f:
content = f.read()
content = content.split('------------------------------------------------------------------------')
for idx, info in enumerate(content):
with open('info{}.txt'.format(idx + 1), 'w') as f:
f.write(info)
我认为“========...”也是其中一种模式,因此我使用了 re 模块...因此如果需要,您可以添加更多模式 [原来有re.compile("-+|=+")]
import re
with open("file.txt", "r") as input_file:
text = input_file.read()
regex = re.compile("-+")
mo = regex.findall(text)
text = text.split("\n")
mo_wanted_patterns = [pattern for pattern in mo if len(pattern) > 5]
print(mo_wanted_patterns)
output_text = []
for index,line in enumerate(text):
if line in mo_wanted_patterns:
filepath = 'new_file' + str(index) + '.txt'
with open(filepath, 'w') as file:
file.write("\n".join(output_text))
output_text = []
elif line not in mo_wanted_patterns:
output_text.append(line)
编辑: 我注意到它的代码比其他人提供的要复杂得多。实施正则表达式使事情变得更加复杂,但很想知道它是否适合你