RegEx Python 查找并打印到新文档

Question

抱歉，如果这是一堆愚蠢的问题，但我有几件事想询问。基本上，我想要做的是获取一个正在发送的文件，其中一堆数据聚集在一起，这些数据应该在不同的行上，对其进行排序，然后在其自己的行上打印每个语句。我不知道的是如何为要转储的所有内容创建一个新文档，我也不知道如何打印到该文档中，每个内容都在其新行上。

我决定尝试使用正则表达式和 Python 来解决这个任务。我希望我的代码查找四个特定字符串（MTH|、SCN|、ENG| 或 HST|）中的任何一个，然后复制它之后的所有内容，直到它再次遇到这四个字符串之一。那时我需要它停止，记录它复制的所有内容，然后开始复制新字符串。我需要让它读取过去的新行并忽略它们，我希望用

来完成

re.DOTALL

基本上，我希望我的代码采用这样的方式：

MTH|stuffstuffstuffSCN|stuffstuffstuffENG|stuffstuffstuffHST|stuffstu
ffstuffSCN|stuffstuffstuffENG|stuffstuffstuffHST|stuffstuffstuffMTH|s
tuffstuffstuffSCN|stuffstuffstuffENG|stuffstuffstuff

然后变成像这样好看且可读的东西：

MTH|stuffstuffstuff

SCN|stuffstuffstuff 

ENG|stuffstuffstuff

HST|stuffstuffstuff

SCN|stuffstuffstuff

ENG|stuffstuffstuff

HST|stuffstuffstuff

MTH|stuffstuffstuff

SCN|stuffstuffstuff

ENG|stuffstuffstuff

同时创建一个新文档并将其全部粘贴到该 .txt 文件中。到目前为止，我的代码如下所示：

import re
re.DOTALL
from __future__ import print_function
NDoc = raw_input("Enter name of to-be-made document")
log = open("C:\Users\XYZ\Desktop\Python\NDoc.txt", "w")
#Need help with this^ How do I make new file instead of opening a file?

nl = list()
file = raw_input("Enter a file to be sorted")
xfile = open(file)

for line in xfile:
        l=line.strip()
        n=re.findall('^([MTH|SCN|ENG|HST][|].)$[MTH|SCN|ENG|HST][|]',l)
                           #Edited out some x's here that I left in, sorry
            if len(n) > 0:
                nl.append(n)
for item in nl:
    print(item, file = log)

在起始文件中，stuffstuffstuff可以是数字、字母和各种符号（包括|），但除了它们应该在的地方之外没有地方MTH| SCN|英文|高铁|发生，所以我想专门寻找这 4 个字符串作为我的开始和结束。

除了能够为列表中的每个项目创建一个新文档并将其粘贴到单独的行中之外，上面的代码是否可以完成我想要做的事情？我可以扫描 .txt 文件和 excel 文件吗？直到星期五我都没有文件来测试它，但我应该在那时完成大部分工作。

哦，还有，做这样的事情：

import.re
re.DOTALL
from __future__ import print_function

我必须在外部设置什么吗？这些插件或东西是我需要导入的，还是它们都内置在 python 中？

Answer 1

此正则表达式将获取您的字符串并在您要分隔的每个字符串之间放置换行符：

re.sub("(\B)(?=((MTH|SCN|ENG|HST)[|]))","\n\n",line)

这是我用来测试的代码：

from __future__ import print_function
import re
#NDoc = raw_input("Enter name of to-be-made document")
#log = open("C:\Users\XYZ\Desktop\Python\NDoc.txt", "w")
#Need help with this^ How do I make new file instead of opening a file?

#nl = list()
#file = raw_input("Enter a file to be sorted")
xfile = open("file2")

for line in xfile:
    l=line.strip()
    n=re.sub("(\B)(?=((MTH|SCN|ENG|HST)[|]))","\n\n",line)
                       #Edited out some x's here that I left in, sorry
    if len(n) > 0:
      nl=n.split("\n")
      for item in nl:
         print(item)

我用没有换行符的输入数据测试了这个版本。我还有一个适用于换行符的版本。如果这不起作用，请告诉我，我会 post 那个版本。

我所做的主要环境更改是我正在从与 python 脚本位于同一目录中的名为 "file2" 的文件中读取，并且我只是将输出写入屏幕。

此版本假定您的数据中有换行符并且只读取整个文件：

from __future__ import print_function
import re
#NDoc = raw_input("Enter name of to-be-made document")   
#log = open("C:\Users\XYZ\Desktop\Python\NDoc.txt", "w")
#Need help with this^ How do I make new file instead of opening a file?

#nl = list()
#file = raw_input("Enter a file to be sorted")
xfile = open("file")

line = xfile.read()
l=line.strip()
l=re.sub("\n","",l)
n=re.sub("(\B)(?=((MTH|SCN|ENG|HST)[|]))","\n\n",l)
print(n)

RegEx Python 查找并打印到新文档

RegEx Python Find and Print to a new document

python

regex

export