在 Python 3 中打开一个文件,重新格式化并写入一个新文件
Open a file, reformat, and write to a new file in Python 3
我是 Python 的新手(几周)。我正在为 Coursera 上的所有人开设 Python 课程,并决定将一些想法扩展到我想编写的应用程序中。
我想把一个写引号的txt文件,去掉一些不需要的字符和换行符,然后把新格式化的字符串写到一个新文件中。该文件将用于在终端中显示随机报价(此处不需要后者)。
txt 文件中的条目如下所示:
“The road to hell is paved with works-in-progress.”
—Philip Roth, WD some other stuff here
“Some other quote.”
—Another Author, Blah blah
我想将以下内容写入新文件:
"The road to hell is paved with works-in-progress." —Phillip Roth
"Some other quote." —Another Author
我想删除引文和作者之间的换行符并替换为 space。我还想从作者之后的逗号中删除所有内容(所以它只是:引用 [space] 作者)。该文件有 73 个,所以我想 运行 通过文件进行这些更改,然后用新格式化的引号写入一个新文件。最终输出将只是:"blah blah blah" -Author
我尝试了各种方法,目前正在 for 循环中遍历文件,将两个片段写入我想加入列表的列表。但是我被卡住了,也不确定这是否矫枉过正。任何帮助将不胜感激。既然我有两个列表,我似乎无法加入它们,而且我不确定这样做是否正确。有什么想法吗?
到目前为止的代码:
fh = open('quotes_source.txt')
quote = list()
author = list()
for line in fh:
# Find quote segment and assign to a string variable
if line.startswith('“'):
phrase_end = line.find('”')+1
phrase_start = line.find('“')
phrase = line[phrase_start:phrase_end]
quote.append(phrase)
# Find author segment and assign to a string variable
if line.startswith('—'):
name_end = line.find(',')
name = line[:name_end]
author.append(name)
print(quote)
print(author)
quote_line="“The road to hell is paved with works-in-progress.”\n—Philip Roth, WD some other stuff here\n"
quote_line=quote_line.replace("\n","")
quote_line=quote_line.split(",")
formatted_quote=""
如果你不确定一行中只有一个逗号
- “以牙还牙。”\n—某人 Roth,废话 blah\n #only one comma
“以牙还牙,以牙还牙”\n—有人罗斯,废话 blah\n #不止一个逗号
len_quote_list=len(quote_line)-1
for part in range(0,len_quote_list):
formatted_quote+=quote_line[part]
formatted_quote+="\n"
或
formatted_quote=quote_line[0]+"\n"
像这样的简单任务不需要正则表达式,您实际上走在正确的轨道上,但您在尝试解析所有内容而不是仅仅流式传输文件并决定剪切位置时陷入困境。
根据您的数据,您希望在以 —
(表示作者)开头的行上剪切,并且您希望从第一个逗号开始剪切该行。据推测,您也想删除空行。因此,一个简单的流修改器看起来像:
# open quotes_source.txt for reading and quotes_processed.txt for writing
with open("quotes_source.txt", "r", encoding="utf-8") as f_in,\
open("quotes_processed.txt", "w", encoding="utf-8") as f_out:
for line in f_in: # read the input file line by line
line = line.strip() # clear out all whitespace, including the new line
if not line: # ignore blank lines
continue
if line[0] == "—": # we found the dash!
# write space, everything up to the first comma and a new line in the end
f_out.write(" " + line.split(",", 1)[0] + "\n")
else:
f_out.write(line) # a quote line, write it immediately
仅此而已。只要数据中没有其他新行,它就会产生您想要的结果,即 quotes_source.txt
文件包含:
“The road to hell is paved with works-in-progress.”
—Philip Roth, WD some other stuff here
“The only thing necessary for the triumph of evil is for good men to do nothing.”
—Edmund Burke, whatever there is
“You know nothing John Snow.”
—The wildling Ygritte, "A Dance With Dragons" - George R.R. Martin
它将生成一个 quotes_processed.txt
文件,其中包含:
“The road to hell is paved with works-in-progress.” —Philip Roth
“The only thing necessary for the triumph of evil is for good men to do nothing.” —Edmund Burke
“You know nothing John Snow.” —The wildling Ygritte
我是 Python 的新手(几周)。我正在为 Coursera 上的所有人开设 Python 课程,并决定将一些想法扩展到我想编写的应用程序中。
我想把一个写引号的txt文件,去掉一些不需要的字符和换行符,然后把新格式化的字符串写到一个新文件中。该文件将用于在终端中显示随机报价(此处不需要后者)。
txt 文件中的条目如下所示:
“The road to hell is paved with works-in-progress.”
—Philip Roth, WD some other stuff here
“Some other quote.”
—Another Author, Blah blah
我想将以下内容写入新文件:
"The road to hell is paved with works-in-progress." —Phillip Roth
"Some other quote." —Another Author
我想删除引文和作者之间的换行符并替换为 space。我还想从作者之后的逗号中删除所有内容(所以它只是:引用 [space] 作者)。该文件有 73 个,所以我想 运行 通过文件进行这些更改,然后用新格式化的引号写入一个新文件。最终输出将只是:"blah blah blah" -Author
我尝试了各种方法,目前正在 for 循环中遍历文件,将两个片段写入我想加入列表的列表。但是我被卡住了,也不确定这是否矫枉过正。任何帮助将不胜感激。既然我有两个列表,我似乎无法加入它们,而且我不确定这样做是否正确。有什么想法吗?
到目前为止的代码:
fh = open('quotes_source.txt')
quote = list()
author = list()
for line in fh:
# Find quote segment and assign to a string variable
if line.startswith('“'):
phrase_end = line.find('”')+1
phrase_start = line.find('“')
phrase = line[phrase_start:phrase_end]
quote.append(phrase)
# Find author segment and assign to a string variable
if line.startswith('—'):
name_end = line.find(',')
name = line[:name_end]
author.append(name)
print(quote)
print(author)
quote_line="“The road to hell is paved with works-in-progress.”\n—Philip Roth, WD some other stuff here\n"
quote_line=quote_line.replace("\n","")
quote_line=quote_line.split(",")
formatted_quote=""
如果你不确定一行中只有一个逗号
- “以牙还牙。”\n—某人 Roth,废话 blah\n #only one comma
“以牙还牙,以牙还牙”\n—有人罗斯,废话 blah\n #不止一个逗号
len_quote_list=len(quote_line)-1 for part in range(0,len_quote_list): formatted_quote+=quote_line[part] formatted_quote+="\n"
或
formatted_quote=quote_line[0]+"\n"
像这样的简单任务不需要正则表达式,您实际上走在正确的轨道上,但您在尝试解析所有内容而不是仅仅流式传输文件并决定剪切位置时陷入困境。
根据您的数据,您希望在以 —
(表示作者)开头的行上剪切,并且您希望从第一个逗号开始剪切该行。据推测,您也想删除空行。因此,一个简单的流修改器看起来像:
# open quotes_source.txt for reading and quotes_processed.txt for writing
with open("quotes_source.txt", "r", encoding="utf-8") as f_in,\
open("quotes_processed.txt", "w", encoding="utf-8") as f_out:
for line in f_in: # read the input file line by line
line = line.strip() # clear out all whitespace, including the new line
if not line: # ignore blank lines
continue
if line[0] == "—": # we found the dash!
# write space, everything up to the first comma and a new line in the end
f_out.write(" " + line.split(",", 1)[0] + "\n")
else:
f_out.write(line) # a quote line, write it immediately
仅此而已。只要数据中没有其他新行,它就会产生您想要的结果,即 quotes_source.txt
文件包含:
“The road to hell is paved with works-in-progress.” —Philip Roth, WD some other stuff here “The only thing necessary for the triumph of evil is for good men to do nothing.” —Edmund Burke, whatever there is “You know nothing John Snow.” —The wildling Ygritte, "A Dance With Dragons" - George R.R. Martin
它将生成一个 quotes_processed.txt
文件,其中包含:
“The road to hell is paved with works-in-progress.” —Philip Roth “The only thing necessary for the triumph of evil is for good men to do nothing.” —Edmund Burke “You know nothing John Snow.” —The wildling Ygritte