将 .txt 文件转换为 .csv ,其中每行转到一个新列,每个段落转到一个新行

Convert .txt file to .csv , where each line goes to a new column and each paragraph goes to a new row

我在处理 txt 和 json 数据集方面比较陌生。我在 txt 文件中有一个对话数据集,我想将其转换为 csv 文件,并将每一行转换为一列。当下一个对话框开始时(下一段),它从一个新行开始。所以我得到

格式的数据
Header = ['Q1' , 'A1' , 'Q2' , 'A2' .......]

这里是供参考的数据(此文件为txt格式): dialog data

1 hello hello what can i help you with today
2 may i have a table in a moderate price range for two in rome with italian cuisine i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call italian rome two moderate

1 hi    hello what can i help you with today
2 can you make a restaurant reservation in a expensive price range with british cuisine in rome for eight people    i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call british rome eight expensive

1 hi    hello what can i help you with today
2 may i have a table in london with spanish cuisine i'm on it
3 <SILENCE> how many people would be in your party
4 we will be six    which price range are looking for
5 i am looking for a moderate restaurant    ok let me look into some options for you
6 <SILENCE> api_call spanish london six moderate

CSV 文件是由逗号分隔的字符串列表,行之间用换行符 (\n) 分隔。

由于这种简单的布局,通常不适合包含其中可能包含逗号的字符串,例如对话。

也就是说,对于您的输入文件,可以使用正则表达式用逗号替换任何单个换行符,这有效地满足了“每个新行转换成一列,每个新段落转换成新行”的要求.

import re

with open('input.txt', 'r') as reader:
    text = reader.read()

text = re.sub(r"(...)\n", r",", text)
print(text)

with open('output.csv', 'w') as writer:
    writer.write(text)

工作示例here