将 .txt 文件转换为 .csv ,其中每行转到一个新列,每个段落转到一个新行
Convert .txt file to .csv , where each line goes to a new column and each paragraph goes to a new row
我在处理 txt 和 json 数据集方面比较陌生。我在 txt 文件中有一个对话数据集,我想将其转换为 csv 文件,并将每一行转换为一列。当下一个对话框开始时(下一段),它从一个新行开始。所以我得到
格式的数据
Header = ['Q1' , 'A1' , 'Q2' , 'A2' .......]
这里是供参考的数据(此文件为txt格式):
dialog data
1 hello hello what can i help you with today
2 may i have a table in a moderate price range for two in rome with italian cuisine i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call italian rome two moderate
1 hi hello what can i help you with today
2 can you make a restaurant reservation in a expensive price range with british cuisine in rome for eight people i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call british rome eight expensive
1 hi hello what can i help you with today
2 may i have a table in london with spanish cuisine i'm on it
3 <SILENCE> how many people would be in your party
4 we will be six which price range are looking for
5 i am looking for a moderate restaurant ok let me look into some options for you
6 <SILENCE> api_call spanish london six moderate
CSV 文件是由逗号分隔的字符串列表,行之间用换行符 (\n
) 分隔。
由于这种简单的布局,通常不适合包含其中可能包含逗号的字符串,例如对话。
也就是说,对于您的输入文件,可以使用正则表达式用逗号替换任何单个换行符,这有效地满足了“每个新行转换成一列,每个新段落转换成新行”的要求.
import re
with open('input.txt', 'r') as reader:
text = reader.read()
text = re.sub(r"(...)\n", r",", text)
print(text)
with open('output.csv', 'w') as writer:
writer.write(text)
工作示例here。
我在处理 txt 和 json 数据集方面比较陌生。我在 txt 文件中有一个对话数据集,我想将其转换为 csv 文件,并将每一行转换为一列。当下一个对话框开始时(下一段),它从一个新行开始。所以我得到
格式的数据Header = ['Q1' , 'A1' , 'Q2' , 'A2' .......]
这里是供参考的数据(此文件为txt格式): dialog data
1 hello hello what can i help you with today
2 may i have a table in a moderate price range for two in rome with italian cuisine i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call italian rome two moderate
1 hi hello what can i help you with today
2 can you make a restaurant reservation in a expensive price range with british cuisine in rome for eight people i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call british rome eight expensive
1 hi hello what can i help you with today
2 may i have a table in london with spanish cuisine i'm on it
3 <SILENCE> how many people would be in your party
4 we will be six which price range are looking for
5 i am looking for a moderate restaurant ok let me look into some options for you
6 <SILENCE> api_call spanish london six moderate
CSV 文件是由逗号分隔的字符串列表,行之间用换行符 (\n
) 分隔。
由于这种简单的布局,通常不适合包含其中可能包含逗号的字符串,例如对话。
也就是说,对于您的输入文件,可以使用正则表达式用逗号替换任何单个换行符,这有效地满足了“每个新行转换成一列,每个新段落转换成新行”的要求.
import re
with open('input.txt', 'r') as reader:
text = reader.read()
text = re.sub(r"(...)\n", r",", text)
print(text)
with open('output.csv', 'w') as writer:
writer.write(text)
工作示例here。