在 python 中使用自定义多分隔符将文本文件转换为数据框
Convert text file into dataframe with custom multiple delimiter in python
我是 python 的新手。我有一个 txt
文件。它包含一些数据,例如
0: 480x640 2 persons, 1 cat, 1 clock, 1: 480x640 2 persons, 1 chair, Done. date (0.635s) Tue, 05 April 03:54:02
0: 480x640 3 persons, 1 cat, 1 laptop, 1 clock, 1: 480x640 4 persons, 2 chairs, Done. date (0.587s) Tue, 05 April 03:54:05
0: 480x640 3 persons, 1 chair, 1: 480x640 4 persons, 2 chairs, Done. date (0.582s) Tue, 05 April 03:54:07
我曾经将其转换为 pandas 具有多个定界符的数据帧
我试过代码:
import pandas as pd
`student_csv = pd.read_csv('output.txt', names=['a', 'b','date','status'], sep='[0: 480x640, 1: 480x640 , date]')
student_csv.to_csv('txttocsv.csv', index = None)`
现在如何将它转换成 pandas 数据框,像这样...
a b c
2 persons 2 persons, Done Tue, 05 April03:54:02
如何将文本文件转换为数据帧
要准确了解您的拆分规则是很棘手的。您可以使用正则表达式作为分隔符。
这是一个将列表和日期拆分为列的工作示例,但您可能需要根据自己的具体规则对其进行调整:
df = pd.read_csv('output.txt', sep=r'(?:,\s*|^)(?:\d+: \d+x\d+|Done[^)]+\)\s*)',
header=None, engine='python', names=(None, 'a', 'b', 'date')).iloc[:, 1:]
输出:
a b date
0 2 persons, 1 cat, 1 clock 2 persons, 1 chair Tue, 05 April 03:54:02
1 3 persons, 1 cat, 1 laptop, 1 clock 4 persons, 2 chairs Tue, 05 April 03:54:05
2 3 persons, 1 chair 4 persons, 2 chairs Tue, 05 April 03:54:07
您可以在 sep
参数中使用 |
作为多个分隔符
df = pd.read_csv('data.txt', sep=r'0: 480x640|1: 480x640|date \(.*\)',
engine='python', names=('None', 'a', 'b', 'c')).drop('None', axis=1)
print(df)
a b \
0 2 persons, 1 cat, 1 clock, 2 persons, 1 chair, Done.
1 3 persons, 1 cat, 1 laptop, 1 clock, 4 persons, 2 chairs, Done.
2 3 persons, 1 chair, 4 persons, 2 chairs, Done.
c
0 Tue, 05 April 03:54:02
1 Tue, 05 April 03:54:05
2 Tue, 05 April 03:54:07
我是 python 的新手。我有一个 txt
文件。它包含一些数据,例如
0: 480x640 2 persons, 1 cat, 1 clock, 1: 480x640 2 persons, 1 chair, Done. date (0.635s) Tue, 05 April 03:54:02
0: 480x640 3 persons, 1 cat, 1 laptop, 1 clock, 1: 480x640 4 persons, 2 chairs, Done. date (0.587s) Tue, 05 April 03:54:05
0: 480x640 3 persons, 1 chair, 1: 480x640 4 persons, 2 chairs, Done. date (0.582s) Tue, 05 April 03:54:07
我曾经将其转换为 pandas 具有多个定界符的数据帧
我试过代码:
import pandas as pd
`student_csv = pd.read_csv('output.txt', names=['a', 'b','date','status'], sep='[0: 480x640, 1: 480x640 , date]')
student_csv.to_csv('txttocsv.csv', index = None)`
现在如何将它转换成 pandas 数据框,像这样...
a b c
2 persons 2 persons, Done Tue, 05 April03:54:02
如何将文本文件转换为数据帧
要准确了解您的拆分规则是很棘手的。您可以使用正则表达式作为分隔符。
这是一个将列表和日期拆分为列的工作示例,但您可能需要根据自己的具体规则对其进行调整:
df = pd.read_csv('output.txt', sep=r'(?:,\s*|^)(?:\d+: \d+x\d+|Done[^)]+\)\s*)',
header=None, engine='python', names=(None, 'a', 'b', 'date')).iloc[:, 1:]
输出:
a b date
0 2 persons, 1 cat, 1 clock 2 persons, 1 chair Tue, 05 April 03:54:02
1 3 persons, 1 cat, 1 laptop, 1 clock 4 persons, 2 chairs Tue, 05 April 03:54:05
2 3 persons, 1 chair 4 persons, 2 chairs Tue, 05 April 03:54:07
您可以在 sep
参数中使用 |
作为多个分隔符
df = pd.read_csv('data.txt', sep=r'0: 480x640|1: 480x640|date \(.*\)',
engine='python', names=('None', 'a', 'b', 'c')).drop('None', axis=1)
print(df)
a b \
0 2 persons, 1 cat, 1 clock, 2 persons, 1 chair, Done.
1 3 persons, 1 cat, 1 laptop, 1 clock, 4 persons, 2 chairs, Done.
2 3 persons, 1 chair, 4 persons, 2 chairs, Done.
c
0 Tue, 05 April 03:54:02
1 Tue, 05 April 03:54:05
2 Tue, 05 April 03:54:07