从具有不同列数的 txt 文件中读取数据并将其保存为数据框
reading data from txt file with varying number of columns and saving it as a dataframe
我有一个 data.txt 文件,如下所示:
1000
1 2 3
4 5 6
2000
11 12 13
14 15 16
我希望将其转换为这样的数据框:
1000 1 2 3
1000 4 5 6
2000 11 12 13
2000 14 15 16
我是 Python 的新手,尝试了不同的方法,但仍然无效,非常感谢您的帮助。
# read the file, as sep='\n', then use `str.split` to get the columns
obj = pd.read_csv('data.txt', sep='\n', header=None)[0]
df = obj.str.split(expand=True)
# handle the lable line `1000 or 2000`, as column 1 is null
cond = df[1].isnull()
# column 4 store the lable `1000` and `2000`
# use `ffill()` to fillna with the previous value
df.loc[cond, 4] = df.loc[cond, 0]
df[4] = df[4].ffill()
# reorder the column, and filter the lable row
df = df.loc[~cond,[4, 0, 1, 2]]
df.to_csv('demo.txt', sep=' ', index=False, header=None)
!cat demo.txt
# 1000 1 2 3
# 1000 4 5 6
# 2000 11 12 13
# 2000 14 15 16
df:
4 0 1 2
1 1000 1 2 3
2 1000 4 5 6
4 2000 11 12 13
5 2000 14 15 16
我有一个 data.txt 文件,如下所示:
1000
1 2 3
4 5 6
2000
11 12 13
14 15 16
我希望将其转换为这样的数据框:
1000 1 2 3
1000 4 5 6
2000 11 12 13
2000 14 15 16
我是 Python 的新手,尝试了不同的方法,但仍然无效,非常感谢您的帮助。
# read the file, as sep='\n', then use `str.split` to get the columns
obj = pd.read_csv('data.txt', sep='\n', header=None)[0]
df = obj.str.split(expand=True)
# handle the lable line `1000 or 2000`, as column 1 is null
cond = df[1].isnull()
# column 4 store the lable `1000` and `2000`
# use `ffill()` to fillna with the previous value
df.loc[cond, 4] = df.loc[cond, 0]
df[4] = df[4].ffill()
# reorder the column, and filter the lable row
df = df.loc[~cond,[4, 0, 1, 2]]
df.to_csv('demo.txt', sep=' ', index=False, header=None)
!cat demo.txt
# 1000 1 2 3
# 1000 4 5 6
# 2000 11 12 13
# 2000 14 15 16
df:
4 0 1 2
1 1000 1 2 3
2 1000 4 5 6
4 2000 11 12 13
5 2000 14 15 16