将多列折叠(或组合)成两个单独的列 python
Collapse(or combine) multiple columns into two separate columns python
我有一个dataframe
如下图:
8964_real 8964_imag 8965_real 8965_imag 8966_real 8966_imag 8967_real ... 8984_imag 8985_real 8985_imag 8986_real 8986_imag 8987_real 8987_imag
0 112.653120 0.000000 117.104887 0.000000 127.593406 0.000000 129.522106 ... 0.000000 125.423552 0.000000 127.888477 0.000000 136.160979 0.000000
1 -0.315831 16.363974 -2.083329 22.443628 -2.166950 15.026253 0.110502 ... -26.613220 8.454297 -35.000742 11.871405 -24.914035 7.448329 -16.370041
2 -1.863497 10.672129 -6.152232 15.980813 -5.679352 18.976117 -5.775777 ... -11.131600 -18.990022 -9.520732 -11.947319 -4.641286 -17.104710 -5.691642
3 -6.749938 14.870590 -12.222749 15.012352 -10.501423 9.345518 -9.103459 ... -2.860546 -29.862724 -5.237663 -28.791194 -5.685985 -24.565608 -10.385683
4 -2.991405 -10.332938 -4.097638 -10.204587 -12.056221 -5.684882 -12.861357 ... 0.821902 -8.787235 -1.521650 -3.798446 -2.390519 -6.527762 -1.145998
我必须转换上面的数据框,这样 "_real"
列中的值应该在一列下,"_imag"
下的值应该在另一列下
最后应该有两列,一列用于 real and other for imag
。最有效的方法是什么?
我参考这个 link 。但这对一个专栏来说很好,但我需要两个。
另一个想法,我得到的是使用 regex to select columns containing "real"
并按照上面所说的 link 做(对于 imag 也类似),但觉得它有点圆。
感谢任何帮助。
编辑:
例如,real
应该像
real
112.653120
-0.315831
-1.863497
-6.749938
-2.991405
---------
117.104887
-2.083329
-6.152232
-12.222749
-4.097638
---------
127.593406
-2.166950
-5.679352
-10.501423
-12.056221
我画了虚线表示清楚
通过 split
创建 MultiIndex
,因此可以通过 DataFrame.stack
重塑:
df.columns = df.columns.str.split('_', expand=True)
print (df.head(10))
8964 8965 8966 \
real imag real imag real imag
0 112.653120 0.000000 117.104887 0.000000 127.593406 0.000000
1 -0.315831 16.363974 -2.083329 22.443628 -2.166950 15.026253
2 -1.863497 10.672129 -6.152232 15.980813 -5.679352 18.976117
3 -6.749938 14.870590 -12.222749 15.012352 -10.501423 9.345518
4 -2.991405 -10.332938 -4.097638 -10.204587 -12.056221 -5.684882
8967 8984 8985 8986 \
real imag real imag real imag
0 129.522106 0.000000 125.423552 0.000000 127.888477 0.000000
1 0.110502 -26.613220 8.454297 -35.000742 11.871405 -24.914035
2 -5.775777 -11.131600 -18.990022 -9.520732 -11.947319 -4.641286
3 -9.103459 -2.860546 -29.862724 -5.237663 -28.791194 -5.685985
4 -12.861357 0.821902 -8.787235 -1.521650 -3.798446 -2.390519
8987
real imag
0 136.160979 0.000000
1 7.448329 -16.370041
2 -17.104710 -5.691642
3 -24.565608 -10.385683
4 -6.527762 -1.145998
df = df.stack(0).reset_index(level=0, drop=True).rename_axis('a').reset_index()
print (df.head(10))
a imag real
0 8964 0.000000 112.653120
1 8965 0.000000 117.104887
2 8966 0.000000 127.593406
3 8967 NaN 129.522106
4 8984 0.000000 NaN
5 8985 0.000000 125.423552
6 8986 0.000000 127.888477
7 8987 0.000000 136.160979
8 8964 16.363974 -0.315831
9 8965 22.443628 -2.083329
编辑:对于新的数据结构,可以通过 ravel
:
重塑值
a = df.filter(like='real')
b = df.filter(like='imag')
c = a.columns.str.replace('_real', '').astype(int)
print (c)
Int64Index([8964, 8965, 8966, 8967, 8985, 8986, 8987], dtype='int64')
df = pd.DataFrame({'r':a.T.to_numpy().ravel(), 'i':b.T.to_numpy().ravel()},
index=np.tile(c, len(df)))
print (df.head(10))
r i
8964 112.653120 0.000000
8965 -0.315831 16.363974
8966 -1.863497 10.672129
8967 -6.749938 14.870590
8985 -2.991405 -10.332938
8986 117.104887 0.000000
8987 -2.083329 22.443628
8964 -6.152232 15.980813
8965 -12.222749 15.012352
8966 -4.097638 -10.204587
我有一个dataframe
如下图:
8964_real 8964_imag 8965_real 8965_imag 8966_real 8966_imag 8967_real ... 8984_imag 8985_real 8985_imag 8986_real 8986_imag 8987_real 8987_imag
0 112.653120 0.000000 117.104887 0.000000 127.593406 0.000000 129.522106 ... 0.000000 125.423552 0.000000 127.888477 0.000000 136.160979 0.000000
1 -0.315831 16.363974 -2.083329 22.443628 -2.166950 15.026253 0.110502 ... -26.613220 8.454297 -35.000742 11.871405 -24.914035 7.448329 -16.370041
2 -1.863497 10.672129 -6.152232 15.980813 -5.679352 18.976117 -5.775777 ... -11.131600 -18.990022 -9.520732 -11.947319 -4.641286 -17.104710 -5.691642
3 -6.749938 14.870590 -12.222749 15.012352 -10.501423 9.345518 -9.103459 ... -2.860546 -29.862724 -5.237663 -28.791194 -5.685985 -24.565608 -10.385683
4 -2.991405 -10.332938 -4.097638 -10.204587 -12.056221 -5.684882 -12.861357 ... 0.821902 -8.787235 -1.521650 -3.798446 -2.390519 -6.527762 -1.145998
我必须转换上面的数据框,这样 "_real"
列中的值应该在一列下,"_imag"
下的值应该在另一列下
最后应该有两列,一列用于 real and other for imag
。最有效的方法是什么?
我参考这个 link 。但这对一个专栏来说很好,但我需要两个。
另一个想法,我得到的是使用 regex to select columns containing "real"
并按照上面所说的 link 做(对于 imag 也类似),但觉得它有点圆。
感谢任何帮助。
编辑:
例如,real
应该像
real
112.653120
-0.315831
-1.863497
-6.749938
-2.991405
---------
117.104887
-2.083329
-6.152232
-12.222749
-4.097638
---------
127.593406
-2.166950
-5.679352
-10.501423
-12.056221
我画了虚线表示清楚
通过 split
创建 MultiIndex
,因此可以通过 DataFrame.stack
重塑:
df.columns = df.columns.str.split('_', expand=True)
print (df.head(10))
8964 8965 8966 \
real imag real imag real imag
0 112.653120 0.000000 117.104887 0.000000 127.593406 0.000000
1 -0.315831 16.363974 -2.083329 22.443628 -2.166950 15.026253
2 -1.863497 10.672129 -6.152232 15.980813 -5.679352 18.976117
3 -6.749938 14.870590 -12.222749 15.012352 -10.501423 9.345518
4 -2.991405 -10.332938 -4.097638 -10.204587 -12.056221 -5.684882
8967 8984 8985 8986 \
real imag real imag real imag
0 129.522106 0.000000 125.423552 0.000000 127.888477 0.000000
1 0.110502 -26.613220 8.454297 -35.000742 11.871405 -24.914035
2 -5.775777 -11.131600 -18.990022 -9.520732 -11.947319 -4.641286
3 -9.103459 -2.860546 -29.862724 -5.237663 -28.791194 -5.685985
4 -12.861357 0.821902 -8.787235 -1.521650 -3.798446 -2.390519
8987
real imag
0 136.160979 0.000000
1 7.448329 -16.370041
2 -17.104710 -5.691642
3 -24.565608 -10.385683
4 -6.527762 -1.145998
df = df.stack(0).reset_index(level=0, drop=True).rename_axis('a').reset_index()
print (df.head(10))
a imag real
0 8964 0.000000 112.653120
1 8965 0.000000 117.104887
2 8966 0.000000 127.593406
3 8967 NaN 129.522106
4 8984 0.000000 NaN
5 8985 0.000000 125.423552
6 8986 0.000000 127.888477
7 8987 0.000000 136.160979
8 8964 16.363974 -0.315831
9 8965 22.443628 -2.083329
编辑:对于新的数据结构,可以通过 ravel
:
a = df.filter(like='real')
b = df.filter(like='imag')
c = a.columns.str.replace('_real', '').astype(int)
print (c)
Int64Index([8964, 8965, 8966, 8967, 8985, 8986, 8987], dtype='int64')
df = pd.DataFrame({'r':a.T.to_numpy().ravel(), 'i':b.T.to_numpy().ravel()},
index=np.tile(c, len(df)))
print (df.head(10))
r i
8964 112.653120 0.000000
8965 -0.315831 16.363974
8966 -1.863497 10.672129
8967 -6.749938 14.870590
8985 -2.991405 -10.332938
8986 117.104887 0.000000
8987 -2.083329 22.443628
8964 -6.152232 15.980813
8965 -12.222749 15.012352
8966 -4.097638 -10.204587