Python Pandas 列按计数排序
Python Pandas Column Sort by Count
我对 python 和 pandas 还是个新手。我想弄清楚如何根据每列的行数对数据进行排序。
不包括 NaN 的行数是:
np.count_nonzero(~np.isnan(df.iloc[:, j])
,其中 j 是列号
import pandas as pd
x = np.nan
data = {"A": [22133.97, 22151.06, 22836.96, 22035.42, 23306.39, 23881.29, 23206.37], "B": [95.924, 107.1005, 123.5775, 107.8946, x, x, x], "C": [74.169, 74.075, 77.564, 76.338, 79.356, 81.666, x], "D": [36.205, 35.435, 36.542, 34.424, 37.457, x, x], "E": [68.048, 65.554, 68.093, 68.37, 74.233, 77.095, 75.156]}
dates = pd.date_range('1/1/2000', periods = 7)
df = pd.DataFrame(data, index = dates)
数据集现在看起来像这样:
A B C D E
2000-01-01 22133.97 95.9240 74.169 36.205 68.048
2000-01-02 22151.06 107.1005 74.075 35.435 65.554
2000-01-03 22836.96 123.5775 77.564 36.542 68.093
2000-01-04 22035.42 107.8946 76.338 34.424 68.370
2000-01-05 23306.39 NaN 79.356 37.457 74.233
2000-01-06 23881.29 NaN 81.666 NaN 77.095
2000-01-07 23206.37 NaN NaN NaN 75.156
是否有任何内置的 pandas 函数来对列进行排序,如下所示(即 A 列有 7 行,所以它仍然存在,然后搜索下一个最列E 并将其向前移动并将其放在 A 列旁边,依此类推)。
A E C D B
2000-01-01 22133.97 68.048 74.169 36.205 95.9240
2000-01-02 22151.06 65.554 74.075 35.435 107.1005
2000-01-03 22836.96 68.093 77.564 36.542 123.5775
2000-01-04 22035.42 68.370 76.338 34.424 107.8946
2000-01-05 23306.39 74.233 79.356 37.457 NaN
2000-01-06 23881.29 77.095 81.666 NaN NaN
2000-01-07 23206.37 75.156 NaN NaN NaN
非常感谢任何帮助。谢谢!
我认为这段代码可以工作。
df=df[df.isna().sum().sort_values().keys()]
输出:
>>>df[df.isna().sum().sort_values().keys()]
A E C D B
2000-01-01 22133.97 68.048 74.169 36.205 95.9240
2000-01-02 22151.06 65.554 74.075 35.435 107.1005
2000-01-03 22836.96 68.093 77.564 36.542 123.5775
2000-01-04 22035.42 68.370 76.338 34.424 107.8946
2000-01-05 23306.39 74.233 79.356 37.457 NaN
2000-01-06 23881.29 77.095 81.666 NaN NaN
2000-01-07 23206.37 75.156 NaN NaN NaN
我对 python 和 pandas 还是个新手。我想弄清楚如何根据每列的行数对数据进行排序。 不包括 NaN 的行数是:
np.count_nonzero(~np.isnan(df.iloc[:, j])
,其中 j 是列号
import pandas as pd
x = np.nan
data = {"A": [22133.97, 22151.06, 22836.96, 22035.42, 23306.39, 23881.29, 23206.37], "B": [95.924, 107.1005, 123.5775, 107.8946, x, x, x], "C": [74.169, 74.075, 77.564, 76.338, 79.356, 81.666, x], "D": [36.205, 35.435, 36.542, 34.424, 37.457, x, x], "E": [68.048, 65.554, 68.093, 68.37, 74.233, 77.095, 75.156]}
dates = pd.date_range('1/1/2000', periods = 7)
df = pd.DataFrame(data, index = dates)
数据集现在看起来像这样:
A B C D E
2000-01-01 22133.97 95.9240 74.169 36.205 68.048
2000-01-02 22151.06 107.1005 74.075 35.435 65.554
2000-01-03 22836.96 123.5775 77.564 36.542 68.093
2000-01-04 22035.42 107.8946 76.338 34.424 68.370
2000-01-05 23306.39 NaN 79.356 37.457 74.233
2000-01-06 23881.29 NaN 81.666 NaN 77.095
2000-01-07 23206.37 NaN NaN NaN 75.156
是否有任何内置的 pandas 函数来对列进行排序,如下所示(即 A 列有 7 行,所以它仍然存在,然后搜索下一个最列E 并将其向前移动并将其放在 A 列旁边,依此类推)。
A E C D B
2000-01-01 22133.97 68.048 74.169 36.205 95.9240
2000-01-02 22151.06 65.554 74.075 35.435 107.1005
2000-01-03 22836.96 68.093 77.564 36.542 123.5775
2000-01-04 22035.42 68.370 76.338 34.424 107.8946
2000-01-05 23306.39 74.233 79.356 37.457 NaN
2000-01-06 23881.29 77.095 81.666 NaN NaN
2000-01-07 23206.37 75.156 NaN NaN NaN
非常感谢任何帮助。谢谢!
我认为这段代码可以工作。
df=df[df.isna().sum().sort_values().keys()]
输出:
>>>df[df.isna().sum().sort_values().keys()]
A E C D B
2000-01-01 22133.97 68.048 74.169 36.205 95.9240
2000-01-02 22151.06 65.554 74.075 35.435 107.1005
2000-01-03 22836.96 68.093 77.564 36.542 123.5775
2000-01-04 22035.42 68.370 76.338 34.424 107.8946
2000-01-05 23306.39 74.233 79.356 37.457 NaN
2000-01-06 23881.29 77.095 81.666 NaN NaN
2000-01-07 23206.37 75.156 NaN NaN NaN