创建一个新列，该列取决于 pandas 中其他列的值范围

Question

我有一个示例 pandas 数据框：

datetime               column1
2021.04.10 01:00:00.    10
2021.04.11 02:00:00     15
2021.04.11 03:00:00.     5
2021.04.11 04:00:00.    20
2021.04.11 05:00:00.    15
2021.04.11 06:00:00.    2

我想创建一个名为 position 的新列，如果 clolumn1 的值小于 10，则给出 25%，当 column1 的值 >=10 且 <15 时，值为 40%，值为 100 % 当 column1 值 >=15.

示例输出如下所示：

datetime               column1.  position
2021.04.10 01:00:00.    10.        40%
2021.04.11 02:00:00     15.        100%
2021.04.11 03:00:00.     5.         25%
2021.04.11 04:00:00.    20.         100%
2021.04.11 05:00:00.    15.         100%
2021.04.11 06:00:00.    2.          25%

Answer 1

`pd.cut`

这是一种使用 pd.cut 将 bin/categorize 中的值 column1 转换为具有预定义标签的离散区间的方法。

df['position'] = pd.cut(df['column1'], 
                        bins=[-np.inf, 10, 15, np.inf], 
                        labels=['25%', '40%', '100%'], right=False)

               datetime  column1 position
0  2021.04.10 01:00:00.       10      40%
1  2021.04.11 02:00:00        15     100%
2  2021.04.11 03:00:00.        5      25%
3  2021.04.11 04:00:00.       20     100%
4  2021.04.11 05:00:00.       15     100%
5  2021.04.11 06:00:00.        2      25%

Answer 2

Pandas apply 可以迟到但绝不会缺席:).

df['position'] = df['column1'].apply(lambda value: '25%' if value < 10 else ('40%' if value < 15 else '100%'))

print(df)

              datetime  column1 position
0  2021.04.10 01:00:00       10      40%
1  2021.04.11 02:00:00       15     100%
2  2021.04.11 03:00:00        5      25%
3  2021.04.11 04:00:00       20     100%
4  2021.04.11 05:00:00       15     100%
5  2021.04.11 06:00:00        2      25%

创建一个新列，该列取决于 pandas 中其他列的值范围

Creat a new column which depends on the range of values of other columns in pandas

python

numpy

pandas

data-science

`pd.cut`