快速将 pandas 列乘以年度系数

Question

我有一个带有日期时间索引的数据框：

df = pd.DataFrame(
    {'test':[1, 1, 1, 1, 1, 1]},
    index=[
        '2018-01-01', '2018-01-02', '2018-01-03',
        '2019-01-03', '2019-01-02', '2020-01-02'
    ]
 )
df.index=  pd.to_datetime(df.index)

我有一个年度参数：

yearly_parameter = [1, 2, 3]

我想有效地（以矢量化方式？）乘以列 'test' 与列表 yearly_parameter 中包含的相应年度参数（第一个值是 2018 年，第二个是 2019 年， 2020 年排名第三）。我怎样才能有效地做到这一点？列表是存储这些年度参数以进行计算的好方法吗？

我希望列中出现以下结果，比如 'answer':

df['answer'] = [1, 1, 1, 2, 2, 3]

print(df)

              test  answer
2018-01-01     1       1
2018-01-02     1       1
2018-01-03     1       1
2019-01-03     1       2
2019-01-02     1       2
2020-01-02     1       3

非常感谢您的帮助，

皮埃尔

Answer 1

`pd.factorize`

使用 factorize 建立应与 yearly_parameter 中的元素相对应的年份顺序。然后我们可以高效地与数组切片相乘。

这预计 yearly_parameter 的长度至少与 df.index

中唯一年份的数量一样长

f, y = pd.factorize(df.index.year)

yearly_parameter = np.array([1, 2, 3])

df.assign(answer=df.test.values * yearly_parameter[f])

            test  answer
2018-01-01     1       1
2018-01-02     1       1
2018-01-03     1       1
2019-01-03     1       2
2019-01-02     1       2
2020-01-02     1       3

`np.unique`

请注意，这假定 yearly_parameter 将其第一个元素与观察到的第一年对齐。如果您希望第一个元素对应于观察到的最小年份，那么您应该使用 pd.factorize(df.index.year, sort=True)。或者更好的是，如果您要排序，则在 Numpy

中使用等效计算

y, f = np.unique(df.index.year, return_inverse=True)

yearly_parameter = np.array([1, 2, 3])

df.assign(answer=df.test.values * yearly_parameter[f])

            test  answer
2018-01-01     1       1
2018-01-02     1       1
2018-01-03     1       1
2019-01-03     1       2
2019-01-02     1       2
2020-01-02     1       3

快速将 pandas 列乘以年度系数

Multiplying a pandas column by a yearly coefficient in a fast way

python

performance

multiplication

pandas

`pd.factorize`

`np.unique`