如何使用 pandas 间隔查找值，填充另一个数据框

Question

我有两个数据帧（df1、df2）：

和

id  x               val
1   (0.0, 50.0]     1.2
2   (90.0, inf]     0.5
3   (0.0, 50.0]     8.9
3  (50.0, 90.0]     9.9
4   (0.0, 50.0]     4.3
4  (50.0, 90.0]     1.1
4   (90.0, inf]     2.9
5  (50.0, 90.0]     3.2
5   (90.0, inf]     5.1

想要在第一个数据帧 df1 中添加一个新列 x_new，其值取决于第二个数据帧 df2 中的查找 table。根据id和x的值，有一个特殊的乘数，得到新的值x_new：

  x  id   x_new
 35  4    35*4.3
 55  3    55*9.9 
 92  2    ...
 99  5    ...

第二个数据框中的值范围是使用 pandas 剪切创建的：

df2 = df.groupby(['id', pd.cut(df.x, [0,50,90,np.inf])]).apply(lambda x: np.average(x['var1']/x['var2'], weights=x['var1'])).reset_index(name='val')

我的想法是从 pandas 内置 lookup 函数开始：

df1['x_new'] = df.lookup(df.index, df['id'])

不知道如何让它工作。

有关代码的更多信息，另请参阅我的。

Answer 1

可以在 pd.Interval 中找到一个值
- 40 in pd.Interval(0.0, 50.0, closed='right') 计算为 True
同样，如果 pd.Interval 在索引中，使用 .loc 传递的值将找到正确的间隔。
- df2.loc[(3, 35)] 将 return 8.9
- 因为 df2 是 multi-indexed，索引的值作为 tuple.
- 如果 df1 的值不存在于 df2 的索引中，则会出现 KeyError，因此您可能需要使用 try-except 编写一个函数.
  - df1_in_df2 = df1[df1.id.isin(df2.index.get_level_values(0))] 将找到 df2.index

import pandas as pd
import numpy as np

# setupt dataframes
df1 = pd.DataFrame({'id': [4, 3, 2, 5], 'x': [35, 55, 92, 99]})
df2 = pd.DataFrame({'id': [1, 2, 3, 3, 4, 4, 4, 5, 5], 'x': [pd.Interval(0.0, 50.0, closed='right'), pd.Interval(90.0, np.inf, closed='right'), pd.Interval(0.0, 50.0, closed='right'), pd.Interval(50.0, 90.0, closed='right'), pd.Interval(0.0, 50.0, closed='right'), pd.Interval(50.0, 90.0, closed='right'), pd.Interval(90.0, np.inf, closed='right'), pd.Interval(50.0, 90.0, closed='right'), pd.Interval(90.0, np.inf, closed='right')], 'val': [1.2, 0.5, 8.9, 9.9, 4.3, 1.1, 2.9, 3.2, 5.1]})

# set id and x as the index of df2
df2 = df2.set_index(['id', 'x'])

# display(df2)
                 val
id x                
1  (0.0, 50.0]   1.2
2  (90.0, inf]   0.5
3  (0.0, 50.0]   8.9
   (50.0, 90.0]  9.9
4  (0.0, 50.0]   4.3
   (50.0, 90.0]  1.1
   (90.0, inf]   2.9
5  (50.0, 90.0]  3.2
   (90.0, inf]   5.1

# use a lambda expression to pass id and x of df1 as index labels to df2 and return val
df1['val'] = df1.apply(lambda x: df2.loc[(x['id'], x['x'])], axis=1)

# multiple x and val to get x_new
df1['x_new'] = df1.x.mul(df1.val)

# display(df1)
   id   x  val  x_new
0   4  35  4.3  150.5
1   3  55  9.9  544.5
2   2  92  0.5   46.0
3   5  99  5.1  504.9

如何使用 pandas 间隔查找值，填充另一个数据框

How to use a pandas interval to lookup values, to fill another dataframe

python

lookup

dataframe

python-3.x

pandas