如何使用 pandas 为多索引数据框使用映射函数？

Question

我有一个如下所示的数据框

df = pd.DataFrame({'source_code':['11','11','12','13','14',np.nan],
                   'source_description':['test1', 'test1','test2','test3',np.nan,'test5'],
                   'key_id':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})

我还有一个 hash_file 数据框，如下所示

hash_file = pd.DataFrame({'source_id':['11','12','13','14','15'],
                          'source_code':['test1','test2','test3','test4','test5'],
                          'hash_id':[911,512,713,814,616]})
id_file =  hash_file.set_index(['source_id','source_code'])['hash_id']

id_file（source_id、source_code）中不会有重复，永远是唯一的

现在，我想根据 source_code、source_description 与 source_id 和 [ 的匹配条目填写 df 中的 key_id 列=19=] 列来自 hash_file.

所以，我尝试了以下

df['key_id'] = df['source_code','source_description'].map(id_file)

它抛出了一个错误

KeyError: ('source_code', 'source_description')

所以，我尝试了下面的另一种方法

df['key_id'] = df[['source_code','source_description']].map(id_file)

它引发了另一个错误

AttributeError: 'DataFrame' object has no attribute 'map'

因此，我希望我的输出如下所示。请注意，中间可能有 NA，并且必须不区分大小写。这意味着 id_file 中的索引与 df 中的列的比较必须不区分大小写。

我只想使用 map 方法。也欢迎任何其他优雅的方法

source_code source_description  key_id
11            test1              911
11            test1              911
12            test2              512
13            test3              713
14             NaN               814
NaN           test5              616

Answer 1

这似乎是一个相当标准的 merge，但有一些重命名：

(df.merge(hash_file, left_on = ['source_code','source_description'], right_on = ['source_id','source_code'])
    .drop(columns = ['key_id','source_id','source_code_y'])
    .rename(columns = {'source_code_x':'source_code','hash_id':'key_id'})
)

输出


    source_code source_description  key_id
0   11          test1               911
1   11          test1               911
2   12          test2               512
3   13          test3               713

使用`map`（用于更新问题中的输入值）

df['key_id'] = df.set_index(['source_code','source_description']).index.map(id_file)

输出

    source_code source_description  key_id
0   11          test1               911.0
1   11          test1               911.0
2   12          test2               512.0
3   13          test3               713.0
4   14          NaN                 NaN
5   NaN         test5               NaN

如何使用 pandas 为多索引数据框使用映射函数？

how to use map function for multiindex dataframe using pandas?

python

dictionary

series

dataframe

pandas

使用`map`（用于更新问题中的输入值）

如何使用 pandas 为多索引数据框使用映射函数？

how to use map function for multiindex dataframe using pandas?

python

dictionary

series

dataframe

pandas

使用map（用于更新问题中的输入值）

使用`map`（用于更新问题中的输入值）