使用带有元组键的字典替换 Pandas DataFrame 中的列值

Replace column values in Pandas DataFrame using dict with tuple key

我有两个 Pandas DataFrame,一个包含我要更新的数据,另一个提供基于 MultiIndex 键的查找以设置值。

例如,我有两个 csv:

fruit.csv

Fruit,Color,State,more,data
Apple,Red,Good,etc.,etc.
Apple,Green,Mouldy,etc.,etc.
Apple,Green,Excellent,etc.,etc.
Pear,Red,Excellent,etc.,etc.
Pear,Green,Good,etc.,etc.
Lime,Green,Bad,etc.,etc.

rating.csv

Fruit,State,Rating
Apple,Excellent,11
Apple,Good,8
Apple,Bad,4
Apple,Mouldy,0
Pear,Excellent,9
Pear,Good,5
Pear,Bad,2
Pear,Mouldy,1
Lime,Excellent,10
Lime,Good,7
Lime,Bad,5
Lime,Mouldy,2

我读入了 DataFrames:

static_data_dir = Path(__file__).resolve().parent
fruit = pd.read_csv(static_data_dir.joinpath("fruit.csv"), index_col=["Fruit","Color"])
rating = pd.read_csv(static_data_dir.joinpath("rating.csv"), index_col=["Fruit","State"])
                  State  more  data
Fruit Color
Apple Red         Good  etc.  etc.
      Green     Mouldy  etc.  etc.
      Green  Excellent  etc.  etc.
Pear  Red    Excellent  etc.  etc.
      Green       Good  etc.  etc.
Lime  Green        Bad  etc.  etc.
                 Rating
Fruit State
Apple Excellent      11
      Good            8
      Bad             4
      Mouldy          0
Pear  Excellent       9
      Good            5
      Bad             2
      Mouldy          1
Lime  Excellent      10
      Good            7
      Bad             5
      Mouldy          2

现在想用评级 DataFrame 中的评级值替换水果 DataFrame 中的 State 值,结果如下。

                  State  more  data
Fruit Color
Apple Red            8  etc.  etc.
      Green          0  etc.  etc.
      Green         11  etc.  etc.
Pear  Red            9  etc.  etc.
      Green          5  etc.  etc.
Lime  Green          5  etc.  etc.

实际上我想使用 pandas.Series.replace 但传入带有元组键的字典,但这似乎不受支持。

{'Rating': {('Apple', 'Bad'): 4,
            ('Apple', 'Excellent'): 11,
            ('Apple', 'Good'): 8,
            ('Apple', 'Mouldy'): 0,
            ('Lime', 'Bad'): 5,
            ('Lime', 'Excellent'): 10,
            ('Lime', 'Good'): 7,
            ('Lime', 'Mouldy'): 2,
            ('Pear', 'Bad'): 2,
            ('Pear', 'Excellent'): 9,
            ('Pear', 'Good'): 5,
            ('Pear', 'Mouldy'): 1}}

我将如何最好地实现这一目标?

读取两个 csv 作为普通数据帧,然后通过设置 how="left" 使用来自 fruit 数据帧的键在 FruitState 列上 merge。最后将 FruitColor 列设置为索引。

import pandas as pd

fruit = pd.read_csv("fruit.csv")
rating = pd.read_csv("rating.csv")

fruit['State'] = fruit.merge(rating, on=["Fruit", "State"], how="left")["Rating"]

fruit.set_index(["Fruit","Color"], inplace=True)
print(fruit)

             State  more  data
Fruit Color                   
Apple Red        8  etc.  etc.
      Green      0  etc.  etc.
      Green     11  etc.  etc.
Pear  Red        9  etc.  etc.
      Green      5  etc.  etc.
Lime  Green      5  etc.  etc.