使用带有元组键的字典替换 Pandas DataFrame 中的列值
Replace column values in Pandas DataFrame using dict with tuple key
我有两个 Pandas DataFrame,一个包含我要更新的数据,另一个提供基于 MultiIndex 键的查找以设置值。
例如,我有两个 csv:
fruit.csv
Fruit,Color,State,more,data
Apple,Red,Good,etc.,etc.
Apple,Green,Mouldy,etc.,etc.
Apple,Green,Excellent,etc.,etc.
Pear,Red,Excellent,etc.,etc.
Pear,Green,Good,etc.,etc.
Lime,Green,Bad,etc.,etc.
rating.csv
Fruit,State,Rating
Apple,Excellent,11
Apple,Good,8
Apple,Bad,4
Apple,Mouldy,0
Pear,Excellent,9
Pear,Good,5
Pear,Bad,2
Pear,Mouldy,1
Lime,Excellent,10
Lime,Good,7
Lime,Bad,5
Lime,Mouldy,2
我读入了 DataFrames:
static_data_dir = Path(__file__).resolve().parent
fruit = pd.read_csv(static_data_dir.joinpath("fruit.csv"), index_col=["Fruit","Color"])
rating = pd.read_csv(static_data_dir.joinpath("rating.csv"), index_col=["Fruit","State"])
State more data
Fruit Color
Apple Red Good etc. etc.
Green Mouldy etc. etc.
Green Excellent etc. etc.
Pear Red Excellent etc. etc.
Green Good etc. etc.
Lime Green Bad etc. etc.
Rating
Fruit State
Apple Excellent 11
Good 8
Bad 4
Mouldy 0
Pear Excellent 9
Good 5
Bad 2
Mouldy 1
Lime Excellent 10
Good 7
Bad 5
Mouldy 2
现在想用评级 DataFrame 中的评级值替换水果 DataFrame 中的 State 值,结果如下。
State more data
Fruit Color
Apple Red 8 etc. etc.
Green 0 etc. etc.
Green 11 etc. etc.
Pear Red 9 etc. etc.
Green 5 etc. etc.
Lime Green 5 etc. etc.
实际上我想使用 pandas.Series.replace
但传入带有元组键的字典,但这似乎不受支持。
{'Rating': {('Apple', 'Bad'): 4,
('Apple', 'Excellent'): 11,
('Apple', 'Good'): 8,
('Apple', 'Mouldy'): 0,
('Lime', 'Bad'): 5,
('Lime', 'Excellent'): 10,
('Lime', 'Good'): 7,
('Lime', 'Mouldy'): 2,
('Pear', 'Bad'): 2,
('Pear', 'Excellent'): 9,
('Pear', 'Good'): 5,
('Pear', 'Mouldy'): 1}}
我将如何最好地实现这一目标?
读取两个 csv 作为普通数据帧,然后通过设置 how="left"
使用来自 fruit
数据帧的键在 Fruit
和 State
列上 merge。最后将 Fruit
和 Color
列设置为索引。
import pandas as pd
fruit = pd.read_csv("fruit.csv")
rating = pd.read_csv("rating.csv")
fruit['State'] = fruit.merge(rating, on=["Fruit", "State"], how="left")["Rating"]
fruit.set_index(["Fruit","Color"], inplace=True)
print(fruit)
State more data
Fruit Color
Apple Red 8 etc. etc.
Green 0 etc. etc.
Green 11 etc. etc.
Pear Red 9 etc. etc.
Green 5 etc. etc.
Lime Green 5 etc. etc.
我有两个 Pandas DataFrame,一个包含我要更新的数据,另一个提供基于 MultiIndex 键的查找以设置值。
例如,我有两个 csv:
fruit.csv
Fruit,Color,State,more,data
Apple,Red,Good,etc.,etc.
Apple,Green,Mouldy,etc.,etc.
Apple,Green,Excellent,etc.,etc.
Pear,Red,Excellent,etc.,etc.
Pear,Green,Good,etc.,etc.
Lime,Green,Bad,etc.,etc.
rating.csv
Fruit,State,Rating
Apple,Excellent,11
Apple,Good,8
Apple,Bad,4
Apple,Mouldy,0
Pear,Excellent,9
Pear,Good,5
Pear,Bad,2
Pear,Mouldy,1
Lime,Excellent,10
Lime,Good,7
Lime,Bad,5
Lime,Mouldy,2
我读入了 DataFrames:
static_data_dir = Path(__file__).resolve().parent
fruit = pd.read_csv(static_data_dir.joinpath("fruit.csv"), index_col=["Fruit","Color"])
rating = pd.read_csv(static_data_dir.joinpath("rating.csv"), index_col=["Fruit","State"])
State more data
Fruit Color
Apple Red Good etc. etc.
Green Mouldy etc. etc.
Green Excellent etc. etc.
Pear Red Excellent etc. etc.
Green Good etc. etc.
Lime Green Bad etc. etc.
Rating
Fruit State
Apple Excellent 11
Good 8
Bad 4
Mouldy 0
Pear Excellent 9
Good 5
Bad 2
Mouldy 1
Lime Excellent 10
Good 7
Bad 5
Mouldy 2
现在想用评级 DataFrame 中的评级值替换水果 DataFrame 中的 State 值,结果如下。
State more data
Fruit Color
Apple Red 8 etc. etc.
Green 0 etc. etc.
Green 11 etc. etc.
Pear Red 9 etc. etc.
Green 5 etc. etc.
Lime Green 5 etc. etc.
实际上我想使用 pandas.Series.replace
但传入带有元组键的字典,但这似乎不受支持。
{'Rating': {('Apple', 'Bad'): 4,
('Apple', 'Excellent'): 11,
('Apple', 'Good'): 8,
('Apple', 'Mouldy'): 0,
('Lime', 'Bad'): 5,
('Lime', 'Excellent'): 10,
('Lime', 'Good'): 7,
('Lime', 'Mouldy'): 2,
('Pear', 'Bad'): 2,
('Pear', 'Excellent'): 9,
('Pear', 'Good'): 5,
('Pear', 'Mouldy'): 1}}
我将如何最好地实现这一目标?
读取两个 csv 作为普通数据帧,然后通过设置 how="left"
使用来自 fruit
数据帧的键在 Fruit
和 State
列上 merge。最后将 Fruit
和 Color
列设置为索引。
import pandas as pd
fruit = pd.read_csv("fruit.csv")
rating = pd.read_csv("rating.csv")
fruit['State'] = fruit.merge(rating, on=["Fruit", "State"], how="left")["Rating"]
fruit.set_index(["Fruit","Color"], inplace=True)
print(fruit)
State more data
Fruit Color
Apple Red 8 etc. etc.
Green 0 etc. etc.
Green 11 etc. etc.
Pear Red 9 etc. etc.
Green 5 etc. etc.
Lime Green 5 etc. etc.