如何将不同的列值映射到一列
How to map different column values to one column
我下面有一个数据框:
import pandas as pd
df = pd.DataFrame({"SK":["EYF","EYF","RMK","MB","RMK","GYF","RMK","MYF"],
"SA":["a","b","tm","tmb","tm","cd","tms","alb"],
"C":["","11","12","13","","15","16","17"]})
df
我想将“SK”、“SA”和“C”的一些值映射到新列:
df["D"]= df["SK"].map({"EYF":1,"MB":2,"GYF":3})
df
df["D"]= df["SA"].map({"tm":4})
df
df["D"]= df["C"].map({"16":5,"17":6})
df
但是当我运行下一个映射函数时,前一个映射函数映射的“D”列值变为NaN。
我想在下面获得 df:
我们将不胜感激。
您可以创建 3 个系列,然后将之前 Series
中的缺失值替换为 Series.fillna
or Series.combine_first
:
a = df["SK"].map({"EYF":1,"MB":2,"GYF":3})
b = df["SA"].map({"tm":4})
c = df["C"].map({"16":5,"17":6})
df["D"] = a.fillna(b).fillna(c)
#alternative
df["D"] = a.combine_first(b).combine_first(c)
print (df)
SK SA C D
0 EYF a 1.0
1 EYF b 11 1.0
2 RMK tm 12 4.0
3 MB tmb 13 2.0
4 RMK tm 4.0
5 GYF cd 15 3.0
6 RMK tms 16 5.0
7 MYF alb 17 6.0
如果某些值相同,则顺序对于优先级很重要:
df = pd.DataFrame({"SK":["EYF","EYF"],
"SA":["a","tm"],
"C":["16","17"]})
a = df["SK"].map({"EYF":1,"MB":2,"GYF":3})
b = df["SA"].map({"tm":4})
c = df["C"].map({"16":5,"17":6})
df["D1"] = a.fillna(b).fillna(c)
df["D2"] = b.fillna(a).fillna(c)
df["D3"] = c.fillna(b).fillna(a)
print (df)
SK SA C D1 D2 D3
0 EYF a 16 1 1.0 5
1 EYF tm 17 1 4.0 6
我下面有一个数据框:
import pandas as pd
df = pd.DataFrame({"SK":["EYF","EYF","RMK","MB","RMK","GYF","RMK","MYF"],
"SA":["a","b","tm","tmb","tm","cd","tms","alb"],
"C":["","11","12","13","","15","16","17"]})
df
我想将“SK”、“SA”和“C”的一些值映射到新列:
df["D"]= df["SK"].map({"EYF":1,"MB":2,"GYF":3})
df
df["D"]= df["SA"].map({"tm":4})
df
df["D"]= df["C"].map({"16":5,"17":6})
df
但是当我运行下一个映射函数时,前一个映射函数映射的“D”列值变为NaN。 我想在下面获得 df:
我们将不胜感激。
您可以创建 3 个系列,然后将之前 Series
中的缺失值替换为 Series.fillna
or Series.combine_first
:
a = df["SK"].map({"EYF":1,"MB":2,"GYF":3})
b = df["SA"].map({"tm":4})
c = df["C"].map({"16":5,"17":6})
df["D"] = a.fillna(b).fillna(c)
#alternative
df["D"] = a.combine_first(b).combine_first(c)
print (df)
SK SA C D
0 EYF a 1.0
1 EYF b 11 1.0
2 RMK tm 12 4.0
3 MB tmb 13 2.0
4 RMK tm 4.0
5 GYF cd 15 3.0
6 RMK tms 16 5.0
7 MYF alb 17 6.0
如果某些值相同,则顺序对于优先级很重要:
df = pd.DataFrame({"SK":["EYF","EYF"],
"SA":["a","tm"],
"C":["16","17"]})
a = df["SK"].map({"EYF":1,"MB":2,"GYF":3})
b = df["SA"].map({"tm":4})
c = df["C"].map({"16":5,"17":6})
df["D1"] = a.fillna(b).fillna(c)
df["D2"] = b.fillna(a).fillna(c)
df["D3"] = c.fillna(b).fillna(a)
print (df)
SK SA C D1 D2 D3
0 EYF a 16 1 1.0 5
1 EYF tm 17 1 4.0 6