Pandas:根据字典中存在的值保留列值,其他列留空

Pandas: Retain the column value based on the value present in dictionary and make other columns as blank

我有一个数据框

df = pd.DataFrame([["A","X",98,56,1,2,3,4], ["B","Z",79,54,36,3,4,8], ["C","Y",98,56,2,5,6,7],["A","Y",79,54,36,12,13,24], ["B","X",98,56,3,6,7,8], ["C","Z",48,51,85,5,6,5]], columns=["id","key","c1","c2","c3","c4","C5","C6"])

我有一本字典

dic = {"X":['c1','c3'],"Y":['c2','c4'],"Z":['c5','c6']}

基于 df 的键列,select 使用字典 dic 的列,仅保留这些列中的行值,并将其他行值设为空白。

例如:对于 df 的键 X,将 C1 和 C3 中的值保留为字典中的值,并将其他列留空。

预期输出:

df_out = pd.DataFrame([["A","X",98,"",1,"","",""], ["B","Z","","","","",4,8], ["C","Y","",56,"",5,"",""],["A","Y","",54,"",12,"",""], ["B","X",98,"",3,"","",""], ["C","Z","","","","",6,5]], columns=["id","key","c1","c2","c3","c4","C5","C6"])

怎么做?

使用Index.difference for not matched columns and set empty strings in DataFrame.loc:

dic = {"X":['c1','c3'],"Y":['c2','c4'],"Z":['C5','C6']}

for k, v in dic.items():
    df.loc[df.key == k, df.columns.difference(v + ['id', 'key'])] = ''

print (df)
  id key  c1  c2 c3  c4 C5 C6
0  A   X  98      1          
1  B   Z                 4  8
2  C   Y      56      5      
3  A   Y      54     12      
4  B   X  98      3          
5  C   Z                 6  5