在 pandas 数据框 python 中匹配列并突出显示结果
Match columns and Highlight results in pandas dataframe python
我想在 pandas 数据框中找到两列之间的匹配项并突出显示两列的结果,所以我做了以下代码:
df=pd.DataFrame({'Name':['Zea mays','Zea mays subsp. mexicana','Zea mays subsp. parviglumis'],'ID':[1,2,3],'type':[1.1,1.2,1.3],
'Name.1':['Zea mays subsp. huehuetenangensis','Zea mays subsp. mays','Zea mays'],'ID.1':[1,2,3],'type.1':[1.1,1.2,1.3],
'Name.2':['Zea nicaraguensis','Zea luxurians','Zea perennis'],'ID.2':[1,2,3],'type.2':[1.1,1.2,1.3],
'Name.3':['Capsicum annuum','Capsicum frutescens','Capsicum chinense'],'ID.3':[1,2,3],'type.3':[1.1,1.2,1.3]})
def in_statements(s):
color = 'yellow'
if np.where(str(s.iloc[4]) == str(s.iloc[8])):
color = 'yellow'
else:
color = 'black'
return 'background-color: %s' % color
df.style.applymap(in_statements)
但是,它给了我这个错误:“(“'str' 对象没有属性 'iloc'”,'occurred at index Samples')”
这是输入的例子:
它应该是这样的:
有人能指出我正确的方向吗?谢谢
如果要突出显示 Name
列中的重复值,请使用 Styler.apply
with mask creted by DataFrame.filter
for columns with Name
, reshaped by DataFrame.stack
, get duplicates by Series.duplicated
and reshaped back by Series.unstack
:
df=pd.DataFrame({'Name':['Dog','Dog.1','Dog.3'],'ID':[1,2,3],'type':[1.1,1.2,1.3],
'Name.1':['Dog','cat','Dog.3'],'ID.1':[1,2,3],'type.1':[1.1,1.2,1.3],
'Name.2':['cat','cat.12','Dog.1'],'ID.2':[1,2,3],'type.2':[1.1,1.2,1.3],
'Name.3':['cat.7','cat.13','Dog.3'],'ID.3':[1,2,3],'type.3':[1.1,1.2,1.3]})
print (df)
Name ID type Name.1 ID.1 type.1 Name.2 ID.2 type.2 Name.3 ID.3 \
0 Dog 1 1.1 Dog 1 1.1 cat 1 1.1 cat.7 1
1 Dog.1 2 1.2 cat 2 1.2 cat.12 2 1.2 cat.13 2
2 Dog.3 3 1.3 Dog.3 3 1.3 Dog.1 3 1.3 Dog.3 3
type.3
0 1.1
1 1.2
2 1.3
def color(x):
c1 = 'background-color: yellow'
c = ''
#compare columns
names = x.filter(like='Name')
mask = names.stack().duplicated(keep=False).unstack()
#DataFrame with same index and columns names as original filled empty strings
df1 = pd.DataFrame(c, index=x.index, columns=x.columns)
#modify values of df1 column by boolean mask
df1 = names.mask(mask, c1).reindex(df1.columns, axis=1, fill_value='')
return df1
df.style.apply(color, axis=None).to_excel('df.xlsx', engine='openpyxl', index=False)
我想在 pandas 数据框中找到两列之间的匹配项并突出显示两列的结果,所以我做了以下代码:
df=pd.DataFrame({'Name':['Zea mays','Zea mays subsp. mexicana','Zea mays subsp. parviglumis'],'ID':[1,2,3],'type':[1.1,1.2,1.3],
'Name.1':['Zea mays subsp. huehuetenangensis','Zea mays subsp. mays','Zea mays'],'ID.1':[1,2,3],'type.1':[1.1,1.2,1.3],
'Name.2':['Zea nicaraguensis','Zea luxurians','Zea perennis'],'ID.2':[1,2,3],'type.2':[1.1,1.2,1.3],
'Name.3':['Capsicum annuum','Capsicum frutescens','Capsicum chinense'],'ID.3':[1,2,3],'type.3':[1.1,1.2,1.3]})
def in_statements(s):
color = 'yellow'
if np.where(str(s.iloc[4]) == str(s.iloc[8])):
color = 'yellow'
else:
color = 'black'
return 'background-color: %s' % color
df.style.applymap(in_statements)
但是,它给了我这个错误:“(“'str' 对象没有属性 'iloc'”,'occurred at index Samples')”
这是输入的例子:
它应该是这样的:
有人能指出我正确的方向吗?谢谢
如果要突出显示 Name
列中的重复值,请使用 Styler.apply
with mask creted by DataFrame.filter
for columns with Name
, reshaped by DataFrame.stack
, get duplicates by Series.duplicated
and reshaped back by Series.unstack
:
df=pd.DataFrame({'Name':['Dog','Dog.1','Dog.3'],'ID':[1,2,3],'type':[1.1,1.2,1.3],
'Name.1':['Dog','cat','Dog.3'],'ID.1':[1,2,3],'type.1':[1.1,1.2,1.3],
'Name.2':['cat','cat.12','Dog.1'],'ID.2':[1,2,3],'type.2':[1.1,1.2,1.3],
'Name.3':['cat.7','cat.13','Dog.3'],'ID.3':[1,2,3],'type.3':[1.1,1.2,1.3]})
print (df)
Name ID type Name.1 ID.1 type.1 Name.2 ID.2 type.2 Name.3 ID.3 \
0 Dog 1 1.1 Dog 1 1.1 cat 1 1.1 cat.7 1
1 Dog.1 2 1.2 cat 2 1.2 cat.12 2 1.2 cat.13 2
2 Dog.3 3 1.3 Dog.3 3 1.3 Dog.1 3 1.3 Dog.3 3
type.3
0 1.1
1 1.2
2 1.3
def color(x):
c1 = 'background-color: yellow'
c = ''
#compare columns
names = x.filter(like='Name')
mask = names.stack().duplicated(keep=False).unstack()
#DataFrame with same index and columns names as original filled empty strings
df1 = pd.DataFrame(c, index=x.index, columns=x.columns)
#modify values of df1 column by boolean mask
df1 = names.mask(mask, c1).reindex(df1.columns, axis=1, fill_value='')
return df1
df.style.apply(color, axis=None).to_excel('df.xlsx', engine='openpyxl', index=False)