根据条件突出显示 panda df 错误
Highlight panda df errors based on conditions
美好的一天 SO 社区,
我在尝试逐行突出显示 df 中的错误时遇到了问题。
reference_dict = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
dict = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
df = pd.DataFrame(data=dict)
def highlight_rows(df):
for i in df.index:
if df.jobclass[i] in reference_dict['jobclass']:
print(df.jobclass[i])
return 'background-color: green'
df.style.apply(highlight_rows, axis = 1)
我收到错误:
类型错误: ('string indices must be integers', 'occurred at index 0')
我希望得到的是我的 df,其中突出显示了我 reference_dict 中未找到的值。
任何帮助将不胜感激..干杯!
编辑:
x = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
d = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
df = pd.DataFrame(data=d)
print(df)
def highlight_rows(s):
ret = ["" for i in s.index]
for i in df.index:
if df.jobclass[i] not in x['jobclass']:
ret[s.index.get_loc('Jobs')] = "background-color: yellow"
return ret
df.style.apply(highlight_rows, axis = 1)
尝试了这个并突出显示了整个列而不是我想要的特定行值.. =/
祝你也愉快!
What i hope to get is my df with values not found in my reference_dict being highlighted.
如果您要查找 而不是 在 reference_dict 中要突出显示的值,您是指以下函数吗?
def highlight_rows(df):
for i in df.index:
if df.jobclass[i] not in reference_dict['jobclass']:
print(df.jobclass[i])
return 'background-color: green'
无论哪种方式,为什么要突出显示可以隔离的行?似乎您想查看 df 中的所有作业 类,而 reference_dict 中没有.
import pandas as pd
reference_dict = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
data_dict = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
ref_df = pd.DataFrame(reference_dict)
df = pd.DataFrame(data_dict)
outliers = df.merge(ref_df, how='outer', on='jobclass') # merge the two tables together, how='outer' includes jobclasses which the DataFrames do not have in common. Will automatically generate columns Jobs_x and Jobs_y once joined together because the columns have the same name
outliers = outliers[ outliers['Jobs_y'].isnull() ] # Jobs_y is null when there is no matching jobclass in the reference DataFrame, so we can take advantage of that by filtering
outliers = outliers.drop('Jobs_y', axis=1) # let's drop the junk column after we used it to filter for what we wanted
print("The reference DataFrame is:")
print(ref_df,'\n')
print("The input DataFrame is:")
print(df,'\n')
print("The result is a list of all the jobclasses not in the reference DataFrame and what job is with it:")
print(outliers)
结果是:
The reference DataFrame is:
jobclass Jobs
0 A Teacher
1 B Plumber
The input DataFrame is:
jobclass Jobs
0 A Teacher
1 C Plumber
2 A Policeman
The result is a list of all the jobclasses not in the reference DataFrame and what job is with it:
jobclass Jobs_x
2 C Plumber
这可能是一个切线,但这是我会做的。我根本不知道您可以突出显示 pandas 中的行,很酷的技巧。
您可以将 merge
与参数 indicator
一起用于找到不匹配的值,然后创建 DataFrame
样式:
x = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
d = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
df = pd.DataFrame(data=d)
print (df)
jobclass Jobs
0 A Teacher
1 C Plumber
2 A Policeman
详情:
print (df.merge(pd.DataFrame(x) , on='jobclass', how='left', indicator=True))
jobclass Jobs_x Jobs_y _merge
0 A Teacher Teacher both
1 C Plumber NaN left_only
2 A Policeman Teacher both
def highlight_rows(s):
c1 = 'background-color: yellow'
c2 = ''
df1 = pd.DataFrame(x)
m = s.merge(df1, on='jobclass', how='left', indicator=True)['_merge'] == 'left_only'
df2 = pd.DataFrame(c2, index=s.index, columns=s.columns)
df2.loc[m, 'Jobs'] = c1
return df2
df.style.apply(highlight_rows, axis = None)
美好的一天 SO 社区,
我在尝试逐行突出显示 df 中的错误时遇到了问题。
reference_dict = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
dict = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
df = pd.DataFrame(data=dict)
def highlight_rows(df):
for i in df.index:
if df.jobclass[i] in reference_dict['jobclass']:
print(df.jobclass[i])
return 'background-color: green'
df.style.apply(highlight_rows, axis = 1)
我收到错误: 类型错误: ('string indices must be integers', 'occurred at index 0')
我希望得到的是我的 df,其中突出显示了我 reference_dict 中未找到的值。
任何帮助将不胜感激..干杯!
编辑:
x = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
d = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
df = pd.DataFrame(data=d)
print(df)
def highlight_rows(s):
ret = ["" for i in s.index]
for i in df.index:
if df.jobclass[i] not in x['jobclass']:
ret[s.index.get_loc('Jobs')] = "background-color: yellow"
return ret
df.style.apply(highlight_rows, axis = 1)
尝试了这个并突出显示了整个列而不是我想要的特定行值.. =/
祝你也愉快!
What i hope to get is my df with values not found in my reference_dict being highlighted.
如果您要查找 而不是 在 reference_dict 中要突出显示的值,您是指以下函数吗?
def highlight_rows(df):
for i in df.index:
if df.jobclass[i] not in reference_dict['jobclass']:
print(df.jobclass[i])
return 'background-color: green'
无论哪种方式,为什么要突出显示可以隔离的行?似乎您想查看 df 中的所有作业 类,而 reference_dict 中没有.
import pandas as pd
reference_dict = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
data_dict = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
ref_df = pd.DataFrame(reference_dict)
df = pd.DataFrame(data_dict)
outliers = df.merge(ref_df, how='outer', on='jobclass') # merge the two tables together, how='outer' includes jobclasses which the DataFrames do not have in common. Will automatically generate columns Jobs_x and Jobs_y once joined together because the columns have the same name
outliers = outliers[ outliers['Jobs_y'].isnull() ] # Jobs_y is null when there is no matching jobclass in the reference DataFrame, so we can take advantage of that by filtering
outliers = outliers.drop('Jobs_y', axis=1) # let's drop the junk column after we used it to filter for what we wanted
print("The reference DataFrame is:")
print(ref_df,'\n')
print("The input DataFrame is:")
print(df,'\n')
print("The result is a list of all the jobclasses not in the reference DataFrame and what job is with it:")
print(outliers)
结果是:
The reference DataFrame is:
jobclass Jobs
0 A Teacher
1 B Plumber
The input DataFrame is:
jobclass Jobs
0 A Teacher
1 C Plumber
2 A Policeman
The result is a list of all the jobclasses not in the reference DataFrame and what job is with it:
jobclass Jobs_x
2 C Plumber
这可能是一个切线,但这是我会做的。我根本不知道您可以突出显示 pandas 中的行,很酷的技巧。
您可以将 merge
与参数 indicator
一起用于找到不匹配的值,然后创建 DataFrame
样式:
x = {'jobclass' : ['A','B'], 'Jobs' : ['Teacher','Plumber']}
d = {'jobclass': ['A','C','A'], 'Jobs': ['Teacher', 'Plumber','Policeman']}
df = pd.DataFrame(data=d)
print (df)
jobclass Jobs
0 A Teacher
1 C Plumber
2 A Policeman
详情:
print (df.merge(pd.DataFrame(x) , on='jobclass', how='left', indicator=True))
jobclass Jobs_x Jobs_y _merge
0 A Teacher Teacher both
1 C Plumber NaN left_only
2 A Policeman Teacher both
def highlight_rows(s):
c1 = 'background-color: yellow'
c2 = ''
df1 = pd.DataFrame(x)
m = s.merge(df1, on='jobclass', how='left', indicator=True)['_merge'] == 'left_only'
df2 = pd.DataFrame(c2, index=s.index, columns=s.columns)
df2.loc[m, 'Jobs'] = c1
return df2
df.style.apply(highlight_rows, axis = None)