如何在检查条件后分配 pandas 数据框中的项目?
How to assign an item in a pandas dataframe after checking for conditions?
我正在遍历 pandas 数据框(最初是一个 csv 文件)并检查特定列的每一行中的特定关键字。如果它至少出现一次,我就给分数加 1。大约有 7 个关键字,如果分数 >=6,我想为另一列(但在这一行)的项目分配一个字符串(这里是“软件和应用程序开发人员”)并确保分数。不幸的是,分数到处都是一样的,我很难相信。到目前为止,这是我的代码:
for row in data.iterrows():
devScore=0
if row[1].str.contains("developer").any() | row[1].str.contains("developpeur").any():
devScore=devScore+1
if row[1].str.contains("symfony").any():
devScore=devScore+1
if row[1].str.contains("javascript").any():
devScore=devScore+1
if row[1].str.contains("java").any() | row[1].str.contains("jee").any():
devScore=devScore+1
if row[1].str.contains("php").any():
devScore=devScore+1
if row[1].str.contains("html").any() | row[1].str.contains("html5").any():
devScore=devScore+1
if row[1].str.contains("application").any() | row[1].str.contains("applications").any():
devScore=devScore+1
if devScore>=6:
data["occupation"]="Software and application developer"
data["score"]=devScore
您在此处为整列分配一个常量:
data["occupation"]="Software and application developer"
data["score"]=devScore
它们应该是:
for idx, row in data.iterrows():
# blah blah
#
.
.
data.loc[idx, "occupation"]="Software and application developer"
data.loc[idx, "score"]=devScore
只需维护一个想要的词列表goodwords
,这将执行您正在寻找的逻辑。
import random
import numpy as np
goodwords = ["developer","developpeur","symfony","javascript","java","jee","php","html","html5", "application","applications"]
prefix = ["a","the","junior"]
company = ["apple", "facebook", "alibaba", "grab"]
# build a dataframe where wanted text may occur in a number of columns
df = pd.DataFrame([
{col:f"{prefix[random.randint(0, len(prefix))-1]} {goodwords[random.randint(0, len(goodwords))-1] if random.randint(0,2)<=1 else 'manager'} at {company[random.randint(0, len(company))-1]}" for col in "abcdefgh"}
for r in range(10)])
# start with a truth matrix that only contains false
matches = np.zeros(df.shape)==1
# build up trues where a goodword is in the text
for w in goodwords:
matches = matches | df.apply(lambda r: r.str.contains(w))
# spec shows only set score column if it's >=6
# score is the sum across the row of the truth matrix (True==1)
df = (df.assign(match=matches.sum(axis=1),
score=lambda dfa: np.where(dfa["match"].ge(6), dfa["match"], np.nan),
occupation=lambda dfa: np.where(dfa["match"].ge(6), "Software and application developer", "wannabe"))
.drop(columns="match"))
输出
a b c d e f g h score occupation
the java at grab junior manager at grab the html5 at apple the applications at grab junior manager at grab junior application at grab junior manager at grab junior applications at alibaba NaN wannabe
a manager at facebook junior application at grab junior manager at grab junior symfony at grab the applications at grab junior symfony at alibaba junior developer at apple a javascript at grab 6.0 Software and application developer
junior applications at apple a php at grab a manager at grab junior applications at grab junior manager at facebook a php at facebook the jee at facebook junior javascript at apple 6.0 Software and application developer
the html5 at grab a jee at apple junior html5 at apple a manager at grab a manager at apple the manager at grab the javascript at facebook the php at apple NaN wannabe
a applications at grab junior developer at grab a manager at grab the manager at alibaba a php at grab junior manager at facebook the manager at grab a javascript at apple NaN wannabe
a manager at grab junior manager at apple a manager at grab junior manager at alibaba the javascript at alibaba junior java at apple a applications at grab the manager at apple NaN wannabe
the jee at facebook the html at apple junior applications at grab junior developpeur at facebook the manager at apple the javascript at grab junior jee at grab a developpeur at facebook 7.0 Software and application developer
junior developer at alibaba the manager at facebook a jee at grab a manager at grab the manager at facebook the applications at grab a manager at alibaba junior application at grab NaN wannabe
the manager at apple junior application at alibaba the application at facebook junior manager at grab junior manager at apple junior manager at apple the manager at apple the symfony at alibaba NaN wannabe
junior html5 at apple the applications at alibaba a manager at grab junior manager at grab junior html5 at facebook junior manager at alibaba junior applications at grab junior developer at grab NaN wannabe
我正在遍历 pandas 数据框(最初是一个 csv 文件)并检查特定列的每一行中的特定关键字。如果它至少出现一次,我就给分数加 1。大约有 7 个关键字,如果分数 >=6,我想为另一列(但在这一行)的项目分配一个字符串(这里是“软件和应用程序开发人员”)并确保分数。不幸的是,分数到处都是一样的,我很难相信。到目前为止,这是我的代码:
for row in data.iterrows():
devScore=0
if row[1].str.contains("developer").any() | row[1].str.contains("developpeur").any():
devScore=devScore+1
if row[1].str.contains("symfony").any():
devScore=devScore+1
if row[1].str.contains("javascript").any():
devScore=devScore+1
if row[1].str.contains("java").any() | row[1].str.contains("jee").any():
devScore=devScore+1
if row[1].str.contains("php").any():
devScore=devScore+1
if row[1].str.contains("html").any() | row[1].str.contains("html5").any():
devScore=devScore+1
if row[1].str.contains("application").any() | row[1].str.contains("applications").any():
devScore=devScore+1
if devScore>=6:
data["occupation"]="Software and application developer"
data["score"]=devScore
您在此处为整列分配一个常量:
data["occupation"]="Software and application developer"
data["score"]=devScore
它们应该是:
for idx, row in data.iterrows():
# blah blah
#
.
.
data.loc[idx, "occupation"]="Software and application developer"
data.loc[idx, "score"]=devScore
只需维护一个想要的词列表goodwords
,这将执行您正在寻找的逻辑。
import random
import numpy as np
goodwords = ["developer","developpeur","symfony","javascript","java","jee","php","html","html5", "application","applications"]
prefix = ["a","the","junior"]
company = ["apple", "facebook", "alibaba", "grab"]
# build a dataframe where wanted text may occur in a number of columns
df = pd.DataFrame([
{col:f"{prefix[random.randint(0, len(prefix))-1]} {goodwords[random.randint(0, len(goodwords))-1] if random.randint(0,2)<=1 else 'manager'} at {company[random.randint(0, len(company))-1]}" for col in "abcdefgh"}
for r in range(10)])
# start with a truth matrix that only contains false
matches = np.zeros(df.shape)==1
# build up trues where a goodword is in the text
for w in goodwords:
matches = matches | df.apply(lambda r: r.str.contains(w))
# spec shows only set score column if it's >=6
# score is the sum across the row of the truth matrix (True==1)
df = (df.assign(match=matches.sum(axis=1),
score=lambda dfa: np.where(dfa["match"].ge(6), dfa["match"], np.nan),
occupation=lambda dfa: np.where(dfa["match"].ge(6), "Software and application developer", "wannabe"))
.drop(columns="match"))
输出
a b c d e f g h score occupation
the java at grab junior manager at grab the html5 at apple the applications at grab junior manager at grab junior application at grab junior manager at grab junior applications at alibaba NaN wannabe
a manager at facebook junior application at grab junior manager at grab junior symfony at grab the applications at grab junior symfony at alibaba junior developer at apple a javascript at grab 6.0 Software and application developer
junior applications at apple a php at grab a manager at grab junior applications at grab junior manager at facebook a php at facebook the jee at facebook junior javascript at apple 6.0 Software and application developer
the html5 at grab a jee at apple junior html5 at apple a manager at grab a manager at apple the manager at grab the javascript at facebook the php at apple NaN wannabe
a applications at grab junior developer at grab a manager at grab the manager at alibaba a php at grab junior manager at facebook the manager at grab a javascript at apple NaN wannabe
a manager at grab junior manager at apple a manager at grab junior manager at alibaba the javascript at alibaba junior java at apple a applications at grab the manager at apple NaN wannabe
the jee at facebook the html at apple junior applications at grab junior developpeur at facebook the manager at apple the javascript at grab junior jee at grab a developpeur at facebook 7.0 Software and application developer
junior developer at alibaba the manager at facebook a jee at grab a manager at grab the manager at facebook the applications at grab a manager at alibaba junior application at grab NaN wannabe
the manager at apple junior application at alibaba the application at facebook junior manager at grab junior manager at apple junior manager at apple the manager at apple the symfony at alibaba NaN wannabe
junior html5 at apple the applications at alibaba a manager at grab junior manager at grab junior html5 at facebook junior manager at alibaba junior applications at grab junior developer at grab NaN wannabe