如何在检查条件后分配 pandas 数据框中的项目？

Question

我正在遍历 pandas 数据框（最初是一个 csv 文件）并检查特定列的每一行中的特定关键字。如果它至少出现一次，我就给分数加 1。大约有 7 个关键字，如果分数 >=6，我想为另一列（但在这一行）的项目分配一个字符串（这里是“软件和应用程序开发人员”）并确保分数。不幸的是，分数到处都是一样的，我很难相信。到目前为止，这是我的代码：

for row in data.iterrows():
devScore=0
if row[1].str.contains("developer").any() | row[1].str.contains("developpeur").any():
    devScore=devScore+1
if row[1].str.contains("symfony").any():
    devScore=devScore+1
if row[1].str.contains("javascript").any():
    devScore=devScore+1
if row[1].str.contains("java").any() | row[1].str.contains("jee").any():
    devScore=devScore+1
if row[1].str.contains("php").any():
    devScore=devScore+1
if row[1].str.contains("html").any() | row[1].str.contains("html5").any():
    devScore=devScore+1
if row[1].str.contains("application").any() | row[1].str.contains("applications").any():
    devScore=devScore+1
if devScore>=6:
    data["occupation"]="Software and application developer"
    data["score"]=devScore

Answer 1

您在此处为整列分配一个常量：

data["occupation"]="Software and application developer"
data["score"]=devScore

它们应该是：

for idx, row in data.iterrows():
    # blah blah
    #
    .
    .
    data.loc[idx, "occupation"]="Software and application developer"
    data.loc[idx, "score"]=devScore

Answer 2

只需维护一个想要的词列表goodwords，这将执行您正在寻找的逻辑。

import random
import numpy as np

goodwords = ["developer","developpeur","symfony","javascript","java","jee","php","html","html5", "application","applications"]
prefix = ["a","the","junior"]
company = ["apple", "facebook", "alibaba", "grab"]

# build a dataframe where wanted text may occur in a number of columns
df = pd.DataFrame([
{col:f"{prefix[random.randint(0, len(prefix))-1]} {goodwords[random.randint(0, len(goodwords))-1] if random.randint(0,2)<=1 else 'manager'} at {company[random.randint(0, len(company))-1]}" for col in "abcdefgh"}
    for r in range(10)])

# start with a truth matrix that only contains false
matches = np.zeros(df.shape)==1
# build up trues where a goodword is in the text
for w in goodwords:
    matches = matches | df.apply(lambda r: r.str.contains(w))

# spec shows only set score column if it's >=6
# score is the sum across the row of the truth matrix (True==1)
df = (df.assign(match=matches.sum(axis=1),
         score=lambda dfa: np.where(dfa["match"].ge(6), dfa["match"], np.nan),
         occupation=lambda dfa: np.where(dfa["match"].ge(6), "Software and application developer", "wannabe"))
    .drop(columns="match"))

输出

                            a                              b                            c                               d                           e                           f                            g                               h  score                          occupation
             the java at grab         junior manager at grab           the html5 at apple        the applications at grab      junior manager at grab  junior application at grab       junior manager at grab  junior applications at alibaba    NaN                             wannabe
        a manager at facebook     junior application at grab       junior manager at grab          junior symfony at grab    the applications at grab   junior symfony at alibaba    junior developer at apple            a javascript at grab    6.0  Software and application developer
 junior applications at apple                  a php at grab            a manager at grab     junior applications at grab  junior manager at facebook           a php at facebook          the jee at facebook      junior javascript at apple    6.0  Software and application developer
            the html5 at grab                 a jee at apple        junior html5 at apple               a manager at grab          a manager at apple         the manager at grab   the javascript at facebook                the php at apple    NaN                             wannabe
       a applications at grab       junior developer at grab            a manager at grab          the manager at alibaba               a php at grab  junior manager at facebook          the manager at grab           a javascript at apple    NaN                             wannabe
            a manager at grab        junior manager at apple            a manager at grab       junior manager at alibaba   the javascript at alibaba        junior java at apple       a applications at grab            the manager at apple    NaN                             wannabe
          the jee at facebook              the html at apple  junior applications at grab  junior developpeur at facebook        the manager at apple      the javascript at grab           junior jee at grab       a developpeur at facebook    7.0  Software and application developer
  junior developer at alibaba        the manager at facebook                a jee at grab               a manager at grab     the manager at facebook    the applications at grab         a manager at alibaba      junior application at grab    NaN                             wannabe
         the manager at apple  junior application at alibaba  the application at facebook          junior manager at grab     junior manager at apple     junior manager at apple         the manager at apple          the symfony at alibaba    NaN                             wannabe
        junior html5 at apple    the applications at alibaba            a manager at grab          junior manager at grab    junior html5 at facebook   junior manager at alibaba  junior applications at grab        junior developer at grab    NaN                             wannabe

如何在检查条件后分配 pandas 数据框中的项目？

How to assign an item in a pandas dataframe after checking for conditions?

python

csv

text-mining

pandas

输出