如果数据帧中存在令牌,则分配 True/False
Assigning True/False if a token is present in a data-frame
我当前的数据框是:
|articleID | keywords |
|:-------- |:------------------------------------------------------:|
0 |58b61d1d | ['Second Avenue (Manhattan, NY)'] |
1 |58b6393b | ['Crossword Puzzles'] |
2 |58b6556e | ['Workplace Hazards and Violations', 'Trump, Donald J']|
3 |58b657fa | ['Trump, Donald J', 'Speeches and Statements']. |
我想要一个类似于下面的数据框,其中根据是否在关键字中提到特朗普令牌 'Trump, Donald J' 添加一列,如果是,则将其分配为 True :
|articleID | keywords | trumpMention |
|:-------- |:------------------------------------------------------:| ------------:|
0 |58b61d1d | ['Second Avenue (Manhattan, NY)'] | False |
1 |58b6393b | ['Crossword Puzzles'] | False |
2 |58b6556e | ['Workplace Hazards and Violations', 'Trump, Donald J']| True |
3 |58b657fa | ['Trump, Donald J', 'Speeches and Statements']. | True |
我尝试了多种使用 df 函数的方法。但是达不到我想要的结果。我尝试过的一些方法是:
df['trumpMention'] = np.where(any(df['keywords']) == 'Trump, Donald J', True, False)
或
df['trumpMention'] = df['keywords'].apply(lambda x: any(token == 'Trump, Donald J') for token in x)
或
lst = ['Trump, Donald J']
df['trumpMention'] = df['keywords'].apply(lambda x: ([ True for token in x if any(token in lst)]))
原始输入:
df = pd.DataFrame({'articleID': ['58b61d1d', '58b6393b', '58b6556e', '58b657fa'],
'keywords': [['Second Avenue (Manhattan, NY)'],
['Crossword Puzzles'],
['Workplace Hazards and Violations', 'Trump, Donald J'],
['Trump, Donald J', 'Speeches and Statements']],
'trumpMention': [False, False, True, True]})
尝试
df["trumpMention"] = df["keywords"].apply(lambda x: "Trump, Donald J" in x)
应用检查集合成员资格的函数如何?
df['trumpMention'] = df['keywords'].apply(lambda x: 'Trump, Donald J' in set(x))
输出:
articleID keywords trumpMention
0 58b61d1d [Second Avenue (Manhattan, NY)] False
1 58b6393b [Crossword Puzzles] False
2 58b6556e [Workplace Hazards and Violations, Trump, Dona... True
3 58b657fa [Trump, Donald J, Speeches and Statements] True
关于您的尝试:
np.where(any(df['keywords']) == 'Trump, Donald J', True, False)
不会工作,因为 any(df['keywords'])
总是计算 True
不等于 'Trump, Donald J'
,所以上面的总是 return array(False)
.
df['keywords'].apply(lambda x: any(token == 'Trump, Donald J') for token in x)
不起作用,因为它引发了 TypeError
因为这里没有理解。
df['keywords'].apply(lambda x: ([ True for token in x if any(token in lst)]))
不起作用,因为 token in lst
是一个布尔值,所以
any(token in lst)
毫无意义。
试试我的方法。我在将其添加到数据框之前创建了一个列表。
def mentioned_Trump(s, lst):
if s in lst:
return True
else:
return False
s = [[1,['Second Avenue (Manhattan, NY)']],[2,['Crossword Puzzles']],
[3, ['Workplace Hazards and Violations', 'Trump, Donald J']],
[4, ['Trump, Donald J', 'Speeches and Statements']]]
import pandas as pd
df = pd.DataFrame(s)
df.columns =['ID','keywords']
s = list( df['keywords'])
s1 = [mentioned_Trump('Trump, Donald J',x) for x in s]
df['trumpMention']= s1
print(df)
使用vectorized方法,比使用apply
更快。
df.keywords.astype(str).str.contains("Trump, Donald J")
我当前的数据框是:
|articleID | keywords |
|:-------- |:------------------------------------------------------:|
0 |58b61d1d | ['Second Avenue (Manhattan, NY)'] |
1 |58b6393b | ['Crossword Puzzles'] |
2 |58b6556e | ['Workplace Hazards and Violations', 'Trump, Donald J']|
3 |58b657fa | ['Trump, Donald J', 'Speeches and Statements']. |
我想要一个类似于下面的数据框,其中根据是否在关键字中提到特朗普令牌 'Trump, Donald J' 添加一列,如果是,则将其分配为 True :
|articleID | keywords | trumpMention |
|:-------- |:------------------------------------------------------:| ------------:|
0 |58b61d1d | ['Second Avenue (Manhattan, NY)'] | False |
1 |58b6393b | ['Crossword Puzzles'] | False |
2 |58b6556e | ['Workplace Hazards and Violations', 'Trump, Donald J']| True |
3 |58b657fa | ['Trump, Donald J', 'Speeches and Statements']. | True |
我尝试了多种使用 df 函数的方法。但是达不到我想要的结果。我尝试过的一些方法是:
df['trumpMention'] = np.where(any(df['keywords']) == 'Trump, Donald J', True, False)
或
df['trumpMention'] = df['keywords'].apply(lambda x: any(token == 'Trump, Donald J') for token in x)
或
lst = ['Trump, Donald J']
df['trumpMention'] = df['keywords'].apply(lambda x: ([ True for token in x if any(token in lst)]))
原始输入:
df = pd.DataFrame({'articleID': ['58b61d1d', '58b6393b', '58b6556e', '58b657fa'],
'keywords': [['Second Avenue (Manhattan, NY)'],
['Crossword Puzzles'],
['Workplace Hazards and Violations', 'Trump, Donald J'],
['Trump, Donald J', 'Speeches and Statements']],
'trumpMention': [False, False, True, True]})
尝试
df["trumpMention"] = df["keywords"].apply(lambda x: "Trump, Donald J" in x)
应用检查集合成员资格的函数如何?
df['trumpMention'] = df['keywords'].apply(lambda x: 'Trump, Donald J' in set(x))
输出:
articleID keywords trumpMention
0 58b61d1d [Second Avenue (Manhattan, NY)] False
1 58b6393b [Crossword Puzzles] False
2 58b6556e [Workplace Hazards and Violations, Trump, Dona... True
3 58b657fa [Trump, Donald J, Speeches and Statements] True
关于您的尝试:
np.where(any(df['keywords']) == 'Trump, Donald J', True, False)
不会工作,因为 any(df['keywords'])
总是计算 True
不等于 'Trump, Donald J'
,所以上面的总是 return array(False)
.
df['keywords'].apply(lambda x: any(token == 'Trump, Donald J') for token in x)
不起作用,因为它引发了 TypeError
因为这里没有理解。
df['keywords'].apply(lambda x: ([ True for token in x if any(token in lst)]))
不起作用,因为 token in lst
是一个布尔值,所以
any(token in lst)
毫无意义。
试试我的方法。我在将其添加到数据框之前创建了一个列表。
def mentioned_Trump(s, lst):
if s in lst:
return True
else:
return False
s = [[1,['Second Avenue (Manhattan, NY)']],[2,['Crossword Puzzles']],
[3, ['Workplace Hazards and Violations', 'Trump, Donald J']],
[4, ['Trump, Donald J', 'Speeches and Statements']]]
import pandas as pd
df = pd.DataFrame(s)
df.columns =['ID','keywords']
s = list( df['keywords'])
s1 = [mentioned_Trump('Trump, Donald J',x) for x in s]
df['trumpMention']= s1
print(df)
使用vectorized方法,比使用apply
更快。
df.keywords.astype(str).str.contains("Trump, Donald J")