动态计算列表中多个单词的出现次数
Dynamically count occurences of multiple words within lists
我正在尝试计算数据框的每个短语中多个关键字的出现次数。这个好像和其他问题差不多,但又不太一样。
这里我们有一个 df 和一个包含 keywords/topics:
的列表列表
df=pd.DataFrame({'phrases':['very expensive meal near city center','very good meal and waiters','nice restaurant near center and public transport']})
topics=[['expensive','city'],['good','waiters'],['center','transport']]
对于每个短语,我们要计算每个单独主题中匹配的单词数。所以第一个短语应该为第一个主题得分 2,第二个主题得分 0,第三个主题得分 1,依此类推
我已经试过了,但它不起作用:
from collections import Counter
topnum=0
for t in topics:
counts=[]
topnum+=1
results = Counter()
for line in df['phrases']:
for c in line.split(' '):
results[c] = t.count(c)
counts.append(sum(results.values()))
df['topic_'+str(topnum)] = counts
我不确定我做错了什么,理想情况下,我最终会为每个 topic/phrases 组合计算匹配词的数量,但这些数量似乎在重复:
phrases topic_1 topic_2 topic_3
very expensive meal near city centre 2 0 0
very good meal and waiters 2 2 0
nice restaurant near center and public transport 2 2 2
非常感谢能帮助我的人。
祝福
这是一个解决方案,它定义了一个名为 find_count 的辅助函数,并将其作为 lambda 应用于数据帧。
import pandas as pd
df=pd.DataFrame({'phrases':['very expensive meal near city center','very good meal and waiters','nice restaurant near center and public transport']})
topics=[['expensive','city'],['good','waiters'],['center','transport']]
def find_count(row, topics_index):
count = 0
word_list = row['phrases'].split()
for word in word_list:
if word in topics[topics_index]:
count+=1
return count
df['Topic 1'] = df.apply(lambda row:find_count(row,0), axis=1)
df['Topic 2'] = df.apply(lambda row:find_count(row,1), axis=1)
df['Topic 3'] = df.apply(lambda row:find_count(row,2), axis=1)
print(df)
#Output
phrases Topic 1 Topic 2 Topic 3
0 very expensive meal near city center 2 0 1
1 very good meal and waiters 0 2 0
2 nice restaurant near center and public transport 0 0 2
我正在尝试计算数据框的每个短语中多个关键字的出现次数。这个好像和其他问题差不多,但又不太一样。
这里我们有一个 df 和一个包含 keywords/topics:
的列表列表df=pd.DataFrame({'phrases':['very expensive meal near city center','very good meal and waiters','nice restaurant near center and public transport']})
topics=[['expensive','city'],['good','waiters'],['center','transport']]
对于每个短语,我们要计算每个单独主题中匹配的单词数。所以第一个短语应该为第一个主题得分 2,第二个主题得分 0,第三个主题得分 1,依此类推
我已经试过了,但它不起作用:
from collections import Counter
topnum=0
for t in topics:
counts=[]
topnum+=1
results = Counter()
for line in df['phrases']:
for c in line.split(' '):
results[c] = t.count(c)
counts.append(sum(results.values()))
df['topic_'+str(topnum)] = counts
我不确定我做错了什么,理想情况下,我最终会为每个 topic/phrases 组合计算匹配词的数量,但这些数量似乎在重复:
phrases topic_1 topic_2 topic_3
very expensive meal near city centre 2 0 0
very good meal and waiters 2 2 0
nice restaurant near center and public transport 2 2 2
非常感谢能帮助我的人。 祝福
这是一个解决方案,它定义了一个名为 find_count 的辅助函数,并将其作为 lambda 应用于数据帧。
import pandas as pd
df=pd.DataFrame({'phrases':['very expensive meal near city center','very good meal and waiters','nice restaurant near center and public transport']})
topics=[['expensive','city'],['good','waiters'],['center','transport']]
def find_count(row, topics_index):
count = 0
word_list = row['phrases'].split()
for word in word_list:
if word in topics[topics_index]:
count+=1
return count
df['Topic 1'] = df.apply(lambda row:find_count(row,0), axis=1)
df['Topic 2'] = df.apply(lambda row:find_count(row,1), axis=1)
df['Topic 3'] = df.apply(lambda row:find_count(row,2), axis=1)
print(df)
#Output
phrases Topic 1 Topic 2 Topic 3
0 very expensive meal near city center 2 0 1
1 very good meal and waiters 0 2 0
2 nice restaurant near center and public transport 0 0 2