按 Python 中的特定术语搜索单个句子列表
Searching over a list of individual sentences by a specific term in Python
我在 Python 中有一个术语列表,看起来像这样。
Fruit
apple
banana
grape
orange
以及数据框中可能包含该水果名称的单个句子列表。类似于此:
Customer Review
1 ['the banana was delicious','he called the firetruck','I had only half an orange']
2 ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons']
3 ['It could use some more cheese','the grape and orange was sour']
我想把评论栏里的句子和文中提到的水果匹配起来,打印出一个数据框作为最终结果。所以,像这样:
Fruit Review
apple ['the banana was delicious','I liked the banana']
banana ['there was a worm in my apple']
grape ['the grape and orange was sour']
orange ['the grape and orange was sour','I had only half an orange']
我可以这样做吗?
你可以拿着字典,然后按字查找
# your fruits list
fruits = ["apple", "banana", "grape", "orange"]
reviews = [['the banana was delicious','he called the firetruck','I had only half an orange'], ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'], ['It could use some more cheese','the grape and orange was sour']]
# Initialize the dictionary, make each fruit a key
fruitReviews = {fruit.lower():[] for fruit in fruits}
# for each review, if a word in the review is a fruit, add it to that
# fruit's reviews list
for reviewer in reviews
for review in reviewer:
for word in review.split():
fruitReview = fruitReviews.get(word.lower(), None)
if fruitReview is not None:
fruitReview.append(review)
"""
result:
{
"orange": [
"I had only half an orange",
"the grape and orange was sour"
],
"grape": [
"the grape and orange was sour"
],
"apple": [
"there was a worm in my apple"
],
"banana": [
"the banana was delicious",
"I liked the banana"
]
}
"""
虽然确切的答案取决于您存储数据的方式,但我认为方法是相同的:
- 为每个水果名称创建并存储一个空列表以存储其评论
- 对于每个评论,检查每个水果,看它们是否出现。如果某个水果出现在评论中,请将评论添加到该水果的列表中
这是一个示例:
#The list of fruits
fruits = ['apple', 'banana', 'grape', 'orange']
#The collection of reviews (based on the way it was presented, I'm assuming it was in a dictionary)
reviews = {
'1':['the banana was delicious','he called the firetruck','I had only half an orange'],
'2':['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'],
'3':['It could use some more cheese','the grape and orange was sour']
}
fruitDictionary = {}
#1. Create and store an empty list for every fruit name to store its reviews
for fruit in fruits:
fruitDictionary[fruit] = []
for customerReviews in reviews.values():
#2. For each review,...
for review in customerReviews:
#...check each of the fruits to see if they appear.
for fruit in fruits:
# If a fruit appears in the comment at all,...
if fruit.lower() in review:
#...add the review to that fruit's list
fruitDictionary[fruit].append(review)
这与之前的答案不同,因为像“我喜欢这种葡萄。我认为葡萄非常多汁”这样的句子只被添加到葡萄部分一次。
如果您的数据存储为列表列表,则过程非常相似:
#The list of fruits
fruits = ['apple', 'banana', 'grape', 'orange']
#The collection of reviews
reviews = [
['the banana was delicious','he called the firetruck','I had only half an orange'],
['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'],
['It could use some more cheese','the grape and orange was sour']
]
fruitDictionary = {}
#1. Create and store an empty list for every fruit name to store its reviews
for fruit in fruits:
fruitDictionary[fruit] = []
for customerReviews in reviews:
#2. For each review,...
for review in customerReviews:
#...check each of the fruits to see if they appear.
for fruit in fruits:
# If a fruit appears in the comment at all,...
if fruit.lower() in review:
#...add the review to that fruit's list
fruitDictionary[fruit].append(review)
您可以使用 .explode 函数展开评论,然后使用集合来查找交集
import pandas as pd
fruits = pd.DataFrame({'Fruit':'apple banana grape orange'.split()})
reviews =pd.DataFrame({'Customer':[1,2,3],
'Review':[['the banana was delicious','he called the firetruck','I had only half an orange'],
['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'],
['It could use some more cheese','the grape and orange was sour'],
]})
# review per row
explode_reviews = reviews.explode('Review')
# create a set
fruits_set = set(fruits['Fruit'].tolist())
# find intersection
explode_reviews['Fruit'] = explode_reviews['Review'].apply(lambda x: ' '.join(set(x.split()).intersection(fruits_set)))
print(explode_reviews)
结果:
如果你不想爆你的数据,你可以这样做:
# ...
flatten = lambda l: [item for sublist in l for item in sublist]
reviews['Fruit'] = reviews['Review'].apply(lambda x: flatten([set(i.split()).intersection(fruits_set) for i in x]))
结果:
功劳 flatten code
我在 Python 中有一个术语列表,看起来像这样。
Fruit
apple
banana
grape
orange
以及数据框中可能包含该水果名称的单个句子列表。类似于此:
Customer Review
1 ['the banana was delicious','he called the firetruck','I had only half an orange']
2 ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons']
3 ['It could use some more cheese','the grape and orange was sour']
我想把评论栏里的句子和文中提到的水果匹配起来,打印出一个数据框作为最终结果。所以,像这样:
Fruit Review
apple ['the banana was delicious','I liked the banana']
banana ['there was a worm in my apple']
grape ['the grape and orange was sour']
orange ['the grape and orange was sour','I had only half an orange']
我可以这样做吗?
你可以拿着字典,然后按字查找
# your fruits list
fruits = ["apple", "banana", "grape", "orange"]
reviews = [['the banana was delicious','he called the firetruck','I had only half an orange'], ['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'], ['It could use some more cheese','the grape and orange was sour']]
# Initialize the dictionary, make each fruit a key
fruitReviews = {fruit.lower():[] for fruit in fruits}
# for each review, if a word in the review is a fruit, add it to that
# fruit's reviews list
for reviewer in reviews
for review in reviewer:
for word in review.split():
fruitReview = fruitReviews.get(word.lower(), None)
if fruitReview is not None:
fruitReview.append(review)
"""
result:
{
"orange": [
"I had only half an orange",
"the grape and orange was sour"
],
"grape": [
"the grape and orange was sour"
],
"apple": [
"there was a worm in my apple"
],
"banana": [
"the banana was delicious",
"I liked the banana"
]
}
"""
虽然确切的答案取决于您存储数据的方式,但我认为方法是相同的:
- 为每个水果名称创建并存储一个空列表以存储其评论
- 对于每个评论,检查每个水果,看它们是否出现。如果某个水果出现在评论中,请将评论添加到该水果的列表中
这是一个示例:
#The list of fruits
fruits = ['apple', 'banana', 'grape', 'orange']
#The collection of reviews (based on the way it was presented, I'm assuming it was in a dictionary)
reviews = {
'1':['the banana was delicious','he called the firetruck','I had only half an orange'],
'2':['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'],
'3':['It could use some more cheese','the grape and orange was sour']
}
fruitDictionary = {}
#1. Create and store an empty list for every fruit name to store its reviews
for fruit in fruits:
fruitDictionary[fruit] = []
for customerReviews in reviews.values():
#2. For each review,...
for review in customerReviews:
#...check each of the fruits to see if they appear.
for fruit in fruits:
# If a fruit appears in the comment at all,...
if fruit.lower() in review:
#...add the review to that fruit's list
fruitDictionary[fruit].append(review)
这与之前的答案不同,因为像“我喜欢这种葡萄。我认为葡萄非常多汁”这样的句子只被添加到葡萄部分一次。
如果您的数据存储为列表列表,则过程非常相似:
#The list of fruits
fruits = ['apple', 'banana', 'grape', 'orange']
#The collection of reviews
reviews = [
['the banana was delicious','he called the firetruck','I had only half an orange'],
['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'],
['It could use some more cheese','the grape and orange was sour']
]
fruitDictionary = {}
#1. Create and store an empty list for every fruit name to store its reviews
for fruit in fruits:
fruitDictionary[fruit] = []
for customerReviews in reviews:
#2. For each review,...
for review in customerReviews:
#...check each of the fruits to see if they appear.
for fruit in fruits:
# If a fruit appears in the comment at all,...
if fruit.lower() in review:
#...add the review to that fruit's list
fruitDictionary[fruit].append(review)
您可以使用 .explode 函数展开评论,然后使用集合来查找交集
import pandas as pd
fruits = pd.DataFrame({'Fruit':'apple banana grape orange'.split()})
reviews =pd.DataFrame({'Customer':[1,2,3],
'Review':[['the banana was delicious','he called the firetruck','I had only half an orange'],
['I liked the banana','there was a worm in my apple','Cantaloupes are better then melons'],
['It could use some more cheese','the grape and orange was sour'],
]})
# review per row
explode_reviews = reviews.explode('Review')
# create a set
fruits_set = set(fruits['Fruit'].tolist())
# find intersection
explode_reviews['Fruit'] = explode_reviews['Review'].apply(lambda x: ' '.join(set(x.split()).intersection(fruits_set)))
print(explode_reviews)
结果:
如果你不想爆你的数据,你可以这样做:
# ...
flatten = lambda l: [item for sublist in l for item in sublist]
reviews['Fruit'] = reviews['Review'].apply(lambda x: flatten([set(i.split()).intersection(fruits_set) for i in x]))
结果:
功劳 flatten code