如何检查列表中是否有输入
How to check if input in the list
我正在尝试构建一个基于电影数据集的建议工具。更具体地说,它将根据流派关键字按标题推荐电影。
但是我无法通过脚本的 loop/check 部分,这是我尝试过的:
import nltk
import pandas as pd
from nltk.tokenize import word_tokenize
import random
#CSV READ & GENRE-TITLE
data = pd.read_csv("data.csv")
df_title = data["title"]
df_genre = data["genre"]
#TOKENIZE
tokenized_genre = [word_tokenize(i) for i in df_genre]
choice = {}
while choice != "exit":
choice = input("Please enter a word = ")
for word in {choice}:
if word in df_genre:
"""The random title of the random adventure movie will be implemented here"""
else:
print("The movie of the genre doesn't exist")
tokenized_genre
的输出是这样的:
[['Biography', ',', 'Crime', ',', 'Drama'],
['Drama'], ['Drama', ',', 'History'],
['Adventure', ',', 'Drama', ',', 'Fantasy'],
['Biography', ',', 'Drama'],
['Biography', ',', 'Drama', ',', 'Romance']
循环的输出:
Please enter a word = adventure
The movie of the genre doesn't exist
Please enter a word = Adventure
The movie of the genre doesn't exist
我猜分词列表中的错误,但我无法解决。
也许我错了,我不是Python高手
df_genre returns "list of list" 似乎不是列表。您应该加入列表,然后在那里搜索。
import itertools
df_genre = [['Biography', ',', 'Crime', ',', 'Drama'], ['Drama'], ['Drama', ',', 'History'], ['Adventure', ',', 'Drama', ',', 'Fantasy'], ['Biography', ',', 'Drama'], ['Biography', ',', 'Drama', ',', 'Romance']]
#TOKENIZE
joined_list = list(itertools.chain.from_iterable(df_genre))
choice = {}
while choice != "exit":
choice = input("Please enter a word = ")
for word in {choice}:
if word in joined_list:
"""The random title of the random adventure movie will be implemented here"""
print("Works!")
else:
print("The movie of the genre doesn't exist")
Result of local test
不知道是不是你要找的。希望能帮助到你。
您可以使用:
search = {e.lower() for l in tokenized_genre for e in l}
choice = input("Please enter a word = ")
while choice != "exit":
if choice.lower() in search:
# TODO: The random title of the random adventure movie will be implemented here
print("Works!")
else:
print("The movie of the genre doesn't exist")
choice = input("Please enter a word = ")
search
是一个仅包含 tokenized_genre
中所有单词的集合,好处是集合中的搜索时间复杂度为 O(1),因为您的 choice
变量是 word
您可以直接检查输入的单词是否在 search
集合
中
我正在尝试构建一个基于电影数据集的建议工具。更具体地说,它将根据流派关键字按标题推荐电影。
但是我无法通过脚本的 loop/check 部分,这是我尝试过的:
import nltk
import pandas as pd
from nltk.tokenize import word_tokenize
import random
#CSV READ & GENRE-TITLE
data = pd.read_csv("data.csv")
df_title = data["title"]
df_genre = data["genre"]
#TOKENIZE
tokenized_genre = [word_tokenize(i) for i in df_genre]
choice = {}
while choice != "exit":
choice = input("Please enter a word = ")
for word in {choice}:
if word in df_genre:
"""The random title of the random adventure movie will be implemented here"""
else:
print("The movie of the genre doesn't exist")
tokenized_genre
的输出是这样的:
[['Biography', ',', 'Crime', ',', 'Drama'],
['Drama'], ['Drama', ',', 'History'],
['Adventure', ',', 'Drama', ',', 'Fantasy'],
['Biography', ',', 'Drama'],
['Biography', ',', 'Drama', ',', 'Romance']
循环的输出:
Please enter a word = adventure
The movie of the genre doesn't exist
Please enter a word = Adventure
The movie of the genre doesn't exist
我猜分词列表中的错误,但我无法解决。
也许我错了,我不是Python高手
df_genre returns "list of list" 似乎不是列表。您应该加入列表,然后在那里搜索。
import itertools
df_genre = [['Biography', ',', 'Crime', ',', 'Drama'], ['Drama'], ['Drama', ',', 'History'], ['Adventure', ',', 'Drama', ',', 'Fantasy'], ['Biography', ',', 'Drama'], ['Biography', ',', 'Drama', ',', 'Romance']]
#TOKENIZE
joined_list = list(itertools.chain.from_iterable(df_genre))
choice = {}
while choice != "exit":
choice = input("Please enter a word = ")
for word in {choice}:
if word in joined_list:
"""The random title of the random adventure movie will be implemented here"""
print("Works!")
else:
print("The movie of the genre doesn't exist")
Result of local test
不知道是不是你要找的。希望能帮助到你。
您可以使用:
search = {e.lower() for l in tokenized_genre for e in l}
choice = input("Please enter a word = ")
while choice != "exit":
if choice.lower() in search:
# TODO: The random title of the random adventure movie will be implemented here
print("Works!")
else:
print("The movie of the genre doesn't exist")
choice = input("Please enter a word = ")
search
是一个仅包含 tokenized_genre
中所有单词的集合,好处是集合中的搜索时间复杂度为 O(1),因为您的 choice
变量是 word
您可以直接检查输入的单词是否在 search
集合