如何将嵌套 for 循环的输出转换为 Python 中的列表?
How do convert the output of a nested for loop into a list in Python?
我是 Python 的新手,很抱歉回答这个基本问题。我正在尝试将关键字列与文本列表相匹配。如果可以在文本中找到关键字,则应将这些关键字附加到当前以 'Engagement' 列结尾的电子表格中。
我目前在 'for-loop' 的第二行收到以下错误消息:TypeError: 'in ' 需要字符串作为左操作数,而不是浮点数
我的代码有什么问题,我应该如何更正它?谢谢。
df_rawdata = pd.read_excel (r'test.xlsx', sheet_name ='rawdata')
my_rawdatalist = df_rawdata['Text'].tolist()
df_all_words = pd.read_excel (r'test.xlsx', sheet_name ='pet_dict')
keywords_list = set(df_all_words['Animals'].tolist()+df_all_words['Cities'].tolist())
matchlist = []
for rawdata in my_rawdatalist:
matches = [keyword for keyword in keywords_list if keyword in rawdata]
matchlist.append("|".join(matches))
print(matchlist)
我真的不明白你为什么要在那里有一个空字符串,但也许这对你有帮助:
我认为列表理解可能会大大简化此过程。请注意,它还允许您处理包含多个关键字的短语:
my_rawdatalist = [
"The cat is out",
"The zoo is fun",
"The dog is tired",
"The dog chases the cat"
]
keywords_list = ["cat", "dog", "NaN"]
matchlist = []
for rawdata in my_rawdatalist:
matches = [keyword for keyword in keywords_list if keyword in rawdata]
matchlist.append("|".join(matches))
print(matchlist)
会给你:
['cat', '', 'dog', 'cat|dog']
如果您有“很多”关键字,那么您可以将 keyword_list 转换为 set()
,因为这将有助于提高查找效率。
keywords_list = set(["cat", "dog", "NaN"])
如果您有多列关键字(如果我理解您在说什么),那么我会将每一列附加到集合中。
keywords_list = set(
["cat", "dog", "NaN"] ## keywords from column A
+ ["Person", "Woman", "Man", "Camera", "TV"] ## keywords from column B
)
代码应该继续工作:
my_rawdatalist = [
"The cat is out",
"The zoo is fun",
"The dog is tired",
"The dog chases the cat on TV"
]
keywords_list = set(
["cat", "dog", "NaN"] ## keywords from column A
+ ["Person", "Woman", "Man", "Camera", "TV"] ## keywords from column B
)
matchlist = []
for rawdata in my_rawdatalist:
matches = [keyword for keyword in keywords_list if keyword in rawdata]
matchlist.append("|".join(matches))
print(matchlist)
给你:
['cat', '', 'dog', 'dog|cat|TV']
我是 Python 的新手,很抱歉回答这个基本问题。我正在尝试将关键字列与文本列表相匹配。如果可以在文本中找到关键字,则应将这些关键字附加到当前以 'Engagement' 列结尾的电子表格中。
我目前在 'for-loop' 的第二行收到以下错误消息:TypeError: 'in ' 需要字符串作为左操作数,而不是浮点数
我的代码有什么问题,我应该如何更正它?谢谢。
df_rawdata = pd.read_excel (r'test.xlsx', sheet_name ='rawdata')
my_rawdatalist = df_rawdata['Text'].tolist()
df_all_words = pd.read_excel (r'test.xlsx', sheet_name ='pet_dict')
keywords_list = set(df_all_words['Animals'].tolist()+df_all_words['Cities'].tolist())
matchlist = []
for rawdata in my_rawdatalist:
matches = [keyword for keyword in keywords_list if keyword in rawdata]
matchlist.append("|".join(matches))
print(matchlist)
我真的不明白你为什么要在那里有一个空字符串,但也许这对你有帮助:
我认为列表理解可能会大大简化此过程。请注意,它还允许您处理包含多个关键字的短语:
my_rawdatalist = [
"The cat is out",
"The zoo is fun",
"The dog is tired",
"The dog chases the cat"
]
keywords_list = ["cat", "dog", "NaN"]
matchlist = []
for rawdata in my_rawdatalist:
matches = [keyword for keyword in keywords_list if keyword in rawdata]
matchlist.append("|".join(matches))
print(matchlist)
会给你:
['cat', '', 'dog', 'cat|dog']
如果您有“很多”关键字,那么您可以将 keyword_list 转换为 set()
,因为这将有助于提高查找效率。
keywords_list = set(["cat", "dog", "NaN"])
如果您有多列关键字(如果我理解您在说什么),那么我会将每一列附加到集合中。
keywords_list = set(
["cat", "dog", "NaN"] ## keywords from column A
+ ["Person", "Woman", "Man", "Camera", "TV"] ## keywords from column B
)
代码应该继续工作:
my_rawdatalist = [
"The cat is out",
"The zoo is fun",
"The dog is tired",
"The dog chases the cat on TV"
]
keywords_list = set(
["cat", "dog", "NaN"] ## keywords from column A
+ ["Person", "Woman", "Man", "Camera", "TV"] ## keywords from column B
)
matchlist = []
for rawdata in my_rawdatalist:
matches = [keyword for keyword in keywords_list if keyword in rawdata]
matchlist.append("|".join(matches))
print(matchlist)
给你:
['cat', '', 'dog', 'dog|cat|TV']