在列中查找关键字并将这些关键字添加到针对同一行的新列中
Finding keywords in a column and adding those keywords in a new column against the same row
我是 python 的新手,这是我第一次 post 堆栈溢出。我有一个关键字列表和一个包含多列的数据框。
我想在特定列中搜索这些关键字,然后写下出现在它旁边的关键字。
这就是我正在做的。 My code
这是我遇到的错误。 The loop with the error
这就是我想要得到的。 Desired output
请帮助找出问题所在或提出更好的解决方法。谢谢!
如果有助于使事情变得更简单,请编写下面的代码。
import pandas as pd
keywords = ["hello","hi","greetings","wassup"]
data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im
Henry", "Hello there General Kenobi"]
df = pd.DataFrame(data,columns = ['strings'])
df['Keywords'] = ""
df2 = pd.DataFrame(data = None, columns = df.columns)
for word in keywords:
temp = df[df['strings'].str.contains(word,na = False)]
temp.reset_index(drop = True)
temp['Keywords'] = word
df2.append(temp)
错误:
C:\Users\harka\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning:
试图在 DataFrame 的切片副本上设置一个值。
尝试使用 .loc[row_indexer,col_indexer] = value 代替
查看文档中的注意事项:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""
我添加了'Yo'来表明它可以return多个字符串
import pandas as pd
def keyword(row):
strings = row['strings']
keywords = ["hello","hi","greetings","wassup",'yo']
keyword = [key for key in keywords if key.upper() in strings.upper()]
return keyword
data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im Henry", "Hello there General Kenobi"]
df = pd.DataFrame(data,columns = ['strings'])
df['keyword'] = df.apply(keyword, axis=1)
如果您不喜欢字符串列表 return 那么也许是逗号分隔的字符串?
import pandas as pd
def keyword(row):
strings = row['strings']
keywords = ["hello","hi","greetings","wassup",'yo']
keyword = [key for key in keywords if key.upper() in strings.upper()]
return ','.join(keyword)
data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im Henry", "Hello there General Kenobi"]
df = pd.DataFrame(data,columns = ['strings'])
df['keyword'] = df.apply(keyword, axis=1)
我是 python 的新手,这是我第一次 post 堆栈溢出。我有一个关键字列表和一个包含多列的数据框。
我想在特定列中搜索这些关键字,然后写下出现在它旁边的关键字。
这就是我正在做的。 My code
这是我遇到的错误。 The loop with the error
这就是我想要得到的。 Desired output
请帮助找出问题所在或提出更好的解决方法。谢谢! 如果有助于使事情变得更简单,请编写下面的代码。
import pandas as pd
keywords = ["hello","hi","greetings","wassup"]
data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im
Henry", "Hello there General Kenobi"]
df = pd.DataFrame(data,columns = ['strings'])
df['Keywords'] = ""
df2 = pd.DataFrame(data = None, columns = df.columns)
for word in keywords:
temp = df[df['strings'].str.contains(word,na = False)]
temp.reset_index(drop = True)
temp['Keywords'] = word
df2.append(temp)
错误:
C:\Users\harka\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning: 试图在 DataFrame 的切片副本上设置一个值。 尝试使用 .loc[row_indexer,col_indexer] = value 代替
查看文档中的注意事项:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy """
我添加了'Yo'来表明它可以return多个字符串
import pandas as pd
def keyword(row):
strings = row['strings']
keywords = ["hello","hi","greetings","wassup",'yo']
keyword = [key for key in keywords if key.upper() in strings.upper()]
return keyword
data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im Henry", "Hello there General Kenobi"]
df = pd.DataFrame(data,columns = ['strings'])
df['keyword'] = df.apply(keyword, axis=1)
如果您不喜欢字符串列表 return 那么也许是逗号分隔的字符串?
import pandas as pd
def keyword(row):
strings = row['strings']
keywords = ["hello","hi","greetings","wassup",'yo']
keyword = [key for key in keywords if key.upper() in strings.upper()]
return ','.join(keyword)
data = ["hello, my name is Harry", "Hi I am John", "Yo! Wassup", "Greetings fellow traveller","Hey im Henry", "Hello there General Kenobi"]
df = pd.DataFrame(data,columns = ['strings'])
df['keyword'] = df.apply(keyword, axis=1)