使用 Pandas 在列中提取文本中的子字符串

Question

我是 python 的新人，所以....我有一个这样的数据框：

    id   city      name     text
    1    Boston    Rosie    I have some text here, as you can see.
    2    New York  Liza     I love my cat

所以我想在文本的每一行内搜索并得到如下结果：

我研究了文本 "love" 或 "love" && "cat"，我想要 return 城市或名称。

我尝试了以下代码：

   if df[df['text'].str.contains("love") | df['text'].str.contains("cat")]:
    print(df['name'])

它抛出一个形式为 "The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

的错误

Answer 1

使用布尔索引 pandas.Series.str.contains:

df['name'][df['text'].str.contains("cat|love")]

输出：

1    Liza
Name: name, dtype: object

Extract substrings in a text, on columns using Pandas