在 pandas frozenset 中查找子字符串

Question

我正试图在 frozenset 中找到一个子字符串，但是我有点无能为力。

我的数据结构是 pandas.dataframe（它来自 mlxtend 包中的 association_rules，如果你熟悉那个的话）我想打印所有行前因（这是一个冻结集）包括一个特定的字符串。

示例数据：

    print(rules[rules["antecedents"].str.contains('line', regex=False)])

但是，每当我运行它时，我都会得到一个空数据框。

当我尝试运行仅在我的 rules["antecedents"] 系列中使用内部函数时，我只得到所有条目的 False 值。但这是为什么呢？

Answer 1

因为dataframe.str.*函数仅适用于字符串数据。由于您的数据不是字符串，因此无论其字符串表示形式如何，它始终为 NaN。求证：

>>> x = pd.DataFrame(np.random.randn(2, 5)).astype("object")
>>> x
         0         1         2          3          4
0 -1.17191  -1.92926 -0.831576 -0.0814279   0.099612
1 -1.55183 -0.494855   1.14398   -1.72675 -0.0390948
>>> x[0].str.contains("-1")
0   NaN
1   NaN
Name: 0, dtype: float64

你能做什么：

使用apply:

>>> x[0].apply(lambda x: "-1" in str(x))
0    True
1    True
Name: 0, dtype: bool

所以你的代码应该这样写：

print(rules[rules["antecedents"].apply(lambda x: 'line' in str(x))])

如果你的意思是完全匹配元素

，你可能想使用'line' in x

在 pandas frozenset 中查找子字符串

Finding substring in pandas frozenset

python

substring

pandas

frozenset