查找多个列是否包含一个字符串

Question

我有一个包含 ID、值、A、B、C 列的 df。我想检查一个字符串，比如“AB 型”，是否在 A、B、C 列的任何一行中。如果它存在，我想在新列“[=”中将该行标记为“存在” 15=]”。我如何在 python 中实现这一目标？

Answer 1

尝试 np.where 和 any

df['TypeAB _present'] = np.where(df[['A', 'B', 'C']].eq('Type AB').any(axis = 1 ), 'Present', '')

Answer 2

假设您正在使用 pandas 并且您要从任何列中寻找准确的 'Type AB' 标签。

df['TypeAB_present'] = df[['A', 'B', 'C']].apply(lambda row: 'Present' if 'Type AB' in row.values else '', axis=1)

Answer 3

你可以试试这个：

import pandas as pd 
df = pd.DataFrame(
        {
                'ID': ['AB01', 'AB02', 'AB02', 'AB01', 'AB01'],
                'Values': [57, 98, 87, 69, 98],
                'A': ['Type A', 'Type B', 'Type B', 'Type B', 'Type AB'],
                'B': [None, 'Type AB', None, 'Type A', None]
        }
)

df.loc[(df[['A', 'B']] == 'Type AB').any(axis=1), 'C'] = 'Present'
df

出来

     ID  Values        A        B        C
0  AB01      57   Type A     None      NaN
1  AB02      98   Type B  Type AB  Present
2  AB02      87   Type B     None      NaN
3  AB01      69   Type B   Type A      NaN
4  AB01      98  Type AB     None  Present

如果您的检查比完全相等匹配稍微复杂一些，您可以创建一个更强大的索引掩码。我在这里检查 A 列或 B 列中的任何字符串是否包含子字符串 'AB':

match_mask = df[['A', 'B']].apply(lambda x: x.str.contains('AB')).any(axis=1)
df.loc[match_mask, 'C'] = 'Present'

查找多个列是否包含一个字符串

Find if multiple columns contain a string

python

data-manipulation

dataframe

python-3.x

pandas