如何有条件地（列中的值）在 python 循环中的另一列中搜索子字符串

Question

我需要在第二列中按条件在字符串中进行子字符串搜索。我有 2 个数据框：

df1 = {'Descr': ["VALVE, PRESSURE", "pump ttf", "Valve, electrical", "Geeku, electrical","VALVE, OVERBOARD, BUTTERFLY"],
        'N_Product': ["VALVE", "PUMP", "VALVE", "GEEKU","VALVE"],
        }
df2 = {'N_Product': ["VALVE", "VALVE","VALVE", "PUMP", "GEEKU"],
        'M_Product': ["PRESSURE", "qwerty","", "", "ELECTRICAL"],
        }
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)

（第 1 步）对于 df1 中的第一行，N_Product 列是 VALVE。

（第 2 步）我们在 df2 的每一行的 N_Product 列中查找 VALVE 并找到 3 个与以下 (N_Product, M_Product) 对匹配的项：第 0 行有阀门，压力；第 1 行有 VALVE，qwerty； row2 有 VALVE,"".

（第3步）然后你需要检查这些对（df2（M_Product））是否包含在Df1 ['Descr']中，如果包含那么你需要写N_Product + “:” + M_Product + “;” .对于阀门，您只需搜索“压力”、“电气”和“”，其他不需要，对于 N_Product ('GEEKU') - 仅 'Electrical' ' 等，具体取决于df2 文件中有哪些对

c = df2['M_Product'].astype(str).to_list()
def matcher(x):
    for i in c:
        if i.lower() in x.lower():
            return i
    else:
        return np.nan
df1['Res'] = df1['Descr'].apply(matcher)

但是我不知道如何循环只为N_Product对应的M_Product的值。

想要的结果：

df1 = {'Descr': ["VALVE, PRESSURE", "pump ttf", "Valve, electrical", "Geeku, electrical","VALVE, OVERBOARD, BUTTERFLY"],
        'N_Product': ["VALVE", "PUMP", "VALVE", "GEEKU","VALVE"],
        },
'Result': ["VALVE: PRESSURE;", "PUMP", "VALVE;", "GEEKU: ELECTRICAL;","VALVE;"],
        }

如有任何帮助，我将不胜感激。如果您有任何选择，请帮助

Answer 1

(已更新)

根据更新后的问题，我对问题的理解是这样的：

新建 Result 列
如果给定行的 df1 和 df2 中的 N_Product 列匹配，则将 df2 的 M_Product 列中的第一个匹配项附加到 df1 中 N_Product 列的值在 df1 的给定行的 Descr 列中找到的字符串（中间有一个 : 字符）。
否则，将来自 df1 的 N_Product 放入 Result 列。
还要在 Result 中添加一个 ; 字符。

这是一种方法：

def foo(x):
    descr = x['Descr'].upper()
    match = None
    for mStr in df2['M_Product'].str.upper():
        if mStr in descr:
            match = mStr
            break
    if match is None:
        return x['N_Product'] + ';'
    else:
        return x['N_Product'] + ': ' + match + ';'
mask = df1['N_Product'] == df2['N_Product']
df1.loc[mask, 'Result'] = df1.apply(foo, axis = 1)
df1.loc[~mask, 'Result'] = df1['N_Product'] + ';'

解释：

创建一个布尔系列 mask，对于 df1 的行，N_Product 与 df2 中的相应值匹配。
对于 df1 中 mask 为 True 的行，使用 apply 调用 foo 执行识别 [=17= 中的第一个值（如果有）的逻辑] 在给定行的 Descr 列中找到的 df2 列，如果找到则将其打包为 N_Product: M_Product; 形式的字符串，否则只是 N_Product;.
对于 df1 中 mask 为 False 的行（即：~mask），将 Result 列设置为 N_Product;.

输入：

df1:
                         Descr N_Product
0              VALVE, PRESSURE     VALVE
1                     pump ttf      PUMP
2            Valve, electrical     VALVE
3            Geeku, electrical     GEEKU
4  VALVE, OVERBOARD, BUTTERFLY     VALVE

df2:
  N_Product   M_Product
0     VALVE    PRESSURE
1     VALVE  ELECTRICAL
2     VALVE
3      PUMP
4     GEEKU         MBA

输出：

                         Descr N_Product              Result
0              VALVE, PRESSURE     VALVE    VALVE: PRESSURE;
1                     pump ttf      PUMP               PUMP;
2            Valve, electrical     VALVE  VALVE: ELECTRICAL;
3            Geeku, electrical     GEEKU              GEEKU;
4  VALVE, OVERBOARD, BUTTERFLY     VALVE              VALVE;

更新#2：

这是一个基于放宽 N_Product 匹配标准的解决方案：

新建 Result 列
对于 df1 中的每一行，如果在 df2 的 N_Product 列中找到 N_Product 值，则将 df2 的 M_Product 列中的第一个匹配项附加到该值在 df1 中给定行的 Descr 列中找到的字符串（中间有一个 : 字符）。
否则，将来自 df1 的 N_Product 放入 Result 列。
还要在 Result 中添加一个 ; 字符。

def foo(x):
    descr = x['Descr'].upper()
    match = None
    if x['N_Product'].upper() in list(df2['N_Product']):
        for mStr in df2['M_Product'].str.upper():
            if mStr in descr:
                match = mStr
                break
    if match is None:
        return x['N_Product'] + ';'
    else:
        return x['N_Product'] + ': ' + match + ';'
df1['Result'] = df1.apply(foo, axis = 1)

如何有条件地（列中的值）在 python 循环中的另一列中搜索子字符串

how to conditionally (value in column) search for substrings in another column in python loop

python

dataframe

python-3.x

pandas