您需要在 2 种情况下映射来自另一个数据框的值

Question

我需要在第二列中按条件在字符串中进行子字符串搜索。我有 2 个数据框： df1 df2

（第 1 步）对于 df1 中的第一行，N_Product 列是 VALVE。

（第2步）在df2的每一行的N_Product列中查找VALVE，并找到3个与以下对匹配的（

df2 ['N_Product'] (VALVE) - df2 ['M_Product'] (DONE),
df2 ['N_Product'] (VALVE) - df2 ['M_Product'] (PRESSURE),
df2 ['N_Product'] (VALVE) - df2 ['M_Product'] ('').

（第3步）然后你需要检查M_Product是否包含以下值：

df2 ['N_Product'] (VALVE) - df2 ['M_Product'] (DONE),
df2 ['N_Product'] (VALVE) - df2 ['M_Product'] (PRESSURE),
df2 ['N_Product'] (VALVE) - df2 ['M_Product'] ('')

在df1 ['Descr']中，如果包含，则需要写N_Product + ":" + M_Product + ";"，如果不包含，则只写N_Product + ';' .对于 'VALVE' 你需要在 df1 ['Descr'] 中寻找 df2 ["M_Product"] 只有 "DONE", "PRESSURE" 和 "", 其他的不需要, for N_Product('GEEKU')——只有"ELECTRICAL","OVERBOARD"和""(值)等，取决于对应的值('M_Product')('N_Product'), Df1中要查找的其他值('N_Product')对应的值('M_Product') ['Descr '] - 不需要


    df1 = {'Descr': ["VALVE, DONE", "pump ttf", "pump electrical", "Valve, ww","Geeku MBA , electrical","valve PRESSURE, OVERBOARD","VALVE, Electrical DONE","Geeku electrical OVERBOARD","Geeku OVERBOARD , electrical"],
            'N_Product': ["VALVE", "PUMP", "PUMP", "VALVE", "GEEKU","VALVE","VALVE", "GEEKU", "GEEKU"],
            }
    df2 = {'N_Product': ["GEEKU","GEEKU","GEEKU", "PUMP", "PUMP","VALVE", "VALVE","VALVE"],
            'M_Product': ["ELECTRICAL", "OVERBOARD","", "TTF","", "DONE","PRESSURE",""],
            }
    df1 = pd.DataFrame(df1)
    df2 = pd.DataFrame(df2)

desired result

我使用此代码，但它会搜索 df2 ['M_Product'] 以获取所有值，但仅搜索 df1 ['N_product'] == df2 ['N_Product'] /如果能帮我解决这个问题，我将不胜感激

def foo(x):
    descr = x['Descr'].upper()
    match = None
    if x['N_Product'].upper() in list(df2['N_Product']):
        for mStr in df2['M_Product'].str.upper():
            if mStr in descr:
                match = mStr
                break
    if match is None:
        return x['N_Product'] + ';'
    else:
        return x['N_Product'] + ': ' + match + ';'
df1['Result'] = df1.apply(foo, axis = 1)

我添加了一张图片来可视化需要做的事情，例如 df1 的值 ['N_Product'] "Valve") 同样，所有值都需要完成：

picture

Answer 1

根据您使用问题中的图片描述的结果，以下是我对您尝试执行的操作的理解：

每个 N_Product 值在 df2 中都有一个关联的 M_Product 值列表。
df1 中的每个 N_Product 值都有一个 Descr 值，它是以下格式的 csv 列表：N_Product 后跟该行的一个或多个 M_Product 兼容值。
Objective：将结果列附加到 df1，其中包含每行的 N_Product 值 n 以及相应描述的第一个 M_Product 值 m 使得 (n, m) 在 df2 中找到。

这里有一些代码，我相信可以满足您的要求：

import pandas as pd
df1 = {'Descr': ["VALVE, DONE", "pump ttf", "pump electrical", "Valve, ww","Geeku MBA , electrical","valve PRESSURE, OVERBOARD","VALVE, Electrical DONE","Geeku electrical OVERBOARD","Geeku OVERBOARD , electrical"],
            'N_Product': ["VALVE", "PUMP", "PUMP", "VALVE", "GEEKU","VALVE","VALVE", "GEEKU", "GEEKU"],
            }
df2 = {'N_Product': ["GEEKU","GEEKU","GEEKU", "PUMP", "PUMP","VALVE", "VALVE","VALVE"],
            'M_Product': ["ELECTRICAL", "OVERBOARD","", "TTF","", "DONE","PRESSURE",""],
            }
df1 = pd.DataFrame(df1).apply(lambda x: x.astype(str).str.upper())
df2 = pd.DataFrame(df2).apply(lambda x: x.astype(str).str.upper())

print('df1:'); print(df1)
print('df2:'); print(df2)

df1['M_Product'] = df1['Descr'].apply(lambda x: [val.strip(',') for val in x.split() if val.strip(',')]).str.slice(start=1)
df1['df1_row'] = df1.index

df3 = df1[['df1_row', 'N_Product', 'M_Product']].explode('M_Product')
df5 = df3.merge(df2, on=['N_Product', 'M_Product']).groupby('df1_row').nth(0)
df1['M_Product'] = df5['M_Product']

df1['Result'] = df1['N_Product'] + (~df1['M_Product'].isna()) * (': ' + df1['M_Product'].astype(str).str.strip()) + ';'
df1 = df1.drop(columns=['M_Product', 'df1_row'])
print('result:'); print(df1)

输出：

df1:
                          Descr N_Product
0                   VALVE, DONE     VALVE
1                      PUMP TTF      PUMP
2               PUMP ELECTRICAL      PUMP
3                     VALVE, WW     VALVE
4        GEEKU MBA , ELECTRICAL     GEEKU
5     VALVE PRESSURE, OVERBOARD     VALVE
6        VALVE, ELECTRICAL DONE     VALVE
7    GEEKU ELECTRICAL OVERBOARD     GEEKU
8  GEEKU OVERBOARD , ELECTRICAL     GEEKU
df2:
  N_Product   M_Product
0     GEEKU  ELECTRICAL
1     GEEKU   OVERBOARD
2     GEEKU
3      PUMP         TTF
4      PUMP
5     VALVE        DONE
6     VALVE    PRESSURE
7     VALVE
result:
                          Descr N_Product              Result
0                   VALVE, DONE     VALVE        VALVE: DONE;
1                      PUMP TTF      PUMP          PUMP: TTF;
2               PUMP ELECTRICAL      PUMP               PUMP;
3                     VALVE, WW     VALVE              VALVE;
4        GEEKU MBA , ELECTRICAL     GEEKU  GEEKU: ELECTRICAL;
5     VALVE PRESSURE, OVERBOARD     VALVE    VALVE: PRESSURE;
6        VALVE, ELECTRICAL DONE     VALVE        VALVE: DONE;
7    GEEKU ELECTRICAL OVERBOARD     GEEKU  GEEKU: ELECTRICAL;
8  GEEKU OVERBOARD , ELECTRICAL     GEEKU  GEEKU: ELECTRICAL;

解释：

使 df1 和 df2 中的所有内容大写以简化匹配
将 Descr 拆分为标记，除了第一个标记（它只是 N_Product 的副本），将它们放入 df1 中名为 M_Product[=47= 的新列中的列表中]
在名为 df1_row
使用explode()创建一个数据框df3，df1

M_Product

使用 merge() 到 select df3 中与 df2 中的行匹配的行 (N_Product, M_Product)
在 df1_row 和 nth(0) 上使用 groupby() 为每个这样的 (N_Product, M_Product) 对取第 0 个匹配项
将这个新数据框中的 M_Product 列添加回 df1
使用 apply() 在 df1 中使用 (1) N_Product + ; 填充新的 Result 列（如果 M_Product 列为空（ isna()) 或 (2) N_Product + ':' + M_Product + ';'如果有 M_Product 匹配。
删除我们不再需要的中间列（M_Product、df1_row）。

您需要在 2 种情况下映射来自另一个数据框的值

You need to map values from another data frame in 2 conditions

python

excel

dataframe

pandas

您需要在 2 种情况下映射来自另一个数据框的值

You need to map values ​from another data frame in 2 conditions

python

excel

dataframe

pandas

You need to map values from another data frame in 2 conditions