您需要在 2 种情况下映射来自另一个数据框的值

You need to map values ​from another data frame in 2 conditions

我需要在第二列中按条件在字符串中进行子字符串搜索。我有 2 个数据框: df1 df2

(第 1 步)对于 df1 中的第一行,N_Product 列是 VALVE。

(第2步)在df2的每一行的N_Product列中查找VALVE,并找到3个与以下对匹配的(

df2 ['N_Product'] (VALVE) - df2 ['M_Product'] (DONE),
df2 ['N_Product'] (VALVE) - df2 ['M_Product'] (PRESSURE),
df2 ['N_Product'] (VALVE) - df2 ['M_Product'] ('').

(第3步)然后你需要检查M_Product是否包含以下值:

df2 ['N_Product'] (VALVE) - df2 ['M_Product'] (DONE),
df2 ['N_Product'] (VALVE) - df2 ['M_Product'] (PRESSURE),
df2 ['N_Product'] (VALVE) - df2 ['M_Product'] ('')

在df1 ['Descr']中,如果包含,则需要写N_Product + ":" + M_Product + ";",如果不包含,则只写N_Product + ';' .对于 'VALVE' 你需要在 df1 ['Descr'] 中寻找 df2 ["M_Product"] 只有 "DONE", "PRESSURE" 和 "", 其他的不需要, for N_Product('GEEKU')——只有"ELECTRICAL","OVERBOARD"和""(值)等,取决于对应的值('M_Product')('N_Product'), Df1中要查找的其他值('N_Product')对应的值('M_Product') ['Descr '] - 不需要


    df1 = {'Descr': ["VALVE, DONE", "pump ttf", "pump electrical", "Valve, ww","Geeku MBA , electrical","valve PRESSURE, OVERBOARD","VALVE, Electrical DONE","Geeku electrical OVERBOARD","Geeku OVERBOARD , electrical"],
            'N_Product': ["VALVE", "PUMP", "PUMP", "VALVE", "GEEKU","VALVE","VALVE", "GEEKU", "GEEKU"],
            }
    df2 = {'N_Product': ["GEEKU","GEEKU","GEEKU", "PUMP", "PUMP","VALVE", "VALVE","VALVE"],
            'M_Product': ["ELECTRICAL", "OVERBOARD","", "TTF","", "DONE","PRESSURE",""],
            }
    df1 = pd.DataFrame(df1)
    df2 = pd.DataFrame(df2)

desired result

我使用此代码,但它会搜索 df2 ['M_Product'] 以获取所有值,但仅搜索 df1 ['N_product'] == df2 ['N_Product'] /如果能帮我解决这个问题,我将不胜感激

def foo(x):
    descr = x['Descr'].upper()
    match = None
    if x['N_Product'].upper() in list(df2['N_Product']):
        for mStr in df2['M_Product'].str.upper():
            if mStr in descr:
                match = mStr
                break
    if match is None:
        return x['N_Product'] + ';'
    else:
        return x['N_Product'] + ': ' + match + ';'
df1['Result'] = df1.apply(foo, axis = 1)

我添加了一张图片来可视化需要做的事情,例如 df1 的值 ['N_Product'] "Valve") 同样,所有值都需要完成:

picture

根据您使用问题中的图片描述的结果,以下是我对您尝试执行的操作的理解:

  • 每个 N_Product 值在 df2 中都有一个关联的 M_Product 值列表。
  • df1 中的每个 N_Product 值都有一个 Descr 值,它是以下格式的 csv 列表:N_Product 后跟该行的一个或多个 M_Product 兼容值。
  • Objective:将结果列附加到 df1,其中包含每行的 N_Product 值 n 以及相应描述的第一个 M_Product 值 m 使得 (n, m) 在 df2 中找到。

这里有一些代码,我相信可以满足您的要求:

import pandas as pd
df1 = {'Descr': ["VALVE, DONE", "pump ttf", "pump electrical", "Valve, ww","Geeku MBA , electrical","valve PRESSURE, OVERBOARD","VALVE, Electrical DONE","Geeku electrical OVERBOARD","Geeku OVERBOARD , electrical"],
            'N_Product': ["VALVE", "PUMP", "PUMP", "VALVE", "GEEKU","VALVE","VALVE", "GEEKU", "GEEKU"],
            }
df2 = {'N_Product': ["GEEKU","GEEKU","GEEKU", "PUMP", "PUMP","VALVE", "VALVE","VALVE"],
            'M_Product': ["ELECTRICAL", "OVERBOARD","", "TTF","", "DONE","PRESSURE",""],
            }
df1 = pd.DataFrame(df1).apply(lambda x: x.astype(str).str.upper())
df2 = pd.DataFrame(df2).apply(lambda x: x.astype(str).str.upper())

print('df1:'); print(df1)
print('df2:'); print(df2)

df1['M_Product'] = df1['Descr'].apply(lambda x: [val.strip(',') for val in x.split() if val.strip(',')]).str.slice(start=1)
df1['df1_row'] = df1.index

df3 = df1[['df1_row', 'N_Product', 'M_Product']].explode('M_Product')
df5 = df3.merge(df2, on=['N_Product', 'M_Product']).groupby('df1_row').nth(0)
df1['M_Product'] = df5['M_Product']

df1['Result'] = df1['N_Product'] + (~df1['M_Product'].isna()) * (': ' + df1['M_Product'].astype(str).str.strip()) + ';'
df1 = df1.drop(columns=['M_Product', 'df1_row'])
print('result:'); print(df1)

输出:

df1:
                          Descr N_Product
0                   VALVE, DONE     VALVE
1                      PUMP TTF      PUMP
2               PUMP ELECTRICAL      PUMP
3                     VALVE, WW     VALVE
4        GEEKU MBA , ELECTRICAL     GEEKU
5     VALVE PRESSURE, OVERBOARD     VALVE
6        VALVE, ELECTRICAL DONE     VALVE
7    GEEKU ELECTRICAL OVERBOARD     GEEKU
8  GEEKU OVERBOARD , ELECTRICAL     GEEKU
df2:
  N_Product   M_Product
0     GEEKU  ELECTRICAL
1     GEEKU   OVERBOARD
2     GEEKU
3      PUMP         TTF
4      PUMP
5     VALVE        DONE
6     VALVE    PRESSURE
7     VALVE
result:
                          Descr N_Product              Result
0                   VALVE, DONE     VALVE        VALVE: DONE;
1                      PUMP TTF      PUMP          PUMP: TTF;
2               PUMP ELECTRICAL      PUMP               PUMP;
3                     VALVE, WW     VALVE              VALVE;
4        GEEKU MBA , ELECTRICAL     GEEKU  GEEKU: ELECTRICAL;
5     VALVE PRESSURE, OVERBOARD     VALVE    VALVE: PRESSURE;
6        VALVE, ELECTRICAL DONE     VALVE        VALVE: DONE;
7    GEEKU ELECTRICAL OVERBOARD     GEEKU  GEEKU: ELECTRICAL;
8  GEEKU OVERBOARD , ELECTRICAL     GEEKU  GEEKU: ELECTRICAL;

解释:

  • 使 df1 和 df2 中的所有内容大写以简化匹配
  • 将 Descr 拆分为标记,除了第一个标记(它只是 N_Product 的副本),将它们放入 df1 中名为 M_Product[=47= 的新列中的列表中]
  • 在名为 df1_row
  • 的列中记录原始 df1 行的索引
  • 使用explode()创建一个数据框df3,df1
  • 中上述M_Product列中的每个值一行
  • 使用 merge() 到 select df3 中与 df2 中的行匹配的行 (N_Product, M_Product)
  • df1_rownth(0) 上使用 groupby() 为每个这样的 (N_Product, M_Product) 对取第 0 个匹配项
  • 将这个新数据框中的 M_Product 列添加回 df1
  • 使用 apply() 在 df1 中使用 (1) N_Product + ; 填充新的 Result 列(如果 M_Product 列为空( isna()) 或 (2) N_Product + ':' + M_Product + ';'如果有 M_Product 匹配。
  • 删除我们不再需要的中间列(M_Product、df1_row)。