如何匹配和合并两个具有完全不同值的数据帧，除了数据帧列中的数字？

Question

有一个数据框 ABC 的值

      id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you       | unknown
1     vbbngy     | txn of INR 191.00 using            | unknown
2     awerfa     | Rs.190.78 credits was used by you  | unknown
3     zxcmo5     | DLR.2000 credits was used by you   | unknown

和其他 XYZ 值

         price          |   type
0      190.78           | food
1      191.00           | movie
2      2,000            | football
3      1,599.00         | basketball

如何将 XYZ 与 ABC 映射，以便使用 XYZ 价格中的值（数字）更新 ABC 中的类型与 xyz 中的类型。

我需要的输出

       id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you        | basketball
1     vbbngy     | txn of INR 191.00 using             | movie
2     awerfa     | Rs.190.78 credits was used by you   | food
3     zxcmo5     | DLR.2,000 credits was used by you| football

用过这个

d = dict(zip(XYZ['PRICE'],XYZ['TYPE']))

pat = (r'({})'.format('|'.join(d.keys())))

ABC['TYPE']=ABC['PRICE'].str.extract(pat,expand=False).map(d)

但是 190.78 和 191.00 等值变得不匹配。例如，在处理大量数据时，190.78 应该与食物值相匹配，例如 190.77 与分配了其他值的食物不匹配。并且 198.78 也与其他一些应该与食物匹配的不匹配

Answer 1

您可以执行以下操作：

'''
First we make a artificial key column to be able to merge
We basically just substract the floating numbers from the string
And convert it to type float
'''

df1['price_key'] = df1['price'].str.replace(',', '').str.extract('(\d+\.\d+)').astype(float)

# After that we do a merge on price and price_key and drop the columns which we dont need
df_final = pd.merge(df1, df2, left_on='price_key', right_on='price', suffixes=['', '_2'])
df_final = df_final.drop(['type', 'price_key', 'price_2'], axis='columns')

输出

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football

我猜你打错了 xyz table，第三个价格应该是 2000.78 而不是 2000。

Answer 2

df

        id                price                                type
0       easdca        Rs.1,599.00 was trasn by you          unknown
1       vbbngy        txn of INR 191.00 using               unknown
2       awerfa        Rs.190.78 credits was used by you     unknown
3       zxcmo5        DLR.2000 credits was used by you      unknown

df2

           price                   type
0        190.78                    food
1        191.00                   movie
2        2,000                 football
3        1,599.00            basketball

使用 re

df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')

将类型更改为 float

df2['price_']  = df2['price_'].astype(float)
df['price_']  = df['price_'] .astype(float)

使用 pd.merge

df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)

输出

                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football

如何匹配和合并两个具有完全不同值的数据帧，除了数据帧列中的数字？

How to match and merge two dataframes having completely different values except numericals in columns of dataframe?

python

epoch

dataframe

python-3.x

pandas