具有公共密钥时如何用另一个数据帧填充数据帧中的缺失数据

How to fill missing data from a dataframe with another dataframe when having a common key

我有两个dataframe.As一个样本请看下面。 当具有相同的 ProductID

时,如何使用 dfB 中的相同值填充 df[GrossRate]== 0

基本上我在 df 中的 GrossRate 应该是 150 40 238 32

dataA = {'date': ['20210101','20210102','20210103','20210104'],
        'quanitity': [22000,25000,27000,35000],
        'NetRate': ['nan','nan','nan','nan'],
        'GrossRate': [150,0,238,0],
        'ProductID': [9613,7974,1714,5302],
        }

df = pd.DataFrame(dataA, columns = ['date', 'quanitity', 'NetRate', 'GrossRate','ProductID' ])

    date  quanitity NetRate  GrossRate  ProductID
0  20210101      22000     nan        150       9613
1  20210102      25000     nan          0       7974
2  20210103      27000     nan        238       1714
3  20210104      35000     nan          0       5302
dataB = {
        'ProductID': ['9613.T','7974.T','1714.T','5302.T'],
         'GrossRate': [10,40,28,32],
        }

dfB = pd.DataFrame(dataB, columns = ['ProductID', 'GrossRate' ])
dfB.ProductID = dfB.ProductID.str.replace('.T','')

print (dfB)

  ProductID  GrossRate
0      9613         10
1      7974         40
2      1714         28
3      5302         32

试试这个列表理解:

df['GrossRate'] = [x if x != 0 else y for x, y in zip(df['GrossRate'], dfB['GrossRate'])]

如果 ProductID 列中相同的行数和相同的顺序不需要通过 ProductID 匹配,那么使用 numpy.where:

df['GrossRate'] = np.where(df['GrossRate'] == 0, dfB['GrossRate'], df['GrossRate'])

print (df)
       date  quanitity NetRate  GrossRate  ProductID
0  20210101      22000     nan        150       9613
1  20210102      25000     nan         40       7974
2  20210103      27000     nan        238       1714
3  20210104      35000     nan         32       5302

如果需要通过ProductID匹配使用:

dfB.ProductID = dfB.ProductID.str.replace('.T','').astype(int)

df['GrossRate'] = (np.where(df['GrossRate'] == 0, 
                            df['ProductID'].map(dfB.set_index('ProductID')['GrossRate']),
                            df['GrossRate']))