具有公共密钥时如何用另一个数据帧填充数据帧中的缺失数据
How to fill missing data from a dataframe with another dataframe when having a common key
我有两个dataframe.As一个样本请看下面。
当具有相同的 ProductID
时,如何使用 dfB 中的相同值填充 df[GrossRate]== 0
基本上我在 df 中的 GrossRate 应该是
150
40
238
32
dataA = {'date': ['20210101','20210102','20210103','20210104'],
'quanitity': [22000,25000,27000,35000],
'NetRate': ['nan','nan','nan','nan'],
'GrossRate': [150,0,238,0],
'ProductID': [9613,7974,1714,5302],
}
df = pd.DataFrame(dataA, columns = ['date', 'quanitity', 'NetRate', 'GrossRate','ProductID' ])
date quanitity NetRate GrossRate ProductID
0 20210101 22000 nan 150 9613
1 20210102 25000 nan 0 7974
2 20210103 27000 nan 238 1714
3 20210104 35000 nan 0 5302
dataB = {
'ProductID': ['9613.T','7974.T','1714.T','5302.T'],
'GrossRate': [10,40,28,32],
}
dfB = pd.DataFrame(dataB, columns = ['ProductID', 'GrossRate' ])
dfB.ProductID = dfB.ProductID.str.replace('.T','')
print (dfB)
ProductID GrossRate
0 9613 10
1 7974 40
2 1714 28
3 5302 32
试试这个列表理解:
df['GrossRate'] = [x if x != 0 else y for x, y in zip(df['GrossRate'], dfB['GrossRate'])]
如果 ProductID
列中相同的行数和相同的顺序不需要通过 ProductID
匹配,那么使用 numpy.where
:
df['GrossRate'] = np.where(df['GrossRate'] == 0, dfB['GrossRate'], df['GrossRate'])
print (df)
date quanitity NetRate GrossRate ProductID
0 20210101 22000 nan 150 9613
1 20210102 25000 nan 40 7974
2 20210103 27000 nan 238 1714
3 20210104 35000 nan 32 5302
如果需要通过ProductID
匹配使用:
dfB.ProductID = dfB.ProductID.str.replace('.T','').astype(int)
df['GrossRate'] = (np.where(df['GrossRate'] == 0,
df['ProductID'].map(dfB.set_index('ProductID')['GrossRate']),
df['GrossRate']))
我有两个dataframe.As一个样本请看下面。 当具有相同的 ProductID
时,如何使用 dfB 中的相同值填充 df[GrossRate]== 0基本上我在 df 中的 GrossRate 应该是 150 40 238 32
dataA = {'date': ['20210101','20210102','20210103','20210104'],
'quanitity': [22000,25000,27000,35000],
'NetRate': ['nan','nan','nan','nan'],
'GrossRate': [150,0,238,0],
'ProductID': [9613,7974,1714,5302],
}
df = pd.DataFrame(dataA, columns = ['date', 'quanitity', 'NetRate', 'GrossRate','ProductID' ])
date quanitity NetRate GrossRate ProductID
0 20210101 22000 nan 150 9613
1 20210102 25000 nan 0 7974
2 20210103 27000 nan 238 1714
3 20210104 35000 nan 0 5302
dataB = {
'ProductID': ['9613.T','7974.T','1714.T','5302.T'],
'GrossRate': [10,40,28,32],
}
dfB = pd.DataFrame(dataB, columns = ['ProductID', 'GrossRate' ])
dfB.ProductID = dfB.ProductID.str.replace('.T','')
print (dfB)
ProductID GrossRate
0 9613 10
1 7974 40
2 1714 28
3 5302 32
试试这个列表理解:
df['GrossRate'] = [x if x != 0 else y for x, y in zip(df['GrossRate'], dfB['GrossRate'])]
如果 ProductID
列中相同的行数和相同的顺序不需要通过 ProductID
匹配,那么使用 numpy.where
:
df['GrossRate'] = np.where(df['GrossRate'] == 0, dfB['GrossRate'], df['GrossRate'])
print (df)
date quanitity NetRate GrossRate ProductID
0 20210101 22000 nan 150 9613
1 20210102 25000 nan 40 7974
2 20210103 27000 nan 238 1714
3 20210104 35000 nan 32 5302
如果需要通过ProductID
匹配使用:
dfB.ProductID = dfB.ProductID.str.replace('.T','').astype(int)
df['GrossRate'] = (np.where(df['GrossRate'] == 0,
df['ProductID'].map(dfB.set_index('ProductID')['GrossRate']),
df['GrossRate']))