如何将数据框的(内部)两列与 pandas/python 合并?

How to merge (inner) two columns of a dataframe with pandas/python?

我有一个包含两列的数据框; A_ID 和 R_ID.

我想将 R_ID 更新为仅包含也在 A_ID 中的值,应删除其余值(也为 NaN)。值应保持不变 position/index。我知道这是一个内部连接,但我提出的解决方案遇到了几个问题。

示例:

import pandas as pd
data = {'A_ID': ['1E2', '1E3', '1E4', '1E5'], 'R_ID': ['1E7',[np.nan],[np.nan],"1E4",]}
df = pd.DataFrame(data)
print(df)

我试过了

df_A_ID = df[["A_ID"]]
df_R_ID = df[["R_ID"]]
new_df = pd.merge(df_A_ID, df_R_ID, how='inner', left_on='A_ID', right_on ='R_ID', right_index=True)

new_df = pd.concat([dataset_A_ID, dataset_R_ID],join="inner")

但是对于第一个选项,我得到一个“你正在尝试合并对象和 int64 列”错误,即使两列都是 d.types 对象,而对于第二个选项,我得到一个空的 DataFrame。

我的预期输出将是与以前相同的数据帧,但 R_ID 仅包含也在列 A_ID 中的值,同时 index/position:

data = {'A_ID': ['1E2', '1E3', '1E4', '1E5'], 'R_ID': [[np.nan],[np.nan],[np.nan],"1E4",]}
df = pd.DataFrame(data)
print(df)

通过Series.where if no match columns compared by Series.isin设置NaN:

#solution working with scalar NaNs
data = {'A_ID': ['1E2', '1E3', '1E4', '1E5'], 'R_ID': ['1E7',np.nan,np.nan,"1E4",]}
df = pd.DataFrame(data)
print(df)
  A_ID R_ID
0  1E2  1E7
1  1E3  NaN
2  1E4  NaN
3  1E5  1E4

df['R_ID'] = df['R_ID'].where(df["R_ID"].isin(df["A_ID"]))
print(df)
  A_ID R_ID
0  1E2  NaN
1  1E3  NaN
2  1E4  NaN
3  1E5  1E4

或者:

df.loc[~df["R_ID"].isin(df["A_ID"]), 'R_ID'] = np.nan

使用isin:

df['R_ID'] = df['R_ID'].loc[df['R_ID'].isin(df['A_ID'])]
>>> df
  A_ID R_ID
0  1E2  NaN
1  1E3  NaN
2  1E4  NaN
3  1E5  1E4

它应该有效

df_A_ID = df[["A_ID"]].astype(dtype=pd.StringDtype())
df_R_ID = df[["R_ID"]].astype(dtype=pd.StringDtype()).reset_index()
temp_df = pd.merge(df_A_ID, df_R_ID, how='inner', left_on='A_ID', right_on ='R_ID').set_index('index')

df.loc[~(df_R_ID.isin(temp_df[['R_ID']])['R_ID']).fillna(False),'R_ID'] = [np.nan]

输出

  A_ID R_ID
0  1E2  NaN
1  1E3  NaN
2  1E4  1E4
3  1E5  NaN