lookup/search 数据框中的值以创建新列

lookup/search values from data frame to create new column

我正在尝试根据从其他列和行中搜索数据在数据框中创建新列。计算此类列值的 best/fasted 方法是什么。

我已经尝试使用 lambda 和外部函数,但没有结果。

  1. 谁能详细说明一下获得最终结果的方法以及哪种方法在计算时间上是最优的。

  2. 我们可以分配 function/lambda 来计算这些值吗?

  3. 我们能否以这种方式实现数据框,它会保留对列中函数计算值的引用而不是计算值本身?基于其他 columns/rows.

  4. 中数据的动态结果
data = { 
            'ID':[1, 2, 3, 4 ,5],                  
            'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
            'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John']
             } 

df = pd.DataFrame(data) 

Original DataFrame:
    ID  M_Name  Name
0   1     Lui  Andy
1   2     Lui   Rob
2   3     Lui  Tony
3   4  NoData  John
4   5    John   Lui

data_after = { 
            'ID':[1, 2, 3, 4 ,5],                  
            'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
            'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John'],    
            'ID_by_M_Name':[5, 5, 5, 'NoData', '4']
             } 

df1 = pd.DataFrame(data_after)

Processed DataFrame:
    ID ID_by_M_Name  M_Name  Name
0   1          5     Lui  Andy
1   2          5     Lui   Rob
2   3          5     Lui  Tony
3   4     NoData  NoData  John
4   5          4    John   Lui

I have tried two ways to get ID but not sure how to use them in assign

getID = lambda name: df.loc[df['Name'] == name]['ID'].iloc[0]

def mID(name):
    return df.loc[df['Name'] == name]['ID'].iloc[0]

For each row we want to find ID of M_Name for specifc Name. 
e.g. for Name='Andy' we have M_Name = 'Lui' and Lui's ID(5)
For Lui M_name is John and John's ID is 4

print(getID('Lui'))
print(mID('Lui'))

df['ID'] = df.assign(mID(df['M_Name']), axis=1 )

IndexError: 单个位置索引器越界

使用Series.replace or Series.map with Series.fillna:

df['ID_by_M_Name'] = df['M_Name'].replace(df.set_index('Name')['ID'])
#assign alternative
#df = df.assign(ID_by_M_Name = df['M_Name'].replace(df.set_index('Name')['ID']))
df['ID_by_M_Name'] = df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name'])
#assign alternative
#df=df.assign(ID_by_M_Name=df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name']))

print (df)

   ID  Name  M_Name ID_by_M_Name
0   1  Andy     Lui            5
1   2   Rob     Lui            5
2   3  Tony     Lui            5
3   4  John  NoData       NoData
4   5   Lui    John            4

如果新列的重要位置使用DataFrame.insert:

df.insert(1, 'ID_by_M_Name', df['M_Name'].replace(df.set_index('Name')['ID']))
print (df)

   ID ID_by_M_Name  Name  M_Name
0   1            5  Andy     Lui
1   2            5   Rob     Lui
2   3            5  Tony     Lui
3   4       NoData  John  NoData
4   5            4   Lui    John