lookup/search 数据框中的值以创建新列
lookup/search values from data frame to create new column
我正在尝试根据从其他列和行中搜索数据在数据框中创建新列。计算此类列值的 best/fasted 方法是什么。
我已经尝试使用 lambda 和外部函数,但没有结果。
谁能详细说明一下获得最终结果的方法以及哪种方法在计算时间上是最优的。
我们可以分配 function/lambda 来计算这些值吗?
我们能否以这种方式实现数据框,它会保留对列中函数计算值的引用而不是计算值本身?基于其他 columns/rows.
中数据的动态结果
data = {
'ID':[1, 2, 3, 4 ,5],
'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John']
}
df = pd.DataFrame(data)
Original DataFrame:
ID M_Name Name
0 1 Lui Andy
1 2 Lui Rob
2 3 Lui Tony
3 4 NoData John
4 5 John Lui
data_after = {
'ID':[1, 2, 3, 4 ,5],
'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John'],
'ID_by_M_Name':[5, 5, 5, 'NoData', '4']
}
df1 = pd.DataFrame(data_after)
Processed DataFrame:
ID ID_by_M_Name M_Name Name
0 1 5 Lui Andy
1 2 5 Lui Rob
2 3 5 Lui Tony
3 4 NoData NoData John
4 5 4 John Lui
I have tried two ways to get ID but not sure how to use them in assign
getID = lambda name: df.loc[df['Name'] == name]['ID'].iloc[0]
def mID(name):
return df.loc[df['Name'] == name]['ID'].iloc[0]
For each row we want to find ID of M_Name for specifc Name.
e.g. for Name='Andy' we have M_Name = 'Lui' and Lui's ID(5)
For Lui M_name is John and John's ID is 4
print(getID('Lui'))
print(mID('Lui'))
df['ID'] = df.assign(mID(df['M_Name']), axis=1 )
IndexError: 单个位置索引器越界
使用Series.replace
or Series.map
with Series.fillna
:
df['ID_by_M_Name'] = df['M_Name'].replace(df.set_index('Name')['ID'])
#assign alternative
#df = df.assign(ID_by_M_Name = df['M_Name'].replace(df.set_index('Name')['ID']))
df['ID_by_M_Name'] = df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name'])
#assign alternative
#df=df.assign(ID_by_M_Name=df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name']))
print (df)
ID Name M_Name ID_by_M_Name
0 1 Andy Lui 5
1 2 Rob Lui 5
2 3 Tony Lui 5
3 4 John NoData NoData
4 5 Lui John 4
如果新列的重要位置使用DataFrame.insert
:
df.insert(1, 'ID_by_M_Name', df['M_Name'].replace(df.set_index('Name')['ID']))
print (df)
ID ID_by_M_Name Name M_Name
0 1 5 Andy Lui
1 2 5 Rob Lui
2 3 5 Tony Lui
3 4 NoData John NoData
4 5 4 Lui John
我正在尝试根据从其他列和行中搜索数据在数据框中创建新列。计算此类列值的 best/fasted 方法是什么。
我已经尝试使用 lambda 和外部函数,但没有结果。
谁能详细说明一下获得最终结果的方法以及哪种方法在计算时间上是最优的。
我们可以分配 function/lambda 来计算这些值吗?
我们能否以这种方式实现数据框,它会保留对列中函数计算值的引用而不是计算值本身?基于其他 columns/rows.
中数据的动态结果
data = {
'ID':[1, 2, 3, 4 ,5],
'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John']
}
df = pd.DataFrame(data)
Original DataFrame:
ID M_Name Name
0 1 Lui Andy
1 2 Lui Rob
2 3 Lui Tony
3 4 NoData John
4 5 John Lui
data_after = {
'ID':[1, 2, 3, 4 ,5],
'Name':['Andy', 'Rob', 'Tony', 'John', 'Lui'],
'M_Name':['Lui', 'Lui', 'Lui','NoData', 'John'],
'ID_by_M_Name':[5, 5, 5, 'NoData', '4']
}
df1 = pd.DataFrame(data_after)
Processed DataFrame:
ID ID_by_M_Name M_Name Name
0 1 5 Lui Andy
1 2 5 Lui Rob
2 3 5 Lui Tony
3 4 NoData NoData John
4 5 4 John Lui
I have tried two ways to get ID but not sure how to use them in assign
getID = lambda name: df.loc[df['Name'] == name]['ID'].iloc[0]
def mID(name):
return df.loc[df['Name'] == name]['ID'].iloc[0]
For each row we want to find ID of M_Name for specifc Name.
e.g. for Name='Andy' we have M_Name = 'Lui' and Lui's ID(5)
For Lui M_name is John and John's ID is 4
print(getID('Lui'))
print(mID('Lui'))
df['ID'] = df.assign(mID(df['M_Name']), axis=1 )
IndexError: 单个位置索引器越界
使用Series.replace
or Series.map
with Series.fillna
:
df['ID_by_M_Name'] = df['M_Name'].replace(df.set_index('Name')['ID'])
#assign alternative
#df = df.assign(ID_by_M_Name = df['M_Name'].replace(df.set_index('Name')['ID']))
df['ID_by_M_Name'] = df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name'])
#assign alternative
#df=df.assign(ID_by_M_Name=df['M_Name'].map(df.set_index('Name')['ID']).fillna(df['M_Name']))
print (df)
ID Name M_Name ID_by_M_Name
0 1 Andy Lui 5
1 2 Rob Lui 5
2 3 Tony Lui 5
3 4 John NoData NoData
4 5 Lui John 4
如果新列的重要位置使用DataFrame.insert
:
df.insert(1, 'ID_by_M_Name', df['M_Name'].replace(df.set_index('Name')['ID']))
print (df)
ID ID_by_M_Name Name M_Name
0 1 5 Andy Lui
1 2 5 Rob Lui
2 3 5 Tony Lui
3 4 NoData John NoData
4 5 4 Lui John