从特定 row/index 的一个数据框搜索值并将其添加到特定 row/index 的另一个 df
Search for and Add Values from One Dataframe at specific row/index to another df at specific row/index
Pandas 操作DF问题在这里
我想在我原来的 DF (df) 中创建一个新列,它是另一个 DF (dfKey) 的特定索引处的值。
我有点卡住了(我确定我遗漏了一些明显的东西,但我无法解码当前的错误消息 'KeyError: 'Name'
)。
数据:
import numpy as np
import pandas as pd
raw_data = {'Code': [250, 200, 875, 1200],
'Metric1': [1.4, 350, 0.2, 500],
'Metric999': [1.2, 375, 0.22, 505],}
df = pd.DataFrame(raw_data, columns = ['Code','Metric1', 'Metric999',])
df.set_index('Code', inplace=True) #Set Code as Row Index
print(df)
raw_dataKey = {'Code': [250, 1200, 205, 2899, 875, 5005],
'Ticker': ['NVID', 'ATVI', 'CRM', 'GOOGL', 'TSLA','GE', ],
'Name': ['NVIDA Corp', 'Activision', 'SalesForce', 'Googlyness', 'Tesla Company','General Electric']}
dfKey = pd.DataFrame(raw_dataKey , columns = ['Code','Ticker', 'Name'])
dfKey.set_index('Code', inplace=True) #Set Code as Row Index
print(dfKey)
期望输出 (df.head()
):
Ticker Name Code Metric1 Metric999
Code
250 NVID NVIDA Corp 250 1.4 1.20
200 NaN NaN 200 350.0 375.00
875 TSLA Tesla Company 875 0.2 0.22
1200 ATVI Activision 1200 500.0 505.00
我认为执行此操作的最佳方法是 for 循环,因为我尝试过的所有其他方法(例如 df['Name']=np.where(df['Code']==dfKey['Code'], dfKey['Name'])
)仅 compare/test 同一索引处的每一行;没有搜索。
我最近的尝试:
codes=df.index.tolist()
codes
for code in codes:
#1. Find Name and Ticker in Key
name = dfKey['Name'].loc[code]
ticker = dfKey['Ticker'].loc[code]
#2. Put Name and Ticker back in original
df['Name'].loc[code] = name
df['Ticker'].loc[code] = ticker
我认为你需要merge
:
dfKey.merge(df, left_index=True, right_index=True, how='outer')
输出:
Ticker Name Metric1 Metric999
Code
200 CRM SalesForce 350.0 375.00
250 NVID NVIDA Corp 1.4 1.20
875 TSLA Tesla Company 0.2 0.22
1200 ATVI Activision 500.0 505.00
2899 GOOGL Googlyness NaN NaN
5005 GE General Electric NaN NaN
IIUC:
In [13]: df.join(dfKey)
Out[13]:
Metric1 Metric999 Ticker Name
Code
250 1.4 1.20 NVID NVIDA Corp
200 350.0 375.00 NaN NaN
875 0.2 0.22 TSLA Tesla Company
1200 500.0 505.00 ATVI Activision
Pandas 操作DF问题在这里
我想在我原来的 DF (df) 中创建一个新列,它是另一个 DF (dfKey) 的特定索引处的值。
我有点卡住了(我确定我遗漏了一些明显的东西,但我无法解码当前的错误消息 'KeyError: 'Name'
)。
数据:
import numpy as np
import pandas as pd
raw_data = {'Code': [250, 200, 875, 1200],
'Metric1': [1.4, 350, 0.2, 500],
'Metric999': [1.2, 375, 0.22, 505],}
df = pd.DataFrame(raw_data, columns = ['Code','Metric1', 'Metric999',])
df.set_index('Code', inplace=True) #Set Code as Row Index
print(df)
raw_dataKey = {'Code': [250, 1200, 205, 2899, 875, 5005],
'Ticker': ['NVID', 'ATVI', 'CRM', 'GOOGL', 'TSLA','GE', ],
'Name': ['NVIDA Corp', 'Activision', 'SalesForce', 'Googlyness', 'Tesla Company','General Electric']}
dfKey = pd.DataFrame(raw_dataKey , columns = ['Code','Ticker', 'Name'])
dfKey.set_index('Code', inplace=True) #Set Code as Row Index
print(dfKey)
期望输出 (df.head()
):
Ticker Name Code Metric1 Metric999
Code
250 NVID NVIDA Corp 250 1.4 1.20
200 NaN NaN 200 350.0 375.00
875 TSLA Tesla Company 875 0.2 0.22
1200 ATVI Activision 1200 500.0 505.00
我认为执行此操作的最佳方法是 for 循环,因为我尝试过的所有其他方法(例如 df['Name']=np.where(df['Code']==dfKey['Code'], dfKey['Name'])
)仅 compare/test 同一索引处的每一行;没有搜索。
我最近的尝试:
codes=df.index.tolist()
codes
for code in codes:
#1. Find Name and Ticker in Key
name = dfKey['Name'].loc[code]
ticker = dfKey['Ticker'].loc[code]
#2. Put Name and Ticker back in original
df['Name'].loc[code] = name
df['Ticker'].loc[code] = ticker
我认为你需要merge
:
dfKey.merge(df, left_index=True, right_index=True, how='outer')
输出:
Ticker Name Metric1 Metric999
Code
200 CRM SalesForce 350.0 375.00
250 NVID NVIDA Corp 1.4 1.20
875 TSLA Tesla Company 0.2 0.22
1200 ATVI Activision 500.0 505.00
2899 GOOGL Googlyness NaN NaN
5005 GE General Electric NaN NaN
IIUC:
In [13]: df.join(dfKey)
Out[13]:
Metric1 Metric999 Ticker Name
Code
250 1.4 1.20 NVID NVIDA Corp
200 350.0 375.00 NaN NaN
875 0.2 0.22 TSLA Tesla Company
1200 500.0 505.00 ATVI Activision