使用 pandas 从数据框中的字典列表中提取值
Extract value from list of dictionaries in dataframe using pandas
我有这个包含 4 列的数据框。我想在一个单独的列中提取 resourceName (即 IDs )。我尝试了各种方法和循环,但无法分离它。
数据集:
Username
Event name
Resources
XYZ-DEV_ENV_POST_function
StopInstances
[{"resourceType":"AWS::EC2::Instance","resourceName":"i-05fbb7a"}]
XYZ-DEV_ENV_POST_function
StartInstances
[{"resourceType":"AWS::EC2::Instance","resourceName":"i-08bd2475"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0fd69dc1"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0174dd38aea"}]
我想要一个更多的列 ID,它们将具有来自资源列的 ID,如下所示:
Username
Event name
Resources
IDS
XYZ-DEV_ENV_POST_function
StopInstances
[{"resourceType":"AWS::EC2::Instance","resourceName":"i-05fbb7a"}]
i-05fbb7a"
XYZ-DEV_ENV_POST_function
StartInstances
[{"resourceType":"AWS::EC2::Instance","resourceName":"i-08bd2475"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0fd69dc1"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0174dd38aea"}]
i-08bd2475 , i-0fd69dc1 , i-0174
这是 data.head(2) 的输出。to_dict():
{'Date':
{0: '28-02-2022', 1: '28-02-2022'},
'Event name':
{0:'StopInstances',1:'StartInstances'},
'Resources':
{
0: '[{"resourceType":"AWS::EC2::Instance","resourceName":"i-05fbb7a"}]',
1: '[{"resourceType":"AWS::EC2::Instance","resourceName":"i-08bd2475"},{"resourceType":"AWS::EC2::Instance","resourceName":" i-0fd69dc1"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0174dd38aea"}]'},
'User name': {0: 'XYZ-DEV_ENV_POST_function', 1:
'XYZ-DEV_ENV_POST_function'}}
感谢和问候
df['ID'] = df['Resources'].apply(lambda x: ','.join([i['resourceName'] for i in eval(x)]))
Date ... ID
0 28-02-2022 ... i-05fbb7a
1 28-02-2022 ... i-08bd2475,i-0fd69dc1,i-0174dd38aea
我有这个包含 4 列的数据框。我想在一个单独的列中提取 resourceName (即 IDs )。我尝试了各种方法和循环,但无法分离它。
数据集:
Username | Event name | Resources |
---|---|---|
XYZ-DEV_ENV_POST_function | StopInstances | [{"resourceType":"AWS::EC2::Instance","resourceName":"i-05fbb7a"}] |
XYZ-DEV_ENV_POST_function | StartInstances | [{"resourceType":"AWS::EC2::Instance","resourceName":"i-08bd2475"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0fd69dc1"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0174dd38aea"}] |
我想要一个更多的列 ID,它们将具有来自资源列的 ID,如下所示:
Username | Event name | Resources | IDS |
---|---|---|---|
XYZ-DEV_ENV_POST_function | StopInstances | [{"resourceType":"AWS::EC2::Instance","resourceName":"i-05fbb7a"}] | i-05fbb7a" |
XYZ-DEV_ENV_POST_function | StartInstances | [{"resourceType":"AWS::EC2::Instance","resourceName":"i-08bd2475"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0fd69dc1"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0174dd38aea"}] | i-08bd2475 , i-0fd69dc1 , i-0174 |
这是 data.head(2) 的输出。to_dict():
{'Date': {0: '28-02-2022', 1: '28-02-2022'}, 'Event name': {0:'StopInstances',1:'StartInstances'}, 'Resources': { 0: '[{"resourceType":"AWS::EC2::Instance","resourceName":"i-05fbb7a"}]', 1: '[{"resourceType":"AWS::EC2::Instance","resourceName":"i-08bd2475"},{"resourceType":"AWS::EC2::Instance","resourceName":" i-0fd69dc1"},{"resourceType":"AWS::EC2::Instance","resourceName":"i-0174dd38aea"}]'}, 'User name': {0: 'XYZ-DEV_ENV_POST_function', 1: 'XYZ-DEV_ENV_POST_function'}}
感谢和问候
df['ID'] = df['Resources'].apply(lambda x: ','.join([i['resourceName'] for i in eval(x)]))
Date ... ID
0 28-02-2022 ... i-05fbb7a
1 28-02-2022 ... i-08bd2475,i-0fd69dc1,i-0174dd38aea