使用具有字典列表的数据框列为数据框创建其他列

Question

我的数据框中有一列对象类型，其值如下：

for i in df3['placeholders'][:10]:

Output:
[{'type': 'experience', 'label': '0-1 Yrs'}, {'type': 'salary', 'label': '1,00,000 - 1,25,000 PA.'}, {'type': 'location', 'label': 'Chennai'}]
[{'type': 'date', 'label': '08 October - 13 October'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Chennai'}]
[{'type': 'education', 'label': 'B.Com'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Mumbai Suburbs, Navi Mumbai, Mumbai'}]
[{'type': 'experience', 'label': '0-2 Yrs'}, {'type': 'salary', 'label': '50,000 - 2,00,000 PA.'}, {'type': 'location', 'label': 'Chennai'}]
[{'type': 'experience', 'label': '0-1 Yrs'}, {'type': 'salary', 'label': '2,00,000 - 2,25,000 PA.'}, {'type': 'location', 'label': 'Bengaluru(JP Nagar)'}]
[{'type': 'experience', 'label': '0-3 Yrs'}, {'type': 'salary', 'label': '80,000 - 2,00,000 PA.'}, {'type': 'location', 'label': 'Hyderabad'}]
[{'type': 'experience', 'label': '0-5 Yrs'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Hyderabad'}]
[{'type': 'experience', 'label': '0-1 Yrs'}, {'type': 'salary', 'label': '1,25,000 - 2,00,000 PA.'}, {'type': 'location', 'label': 'Mumbai'}]
[{'type': 'date', 'label': '08 October - 17 October'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Pune(Bavdhan)'}]
[{'type': 'experience', 'label': '0-2 Yrs'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Jaipur'}]
[{'type': 'experience', 'label': '0-0 Yrs'}, {'type': 'salary', 'label': '1,00,000 - 1,50,000 PA.'}, {'type': 'location', 'label': 'Delhi NCR(Sector-81 Noida)'}]

我想通过从该列中提取特征来向我现有的数据框添加更多列，这样

“类型”的值= 列名

“标签”的值= 列下的值

最终预期输出：

df.head(3)

Output:

..... experience, salary, location, date, education

..... 0-1 Yrs, 1,00,000 - 1,25,000 PA., Chennai, nan, nan
..... nan, 1,00,000 - 1,25,000 PA., Chennai, 08 October - 13 October, nan
..... nan, Not disclosed, Mumbai Suburbs, Navi Mumbai, Mumbai, nan, B.Com

第一个答案有效。 [编辑 2]

后来，我针对具有相同问题的新数据集尝试了第一个响应中建议的相同代码。我收到以下错误：

<ipython-input-23-ad8e644044af> in <listcomp>(.0)
----> 1 new_columns = set([d['Name'] for l in dfr.RatingDistribution.values for d in l ])
      2 # Make a dict of dicts
      3 col_val_dict = {}
      4 for col_name in new_columns:
      5     col_val_dict[col_name] = {}

TypeError: 'float' object is not iterable

我的输入栏：

RatingDistribution
[{'Name': 'Work-Life Balance', 'count': 5}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 5}, {'Name': 'Company Culture', 'count': 5}, {'Name': 'Career Growth', 'count': 5}, {'Name': 'Work Satisfaction', 'count': 5}]
[{'Name': 'Work-Life Balance', 'count': 4}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 4}, {'Name': 'Job Security', 'count': 4}, {'Name': 'Company Culture', 'count': 3}, {'Name': 'Career Growth', 'count': 3}, {'Name': 'Work Satisfaction', 'count': 5}]
[{'Name': 'Work-Life Balance', 'count': 3}, {'Name': 'Skill Development', 'count': 4}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 4}, {'Name': 'Company Culture', 'count': 5}, {'Name': 'Career Growth', 'count': 4}, {'Name': 'Work Satisfaction', 'count': 4}]
[{'Name': 'Work-Life Balance', 'count': 5}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 5}, {'Name': 'Company Culture', 'count': 5}, {'Name': 'Career Growth', 'count': 5}, {'Name': 'Work Satisfaction', 'count': 5}]
[{'Name': 'Work-Life Balance', 'count': 3}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 3}, {'Name': 'Job Security', 'count': 3}, {'Name': 'Company Culture', 'count': 3}, {'Name': 'Career Growth', 'count': 3}, {'Name': 'Work Satisfaction', 'count': 4}]
[{'Name': 'Work-Life Balance', 'count': 3}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 1}, {'Name': 'Company Culture', 'count': 3}, {'Name': 'Career Growth', 'count': 1}, {'Name': 'Work Satisfaction', 'count': 1}]

我的代码：

new_columns = set([d['Name'] for l in dfr.RatingDistribution.values for d in l ])
# Make a dict of dicts 
col_val_dict = {}
for col_name in new_columns:
    col_val_dict[col_name] = {}
    # For each column name look to see if a row has that as a type
    # If so, get the label for that dict
    # otherwise fill it with NaN
    for i,l in enumerate(dfr.placeholders.values):
        the_label = [d['count'] for d in l if d['Name'] == col_name]
        if the_label:
            col_val_dict[col_name][i] = the_label[0]
        else:
            col_val_dict[col_name][i] = np.NaN
            
# Merge this new dfa with the old one
merged_dfa = pd.concat([dfr,pd.DataFrame(col_val_dict)],axis='columns')
dfr.shape

我在第一行遇到错误。我无法弄清楚为什么它会抛出浮动错误。

请帮忙

Answer 1

# Get the unique types (column names)
new_columns = set([d['type'] for l in df3.placeholders.values for d in l ])
# Make a dict of dicts 
col_val_dict = {}
for col_name in new_columns:
    col_val_dict[col_name] = {}
    # For each column name look to see if a row has that as a type
    # If so, get the label for that dict
    # otherwise fill it with NaN
    for i,l in enumerate(df3.placeholders.values):
        the_label = [d['label'] for d in l if d['type'] == col_name]
        if the_label:
            col_val_dict[col_name][i] = the_label[0]
        else:
            col_val_dict[col_name][i] = np.NaN
            
# Merge this new df with the old one
merged_df = pd.concat([df3,pd.DataFrame(col_val_dict)],axis='columns')

使用具有字典列表的数据框列为数据框创建其他列

Use the column of a dataframe that has a list of dictionaries to create other columns for the dataframe

python

feature-extraction

dataframe

python-3.x

pandas

第一个答案有效。 [编辑 2]