如何旋转 pandas DataFrame 然后添加分层列？

Question

有人能帮我理解将记录形式的 Python pandas DataFrame（数据集 A）转换为以嵌套列为中心的 DataFrame 的步骤（如中所示数据集 B)?

对于这个问题，底层模式具有以下规则：

每个 ProjectID 出现一次
每个 ProjectID 都关联到一个 PM
每个 ProjectID 都关联到一个类别
多个 ProjectID 可以与一个类别相关联
多个 ProjectID 可以与单个 PM 相关联

输入数据集A

df_A = pd.DataFrame({'ProjectID':[1,2,3,4,5,6,7,8],
          'PM':['Bob','Jill','Jack','Jack','Jill','Amy','Jill','Jack'],
          'Category':['Category A','Category B','Category C','Category B','Category A','Category D','Category B','Category B'],
          'Comments':['Justification 1','Justification 2','Justification 3','Justification 4','Justification 5','Justification 6','Justification 7','Justification 8'],
          'Score':[10,7,10,5,15,10,0,2]})

期望输出 注意上面添加了跨列的嵌套索引。还要注意 'Comments' 和 'Score' 都出现在 'ProjectID' 下面的同一层。最后看看所需的输出如何不聚合任何数据，而是 groups/merges 类别数据每个类别值一行。

到目前为止我已经尝试过：

df_A.set_index(['Category','ProjectID'],append=True).unstack() - 这只有在我首先创建 ['Category','ProjectID] 的嵌套索引并将其添加到使用标准数据框创建的原始数字索引中，但是它将 Category/ProjectID 匹配的每个实例重复为它自己的行（因为原始索引）。
df_A.groupby() - 我无法使用它，因为它 似乎强制某种聚合 为了在一行中获取单个类别的所有值。
df_A.pivot('Category','ProjectID',values='Comments') - 我可以执行一个枢轴以避免不必要的聚合，它开始看起来与我的预期输出相似，但只能看到 'Comments' 字段，也不能以这种方式设置嵌套列。尝试在数据透视语句中设置 values=['Comments','Score'] 时收到错误消息。

我认为答案介于 pivot、unstack、set_index 或 groupby 之间，但我不知道如何完成 pivot，然后添加适当的嵌套列索引。

如果你们有任何想法，我将不胜感激。
根据 T 先生的评论更新了问题。谢谢。

Answer 1

我想这就是您要找的：

pd.DataFrame(df_A.set_index(['PM', 'ProjectID', 'Category']).sort_index().stack()).T.stack(2)

Out[4]:
PM                        Amy                    Bob        ...              Jill
ProjectID                   6                      1        ...                 5                      7
                     Comments Score         Comments Score  ...          Comments Score         Comments Score
  Category                                                  ...
0 Category A              NaN   NaN  Justification 1    10  ...   Justification 5    15              NaN   NaN
  Category B              NaN   NaN              NaN   NaN  ...               NaN   NaN  Justification 7     0
  Category C              NaN   NaN              NaN   NaN  ...               NaN   NaN              NaN   NaN
  Category D  Justification 6    10              NaN   NaN  ...               NaN   NaN              NaN   NaN

[4 rows x 16 columns]

编辑：要按类别 select 行，您应该通过添加 .xs():

来删除行索引 0

In [3]: df_A_transformed = pd.DataFrame(df_A.set_index(['PM', 'ProjectID', 'Category']).sort_index().stack()).T.stack(2).xs(0)

In [4]: df_A_transformed
Out[4]:
PM                      Amy                    Bob        ...              Jill
ProjectID                 6                      1        ...                 5                      7
                   Comments Score         Comments Score  ...          Comments Score         Comments Score
Category                                                  ...
Category A              NaN   NaN  Justification 1    10  ...   Justification 5    15              NaN   NaN
Category B              NaN   NaN              NaN   NaN  ...               NaN   NaN  Justification 7     0
Category C              NaN   NaN              NaN   NaN  ...               NaN   NaN              NaN   NaN
Category D  Justification 6    10              NaN   NaN  ...               NaN   NaN              NaN   NaN

[4 rows x 16 columns]

In [5]: df_A_transformed.loc['Category B']
Out[5]:
PM    ProjectID
Amy   6          Comments                NaN
                 Score                   NaN
Bob   1          Comments                NaN
                 Score                   NaN
Jack  3          Comments                NaN
                 Score                   NaN
      4          Comments    Justification 4
                 Score                     5
      8          Comments    Justification 8
                 Score                     2
Jill  2          Comments    Justification 2
                 Score                     7
      5          Comments                NaN
                 Score                   NaN
      7          Comments    Justification 7
                 Score                     0
Name: Category B, dtype: object

如何旋转 pandas DataFrame 然后添加分层列？

How do I pivot a pandas DataFrame and then add hierarchical columns?

python

preprocessor

dataframe

pandas

data-cleaning