使用新列名将 Pandas 中的数据框从长格式重塑为宽格式

Question

我有一个需要重塑的数据框（下面的示例）。我希望每行只有一个唯一用户，但是现在，每个用户在数据框中有两行，具有基于 'testday' 列（基线和 D7）的不同值。我想要的是根据测试日组的列名重命名值列 ('01.Tristeza Aparente)。因此，新的值列将类似于 'Basel_Tristeza Aparente' 和 'D7_01. Tristeza Aparente'

我在 Pivot 和 unstack 上准备的教程不太管用，因为我没有尝试汇总数据。将用户折叠成一行时，我只需要不同的列。谢谢，如果我能把这个问题说得更清楚，请告诉我

  {'01. Tristeza Aparente': {0: 4.0,
  1: 4.0,
  2: 4.0,
  3: 2.0,
  4: 1.0,
  5: 0.0,
  6: 3.0},
 '02. Tristeza Expressa': {0: 6.0,
  1: 6.0,
  2: 4.0,
  3: 0.0,
  4: 4.0,
  5: 3.0,
  6: 6.0},
 'group': {0: 'placebo',
  1: 'placebo',
  2: 'placebo',
  3: 'placebo',
  4: 'placebo',
  5: 'placebo',
  6: 'placebo'},
 'subject': {0: 1.0, 1: nan, 2: 2.0, 3: nan, 4: 3.0, 5: nan, 6: 4.0},
 'subjectedit': {0: 1.0, 1: 1.0, 2: 2.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 4.0},
 'testday': {0: 'Basal',
  1: 'D7',
  2: 'Basal',
  3: 'D7',
  4: 'Basal',
  5: 'D7',
  6: 'Basal'}}

Answer 1

这df['new_column'] = df['testday'] + '_' + '01. Tristeza Aparente'能解决您的问题吗？您也可以将其分配给现有列。

Answer 2

您可以 pivot 数据框并使用带有 f 的格式化字符串重命名列，但请确保您使用的是最新版本的 pandas，因为 pivot 较早版本存在错误版本。

df = df.pivot(index=['group', 'subjectedit'], columns='testday')
df.columns = [f'{col[1]}_{col[0]}' for col in df.columns]
df
Out[1]: 
                     Basal_01. Tristeza Aparente  D7_01. Tristeza Aparente  \
group   subjectedit                                                          
placebo 1.0                                  4.0                       4.0   
        2.0                                  4.0                       2.0   
        3.0                                  1.0                       0.0   
        4.0                                  3.0                       NaN   

                     Basal_02. Tristeza Expressa  D7_02. Tristeza Expressa  \
group   subjectedit                                                          
placebo 1.0                                  6.0                       6.0   
        2.0                                  4.0                       0.0   
        3.0                                  4.0                       3.0   
        4.0                                  6.0                       NaN   

                     Basal_subject  D7_subject  
group   subjectedit                             
placebo 1.0                    1.0         NaN  
        2.0                    2.0         NaN  
        3.0                    3.0         NaN  
        4.0                    4.0         NaN

使用新列名将 Pandas 中的数据框从长格式重塑为宽格式

Reshape dataframe in Pandas from long to wide format with new column names

python

reshape

pandas