如何使用多列将宽数据集重塑为长数据集?
How can I reshape a wide dataset into a long dataset using multiple columns?
我有一个需要重塑的数据框
ID Treatment Dog_Weight Cat_Weight Horse_Weight Pig_Weight
1 A 20 10 100 1000
2 A 30 20 200 550
3 A 40 30 300 750
4 A 50 40 400 800
5 B 60 50 500 650
6 B 70 60 600 450
7 B 80 70 700 500
8 B 90 80 800 600
我正在尝试获取它,所以它看起来像这样:
ID Animal Animal_Weight_A Animal_Weight_B
0 1 Dog 20 60
1 2 Dog 30 70
2 3 Dog 40 80
3 4 Dog 50 90
4 1 Cat 10 50
5 2 Cat 20 60
6 3 Cat 30 70
7 4 Cat 40 80
8 1 Horse 100 500
9 2 Horse 200 600
10 3 Horse 300 700
11 4 Horse 400 800
12 1 Pig 1000 650
13 2 Pig 550 450
14 3 Pig 750 500
15 4 Pig 800 600
我已经能够通过以下步骤做到这一点:
- Groupby 以获取每只动物的汇总信息:
df_test1 = (
df.groupby(["ID", "Treatment"])[
"Dog_Weight", "Cat_Weight", "Horse_Weight", "Pig_Weight"
]
.mean()
.reset_index()
)
- 融化数据以将动物放入列中:
df_test2 = pd.melt(
df_test1,
id_vars=["ID", "Treatment"],
value_vars=["Dog_Weight", "Cat_Weight", "Horse_Weight", "Pig_Weight"],
).rename(columns={"variable": "Animal", "value": "Animal_Weight"})
- 提取动物名称
df_test2["Animal"] = df_test2["Animal"].str.split("_").str[0]
- 按处理方式分开
test_A = df_test2.query("Treatment == 'A'")
test_B = df_test2.query("Treatment == 'B'")
- 合并 ID 和 Animal 以将数据集重新组合在一起,删除不必要的列
df_testfinal = pd.merge(
test_A,
test_B,
on=["ID", "Animal"],
suffixes=("_A", "_B"),
).drop(["Treatment_A", "Treatment_B"], axis=1)
虽然此方法有效,但似乎有一种方法可以使用 reshape/pivot/melt 来完成。我希望有人可以帮助我找到一种使用其中一种方法或减少步骤数的方法?
谢谢!
让我们先尝试熔化,然后再旋转:
tmp = df.melt(['ID','Treatment'], var_name='Animal')
tmp['Animal'] = tmp['Animal'].str.extract('^([^_]+)')
tmp['ID'] = tmp.groupby(['Animal','Treatment']).cumcount()
out = (tmp.pivot_table(index=['Animal','ID'], columns=['Treatment'],
values='value')
.add_prefix('Animal_Weight_').reset_index()
)
输出:
Treatment Animal ID Animal_Weight_A Animal_Weight_B
0 Cat 0 10 50
1 Cat 1 20 60
2 Cat 2 30 70
3 Cat 3 40 80
4 Dog 0 20 60
5 Dog 1 30 70
6 Dog 2 40 80
7 Dog 3 50 90
8 Horse 0 100 500
9 Horse 1 200 600
10 Horse 2 300 700
11 Horse 3 400 800
12 Pig 0 1000 650
13 Pig 1 550 450
14 Pig 2 750 500
15 Pig 3 800 600
我有一个需要重塑的数据框
ID Treatment Dog_Weight Cat_Weight Horse_Weight Pig_Weight
1 A 20 10 100 1000
2 A 30 20 200 550
3 A 40 30 300 750
4 A 50 40 400 800
5 B 60 50 500 650
6 B 70 60 600 450
7 B 80 70 700 500
8 B 90 80 800 600
我正在尝试获取它,所以它看起来像这样:
ID Animal Animal_Weight_A Animal_Weight_B
0 1 Dog 20 60
1 2 Dog 30 70
2 3 Dog 40 80
3 4 Dog 50 90
4 1 Cat 10 50
5 2 Cat 20 60
6 3 Cat 30 70
7 4 Cat 40 80
8 1 Horse 100 500
9 2 Horse 200 600
10 3 Horse 300 700
11 4 Horse 400 800
12 1 Pig 1000 650
13 2 Pig 550 450
14 3 Pig 750 500
15 4 Pig 800 600
我已经能够通过以下步骤做到这一点:
- Groupby 以获取每只动物的汇总信息:
df_test1 = (
df.groupby(["ID", "Treatment"])[
"Dog_Weight", "Cat_Weight", "Horse_Weight", "Pig_Weight"
]
.mean()
.reset_index()
)
- 融化数据以将动物放入列中:
df_test2 = pd.melt(
df_test1,
id_vars=["ID", "Treatment"],
value_vars=["Dog_Weight", "Cat_Weight", "Horse_Weight", "Pig_Weight"],
).rename(columns={"variable": "Animal", "value": "Animal_Weight"})
- 提取动物名称
df_test2["Animal"] = df_test2["Animal"].str.split("_").str[0]
- 按处理方式分开
test_A = df_test2.query("Treatment == 'A'")
test_B = df_test2.query("Treatment == 'B'")
- 合并 ID 和 Animal 以将数据集重新组合在一起,删除不必要的列
df_testfinal = pd.merge(
test_A,
test_B,
on=["ID", "Animal"],
suffixes=("_A", "_B"),
).drop(["Treatment_A", "Treatment_B"], axis=1)
虽然此方法有效,但似乎有一种方法可以使用 reshape/pivot/melt 来完成。我希望有人可以帮助我找到一种使用其中一种方法或减少步骤数的方法?
谢谢!
让我们先尝试熔化,然后再旋转:
tmp = df.melt(['ID','Treatment'], var_name='Animal')
tmp['Animal'] = tmp['Animal'].str.extract('^([^_]+)')
tmp['ID'] = tmp.groupby(['Animal','Treatment']).cumcount()
out = (tmp.pivot_table(index=['Animal','ID'], columns=['Treatment'],
values='value')
.add_prefix('Animal_Weight_').reset_index()
)
输出:
Treatment Animal ID Animal_Weight_A Animal_Weight_B
0 Cat 0 10 50
1 Cat 1 20 60
2 Cat 2 30 70
3 Cat 3 40 80
4 Dog 0 20 60
5 Dog 1 30 70
6 Dog 2 40 80
7 Dog 3 50 90
8 Horse 0 100 500
9 Horse 1 200 600
10 Horse 2 300 700
11 Horse 3 400 800
12 Pig 0 1000 650
13 Pig 1 550 450
14 Pig 2 750 500
15 Pig 3 800 600