Group by 在第三次使用后不起作用
Group by don't work after used for third time
我不知道我做错了什么,但我的数据框没有像我期望的那样分组:
这是我的脚本的实际结果,但我想将第四个也分组 - 见下图
Actual result
Wanted result
出于某种原因,它在前两列中运行良好,但最近遇到了困难,我看不出有什么问题。谢谢大家的帮助。
这是源代码。
import pandas as pd
data = [('AAA1', 'BBB1', 'XXX1', 46.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 1.0, 'YYY3'),
('AAA1', 'BBB1', 'XXX2', 2.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 2.0, 'DDD'),
('AAA1', 'BBB1', 'XXX5', 3.0, 'YYY6'), ('AAA1', 'BBB1', 'XXX6', 3.0, 'YYY1'),
('AAA1', 'BBB1', 'XXX4', 10.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX3', 24.0, 'YYY1'),
('AAA1', 'BBB1', 'XXX5', 4.0, 'YYY89'), ('AAA1', 'BBB1', 'XXX2', 8.0, 'YYY6'),
('AAA1', 'BBB1', 'XXX5', 26.0, 'YYY1'), ('AAA1', 'BBB2', 'XXX2', 1.0, 'DDD'),
('AAA1', 'BBB2', 'XXX5', 1.5, 'YYY3'), ('AAA1', 'BBB2', 'XXX5', 12.0, 'YYY6'),
('AAA10', 'BBB42', 'XXX2', 1.0, 'YYY3'), ('AAA10', 'BBB42', 'XXX2', 2.0, 'YYY89'),
('AAA10', 'BBB42', 'XXX5', 7.0, 'YYY3'), ('AAA12', 'BBB20', 'XXX5', 3.5, 'YYY3'),
('AAA12', 'BBB52', 'XXX4', 8.0, 'YYY3'), ('AAA13', 'BBB21', 'XXX4', 3.0, 'YYY3'),
('AAA13', 'BBB23', 'XXX4', 3.0, 'YYY3'), ('AAA13', 'BBB23', 'XXX5', 1.0, 'YYY6'),
('AAA13', 'BBB23', 'XXX5', 2.0, 'YYY3'), ('AAA13', 'BBB24', 'XXX2', 6.5, 'YYY11'),
('AAA13', 'BBB24', 'XXX2', 7.0, 'YYY10'), ('AAA13', 'BBB24', 'XXX5', 2.0, 'YYY3'),
('AAA13', 'BBB24', 'XXX5', 5.0, 'YYY3'), ('AAA13', 'BBB65', 'XXX5', 8.0, 'YYY19'),
('AAA14', 'BBB26', 'XXX2', 14.0, 'YYY3'), ('AAA14', 'BBB26', 'XXX6', 4.0, 'YYY3'),
('AAA14', 'BBB77', 'XXX2', 3.0, 'YYY19'), ('AAA14', 'BBB77', 'XXX2', 19.5, 'YYY3'),
('AAA15', 'BBB30', 'XXX4', 1.0, 'YYY3'), ('AAA15', 'BBB30', 'XXX3', 8.0, 'YYY3'),
('AAA15', 'BBB30', 'XXX5', 1.0, 'YYY3'), ('AAA16', 'BBB33', 'XXX6', 0.5, 'YYY3'),
('AAA17', 'BBB57', 'XXX4', 4.0, 'YYY3'), ('AAA18', 'BBB36', 'XXX4', 1.0, 'YYY3'),
('AAA2', 'BBB61', 'XXX2', 1.0, 'YYY3'), ('AAA2', 'BBB61', 'XXX4', 4.0, 'YYY3'),
('AAA32', 'BBB76', 'XXX4', 1.0, 'YYY3'), ('AAA32', 'BBB76', 'XXX3', 16.0, 'YYY3'),
('AAA6', 'BBB15', 'XXX3', 8.0, 'YYY6'), ('AAA7', 'BBB10', 'XXX6', 51.0, 'YYY3'),
('AAA7', 'BBB12', 'XXX5', 8.0, 'YYY3'), ('AAA29', 'BBB38', 'XXX4', 12.0, 'YYY3'),
('AAA18', 'BBB40', 'XXX1', 16.0, 'YYY3')]
df = pd.DataFrame(data,
columns=["first", "second", "third", "number", "fourth"])
df["first_numbers"] = df.groupby("first")["number"].transform("sum")
df["second_numbers"] = df.groupby(["first", "second"])["number"].transform("sum")
df["third_numbers"] = df.groupby(["first", "second", "fourth"])["number"].transform("sum")
df.set_index(["first", "first_numbers", "second",
"second_numbers", "fourth", "third_numbers", "third"], inplace=True)
xlsx_writer = pd.ExcelWriter("new.xlsx", engine="xlsxwriter")
df.to_excel(excel_writer=xlsx_writer,
sheet_name="name",
index_label=["First", "First Number", "Second", "Second Number", "Fourth", "Fourth Number", "Third", "Third Number"],
engine="xlsxwriter",
startrow=0)
xlsx_writer.save()
您的代码工作正常,但您需要对索引进行排序以使显示看起来分组(只有连续的相同标签看起来“合并”):
df = (df.set_index(["first", "first_numbers", "second",
"second_numbers", "fourth", "third_numbers", "third"])
.sort_index()
)
# then export
# ...
输出:
number
first first_numbers second second_numbers fourth third_numbers third
AAA1 143.5 BBB1 129.0 DDD 2.0 XXX2 2.0
YYY1 111.0 XXX1 46.0
XXX2 2.0
XXX3 24.0
XXX4 10.0
XXX5 26.0
XXX6 3.0
YYY3 1.0 XXX2 1.0
YYY6 11.0 XXX2 8.0
XXX5 3.0
...
我不知道我做错了什么,但我的数据框没有像我期望的那样分组:
这是我的脚本的实际结果,但我想将第四个也分组 - 见下图
Actual result
Wanted result
出于某种原因,它在前两列中运行良好,但最近遇到了困难,我看不出有什么问题。谢谢大家的帮助。
这是源代码。
import pandas as pd
data = [('AAA1', 'BBB1', 'XXX1', 46.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 1.0, 'YYY3'),
('AAA1', 'BBB1', 'XXX2', 2.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 2.0, 'DDD'),
('AAA1', 'BBB1', 'XXX5', 3.0, 'YYY6'), ('AAA1', 'BBB1', 'XXX6', 3.0, 'YYY1'),
('AAA1', 'BBB1', 'XXX4', 10.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX3', 24.0, 'YYY1'),
('AAA1', 'BBB1', 'XXX5', 4.0, 'YYY89'), ('AAA1', 'BBB1', 'XXX2', 8.0, 'YYY6'),
('AAA1', 'BBB1', 'XXX5', 26.0, 'YYY1'), ('AAA1', 'BBB2', 'XXX2', 1.0, 'DDD'),
('AAA1', 'BBB2', 'XXX5', 1.5, 'YYY3'), ('AAA1', 'BBB2', 'XXX5', 12.0, 'YYY6'),
('AAA10', 'BBB42', 'XXX2', 1.0, 'YYY3'), ('AAA10', 'BBB42', 'XXX2', 2.0, 'YYY89'),
('AAA10', 'BBB42', 'XXX5', 7.0, 'YYY3'), ('AAA12', 'BBB20', 'XXX5', 3.5, 'YYY3'),
('AAA12', 'BBB52', 'XXX4', 8.0, 'YYY3'), ('AAA13', 'BBB21', 'XXX4', 3.0, 'YYY3'),
('AAA13', 'BBB23', 'XXX4', 3.0, 'YYY3'), ('AAA13', 'BBB23', 'XXX5', 1.0, 'YYY6'),
('AAA13', 'BBB23', 'XXX5', 2.0, 'YYY3'), ('AAA13', 'BBB24', 'XXX2', 6.5, 'YYY11'),
('AAA13', 'BBB24', 'XXX2', 7.0, 'YYY10'), ('AAA13', 'BBB24', 'XXX5', 2.0, 'YYY3'),
('AAA13', 'BBB24', 'XXX5', 5.0, 'YYY3'), ('AAA13', 'BBB65', 'XXX5', 8.0, 'YYY19'),
('AAA14', 'BBB26', 'XXX2', 14.0, 'YYY3'), ('AAA14', 'BBB26', 'XXX6', 4.0, 'YYY3'),
('AAA14', 'BBB77', 'XXX2', 3.0, 'YYY19'), ('AAA14', 'BBB77', 'XXX2', 19.5, 'YYY3'),
('AAA15', 'BBB30', 'XXX4', 1.0, 'YYY3'), ('AAA15', 'BBB30', 'XXX3', 8.0, 'YYY3'),
('AAA15', 'BBB30', 'XXX5', 1.0, 'YYY3'), ('AAA16', 'BBB33', 'XXX6', 0.5, 'YYY3'),
('AAA17', 'BBB57', 'XXX4', 4.0, 'YYY3'), ('AAA18', 'BBB36', 'XXX4', 1.0, 'YYY3'),
('AAA2', 'BBB61', 'XXX2', 1.0, 'YYY3'), ('AAA2', 'BBB61', 'XXX4', 4.0, 'YYY3'),
('AAA32', 'BBB76', 'XXX4', 1.0, 'YYY3'), ('AAA32', 'BBB76', 'XXX3', 16.0, 'YYY3'),
('AAA6', 'BBB15', 'XXX3', 8.0, 'YYY6'), ('AAA7', 'BBB10', 'XXX6', 51.0, 'YYY3'),
('AAA7', 'BBB12', 'XXX5', 8.0, 'YYY3'), ('AAA29', 'BBB38', 'XXX4', 12.0, 'YYY3'),
('AAA18', 'BBB40', 'XXX1', 16.0, 'YYY3')]
df = pd.DataFrame(data,
columns=["first", "second", "third", "number", "fourth"])
df["first_numbers"] = df.groupby("first")["number"].transform("sum")
df["second_numbers"] = df.groupby(["first", "second"])["number"].transform("sum")
df["third_numbers"] = df.groupby(["first", "second", "fourth"])["number"].transform("sum")
df.set_index(["first", "first_numbers", "second",
"second_numbers", "fourth", "third_numbers", "third"], inplace=True)
xlsx_writer = pd.ExcelWriter("new.xlsx", engine="xlsxwriter")
df.to_excel(excel_writer=xlsx_writer,
sheet_name="name",
index_label=["First", "First Number", "Second", "Second Number", "Fourth", "Fourth Number", "Third", "Third Number"],
engine="xlsxwriter",
startrow=0)
xlsx_writer.save()
您的代码工作正常,但您需要对索引进行排序以使显示看起来分组(只有连续的相同标签看起来“合并”):
df = (df.set_index(["first", "first_numbers", "second",
"second_numbers", "fourth", "third_numbers", "third"])
.sort_index()
)
# then export
# ...
输出:
number
first first_numbers second second_numbers fourth third_numbers third
AAA1 143.5 BBB1 129.0 DDD 2.0 XXX2 2.0
YYY1 111.0 XXX1 46.0
XXX2 2.0
XXX3 24.0
XXX4 10.0
XXX5 26.0
XXX6 3.0
YYY3 1.0 XXX2 1.0
YYY6 11.0 XXX2 8.0
XXX5 3.0
...