在 pandas 中创建直方图
creating histograms in pandas
我正在尝试根据以下分组创建直方图,
dfm.groupby(['ID', 'Readings', 'Condition']).size:
578871001 20110603 True 1
20110701 True 1
20110803 True 1
20110901 True 1
20110930 True 1
..
324461897 20130214 False 1
20130318 False 1
20130416 False 1
20130516 False 1
20130617 False 1
532674350 20110616 False 1
20110718 False 1
20110818 False 1
20110916 False 1
20111017 False 1
20111115 False 1
20111219 False 1
但是,我正在尝试通过 Condition
格式化输出并将 ID
和 Readings
的数量分组。像这样,
True
# of Readings: # of ID
1 : 5
2 : 8
3 : 15
4 : 10
5 : 4
我试过仅按 ID 和读数进行分组,并按条件进行转换,但效果不佳。
编辑:
这是 groupby 之前数据框的样子:
CustID Condtion Month Reading Consumption
0 108000601 True June 20110606 28320.0
1 108007000 True July 20110705 13760.0
2 108007000 True August 20110804 16240.0
3 108008000 True September 20110901 12560.0
4 108008000 True October 20111004 12400.0
5 108000601 False November 20111101 9440.0
6 108090000 False December 20111205 12160.0
这就是您要通过 groupby
实现的目标吗?我已经包括 Counter
来跟踪每次阅读的计数。例如,对于Condtion = False,有两个CustID,有一个读数,所以第一行的输出是:
Condtion
False 1 2 # One reading, two observations of one reading.
然后,对于 Condtion = True,有一个客户有一个读数 (108000601) 和两个客户各有两个读数。该组的输出是:
Condtion
True 1 1 # One customer with one reading.
2 2 # Two customers with two readings each.
from collections import Counter
gb = df.groupby(['Condtion', 'CustID'], as_index=False).Reading.count()
>>> gb
Condtion CustID Reading
0 False 108000601 1
1 False 108090000 1
2 True 108000601 1
3 True 108007000 2
4 True 108008000 2
>>> gb.groupby('Condtion').Reading.apply(lambda group: Counter(group))
Condtion
False 1 2
True 1 1
2 2
dtype: float64
或者,链接在一起作为一个语句:
gb = (df
.groupby(['Condtion', 'CustID'], as_index=False)['Reading']
.count()
.groupby('Condtion')['Reading']
.apply(lambda group: Counter(group))
)
我正在尝试根据以下分组创建直方图,
dfm.groupby(['ID', 'Readings', 'Condition']).size:
578871001 20110603 True 1
20110701 True 1
20110803 True 1
20110901 True 1
20110930 True 1
..
324461897 20130214 False 1
20130318 False 1
20130416 False 1
20130516 False 1
20130617 False 1
532674350 20110616 False 1
20110718 False 1
20110818 False 1
20110916 False 1
20111017 False 1
20111115 False 1
20111219 False 1
但是,我正在尝试通过 Condition
格式化输出并将 ID
和 Readings
的数量分组。像这样,
True
# of Readings: # of ID
1 : 5
2 : 8
3 : 15
4 : 10
5 : 4
我试过仅按 ID 和读数进行分组,并按条件进行转换,但效果不佳。
编辑:
这是 groupby 之前数据框的样子:
CustID Condtion Month Reading Consumption
0 108000601 True June 20110606 28320.0
1 108007000 True July 20110705 13760.0
2 108007000 True August 20110804 16240.0
3 108008000 True September 20110901 12560.0
4 108008000 True October 20111004 12400.0
5 108000601 False November 20111101 9440.0
6 108090000 False December 20111205 12160.0
这就是您要通过 groupby
实现的目标吗?我已经包括 Counter
来跟踪每次阅读的计数。例如,对于Condtion = False,有两个CustID,有一个读数,所以第一行的输出是:
Condtion
False 1 2 # One reading, two observations of one reading.
然后,对于 Condtion = True,有一个客户有一个读数 (108000601) 和两个客户各有两个读数。该组的输出是:
Condtion
True 1 1 # One customer with one reading.
2 2 # Two customers with two readings each.
from collections import Counter
gb = df.groupby(['Condtion', 'CustID'], as_index=False).Reading.count()
>>> gb
Condtion CustID Reading
0 False 108000601 1
1 False 108090000 1
2 True 108000601 1
3 True 108007000 2
4 True 108008000 2
>>> gb.groupby('Condtion').Reading.apply(lambda group: Counter(group))
Condtion
False 1 2
True 1 1
2 2
dtype: float64
或者,链接在一起作为一个语句:
gb = (df
.groupby(['Condtion', 'CustID'], as_index=False)['Reading']
.count()
.groupby('Condtion')['Reading']
.apply(lambda group: Counter(group))
)