计算唯一值会引发维度错误
Counting unique values throws dimension error
我输入了一个 pandas 数据帧 new_res
,行数超过 600 万。我的 objective 是计算所有唯一行的数量。
start_hex_id_res8 start_hex_id_res9 end_hex_id_res9 end_hex_id_res9 date is_weekday is_holiday starthour
0 882a100d23fffff 892a100d23bffff 892a100d237ffff 892a100d237ffff 2020-07-01 True False 0
1 882a100d23fffff 892a100d23bffff 892a100d237ffff 892a100d237ffff 2020-07-01 True False 0
2 882a1072c7fffff 892a1072c6bffff 892a1072187ffff 892a1072187ffff 2020-07-01 True False 0
3 882a1072c7fffff 892a1072c6bffff 892a1072187ffff 892a1072187ffff 2020-07-01 True False 0
4 882a100d09fffff 892a100d097ffff 892a100d09bffff 892a100d09bffff 2020-07-01 True False 0
start_hex_id_res8 object
start_hex_id_res9 object
end_hex_id_res9 object
end_hex_id_res9 object
date object
is_weekday bool
is_holiday bool
starthour int64
我试过了
agg = new_res.groupby(['start_hex_id_res8', 'start_hex_id_res9', 'end_hex_id_res9', 'end_hex_id_res9', 'date','is_weekday', 'is_holiday', 'starthour']).size().groupby(level=0).size()
但这会引发错误:
ValueError: Grouper for 'end_hex_id_res9' not 1-dimensional
我应该如何解释这个以及 pandas 中正确的方法是什么来创建一个新的数据框,它是 new_res
的压缩版本?输出将只是一个具有相同列名的数据框,但包含所有唯一行的计数(在末尾添加一个 count
列)。
让我们试试;
g=df.apply(lambda x:x.astype(str))#Make entire dataframe a str
g.groupby(list(g.columns)).ngroup().nunique()#Groupbycolumns, find special groups and see how many are unique
我输入了一个 pandas 数据帧 new_res
,行数超过 600 万。我的 objective 是计算所有唯一行的数量。
start_hex_id_res8 start_hex_id_res9 end_hex_id_res9 end_hex_id_res9 date is_weekday is_holiday starthour
0 882a100d23fffff 892a100d23bffff 892a100d237ffff 892a100d237ffff 2020-07-01 True False 0
1 882a100d23fffff 892a100d23bffff 892a100d237ffff 892a100d237ffff 2020-07-01 True False 0
2 882a1072c7fffff 892a1072c6bffff 892a1072187ffff 892a1072187ffff 2020-07-01 True False 0
3 882a1072c7fffff 892a1072c6bffff 892a1072187ffff 892a1072187ffff 2020-07-01 True False 0
4 882a100d09fffff 892a100d097ffff 892a100d09bffff 892a100d09bffff 2020-07-01 True False 0
start_hex_id_res8 object
start_hex_id_res9 object
end_hex_id_res9 object
end_hex_id_res9 object
date object
is_weekday bool
is_holiday bool
starthour int64
我试过了
agg = new_res.groupby(['start_hex_id_res8', 'start_hex_id_res9', 'end_hex_id_res9', 'end_hex_id_res9', 'date','is_weekday', 'is_holiday', 'starthour']).size().groupby(level=0).size()
但这会引发错误:
ValueError: Grouper for 'end_hex_id_res9' not 1-dimensional
我应该如何解释这个以及 pandas 中正确的方法是什么来创建一个新的数据框,它是 new_res
的压缩版本?输出将只是一个具有相同列名的数据框,但包含所有唯一行的计数(在末尾添加一个 count
列)。
让我们试试;
g=df.apply(lambda x:x.astype(str))#Make entire dataframe a str
g.groupby(list(g.columns)).ngroup().nunique()#Groupbycolumns, find special groups and see how many are unique