pandas 中的排名和子排名行
Ranking and subranking rows in pandas
我有以下数据集,我想按地区以及商店类型(在每个地区内)进行排名。
python 中的这两列是否有巧妙的编码方式?
数据:
print (df)
Region ID Location store Type ID Brand share
0 1 Warehouse 1.97
1 1 Warehouse 0.24
2 1 Super Centre 0.21
3 1 Warehouse 0.13
4 1 Mini Warehouse 0.10
5 1 Super Centre 0.07
6 1 Mini Warehouse 0.04
7 1 Super Centre 0.02
8 1 Mini Warehouse 0.02
9 10 Warehouse 0.64
10 10 Mini Warehouse 0.18
11 10 Warehouse 0.13
12 10 Warehouse 0.09
13 10 Super Centre 0.07
14 10 Mini Warehouse 0.03
15 10 Mini Warehouse 0.02
16 10 Super Centre 0.02
df['RegionRank'] = df.groupby('Region ID')['Brand share'].cumcount() + 1
cols = ['Location store Type ID', 'Region ID']
df['StoreTypeRank'] = df.groupby(cols)['Brand share'].cumcount() + 1
print (df)
Region ID Location store Type ID Brand share RegionRank StoreTypeRank
0 1 Warehouse 1.97 1 1
1 1 Warehouse 0.24 2 2
2 1 Super Centre 0.21 3 1
3 1 Warehouse 0.13 4 3
4 1 Mini Warehouse 0.10 5 1
5 1 Super Centre 0.07 6 2
6 1 Mini Warehouse 0.04 7 2
7 1 Super Centre 0.02 8 3
8 1 Mini Warehouse 0.02 9 3
9 10 Warehouse 0.64 1 1
10 10 Mini Warehouse 0.18 2 1
11 10 Warehouse 0.13 3 2
12 10 Warehouse 0.09 4 3
13 10 Super Centre 0.07 5 1
14 10 Mini Warehouse 0.03 6 2
15 10 Mini Warehouse 0.02 7 3
16 10 Super Centre 0.02 8 2
或 GroupBy.rank
,但它 return 相同类别的相同值:
df['RegionRank'] = (df.groupby('Region ID')['Brand share']
.rank(method='dense', ascending=False)
.astype(int))
cols = ['Location store Type ID', 'Region ID']
df['StoreTypeRank'] = (df.groupby(cols)['Brand share']
.rank(method='dense', ascending=False)
.astype(int))
print (df)
Region ID Location store Type ID Brand share RegionRank StoreTypeRank
0 1 Warehouse 1.97 1 1
1 1 Warehouse 0.24 2 2
2 1 Super Centre 0.21 3 1
3 1 Warehouse 0.13 4 3
4 1 Mini Warehouse 0.10 5 1
5 1 Super Centre 0.07 6 2
6 1 Mini Warehouse 0.04 7 2
7 1 Super Centre 0.02 8 3
8 1 Mini Warehouse 0.02 8 3
9 10 Warehouse 0.64 1 1
10 10 Mini Warehouse 0.18 2 1
11 10 Warehouse 0.13 3 2
12 10 Warehouse 0.09 4 3
13 10 Super Centre 0.07 5 1
14 10 Mini Warehouse 0.03 6 2
15 10 Mini Warehouse 0.02 7 3 <-same value .02
16 10 Super Centre 0.02 7 2 <-same value .02
我有以下数据集,我想按地区以及商店类型(在每个地区内)进行排名。
python 中的这两列是否有巧妙的编码方式?
数据:
print (df)
Region ID Location store Type ID Brand share
0 1 Warehouse 1.97
1 1 Warehouse 0.24
2 1 Super Centre 0.21
3 1 Warehouse 0.13
4 1 Mini Warehouse 0.10
5 1 Super Centre 0.07
6 1 Mini Warehouse 0.04
7 1 Super Centre 0.02
8 1 Mini Warehouse 0.02
9 10 Warehouse 0.64
10 10 Mini Warehouse 0.18
11 10 Warehouse 0.13
12 10 Warehouse 0.09
13 10 Super Centre 0.07
14 10 Mini Warehouse 0.03
15 10 Mini Warehouse 0.02
16 10 Super Centre 0.02
df['RegionRank'] = df.groupby('Region ID')['Brand share'].cumcount() + 1
cols = ['Location store Type ID', 'Region ID']
df['StoreTypeRank'] = df.groupby(cols)['Brand share'].cumcount() + 1
print (df)
Region ID Location store Type ID Brand share RegionRank StoreTypeRank
0 1 Warehouse 1.97 1 1
1 1 Warehouse 0.24 2 2
2 1 Super Centre 0.21 3 1
3 1 Warehouse 0.13 4 3
4 1 Mini Warehouse 0.10 5 1
5 1 Super Centre 0.07 6 2
6 1 Mini Warehouse 0.04 7 2
7 1 Super Centre 0.02 8 3
8 1 Mini Warehouse 0.02 9 3
9 10 Warehouse 0.64 1 1
10 10 Mini Warehouse 0.18 2 1
11 10 Warehouse 0.13 3 2
12 10 Warehouse 0.09 4 3
13 10 Super Centre 0.07 5 1
14 10 Mini Warehouse 0.03 6 2
15 10 Mini Warehouse 0.02 7 3
16 10 Super Centre 0.02 8 2
或 GroupBy.rank
,但它 return 相同类别的相同值:
df['RegionRank'] = (df.groupby('Region ID')['Brand share']
.rank(method='dense', ascending=False)
.astype(int))
cols = ['Location store Type ID', 'Region ID']
df['StoreTypeRank'] = (df.groupby(cols)['Brand share']
.rank(method='dense', ascending=False)
.astype(int))
print (df)
Region ID Location store Type ID Brand share RegionRank StoreTypeRank
0 1 Warehouse 1.97 1 1
1 1 Warehouse 0.24 2 2
2 1 Super Centre 0.21 3 1
3 1 Warehouse 0.13 4 3
4 1 Mini Warehouse 0.10 5 1
5 1 Super Centre 0.07 6 2
6 1 Mini Warehouse 0.04 7 2
7 1 Super Centre 0.02 8 3
8 1 Mini Warehouse 0.02 8 3
9 10 Warehouse 0.64 1 1
10 10 Mini Warehouse 0.18 2 1
11 10 Warehouse 0.13 3 2
12 10 Warehouse 0.09 4 3
13 10 Super Centre 0.07 5 1
14 10 Mini Warehouse 0.03 6 2
15 10 Mini Warehouse 0.02 7 3 <-same value .02
16 10 Super Centre 0.02 7 2 <-same value .02