Pandas: 如何使用 groupby 选项获取列中每个值的计数
Pandas: How to get the count of each value in a column with groupby option
这是我得到的数据框:
data = {'Year' : [2021, 2021, 2021, 2022, 2022, 2022],
'Class':['A', 'A', 'B', 'A', 'C', 'C'],
'Animal':['dog|cat|bird', 'cat|dog', 'tiger|dog', 'cat|bird', 'dog|cat|rabbit', 'rabbit|dog|tiger',]}
df = pd.DataFrame(data)
所以 df 看起来像:
Year
Class
Animal
2021
A
dog|cat|bird
2021
A
cat|dog
2021
B
tiger|dog
2022
A
cat|bird
2022
C
dog|cat|rabbit
2022
C
rabbit|dog|tiger
我想做的是计算每年每种动物的数量class。例如,我想获取以下数据框:
Year
Class
Animal
Count
2021
A
dog
2
2021
A
cat
2
2021
A
bird
1
2021
B
tiger
1
2021
B
dog
1
2022
A
cat
1
2022
A
bird
1
2022
C
dog
2
2022
C
cat
1
2022
C
rabbit
2
2022
C
tiger
1
有人对实现这一点有什么建议吗?我将不胜感激。
您可以使用 one-liner:
(df.assign(Animal=df['Animal'].str.split('|')) # Create a list using split
.explode('Animal') # Expand that list it rows using explode
.value_counts(sort=False) # Use pd.DataFrame.value_counts
.rename('Count') # Rename series
.reset_index()) # Reset to a dataframe
输出:
Year Class Animal Count
0 2021 A bird 1
1 2021 A cat 2
2 2021 A dog 2
3 2021 B dog 1
4 2021 B tiger 1
5 2022 A bird 1
6 2022 A cat 1
7 2022 C cat 1
8 2022 C dog 2
9 2022 C rabbit 2
10 2022 C tiger 1
让我们试试 str.get_dummies
然后 groupby
out = (df.Animal.str.get_dummies('|')
.groupby([df['Year'],df['Class']]).sum()
.mask(lambda x : x==0)
.rename_axis(['animal'],axis=1).stack().reset_index(name='Count')
Out[666]:
Year Class animal Count
0 2021 A bird 1.0
1 2021 A cat 2.0
2 2021 A dog 2.0
3 2021 B dog 1.0
4 2021 B tiger 1.0
5 2022 A bird 1.0
6 2022 A cat 1.0
7 2022 C cat 1.0
8 2022 C dog 2.0
9 2022 C rabbit 2.0
10 2022 C tiger 1.0
这是我得到的数据框:
data = {'Year' : [2021, 2021, 2021, 2022, 2022, 2022],
'Class':['A', 'A', 'B', 'A', 'C', 'C'],
'Animal':['dog|cat|bird', 'cat|dog', 'tiger|dog', 'cat|bird', 'dog|cat|rabbit', 'rabbit|dog|tiger',]}
df = pd.DataFrame(data)
所以 df 看起来像:
Year | Class | Animal |
---|---|---|
2021 | A | dog|cat|bird |
2021 | A | cat|dog |
2021 | B | tiger|dog |
2022 | A | cat|bird |
2022 | C | dog|cat|rabbit |
2022 | C | rabbit|dog|tiger |
我想做的是计算每年每种动物的数量class。例如,我想获取以下数据框:
Year | Class | Animal | Count |
---|---|---|---|
2021 | A | dog | 2 |
2021 | A | cat | 2 |
2021 | A | bird | 1 |
2021 | B | tiger | 1 |
2021 | B | dog | 1 |
2022 | A | cat | 1 |
2022 | A | bird | 1 |
2022 | C | dog | 2 |
2022 | C | cat | 1 |
2022 | C | rabbit | 2 |
2022 | C | tiger | 1 |
有人对实现这一点有什么建议吗?我将不胜感激。
您可以使用 one-liner:
(df.assign(Animal=df['Animal'].str.split('|')) # Create a list using split
.explode('Animal') # Expand that list it rows using explode
.value_counts(sort=False) # Use pd.DataFrame.value_counts
.rename('Count') # Rename series
.reset_index()) # Reset to a dataframe
输出:
Year Class Animal Count
0 2021 A bird 1
1 2021 A cat 2
2 2021 A dog 2
3 2021 B dog 1
4 2021 B tiger 1
5 2022 A bird 1
6 2022 A cat 1
7 2022 C cat 1
8 2022 C dog 2
9 2022 C rabbit 2
10 2022 C tiger 1
让我们试试 str.get_dummies
然后 groupby
out = (df.Animal.str.get_dummies('|')
.groupby([df['Year'],df['Class']]).sum()
.mask(lambda x : x==0)
.rename_axis(['animal'],axis=1).stack().reset_index(name='Count')
Out[666]:
Year Class animal Count
0 2021 A bird 1.0
1 2021 A cat 2.0
2 2021 A dog 2.0
3 2021 B dog 1.0
4 2021 B tiger 1.0
5 2022 A bird 1.0
6 2022 A cat 1.0
7 2022 C cat 1.0
8 2022 C dog 2.0
9 2022 C rabbit 2.0
10 2022 C tiger 1.0