Python Pandas,如何在数据框中找到子索引中的条目数
Python Pandas, how to find the number of entries in a sub-index in data frame
我有这个数据框:
data frame
1 级索引为 STNAME,2 级索引为 CTYNAME
查找每个 1 级索引中包含的条目数的最佳方法是什么?
我知道的唯一解决方案是在执行 groupby
之前重置索引。我在下面做了一个简单的可复制示例,它必须适应您的用例。
应该可行,但也许有更好的解决方案。我去看看
# Creating test data
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)),
columns=list('ABCD'))
df = df.set_index(['A', 'B'])
# Reset the index,
# group by the first level and count the number of second level
# nunique can also be used to get the number of unique values
df.reset_index(level=1).groupby(level=0)['B'].count()
# A
# 2 1
# 3 1
# 4 1
# 5 3
# 7 2
# 8 2
编辑
这是我认为在索引上使用很棒的 value_counts
方法的更好解决方案。
df.reset_index(level=1).index.value_counts()
# 5 3
# 8 2
# 7 2
# 4 1
# 3 1
# 2 1
census_df = census_df.set_index(['STNAME'])
#this sets all the indices according to STNAME with multiple occurences of each STNAME
census_df.index.value_counts().index[0]
# .index gives all the indices present multiple times
# .value_counts() returns a series with number of occurence of each index sorted from max -> low
#.index[0] gives the STNAME with max occurences = max no. of counties
我有这个数据框:
data frame
1 级索引为 STNAME,2 级索引为 CTYNAME
查找每个 1 级索引中包含的条目数的最佳方法是什么?
我知道的唯一解决方案是在执行 groupby
之前重置索引。我在下面做了一个简单的可复制示例,它必须适应您的用例。
应该可行,但也许有更好的解决方案。我去看看
# Creating test data
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)),
columns=list('ABCD'))
df = df.set_index(['A', 'B'])
# Reset the index,
# group by the first level and count the number of second level
# nunique can also be used to get the number of unique values
df.reset_index(level=1).groupby(level=0)['B'].count()
# A
# 2 1
# 3 1
# 4 1
# 5 3
# 7 2
# 8 2
编辑
这是我认为在索引上使用很棒的 value_counts
方法的更好解决方案。
df.reset_index(level=1).index.value_counts()
# 5 3
# 8 2
# 7 2
# 4 1
# 3 1
# 2 1
census_df = census_df.set_index(['STNAME'])
#this sets all the indices according to STNAME with multiple occurences of each STNAME
census_df.index.value_counts().index[0]
# .index gives all the indices present multiple times
# .value_counts() returns a series with number of occurence of each index sorted from max -> low
#.index[0] gives the STNAME with max occurences = max no. of counties