使用 Python 中的条件对多个数据框列进行分组和比较

Question

我正在尝试打印出每个地区人口最多的州。

代码示例:

# all unique regions
region_unique = data['Region'].unique()

# highest population
max_pop = data['population'].max()

如何链接以上代码行并引入 'States' 列来实现我的结果？

数据集:

Answer 1

考虑到您没有提到任何图书馆...

您可以先创建一个助手 dict，将每个区域映射到一个状态数组。每个州都是一个元组：(state, pop)（名称和人口计数）：

regions = {}
for state, pop, region in zip(data['States'], data['population'], data['Region']):
    res.setdefault(region, []).append((state, pop))

然后对于每个地区你可以拉出最有人居住的州：

for region, states in regions.items():
    print(region, max(states, key=lambda _, pop: pop))

每个地区下人口少于100的州，您可以：

for region, states in regions.items():
    print(region, list(filter(lambda state: state[1] > 100, states)))

Group and Compare multiple dataframe columns with conditions in Python