在 Python 中循环遍历 table 中的多个列

Question

我正在尝试遍历包含 covid-19 数据的 table。我的 table 有 4 列：月份、日期、位置和案例。 table 中每一列的值都存储在自己的列表中，因此每个列表的长度都相同。（即有月份列表、日期列表、位置列表和案例列表）。有 12 个月，一个月最多 31 天。世界上许多地方都有病例记录。我想弄清楚一年中哪一天的全球病例总数最多。我不确定如何适当地构建我的循环。由列表表示的 table 的过度简化示例版本如下所示。

在这个小例子中，结果将是第 1 个月第 3 天，有 709 个案例 (257 + 452)。

Month	Day	Location	Cases
1	1	CAN	124
1	1	USA	563
1	2	CAN	242
1	2	USA	156
1	3	CAN	257
1	3	USA	452
.	.	...	...
12	31	...	...

Answer 1

您可以先查看案例列表中的最大值。然后将最大案例的索引与其他三个列表进行映射并获取它们的值。例如：caseList = [1,2,3,52,1,0]

最大值是52。它的索引是3。在你的例子中你可以得到monthList[3], dayList[3], locationList[3] 分别。然后你会得到全球病例总数最多的相关日期、月份和国家。

检查这是否对您的情况有帮助。

Answer 2

我假设您已将所有数据放在同一个数据框中，df。

df = pandas.DataFrame()
df['Month'] = name_of_your_month_list
df['Day'] = name_of_your_daylist
df['Location'] = name_of_your_location_list
df['Cases'] = name_of_your_cases_list

df.Cases.max() 给你最多的案例。我假设数据集中只有一年。所以 df[df.Cases==df.Cases.max()].index 给出了你搜索的青春指数

对于当天，只需过滤：

df[df.index==df[df.Cases==df.Cases.max()].index].Day

当月：

df[df.index==df[df.Cases==df.Cases.max()].index].Month

申请案例数：

df[df.index==df[df.Cases==df.Cases.max()].index].Cases

对于国家：

df[df.index==df[df.Cases==df.Cases.max()].index].Location

看了评论，不清楚是搜索某地点最大的case还是当天最大的case。如果它是从那天开始的，您必须先使用 groupby('Day') 函数进行过滤，以将其用作 groupby('Day').max()

Answer 3

您可以使用此策略来获得所需的结果。

daylist,monthlist,location,Cases = [1, 2, 3, 4], [1,1,1,1],['CAN','USA','CAN','USA'],[124,563,242,999]    
maxCases = Cases.index(max(Cases))
print("Max Case:",Cases[maxCases])
print("Location:",location[maxCases])
print("Month:",monthlist[maxCases])
print("Day:",daylist[maxCases])

Answer 4

您按月和日对数据框进行分组。然后遍历组，找到所有位置的案例总和最大的组，如下所示：

import pandas as pd
df = pd.DataFrame({'Month':[1,1,1,1,1,1], 'Day':[1,1,2,2,3,3],
                   'Location':['CAN', 'USA', 'CAN', 'USA','CAN', 'USA'],
                   'Cases':[124,563,242,156,257,452]})

grouped = df.groupby(['Month', 'Day'])
max_sum = 0
max_day = None
for idx, group in grouped:
    if group['Cases'].sum() > max_sum:
        max_sum = group['Cases'].sum()
        max_day = group

month = max_day['Month'].iloc[1]
day = max_day['Day'].iloc[1]
print(f'Maximum cases of {max_sum} occurred on {month}/{day}.')

#prints: Maximum cases of 709 occurred on 1/3

如果您不想使用 Pandas，您可以这样做：

months = [1,1,1,1,1,1]
days = [1,1,2,2,3,3]
locations = ['CAN', 'USA', 'CAN', 'USA','CAN', 'USA']
cases = [124,563,242,156,257,452]
dic = {}
target_day = 0
count = 0

for i in range(len(days)):
    if days[i] != target_day:
        target_day = days[i]
        count = cases[i]
    else:
        count += cases[i]
        dic[f'{months[i]}/{days[i]}'] = count

max_cases = max(dic.values())
worst_day = list(dic.keys())[list(dic.values()).index(max_cases)]

print(f'Maximum cases of {max_cases} occurred on {worst_day}.')

#Prints: Maximum cases of 709 occurred on 1/3.

在 Python 中循环遍历 table 中的多个列

Looping through multiple columns in a table in Python

python

list

nested-loops

Month	Day	Location	Cases
1	1	CAN	124
1	1	USA	563
1	2	CAN	242
1	2	USA	156
1	3	CAN	257
1	3	USA	452
.	.	...	...
12	31	...	...

Month	Day	Location	Cases
1	1	CAN	124
1	1	USA	563
1	2	CAN	242
1	2	USA	156
1	3	CAN	257
1	3	USA	452
.	.	...	...
12	31	...	...

Month	Day	Location	Cases
1	1	CAN	124
1	1	USA	563
1	2	CAN	242
1	2	USA	156
1	3	CAN	257
1	3	USA	452
.	.	...	...
12	31	...	...