使用 pandas 和 Matplotlib 计算每小时的投票率
Calculate hourly turnout with pandas and Matplotlib
我正在尝试查找特定值“PAY”在每小时范围列中出现的次数。
我用pandas制作了数据框:
df = pd.read_json('test.json')
print(df.head(3))
print(df.dtypes)
TransactionCode Date
1 PAY 2021-12-09T10:23:29.242+01:00
2 PAY 2021-12-09T10:23:02.978+01:00
3 PAY 2021-12-09T10:22:48.659+01:00
TransactionCode object
Date object
将日期列拆分为“日期”和“时间”两列后:
df['Time'] = pd.to_datetime(df['Date']).dt.time
df['Date'] = pd.to_datetime(df['Date']).dt.date
print(df.head())
print(df.dtypes)
TransactionCode Date Time
1 PAY 2021-12-09 10:23:29.242000
2 PAY 2021-12-09 10:23:02.978000
3 PAY 2021-12-09 10:22:48.659000
4 PAY 2021-12-09 11:32:48.659000
5 PAY 2021-12-09 11:45:12.659000
TransactionCode object
Date object
Time object
我希望遍历“Time”列以了解每天“PAY”值出现了多少次。
我需要这个来使用 Matplotlib 构建每小时投票率图。
您可以转换为日期时间,set_index
日期和 resample
转换为小时,最后绘制:
df2 = (
df.assign(Date=pd.to_datetime(df['Date']))
.set_index('Date')
.loc[lambda d: d['TransactionCode'].eq('PAY')]
.resample('1H').count()
)
输出:
TransactionCode
Date
2021-12-09 10:00:00+01:00 3
2021-12-09 11:00:00+01:00 2
剧情:
df2.plot.bar()
考虑以下数据框:
TransactionCode Date Time
1 PAY 2021-12-09 10:23:29.242000
2 PAY 2021-12-09 10:23:02.978000
3 PAY 2021-12-09 10:22:48.659000
4 ERR 2021-12-09 11:32:48.659000
5 PAY 2021-12-09 11:45:12.659000
你可以这样做:
df['Hour'] = [time.hour for time in df['Time']]
df[df['TransactionCode'] == 'PAY'][['TransactionCode','Hour']].groupby('Hour').count().plot(kind='bar');
我正在尝试查找特定值“PAY”在每小时范围列中出现的次数。
我用pandas制作了数据框:
df = pd.read_json('test.json')
print(df.head(3))
print(df.dtypes)
TransactionCode Date
1 PAY 2021-12-09T10:23:29.242+01:00
2 PAY 2021-12-09T10:23:02.978+01:00
3 PAY 2021-12-09T10:22:48.659+01:00
TransactionCode object
Date object
将日期列拆分为“日期”和“时间”两列后:
df['Time'] = pd.to_datetime(df['Date']).dt.time
df['Date'] = pd.to_datetime(df['Date']).dt.date
print(df.head())
print(df.dtypes)
TransactionCode Date Time
1 PAY 2021-12-09 10:23:29.242000
2 PAY 2021-12-09 10:23:02.978000
3 PAY 2021-12-09 10:22:48.659000
4 PAY 2021-12-09 11:32:48.659000
5 PAY 2021-12-09 11:45:12.659000
TransactionCode object
Date object
Time object
我希望遍历“Time”列以了解每天“PAY”值出现了多少次。 我需要这个来使用 Matplotlib 构建每小时投票率图。
您可以转换为日期时间,set_index
日期和 resample
转换为小时,最后绘制:
df2 = (
df.assign(Date=pd.to_datetime(df['Date']))
.set_index('Date')
.loc[lambda d: d['TransactionCode'].eq('PAY')]
.resample('1H').count()
)
输出:
TransactionCode
Date
2021-12-09 10:00:00+01:00 3
2021-12-09 11:00:00+01:00 2
剧情:
df2.plot.bar()
考虑以下数据框:
TransactionCode Date Time
1 PAY 2021-12-09 10:23:29.242000
2 PAY 2021-12-09 10:23:02.978000
3 PAY 2021-12-09 10:22:48.659000
4 ERR 2021-12-09 11:32:48.659000
5 PAY 2021-12-09 11:45:12.659000
你可以这样做:
df['Hour'] = [time.hour for time in df['Time']]
df[df['TransactionCode'] == 'PAY'][['TransactionCode','Hour']].groupby('Hour').count().plot(kind='bar');