Pandas:按小时和月份查找数据框的平均值
Pandas: Finding average values of dataframe by hour & month
假设我有一个 df:
timestamp value1 value2
01-01-2010 00:00:00 10 5
30-01-2019 00:00:00 5 1
01-02-2015 12:00:00 1 0
25-02-2007 05:00:00 10 10
01-02-2015 05:00:00 10 1
我想根据数据集的小时和月份,根据列 'value1' 和 'value2' 的平均值绘制时间序列图。所需的 df 和图表可能如下所示:
hour-month value1 value2
00-01 7.5 3
05-02 10 5.5
12-02 1 0
我是 Python 的新手;请指教
首先按 to_datetime
, then aggregate mean
with Series.dt.strftime
for convert datetimes to HH-mm
strings and last plot by DataFrame.plot
:
将列转换为日期时间
df['timestamp'] = pd.to_datetime(df['timestamp'], dayfirst=True)
df1 = df.groupby(df['timestamp'].dt.strftime('%H-%m')).mean()
print (df1)
value1 value2
timestamp
00-01 7.5 3.0
05-02 10.0 5.5
12-02 1.0 0.0
df1.plot()
编辑:
df['timestamp'] = pd.to_datetime(df['timestamp'], dayfirst=True)
df1 = df.groupby(df['timestamp'].map(lambda x: x.replace(year=2020, day=1))).mean()
print (df1)
value1 value2
timestamp
2020-01-01 00:00:00 7.5 3.0
2020-02-01 05:00:00 10.0 5.5
2020-02-01 12:00:00 1.0 0.0
df2 = df1.rename_axis('col', axis=1).stack().reset_index(name='vals')
print (df2)
timestamp col vals
0 2020-01-01 00:00:00 value1 7.5
1 2020-01-01 00:00:00 value2 3.0
2 2020-02-01 05:00:00 value1 10.0
3 2020-02-01 05:00:00 value2 5.5
4 2020-02-01 12:00:00 value1 1.0
5 2020-02-01 12:00:00 value2 0.0
import plotly.express as px
#https://plotly.com/python/line-charts/
fig = px.line(df2, x="timestamp", y="vals", color='col')
#https://plotly.com/python/time-series/
fig.update_xaxes(
dtick="timestamp",
tickformat="%H\n%m")
fig.show()
假设我有一个 df:
timestamp value1 value2
01-01-2010 00:00:00 10 5
30-01-2019 00:00:00 5 1
01-02-2015 12:00:00 1 0
25-02-2007 05:00:00 10 10
01-02-2015 05:00:00 10 1
我想根据数据集的小时和月份,根据列 'value1' 和 'value2' 的平均值绘制时间序列图。所需的 df 和图表可能如下所示:
hour-month value1 value2
00-01 7.5 3
05-02 10 5.5
12-02 1 0
我是 Python 的新手;请指教
首先按 to_datetime
, then aggregate mean
with Series.dt.strftime
for convert datetimes to HH-mm
strings and last plot by DataFrame.plot
:
df['timestamp'] = pd.to_datetime(df['timestamp'], dayfirst=True)
df1 = df.groupby(df['timestamp'].dt.strftime('%H-%m')).mean()
print (df1)
value1 value2
timestamp
00-01 7.5 3.0
05-02 10.0 5.5
12-02 1.0 0.0
df1.plot()
编辑:
df['timestamp'] = pd.to_datetime(df['timestamp'], dayfirst=True)
df1 = df.groupby(df['timestamp'].map(lambda x: x.replace(year=2020, day=1))).mean()
print (df1)
value1 value2
timestamp
2020-01-01 00:00:00 7.5 3.0
2020-02-01 05:00:00 10.0 5.5
2020-02-01 12:00:00 1.0 0.0
df2 = df1.rename_axis('col', axis=1).stack().reset_index(name='vals')
print (df2)
timestamp col vals
0 2020-01-01 00:00:00 value1 7.5
1 2020-01-01 00:00:00 value2 3.0
2 2020-02-01 05:00:00 value1 10.0
3 2020-02-01 05:00:00 value2 5.5
4 2020-02-01 12:00:00 value1 1.0
5 2020-02-01 12:00:00 value2 0.0
import plotly.express as px
#https://plotly.com/python/line-charts/
fig = px.line(df2, x="timestamp", y="vals", color='col')
#https://plotly.com/python/time-series/
fig.update_xaxes(
dtick="timestamp",
tickformat="%H\n%m")
fig.show()