在同一个图上绘制多个变量,并按站点 ID 绘制面板图
Plot multiple variables on same plot, and panel plot by station ID
我有 2 个时间序列数据帧,它们来自两个二维数组。这些数据帧的结构是:
生成示例数据帧
import pandas as pd
import numpy as np
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.arange(8).reshape((4,2))
y = np.arange(8).reshape((4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range, columns = ['station1','station2'])
print(x)
station1 station2
1981-01-01 0 1
1981-01-02 2 3
1981-01-03 4 5
1981-01-04 6 7
目标
我想生成一个多图,其中 'x' 和 'y' 的值被绘制为同一图上的线,x 和 y 按颜色分开,但有多个 'rows' 每个站的图表。使用上面的示例代码,每个单独的图表将绘制不同的站点列。
我试过的
我尝试了 seaborn 路线:首先将两个数据帧连接在一起 - 每个 df 代表一个变量,因此我将它们添加为键以在连接后命名这些变量。然后我使用 melt 来对它们进行多重绘制:
df = pd.concat([x , y], keys = ['Var1', 'Var2'])
meltdf = df.melt(var_name = 'Station', value_name = 'Value', ignore_index = False)
print(meltdf)
Station Value
Var1 1981-01-01 station1 0
1981-01-02 station1 2
1981-01-03 station1 4
1981-01-04 station1 6
Var2 1981-01-01 station1 0
1981-01-02 station1 2
1981-01-03 station1 4
1981-01-04 station1 6
Var1 1981-01-01 station2 1
1981-01-02 station2 3
1981-01-03 station2 5
1981-01-04 station2 7
Var2 1981-01-01 station2 1
1981-01-02 station2 3
1981-01-03 station2 5
1981-01-04 station2 7
我想将 Var1 和 Var2 的值绘制为同一图表上的线,对于 station1,对于 station2 也是如此,依此类推。我想 将日期 保留为索引,因为这些应该是时间序列图,'date' 沿 x 轴。我试过这个非工作代码(例如):
import seaborn as sns
sns.relplot(data=df, x = 'Var1', y = 'Var2', kind = 'line', hue = 'keys', row = 'Station')
我是否应该 'double melt' dfs 将变量类型作为它自己的 col? concat + keys 步骤似乎不正确。
你在 pd.concat
和 pd.melt
的正确轨道上,其次是 seaborn relplot
。我会这样处理:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
#data generation
import numpy as np
np.random.seed(123)
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.random.randint(1, 10, (4,2))
y = np.random.randint(1, 10, (4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range + pd.to_timedelta(1, unit="D"), columns = ['station1','station2'])
#keep information where each data point comes from
x["key"], y["key"] = "x", "y"
#combining dataframes and reshaping
df = pd.concat([x, y]).melt(["key"], var_name="station", value_name="station_value", ignore_index = False)
#plotting - the datetime conversion might not be necessary
#depending on the datetime format of your original dataframes
#best approach is conversion to datetime index when creating the dataframes
fg = sns.relplot(data=df, x = pd.to_datetime(df.index.to_timestamp()), y = "station_value", kind = "line", hue = "key", row = "station")
#shouldn't be necessary but this example had too many ticks for the interval
from matplotlib.dates import DateFormatter, DayLocator
fg.axes[0,0].xaxis.set_major_locator(DayLocator(interval=1))
fg.axes[0,0].xaxis.set_major_formatter(DateFormatter("%y-%m-%d"))
plt.show()
示例输出:
如果pandas版本不能处理重复的索引条目,我们可以改写为:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
#data generation
import numpy as np
np.random.seed(123)
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.random.randint(1, 10, (4,2))
y = np.random.randint(1, 10, (4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range + pd.to_timedelta(1, unit="D"), columns = ['station1','station2'])
#keep information where each data point comes from
x["key"], y["key"] = "x", "y"
#moving index into a column
x = x.reset_index()
y = y.reset_index()
#and changing it to datetime values that seaborn can understand
#only necessary because your example contains pd.Period data
x["index"] = pd.to_datetime(x["index"].astype(str))
y["index"] = pd.to_datetime(y["index"].astype(str))
#combining dataframes and reshaping
df = pd.concat([x, y]).melt(["index", "key"], var_name="station", value_name="station_value")
#plotting
fg = sns.relplot(data=df, x = "index", y = "station_value", kind = "line", hue = "key", row = "station")
#shouldn't be necessary but this example had too many ticks for the interval
from matplotlib.dates import DateFormatter, DayLocator
fg.axes[0,0].xaxis.set_major_locator(DayLocator(interval=1))
fg.axes[0,0].xaxis.set_major_formatter(DateFormatter("%y-%m-%d"))
plt.show()
我有 2 个时间序列数据帧,它们来自两个二维数组。这些数据帧的结构是:
生成示例数据帧
import pandas as pd
import numpy as np
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.arange(8).reshape((4,2))
y = np.arange(8).reshape((4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range, columns = ['station1','station2'])
print(x)
station1 station2
1981-01-01 0 1
1981-01-02 2 3
1981-01-03 4 5
1981-01-04 6 7
目标
我想生成一个多图,其中 'x' 和 'y' 的值被绘制为同一图上的线,x 和 y 按颜色分开,但有多个 'rows' 每个站的图表。使用上面的示例代码,每个单独的图表将绘制不同的站点列。
我试过的
我尝试了 seaborn 路线:首先将两个数据帧连接在一起 - 每个 df 代表一个变量,因此我将它们添加为键以在连接后命名这些变量。然后我使用 melt 来对它们进行多重绘制:
df = pd.concat([x , y], keys = ['Var1', 'Var2'])
meltdf = df.melt(var_name = 'Station', value_name = 'Value', ignore_index = False)
print(meltdf)
Station Value
Var1 1981-01-01 station1 0
1981-01-02 station1 2
1981-01-03 station1 4
1981-01-04 station1 6
Var2 1981-01-01 station1 0
1981-01-02 station1 2
1981-01-03 station1 4
1981-01-04 station1 6
Var1 1981-01-01 station2 1
1981-01-02 station2 3
1981-01-03 station2 5
1981-01-04 station2 7
Var2 1981-01-01 station2 1
1981-01-02 station2 3
1981-01-03 station2 5
1981-01-04 station2 7
我想将 Var1 和 Var2 的值绘制为同一图表上的线,对于 station1,对于 station2 也是如此,依此类推。我想 将日期 保留为索引,因为这些应该是时间序列图,'date' 沿 x 轴。我试过这个非工作代码(例如):
import seaborn as sns
sns.relplot(data=df, x = 'Var1', y = 'Var2', kind = 'line', hue = 'keys', row = 'Station')
我是否应该 'double melt' dfs 将变量类型作为它自己的 col? concat + keys 步骤似乎不正确。
你在 pd.concat
和 pd.melt
的正确轨道上,其次是 seaborn relplot
。我会这样处理:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
#data generation
import numpy as np
np.random.seed(123)
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.random.randint(1, 10, (4,2))
y = np.random.randint(1, 10, (4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range + pd.to_timedelta(1, unit="D"), columns = ['station1','station2'])
#keep information where each data point comes from
x["key"], y["key"] = "x", "y"
#combining dataframes and reshaping
df = pd.concat([x, y]).melt(["key"], var_name="station", value_name="station_value", ignore_index = False)
#plotting - the datetime conversion might not be necessary
#depending on the datetime format of your original dataframes
#best approach is conversion to datetime index when creating the dataframes
fg = sns.relplot(data=df, x = pd.to_datetime(df.index.to_timestamp()), y = "station_value", kind = "line", hue = "key", row = "station")
#shouldn't be necessary but this example had too many ticks for the interval
from matplotlib.dates import DateFormatter, DayLocator
fg.axes[0,0].xaxis.set_major_locator(DayLocator(interval=1))
fg.axes[0,0].xaxis.set_major_formatter(DateFormatter("%y-%m-%d"))
plt.show()
示例输出:
如果pandas版本不能处理重复的索引条目,我们可以改写为:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
#data generation
import numpy as np
np.random.seed(123)
date_range = pd.period_range('1981-01-01','1981-01-04',freq='D')
x = np.random.randint(1, 10, (4,2))
y = np.random.randint(1, 10, (4,2))
x = pd.DataFrame(x, index = date_range, columns = ['station1','station2'])
y = pd.DataFrame(y, index = date_range + pd.to_timedelta(1, unit="D"), columns = ['station1','station2'])
#keep information where each data point comes from
x["key"], y["key"] = "x", "y"
#moving index into a column
x = x.reset_index()
y = y.reset_index()
#and changing it to datetime values that seaborn can understand
#only necessary because your example contains pd.Period data
x["index"] = pd.to_datetime(x["index"].astype(str))
y["index"] = pd.to_datetime(y["index"].astype(str))
#combining dataframes and reshaping
df = pd.concat([x, y]).melt(["index", "key"], var_name="station", value_name="station_value")
#plotting
fg = sns.relplot(data=df, x = "index", y = "station_value", kind = "line", hue = "key", row = "station")
#shouldn't be necessary but this example had too many ticks for the interval
from matplotlib.dates import DateFormatter, DayLocator
fg.axes[0,0].xaxis.set_major_locator(DayLocator(interval=1))
fg.axes[0,0].xaxis.set_major_formatter(DateFormatter("%y-%m-%d"))
plt.show()