使用 Seaborn 绘制带有 min/max 阴影的时间序列图

Question

我正在尝试根据以下数据在 Week x Overload 图中创建一个 3 线时间序列图，其中每个集群都是不同的线。

我对每个（集群、周）对有多个观察（每个 atm 5 个，将有 1000 个）。我希望线上的点是该特定 (Cluster, Week) 对的平均过载值，band 是它的 min/max 值。

目前正在使用以下代码来绘制它，但我没有得到任何线条，因为我不知道使用当前数据帧指定什么单位：

    ax14 = sns.tsplot(data = long_total_cluster_capacity_overload_df, value = "Overload", time = "Week", condition = "Cluster")

GIST Data

我觉得我仍然需要重新塑造我的数据框，但我不知道如何做。寻找看起来像这样的最终结果

Answer 1

我真的以为我可以用seaborn.tsplot做到。但它看起来不太正确。这是我使用 seaborn 得到的结果：

cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount()
ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", ci=100, unit="Unit", data=cluster_overload)

输出：

我真的很困惑为什么 unit 参数是必要的，因为我的理解是所有数据都是基于 (time, condition) 聚合的 Seaborn Documentation 定义 unit作为

Field in the data DataFrame identifying the sampling unit (e.g. subject, neuron, etc.). The error representation will collapse over units at each time/condition observation. This has no role when data is an array.

我不确定 'collapsed over' 的含义 - 特别是因为我的定义不会使其成为必需变量。

无论如何，如果您想要完全您讨论的内容，这就是输出，而不是那么漂亮。我不确定如何在这些区域手动着色，但如果你弄明白了，请分享。

cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
grouped = cluster_overload.groupby(['Cluster','Week'],as_index=False)
stats = grouped.agg(['min','mean','max']).unstack().T
stats.index = stats.index.droplevel(0)

colors = ['b','g','r']
ax = stats.loc['mean'].plot(color=colors, alpha=0.8, linewidth=3)
stats.loc['max'].plot(ax=ax,color=colors,legend=False, alpha=0.3)
stats.loc['min'].plot(ax=ax,color=colors,legend=False, alpha=0.3)

输出：

Answer 2

基于 this incredible answer，我能够创建一个猴子补丁来完美地完成您正在寻找的东西。

import pandas as pd
import seaborn as sns    
import seaborn.timeseries

def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs):
    upper = data.max(axis=0)
    lower = data.min(axis=0)
    #import pdb; pdb.set_trace()
    ci = np.asarray((lower, upper))
    kwargs.update({"central_data": central_data, "ci": ci, "data": data})
    seaborn.timeseries._plot_ci_band(*args, **kwargs)

seaborn.timeseries._plot_range_band = _plot_range_band

cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount()

ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", unit="Unit", data=cluster_overload,
               err_style="range_band", n_boot=0)

输出图：

请注意，阴影区域与折线图中的真实最大值和最小值对齐！

如果您弄清楚为什么需要 unit 变量，请告诉我。

如果您不希望它们都出现在同一个图表上，那么：

import pandas as pd
import seaborn as sns
import seaborn.timeseries


def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs):
    upper = data.max(axis=0)
    lower = data.min(axis=0)
    #import pdb; pdb.set_trace()
    ci = np.asarray((lower, upper))
    kwargs.update({"central_data": central_data, "ci": ci, "data": data})
    seaborn.timeseries._plot_ci_band(*args, **kwargs)

seaborn.timeseries._plot_range_band = _plot_range_band

cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
cluster_overload['subindex'] = cluster_overload.groupby(['Cluster','Week']).cumcount()

def customPlot(*args,**kwargs):
    df = kwargs.pop('data')
    pivoted = df.pivot(index='subindex', columns='Week', values='Overload')
    ax = sns.tsplot(pivoted.values, err_style="range_band", n_boot=0, color=kwargs['color'])

g = sns.FacetGrid(cluster_overload, row="Cluster", sharey=False, hue='Cluster', aspect=3)
g = g.map_dataframe(customPlot, 'Week', 'Overload','subindex')

产生以下内容，（如果您认为比例不对，您显然可以玩纵横比）

Answer 3

我终于使用了旧的 plot 设计（子图），对我来说似乎更具可读性。

df = pd.read_csv('TSplot.csv', sep='\t', index_col=0)
# Compute the min, mean and max (could also be other values)
grouped = df.groupby(["Cluster", "Week"]).agg({'Overload': ['min', 'mean', 'max']}).unstack("Cluster")

# Plot with sublot since it is more readable
axes = grouped.loc[:,('Overload', 'mean')].plot(subplots=True)

# Getting the color palette used
palette = sns.color_palette()

# Initializing an index to get each cluster and each color
index = 0
for ax in axes:
    ax.fill_between(grouped.index, grouped.loc[:,('Overload', 'mean', index + 1)], 
                    grouped.loc[:,('Overload', 'max', index + 1 )], alpha=.2, color=palette[index])
    ax.fill_between(grouped.index, 
                    grouped.loc[:,('Overload', 'min', index + 1)] , grouped.loc[:,('Overload', 'mean', index + 1)], alpha=.2, color=palette[index])
    index +=1

使用 Seaborn 绘制带有 min/max 阴影的时间序列图

Timeseries plot with min/max shading using Seaborn

python

pandas

seaborn