如何使用 Seaborn 误差带显示纯矩阵 [Samples, X_Range] 的误差带?

How to show error bands for pure matrices [Samples, X_Range] with Seaborn error bands?

我将一些数据存储为 [samples, x_values_toconsider],然后将其提供给 seaborn。我已经尝试了很多组合来绘制误差带,但它似乎并没有这样做,它只是为 x 的特定 feature/value 绘制我的每个样本。所以它最终有 samples 条曲线,这不是我想要的。我也曾尝试将其安排为数据框,但这也无济于事。

为什么 seaborn 将每个单独的样本绘制成它自己的曲线?我希望它用通常的置信区间聚合它。

独立的可重现代码:

#%%
"""
https://seaborn.pydata.org/tutorial/relational.html#relational-tutorial

https://seaborn.pydata.org/examples/errorband_lineplots.html
https://www.youtube.com/watch?v=G3F0EZcW9Ew
https://github.com/knathanieltucker/seaborn-weird-parts/commit/3e571fd8e211ea04b6c9577fd548e7e532507acf
https://github.com/knathanieltucker/seaborn-weird-parts/blob/3e571fd8e211ea04b6c9577fd548e7e532507acf/tsplot.ipynb
"""
from collections import OrderedDict

import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from pandas import DataFrame
import pandas as pd

print(sns)

np.random.seed(22)
sns.set(color_codes=True)

# the number of x values to consider in a given range e.g. [0,1] will sample 10 raw features x sampled at in [0,1] interval
num_x: int = 10
# the repetitions for each x feature value e.g. multiple measurements for sample x=0.0 up to x=1.0 at the end
rep_per_x: int = 5
total_size_data_set: int = num_x * rep_per_x
print(f'{total_size_data_set=}')
# - create fake data set
# only consider 10 features from 0 to 1
x = np.linspace(start=0.0, stop=1.0, num=num_x)
# to introduce fake variation add uniform noise to each feature and pretend each one is a new observation for that feature
noise_uniform: np.ndarray = np.random.rand(rep_per_x, num_x)
# same as above but have the noise be the same for each x (thats what the 1 means)
noise_normal: np.ndarray = np.random.randn(rep_per_x, 1)
# signal function
sin_signal: np.ndarray = np.sin(x)
# [rep_per_x, num_x]
data: np.ndarray = sin_signal + noise_uniform + noise_normal

# data_od: OrderedDict = OrderedDict()
# for idx_x in range(num_x):
#     # [rep_per_x, 1]
#     samples_for_x: np.ndarray = data[:, idx_x]
#     data_od[str(x[idx_x])] = samples_for_x
#
# data_df = pd.DataFrame(data_od)
# data = data_df

print(data)
ax = sns.lineplot(data=data)
# ax = sns.lineplot(data=data, err_style='band')
# ax = sns.lineplot(data=data, err_style='bars')
# ax = sns.lineplot(data=data, ci='sd', err_style='band')
# ax = sns.lineplot(data=data, ci='sd', err_style='bars')
 
# ax = sns.relplot(data=data)

plt.show()

#%%
"""
https://seaborn.pydata.org/examples/errorband_lineplots.html
"""

# import numpy as np
# import seaborn as sns
# from matplotlib import pyplot as plt
# from pandas import DataFrame
#
# fmri: DataFrame = sns.load_dataset("fmri")
# print(fmri)
# sns.lineplot(x="timepoint", y="signal",  hue="region", style="event", data=fmri)
# plt.show()

错误的情节产生

但希望有这样的东西(加上带有自己的误差带的附加线会更好,因为我有很多矩阵!)

喜欢:


请注意,我不想预先计算 stds 并绘制波段,所以其中大部分 questions/answers 对我不起作用。

令我困惑的是,当我将 fmri 数据传递给它时它起作用,但当我将它传递给每个 x 值的观察矩阵时却不起作用...


相关帖子:

带有误差带的 Seaborn 的 lineplot 需要一个具有相应 y 值的 x 值列表。要获得误差带,相同的 x 值应以不同的 y 值出现多次。

在您的设置中,仅提供了 y 值。在这里,Seaborn 将其解释为对应于默认 x 值 (0,1,2,3,4,5,6,7,8,9) 的 10 个 y 值列表,如刻度标签所示。

要获得所需的绘图,应将 y 值“拼凑”成一个长的一维数组。应重复相应的 x 值(使用 np.tile):

ax = sns.lineplot(x=np.tile(x, rep_per_x), y=data.ravel())

请注意,没有给出明确 x 的教程示例使用 pandas 数据框或数据框中的一列形式的数据。在这种情况下,数据帧的索引可用于 x 值。

For example:

flights = sns.load_dataset("flights")
flights_wide = flights.pivot("year", "month", "passengers")

创建一个以“year”为索引,每个“month”为列(以“passengers”为值)的数据框。然后 sns.lineplot(data=flights_wide) 创建一个线图,每列都有单独的曲线,“年”作为 x 值。将其与 sns.lineplot(data=flights, x="year", y="passengers") 进行比较,后者使用数据框的 "long form" 并忽略各个月份。

要处理两个或更多数据集,您可以调用 sns.lineplot 两次。这是一个例子:

from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

np.random.seed(22)
sns.set(color_codes=True)

num_x: int = 30
rep_per_x1: int = 5
rep_per_x2: int = 20

x = np.linspace(start=0.0, stop=3.0, num=num_x)

noise_uniform1: np.ndarray = np.random.rand(rep_per_x1, num_x) / 10
noise_normal1: np.ndarray = np.random.randn(rep_per_x1, 1) / 10
sin_signal: np.ndarray = np.sin(x)
data1: np.ndarray = sin_signal + noise_uniform1 + noise_normal1

noise_uniform2: np.ndarray = np.random.rand(rep_per_x2, num_x) / 10
noise_normal2: np.ndarray = np.random.randn(rep_per_x2, 1) / 10
cos_signal: np.ndarray = np.cos(x)
data2: np.ndarray = cos_signal + noise_uniform2 + noise_normal2

ax = sns.lineplot(x=np.tile(x, rep_per_x1),
                  y=data1.ravel(),
                  label=1)
sns.lineplot(x=np.tile(x, rep_per_x2),
             y=data2.ravel(),
             label=2, ax=ax)
plt.show()

或者您可以连接所有值。使用数据框可能更容易查看正在发生的事情,并且还会自动使用列名来标记轴和图例。

ax = sns.lineplot(x=np.tile(x, rep_per_x1 + rep_per_x2),
                  y=np.concatenate([data1.ravel(), data2.ravel()]),
                  hue=np.concatenate([np.repeat(1, num_x * rep_per_x1), np.repeat(2, num_x * rep_per_x2)]),
                  palette=['crimson', 'dodgerblue'])

为了给未来的用户提供一个包含可重用代码的完整示例:

def plot_seaborn_curve_with_x_values_y_values(x: np.ndarray, y: np.ndarray,
                                              xlabel: str, ylabel: str,
                                              title: str,
                                              curve_label: Optional[str] = None,
                                              err_style: str = 'band',
                                              marker: Optional[str] = 'x',
                                              dashes: bool = False,
                                              show: bool = False
                                              ):
    """
    Given a list of x values in a range with num_x_values number of x values in that range and the corresponding samples
    for each specific x value (so [samples_per_x] for each value of x giving in total a matrix of size
    [samples_per_x, num_x_values]), plot aggregates of them with error bands.
    Note that the main assumption is that each x value has a number of y values corresponding to it (likely due to noise
    for example).


    Note:
        - if you want string in the specific x axis point do
        sns.lineplot(x=np.tile([f'Layer{i}' for i in range(1, num_x+1)], rep_per_x),...) assuming the x values are the
        layers. 
        - note you can all this function multiple times to insert different curves to your plot.
        - note its recommended call show only for if you have one or at the final curve you want to add.
        - if you want bands and bars it might work if you call this function twice but using the bar and band argument
        for each call.

    ref:
        - 

    :param x: [num_x_values]
    :param y: [samples_per_x, num_x_values]
    :param xlabel:
    :param ylabel:
    :param title:
    :param curve_label:
    :param err_style:
    :param marker:
    :param dashes:
    :param show:
    :return:
    """
    import seaborn as sns
    samples_per_x: int = y.shape[0]
    num_x_values: int = x.shape[0]
    assert(num_x_values == y.shape[1]), f'We are plotting aggreagates for one specific value of x multple values of y,' \
                                        f'thus we need to have the same number of x values match in the x axis.'

    # - since seaborn expects a an x value paired with it's y value, let's flatten the y's and make sure the corresponding
    # x value is aligned with it's y value [num_x_values * samples_per_x]
    x: np.ndarray = np.tile(x, samples_per_x)  # np.tile = Construct an array by repeating A the number of times given by reps.
    assert (x.shape == (num_x_values * samples_per_x,))
    y: np.ndarray = np.ravel(y)  # flatten the y's to match the x values to have the x to it's corresponding y
    assert (y.shape == (num_x_values * samples_per_x,))
    assert (x.shape == y.shape)

    # - plot
    ax = sns.lineplot(x=x, y=y, err_style=err_style, label=curve_label, marker=marker, dashes=dashes)
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    if show:
        plt.show()

例如

def plot_seaborn_curve_with_x_values_y_values_test():
    # the number of x values to consider in a given range e.g. [0,1] will sample 10 raw features x sampled at in [0,1] interval
    num_x: int = 10
    # the repetitions for each x feature value e.g. multiple measurements for sample x=0.0 up to x=1.0 at the end
    rep_per_x: int = 5
    total_size_data_set: int = num_x * rep_per_x
    print(f'{total_size_data_set=}')
    # - create fake data set
    # only consider 10 features from 0 to 1
    x = np.linspace(start=0.0, stop=1.0, num=num_x)

    # to introduce fake variation add uniform noise to each feature and pretend each one is a new observation for that feature
    noise_uniform: np.ndarray = np.random.rand(rep_per_x, num_x)
    # same as above but have the noise be the same for each x (thats what the 1 means)
    noise_normal: np.ndarray = np.random.randn(rep_per_x, 1)
    # signal function
    sin_signal: np.ndarray = np.sin(x)
    cos_signal: np.ndarray = np.cos(x)
    # [rep_per_x, num_x]
    y1: np.ndarray = sin_signal + noise_uniform + noise_normal
    y2: np.ndarray = cos_signal + noise_uniform + noise_normal

    plot_seaborn_curve_with_x_values_y_values(x=x, y=y1, xlabel='x', ylabel='y', title='Sin vs Cos')
    plot_seaborn_curve_with_x_values_y_values(x=x, y=y2, xlabel='x', ylabel='y', title='Sin vs Cos')
    plt.show()

输出: