将一对二的 x-y 数据分成 top 和 bottom 集合

Divide one-to-two x-y data into top and bottom sets

我有一个数据集,其中有两个 y 值与每个 x 值相关联。如何将数据分为 "upper" 和 "lower" 值?

下面,我展示了一个这样的数据集的例子。我显示了所需 "top" 和 "bottom" 分组的图像(红色是顶部,紫色是底部)。到目前为止,我最好的想法是使用迭代找到一条线来划分顶部和底部数据 approach.This 解决方案很复杂并且效果不佳,所以我没有包括它。

import matplotlib.pyplot as plt
import numpy as np

# construct data using piecewise functions
x1 = np.linspace(0, 0.7, 70)
x2 = np.linspace(0.7, 1, 30)
x3 = np.linspace(0.01, 0.999, 100)
y1 = 4.164 * x1 ** 3
y2 = 1 / x2
y3 = x3 ** 4 - 0.1

# concatenate data
x = np.concatenate([x1, x2, x3])
y = np.concatenate([y1, y2, y3])

# I want to be able divide the data by top and bottom,
#  like shown in the chart. The black is the unlabeled data
#  and the red and purple show the top and bottom
plt.scatter(x, y, marker='^', s=10, c='k')
plt.scatter(x1, y1, marker='x', s=0.8, c='r')
plt.scatter(x2, y2, marker='x', s=0.8, c='r')
plt.scatter(x3, y3, marker='x', s=0.8, c='purple')
plt.show()

您可以通过重新排序数据来创建分界线。按 x 对所有内容进行排序,然后应用高斯滤波器。两组数据严格高于或低于高斯滤波的结果:

import matplotlib.pyplot as plt
from scipy.ndimage.filters import gaussian_filter1d
import numpy as np

# construct data using piecewise functions
x1 = np.linspace(0, 0.7, 70)
x2 = np.linspace(0.7, 1, 30)
x3 = np.linspace(0.01, 0.999, 100)
y1 = 4.164 * x1 ** 3
y2 = 1 / x2
y3 = x3 ** 4 - 0.1

# concatenate data
x = np.concatenate([x1, x2, x3])
y = np.concatenate([y1, y2, y3])

# I want to be able divide the data by top and bottom,
#  like shown in the chart. The black is the unlabeled data
#  and the red and purple show the top and bottom


idx = np.argsort(x)
newy = y[idx]
newx = x[idx]
gf = gaussian_filter1d(newy, 5)
plt.scatter(x, y, marker='^', s=10, c='k')
plt.scatter(x1, y1, marker='x', s=0.8, c='r')
plt.scatter(x2, y2, marker='x', s=0.8, c='r')
plt.scatter(x3, y3, marker='x', s=0.8, c='purple')
plt.scatter(newx, gf, c='orange')
plt.show()

我会尝试如下:

  • 如有必要,通过增加 X 对点进行排序;

  • 维护上下两个子集的索引;

  • 从左向右移动,对于每一个新的点,将其分配给最近的子集并更新相应的索引。

进程的初始化似乎有点棘手。从前两点开始(它们很有可能属于同一子集)。继续前进,直到两点有明显的分离,这样您就可以确定它们属于不同的子集。然后向左原路返回。