将一对二的 x-y 数据分成 top 和 bottom 集合
Divide one-to-two x-y data into top and bottom sets
我有一个数据集,其中有两个 y 值与每个 x 值相关联。如何将数据分为 "upper" 和 "lower" 值?
下面,我展示了一个这样的数据集的例子。我显示了所需 "top" 和 "bottom" 分组的图像(红色是顶部,紫色是底部)。到目前为止,我最好的想法是使用迭代找到一条线来划分顶部和底部数据 approach.This 解决方案很复杂并且效果不佳,所以我没有包括它。
import matplotlib.pyplot as plt
import numpy as np
# construct data using piecewise functions
x1 = np.linspace(0, 0.7, 70)
x2 = np.linspace(0.7, 1, 30)
x3 = np.linspace(0.01, 0.999, 100)
y1 = 4.164 * x1 ** 3
y2 = 1 / x2
y3 = x3 ** 4 - 0.1
# concatenate data
x = np.concatenate([x1, x2, x3])
y = np.concatenate([y1, y2, y3])
# I want to be able divide the data by top and bottom,
# like shown in the chart. The black is the unlabeled data
# and the red and purple show the top and bottom
plt.scatter(x, y, marker='^', s=10, c='k')
plt.scatter(x1, y1, marker='x', s=0.8, c='r')
plt.scatter(x2, y2, marker='x', s=0.8, c='r')
plt.scatter(x3, y3, marker='x', s=0.8, c='purple')
plt.show()
您可以通过重新排序数据来创建分界线。按 x 对所有内容进行排序,然后应用高斯滤波器。两组数据严格高于或低于高斯滤波的结果:
import matplotlib.pyplot as plt
from scipy.ndimage.filters import gaussian_filter1d
import numpy as np
# construct data using piecewise functions
x1 = np.linspace(0, 0.7, 70)
x2 = np.linspace(0.7, 1, 30)
x3 = np.linspace(0.01, 0.999, 100)
y1 = 4.164 * x1 ** 3
y2 = 1 / x2
y3 = x3 ** 4 - 0.1
# concatenate data
x = np.concatenate([x1, x2, x3])
y = np.concatenate([y1, y2, y3])
# I want to be able divide the data by top and bottom,
# like shown in the chart. The black is the unlabeled data
# and the red and purple show the top and bottom
idx = np.argsort(x)
newy = y[idx]
newx = x[idx]
gf = gaussian_filter1d(newy, 5)
plt.scatter(x, y, marker='^', s=10, c='k')
plt.scatter(x1, y1, marker='x', s=0.8, c='r')
plt.scatter(x2, y2, marker='x', s=0.8, c='r')
plt.scatter(x3, y3, marker='x', s=0.8, c='purple')
plt.scatter(newx, gf, c='orange')
plt.show()
我会尝试如下:
如有必要,通过增加 X 对点进行排序;
维护上下两个子集的索引;
从左向右移动,对于每一个新的点,将其分配给最近的子集并更新相应的索引。
进程的初始化似乎有点棘手。从前两点开始(它们很有可能属于同一子集)。继续前进,直到两点有明显的分离,这样您就可以确定它们属于不同的子集。然后向左原路返回。
我有一个数据集,其中有两个 y 值与每个 x 值相关联。如何将数据分为 "upper" 和 "lower" 值?
下面,我展示了一个这样的数据集的例子。我显示了所需 "top" 和 "bottom" 分组的图像(红色是顶部,紫色是底部)。到目前为止,我最好的想法是使用迭代找到一条线来划分顶部和底部数据 approach.This 解决方案很复杂并且效果不佳,所以我没有包括它。
import matplotlib.pyplot as plt
import numpy as np
# construct data using piecewise functions
x1 = np.linspace(0, 0.7, 70)
x2 = np.linspace(0.7, 1, 30)
x3 = np.linspace(0.01, 0.999, 100)
y1 = 4.164 * x1 ** 3
y2 = 1 / x2
y3 = x3 ** 4 - 0.1
# concatenate data
x = np.concatenate([x1, x2, x3])
y = np.concatenate([y1, y2, y3])
# I want to be able divide the data by top and bottom,
# like shown in the chart. The black is the unlabeled data
# and the red and purple show the top and bottom
plt.scatter(x, y, marker='^', s=10, c='k')
plt.scatter(x1, y1, marker='x', s=0.8, c='r')
plt.scatter(x2, y2, marker='x', s=0.8, c='r')
plt.scatter(x3, y3, marker='x', s=0.8, c='purple')
plt.show()
您可以通过重新排序数据来创建分界线。按 x 对所有内容进行排序,然后应用高斯滤波器。两组数据严格高于或低于高斯滤波的结果:
import matplotlib.pyplot as plt
from scipy.ndimage.filters import gaussian_filter1d
import numpy as np
# construct data using piecewise functions
x1 = np.linspace(0, 0.7, 70)
x2 = np.linspace(0.7, 1, 30)
x3 = np.linspace(0.01, 0.999, 100)
y1 = 4.164 * x1 ** 3
y2 = 1 / x2
y3 = x3 ** 4 - 0.1
# concatenate data
x = np.concatenate([x1, x2, x3])
y = np.concatenate([y1, y2, y3])
# I want to be able divide the data by top and bottom,
# like shown in the chart. The black is the unlabeled data
# and the red and purple show the top and bottom
idx = np.argsort(x)
newy = y[idx]
newx = x[idx]
gf = gaussian_filter1d(newy, 5)
plt.scatter(x, y, marker='^', s=10, c='k')
plt.scatter(x1, y1, marker='x', s=0.8, c='r')
plt.scatter(x2, y2, marker='x', s=0.8, c='r')
plt.scatter(x3, y3, marker='x', s=0.8, c='purple')
plt.scatter(newx, gf, c='orange')
plt.show()
我会尝试如下:
如有必要,通过增加 X 对点进行排序;
维护上下两个子集的索引;
从左向右移动,对于每一个新的点,将其分配给最近的子集并更新相应的索引。
进程的初始化似乎有点棘手。从前两点开始(它们很有可能属于同一子集)。继续前进,直到两点有明显的分离,这样您就可以确定它们属于不同的子集。然后向左原路返回。