Scikit-learn 数据集制作者不接受命令行参数

Question

我正在学习 Scikit-learn 中的一个教程，其中一个部分创建了一个数据集。这个：

#Generate the example datapoints, red and blue.
X, t = sklearn.datasets.make_circles(n_samples=100, shuffle=False, factor=0.3, noise=0.1)
T = np.zeros((100,2))
T[t==1,1] = 1
T[t==0,0] = 1

#Seperate the datapoints by color.
x_red = X[t==0]
x_blue = X[t==1]
print ('shape of X: {}'.format(X.shape))
print ('shape of T: {}'.format(T.shape))

#Plotstuff
plt.plot(x_red[:,0], x_red[:,1], 'ro', label='class red')
plt.plot(x_blue[:,0], x_blue[:,1], 'bo', label='class blue')
plt.grid()
plt.legend(loc=1)
plt.xlabel('$x_1$', fontsize=15)
plt.ylabel('$x_2$', fontsize=15)
plt.axis([-1.5, 1.5, -1.5, 1.5])
plt.title('red vs. blue classes in the input space')
plt.show()

完全正常chart。

但是，当我将其更改为接受命令行输入时：

    try:
        in1 = (int(float(sys.argv[1])))
        in2 = (int(float(sys.argv[2])))
        in3 = (int(float(sys.argv[3])))
        in4 = (int(float(sys.argv[4])))
    except IndexError:
        print(
        "The program is run as: program.py a b c d \n"
        "a = Random seed\n"
        "b = Number of samples\n"
        "c = Factor\n"
        "d = Noise\n"
        "Example: python JISIDF-[01].py 1 100 0.3 0.1")
        raise SystemExit

    np.random.seed(seed=in1)
    #Generate the example datapoints, red and blue.
    X, t = sklearn.datasets.make_circles(n_samples=in2, shuffle=False, factor=in3, noise=in4)

即使输入相同：

don@don-DELL:~/Code/Tutorials/Peterrolelant$ python3 PeterNet-17.py 1 100 0.3 0.1

我收到一个错误：

    Traceback (most recent call last):
      File "PeterNet-17.py", line 27, in <module>
        X, t = sklearn.datasets.make_circles(n_samples=in2, shuffle=False, factor=in3, noise=in4)
      File "/usr/local/lib/python3.4/dist-packages/sklearn/datasets/samples_generator.py", line 625, in make_circles
        X += generator.normal(scale=noise, size=X.shape)
      File "mtrand.pyx", line 1902, in mtrand.RandomState.normal (numpy/random/mtrand/mtrand.c:17755)
    ValueError: scale <= 0

这曾经是一个问题。我该如何解决这个问题？

Answer 1

从 http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html and http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html 开始，请注意 seed 和 n_samples 需要 int，而 noise 和 factor 需要 double.

当前类型转换的主要问题是 in4 = int(float(sys.argv[4])) 的计算结果为 0，因为 int(float('0.1')) 的计算结果为 0，但 noise（和 scale）是预期的大于 0。这就是回溯显示 ValueError: scale <= 0 的原因。出于同样的原因，int(float(sys.argv[3])) 的计算结果也为 0，但您可能希望它为 0.3。因此，这两种情况的解决方案都是删除 int 转换。

另一个小问题是您可以直接执行 in1 = int(sys.argv[1]) 和 in2 = int(sys.argv[2]) 而无需先将 sys.argv[1] 和 sys.argv[2] 字符串转换为 floats .

因此，总而言之，您应该改为这样做：

in1 = int(sys.argv[1])
in2 = int(sys.argv[2])
in3 = float(sys.argv[3])
in4 = float(sys.argv[4])

Scikit-learn 数据集制作者不接受命令行参数

Scikit-learn dataset maker not accepting command-line arguments

python

numpy

dataset

command-line-arguments

scikit-learn