knn 的 y 轴不匹配样本
Not matching sample in y axis for knn
我正在尝试使用比基于 iris 数据集的教程更灵活的 knn 输入脚本,但我在将匹配的第二维添加到 #6 中的 numpy 数组时遇到了一些麻烦(我认为)当我来到#11 时。配件。
File "G:\PROGRAMMERING\Anaconda\lib\site-packages\sklearn\utils\validation.py", line 212, in check_consistent_length
" samples: %r" % [int(l) for l in lengths]) ValueError: Found input variables with inconsistent numbers of samples: [150, 1]
x 是 (150,5),y 是 (150,1)。两者的样本数都是 150,但它们的字段数不同,这是问题所在吗?如果是,我该如何解决?
#1. Loading the Pandas libraries as pd
import pandas as pd
import numpy as np
#2. Read data from the file 'custom.csv' placed in your code directory
data = pd.read_csv("custom.csv")
#3. Preview the first 5 lines of the loaded data
print(data.head())
print(type(data))
#4.Test the shape of the data
print(data.shape)
df = pd.DataFrame(data)
print(df)
#5. Convert non-numericals to numericals
print(df.dtypes)
# Any object should be converted to numerical
df['species'] = pd.Categorical(df['species'])
df['species'] = df.species.cat.codes
print("outcome:")
print(df.dtypes)
#6.Convert df to numpy.ndarray
np = df.to_numpy()
print(type(np)) #this should state <class 'numpy.ndarray'>
print(data.shape)
print(np)
x = np.data
y = [df['species']]
print(y)
#K-nearest neighbor (find closest) - searach for the K nearest observations in the dataset
#The model calculates the distance to all, and selects the K nearest ones.
#8. Import the class you plan to use
from sklearn.neighbors import (KNeighborsClassifier)
#9. Pick a value for K
k = 2
#10. Instantiate the "estimator" (make an instance of the model)
knn = KNeighborsClassifier(n_neighbors=k)
print(knn)
#11. fit the model with data/model training
knn.fit(x, y)
#12. Predict the response for a new observation
print(knn.predict([[3, 5, 4, 2]]))```
这就是我如何使用 scikit-learn KNeighborsClassifier
来拟合 knn
模型:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
df = datasets.load_iris()
X = pd.DataFrame(df.data)
y = df.target
knn = KNeighborsClassifier(n_neighbors = 2)
knn.fit(X,y)
print(knn.predict([[6, 3, 5, 2]]))
#prints output class [2]
print(knn.predict([[3, 5, 4, 2]]))
#prints output class [1]
从DataFrame
不需要转换为numpy array
,可以直接在DataFrame
上拟合模型,同时将DataFrame
转换为numpy array
您已将其命名为 np
,它也用于在顶部导入 numpy
import numpy as np
输入预测输入为4列,剩下第五列'species'没有预测。此外,如果 'species' 是目标,则不能同时将其作为输入提供给 knn。 pop 从 dataFrame df.
中删除了这个特定的列
#npdf = df.to_numpy()
df = df.apply(lambda x:pd.Series(x))
y = np.asarray(df['species'])
#removes the target from the sample
df.pop('species')
x = df.to_numpy()
我正在尝试使用比基于 iris 数据集的教程更灵活的 knn 输入脚本,但我在将匹配的第二维添加到 #6 中的 numpy 数组时遇到了一些麻烦(我认为)当我来到#11 时。配件。
File "G:\PROGRAMMERING\Anaconda\lib\site-packages\sklearn\utils\validation.py", line 212, in check_consistent_length " samples: %r" % [int(l) for l in lengths]) ValueError: Found input variables with inconsistent numbers of samples: [150, 1]
x 是 (150,5),y 是 (150,1)。两者的样本数都是 150,但它们的字段数不同,这是问题所在吗?如果是,我该如何解决?
#1. Loading the Pandas libraries as pd
import pandas as pd
import numpy as np
#2. Read data from the file 'custom.csv' placed in your code directory
data = pd.read_csv("custom.csv")
#3. Preview the first 5 lines of the loaded data
print(data.head())
print(type(data))
#4.Test the shape of the data
print(data.shape)
df = pd.DataFrame(data)
print(df)
#5. Convert non-numericals to numericals
print(df.dtypes)
# Any object should be converted to numerical
df['species'] = pd.Categorical(df['species'])
df['species'] = df.species.cat.codes
print("outcome:")
print(df.dtypes)
#6.Convert df to numpy.ndarray
np = df.to_numpy()
print(type(np)) #this should state <class 'numpy.ndarray'>
print(data.shape)
print(np)
x = np.data
y = [df['species']]
print(y)
#K-nearest neighbor (find closest) - searach for the K nearest observations in the dataset
#The model calculates the distance to all, and selects the K nearest ones.
#8. Import the class you plan to use
from sklearn.neighbors import (KNeighborsClassifier)
#9. Pick a value for K
k = 2
#10. Instantiate the "estimator" (make an instance of the model)
knn = KNeighborsClassifier(n_neighbors=k)
print(knn)
#11. fit the model with data/model training
knn.fit(x, y)
#12. Predict the response for a new observation
print(knn.predict([[3, 5, 4, 2]]))```
这就是我如何使用 scikit-learn KNeighborsClassifier
来拟合 knn
模型:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
df = datasets.load_iris()
X = pd.DataFrame(df.data)
y = df.target
knn = KNeighborsClassifier(n_neighbors = 2)
knn.fit(X,y)
print(knn.predict([[6, 3, 5, 2]]))
#prints output class [2]
print(knn.predict([[3, 5, 4, 2]]))
#prints output class [1]
从DataFrame
不需要转换为numpy array
,可以直接在DataFrame
上拟合模型,同时将DataFrame
转换为numpy array
您已将其命名为 np
,它也用于在顶部导入 numpy
import numpy as np
输入预测输入为4列,剩下第五列'species'没有预测。此外,如果 'species' 是目标,则不能同时将其作为输入提供给 knn。 pop 从 dataFrame df.
中删除了这个特定的列#npdf = df.to_numpy()
df = df.apply(lambda x:pd.Series(x))
y = np.asarray(df['species'])
#removes the target from the sample
df.pop('species')
x = df.to_numpy()