macbook 上的 KMeans 散点图
KMeans scatter plot on macbook
我是数据科学的新手,我试图为一个有 4000 行的数据集绘制散点图。我是 运行 Macbook 上的 Jupyter Notebook。我发现散点图出现在 Jupyter notebook 中需要五分钟以上的时间。我的笔记本是最近买的,是2.3Ghz的intel core i5,内存是8GB。
我有两个问题:为什么花了这么长时间?为什么情节如此拥挤(例如,所有 x 刻度都显得很小并且它们聚集在一起并且无法清楚地阅读)并且不是很清楚。数据集在这里:https://raw.githubusercontent.com/datascienceinc/learn-data-science/master/Introduction-to-K-means-Clustering/Data/data_1024.csv
非常感谢任何启发。
这是我的代码:
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
from sklearn.cluster import KMeans
df= pd.read_csv('/users/kyaw/Downloads/data_1024.csv')
df = df.join(df['Driver_ID'].str.split(expand=True))
df = df.drop(["Driver_ID"], axis=1)
df.columns=['Driver_ID','Distance_Feature','Speeding_Feature']
f1 = df['Distance_Feature'].values
f2 = df['Speeding_Feature'].values
X=np.array(list(zip(f1,f2)))
fig=plt.gcf()
fig.set_size_inches(10,8)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
plt.show()
我试过 运行 你的代码,但没成功。我做了以下更正
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
#%matplotlib inline --> Removed this inline, maybe is here due to jupyter
from sklearn.cluster import KMeans
df= pd.read_csv('./data_1024.csv',sep='\t' ) #indicate the separator as tab.
#remove the other instructions that are useless
f1 = df['Distance_Feature'].values
f2 = df['Speeding_Feature'].values
X=np.array(list(zip(f1,f2)))
fig=plt.gcf()
fig.set_size_inches(10,8)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
plt.show()
我得到了这张图片
我是数据科学的新手,我试图为一个有 4000 行的数据集绘制散点图。我是 运行 Macbook 上的 Jupyter Notebook。我发现散点图出现在 Jupyter notebook 中需要五分钟以上的时间。我的笔记本是最近买的,是2.3Ghz的intel core i5,内存是8GB。
我有两个问题:为什么花了这么长时间?为什么情节如此拥挤(例如,所有 x 刻度都显得很小并且它们聚集在一起并且无法清楚地阅读)并且不是很清楚。数据集在这里:https://raw.githubusercontent.com/datascienceinc/learn-data-science/master/Introduction-to-K-means-Clustering/Data/data_1024.csv
非常感谢任何启发。
这是我的代码:
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
from sklearn.cluster import KMeans
df= pd.read_csv('/users/kyaw/Downloads/data_1024.csv')
df = df.join(df['Driver_ID'].str.split(expand=True))
df = df.drop(["Driver_ID"], axis=1)
df.columns=['Driver_ID','Distance_Feature','Speeding_Feature']
f1 = df['Distance_Feature'].values
f2 = df['Speeding_Feature'].values
X=np.array(list(zip(f1,f2)))
fig=plt.gcf()
fig.set_size_inches(10,8)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
plt.show()
我试过 运行 你的代码,但没成功。我做了以下更正
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
#%matplotlib inline --> Removed this inline, maybe is here due to jupyter
from sklearn.cluster import KMeans
df= pd.read_csv('./data_1024.csv',sep='\t' ) #indicate the separator as tab.
#remove the other instructions that are useless
f1 = df['Distance_Feature'].values
f2 = df['Speeding_Feature'].values
X=np.array(list(zip(f1,f2)))
fig=plt.gcf()
fig.set_size_inches(10,8)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
plt.show()
我得到了这张图片