根据分类变量拆分 numpy 数组
Splitting numpy arrays based on categorical variable
我正在尝试根据分类变量 "obese" 拆分年龄和体重,然后用不同的颜色绘制两组。我想我可能做错了列表理解。当我绘图时,我只看到一种颜色和所有数据点。
import numpy as np
import matplotlib.pyplot as plt
ages = np.array([20, 22, 23, 25, 27])
weights = np.array([140, 144, 150, 156, 160])
obese = np.array([0, 0, 0, 1, 1])
ages_normal = [ages for i in range(0, len(obese)) if obese[i] == 0]
weights_normal = [weights for i in range(0, len(obese)) if obese[i] == 0]
ages_obese = [ages for i in range(0, len(obese)) if obese[i] == 1]
weights_obese = [weights for i in range(0, len(obese)) if obese[i] == 1]
plt.scatter(ages_normal, weights_normal, color = "b")
plt.scatter(ages_obese, weights_obese, color = "r")
plt.show()
我可能会这样做:
import numpy as np
import matplotlib.pyplot as plt
ages = np.array([20, 22, 23, 25, 27])
weights = np.array([140, 144, 150, 156, 160])
obese = np.array([0, 0, 0, 1, 1])
data = zip(ages, weights, obese)
data_normal = np.array([(a,w) for (a,w,o) in data if o == 0])
data_obese = np.array([(a,w) for (a,w,o) in data if o == 1])
plt.scatter(data_normal[:,0], data_normal[:,1], color = "b")
plt.scatter(data_obese[:,0], data_obese[:,1], color = "r")
plt.show()
但这可能更有效:
data = np.array(np.vstack([ages, weights, obese])).T
ind_n = np.where(data[:,2] == 0)
ind_o = np.where(data[:,2] == 1)
plt.scatter(data[ind_n,0], data[ind_n,1], color = "b")
plt.scatter(data[ind_o,0], data[ind_o,1], color = "r")
但是你是对的,列表理解有点偏离,也许你想要这样的东西:
ages_normal = [ages[i] for i in range(0, len(obese)) if obese[i] == 0]
weights_normal = [weights[i] for i in range(0, len(obese)) if obese[i] == 0]
ages_obese = [ages[i] for i in range(0, len(obese)) if obese[i] == 1]
weights_obese = [weights[i] for i in range(0, len(obese)) if obese[i] == 1]
不同之处在于在 ages
/weights
上添加了索引。
这三种方法都能生成您正在寻找的图表。
我正在尝试根据分类变量 "obese" 拆分年龄和体重,然后用不同的颜色绘制两组。我想我可能做错了列表理解。当我绘图时,我只看到一种颜色和所有数据点。
import numpy as np
import matplotlib.pyplot as plt
ages = np.array([20, 22, 23, 25, 27])
weights = np.array([140, 144, 150, 156, 160])
obese = np.array([0, 0, 0, 1, 1])
ages_normal = [ages for i in range(0, len(obese)) if obese[i] == 0]
weights_normal = [weights for i in range(0, len(obese)) if obese[i] == 0]
ages_obese = [ages for i in range(0, len(obese)) if obese[i] == 1]
weights_obese = [weights for i in range(0, len(obese)) if obese[i] == 1]
plt.scatter(ages_normal, weights_normal, color = "b")
plt.scatter(ages_obese, weights_obese, color = "r")
plt.show()
我可能会这样做:
import numpy as np
import matplotlib.pyplot as plt
ages = np.array([20, 22, 23, 25, 27])
weights = np.array([140, 144, 150, 156, 160])
obese = np.array([0, 0, 0, 1, 1])
data = zip(ages, weights, obese)
data_normal = np.array([(a,w) for (a,w,o) in data if o == 0])
data_obese = np.array([(a,w) for (a,w,o) in data if o == 1])
plt.scatter(data_normal[:,0], data_normal[:,1], color = "b")
plt.scatter(data_obese[:,0], data_obese[:,1], color = "r")
plt.show()
但这可能更有效:
data = np.array(np.vstack([ages, weights, obese])).T
ind_n = np.where(data[:,2] == 0)
ind_o = np.where(data[:,2] == 1)
plt.scatter(data[ind_n,0], data[ind_n,1], color = "b")
plt.scatter(data[ind_o,0], data[ind_o,1], color = "r")
但是你是对的,列表理解有点偏离,也许你想要这样的东西:
ages_normal = [ages[i] for i in range(0, len(obese)) if obese[i] == 0]
weights_normal = [weights[i] for i in range(0, len(obese)) if obese[i] == 0]
ages_obese = [ages[i] for i in range(0, len(obese)) if obese[i] == 1]
weights_obese = [weights[i] for i in range(0, len(obese)) if obese[i] == 1]
不同之处在于在 ages
/weights
上添加了索引。
这三种方法都能生成您正在寻找的图表。