计算 numpy 数组列的唯一值概率
Calculate unique value probability over numpy array columns
我想根据预测列表创建一个 scikit-learns predict_proba 版本。
我目前有一个如下所示的列表:
[[0,1,0,0,0,1,1,0,0,0],[0,1,0,1,0,1,1,1,0,0],[0,0,0,0,0,1,1,0,0,0]]
我想找出每个列表的第一个值是 0 或 1 的概率,然后每个连续值都相同。
即输出是这样的:
[[0.33,0.66],[0,1],[0.66,0.3]........etc
我写了下面的代码,它运行良好,但看起来很笨拙,我确定有更好的方法来实现我的目标吗?
有什么建议吗?
#create np array from list
ar = np.array([[0,1,0,0,0,1,1,0,0,0],[0,1,0,1,0,1,1,1,0,0],[0,0,0,0,0,1,1,0,0,0]])
#calculate unique values and sort in order
uni = np.unique(ar)
uni.sort()
#create new pred list
new_pred = []
#transpose and iterate
for row in ar.transpose():
# create dic with keys as unique values
val_dic = {k: 0 for k in uni}
#create list for row probabilities
row_pred = []
#iterate row and incremnet dic if found
for val in row:
if val in val_dic.keys():
val_dic[val] = val_dic.get(val, 0) + 1
#calc row total
total = sum(val_dic.values())
#append row list with probabilities
for val in val_dic.values():
row_pred.append(val/total)
#append final output list
new_pred.append(row_pred)
print(new_pred)
输出:
[[1.0, 0.0], [0.3333333333333333, 0.6666666666666666], [1.0, 0.0], [0.6666666666666666, 0.3333333333333333], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0], [0.6666666666666666, 0.3333333333333333], [1.0, 0.0], [1.0, 0.0]]
如果您的 ar
仅由 0
、1
组成,就像您的问题一样,您可以这样做来简化您的代码:
import numpy as np
ar = np.array([[0,1,0,0,0,1,1,0,0,0],[0,1,0,1,0,1,1,1,0,0],[0,0,0,0,0,1,1,0,0,0]])
prob_1 = ar.T.sum(axis=1) / len(ar) # <-- max sum of row is len(ar) == 3
prob_0 = 1.0 - prob_1
print(np.column_stack((prob_0, prob_1)))
打印:
[[1. 0. ]
[0.33333333 0.66666667]
[1. 0. ]
[0.66666667 0.33333333]
[1. 0. ]
[0. 1. ]
[0. 1. ]
[0.66666667 0.33333333]
[1. 0. ]
[1. 0. ]]
我想根据预测列表创建一个 scikit-learns predict_proba 版本。
我目前有一个如下所示的列表:
[[0,1,0,0,0,1,1,0,0,0],[0,1,0,1,0,1,1,1,0,0],[0,0,0,0,0,1,1,0,0,0]]
我想找出每个列表的第一个值是 0 或 1 的概率,然后每个连续值都相同。
即输出是这样的:
[[0.33,0.66],[0,1],[0.66,0.3]........etc
我写了下面的代码,它运行良好,但看起来很笨拙,我确定有更好的方法来实现我的目标吗?
有什么建议吗?
#create np array from list
ar = np.array([[0,1,0,0,0,1,1,0,0,0],[0,1,0,1,0,1,1,1,0,0],[0,0,0,0,0,1,1,0,0,0]])
#calculate unique values and sort in order
uni = np.unique(ar)
uni.sort()
#create new pred list
new_pred = []
#transpose and iterate
for row in ar.transpose():
# create dic with keys as unique values
val_dic = {k: 0 for k in uni}
#create list for row probabilities
row_pred = []
#iterate row and incremnet dic if found
for val in row:
if val in val_dic.keys():
val_dic[val] = val_dic.get(val, 0) + 1
#calc row total
total = sum(val_dic.values())
#append row list with probabilities
for val in val_dic.values():
row_pred.append(val/total)
#append final output list
new_pred.append(row_pred)
print(new_pred)
输出:
[[1.0, 0.0], [0.3333333333333333, 0.6666666666666666], [1.0, 0.0], [0.6666666666666666, 0.3333333333333333], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0], [0.6666666666666666, 0.3333333333333333], [1.0, 0.0], [1.0, 0.0]]
如果您的 ar
仅由 0
、1
组成,就像您的问题一样,您可以这样做来简化您的代码:
import numpy as np
ar = np.array([[0,1,0,0,0,1,1,0,0,0],[0,1,0,1,0,1,1,1,0,0],[0,0,0,0,0,1,1,0,0,0]])
prob_1 = ar.T.sum(axis=1) / len(ar) # <-- max sum of row is len(ar) == 3
prob_0 = 1.0 - prob_1
print(np.column_stack((prob_0, prob_1)))
打印:
[[1. 0. ]
[0.33333333 0.66666667]
[1. 0. ]
[0.66666667 0.33333333]
[1. 0. ]
[0. 1. ]
[0. 1. ]
[0.66666667 0.33333333]
[1. 0. ]
[1. 0. ]]