Tensorflow 提取分类预测
Tensorflow Extracting Classification Predictions
我有一个 tensorflow NN 模型,用于 class 单热编码组标签的化(组是排他的),它以(layerActivs[-1]
是最后一层的激活)结束:
probs = sess.run(tf.nn.softmax(layerActivs[-1]),...)
classes = sess.run(tf.round(probs))
preds = sess.run(tf.argmax(classes))
包含 tf.round
以强制任何低概率为 0。如果观察的所有概率都低于 50%,这意味着不会预测 class。也就是说,如果有 4 个 class,我们可以有 probs[0,:] = [0.2,0,0,0.4]
,所以 classes[0,:] = [0,0,0,0]
; preds[0] = 0
跟随。
显然这是模棱两可的,因为它与 probs[1,:]=[.9,0,.1,0]
-> classes[1,:] = [1,0,0,0]
-> 1 preds[1] = 0
的结果相同。这是使用 tensorflow 内置指标 class 时的问题,因为函数无法区分无预测和 class 0 中的预测。此代码演示了这一点:
import numpy as np
import tensorflow as tf
import pandas as pd
''' prepare '''
classes = 6
n = 100
# simulate data
np.random.seed(42)
simY = np.random.randint(0,classes,n) # pretend actual data
simYhat = np.random.randint(0,classes,n) # pretend pred data
truth = np.sum(simY == simYhat)/n
tabulate = pd.Series(simY).value_counts()
# create placeholders
lab = tf.placeholder(shape=simY.shape, dtype=tf.int32)
prd = tf.placeholder(shape=simY.shape, dtype=tf.int32)
AM_lab = tf.placeholder(shape=simY.shape,dtype=tf.int32)
AM_prd = tf.placeholder(shape=simY.shape,dtype=tf.int32)
# create one-hot encoding objects
simYOH = tf.one_hot(lab,classes)
# create accuracy objects
acc = tf.metrics.accuracy(lab,prd) # real accuracy with tf.metrics
accOHAM = tf.metrics.accuracy(AM_lab,AM_prd) # OHE argmaxed to labels - expected to be correct
# now setup to pretend we ran a model & generated OHE predictions all unclassed
z = np.zeros(shape=(n,classes),dtype=float)
testPred = tf.constant(z)
''' run it all '''
# setup
sess = tf.Session()
sess.run([tf.global_variables_initializer(),tf.local_variables_initializer()])
# real accuracy with tf.metrics
ACC = sess.run(acc,feed_dict = {lab:simY,prd:simYhat})
# OHE argmaxed to labels - expected to be correct, but is it?
l,p = sess.run([simYOH,testPred],feed_dict={lab:simY})
p = np.argmax(p,axis=-1)
ACCOHAM = sess.run(accOHAM,feed_dict={AM_lab:simY,AM_prd:p})
sess.close()
''' print stuff '''
print('Accuracy')
print('-known truth: %0.4f'%truth)
print('-on unprocessed data: %0.4f'%ACC[1])
print('-on faked unclassed labels data (s.b. 0%%): %0.4f'%ACCOHAM[1])
print('----------\nTrue Class Freqs:\n%r'%(tabulate.sort_index()/n))
输出为:
Accuracy
-known truth: 0.1500
-on unprocessed data: 0.1500
-on faked unclassed labels data (s.b. 0%): 0.1100
----------
True Class Freqs:
0 0.11
1 0.19
2 0.11
3 0.25
4 0.17
5 0.17
dtype: float64
Note freq for class 0 is same as faked accuracy...
我尝试将 preds
的值设置为 np.nan
用于没有预测的观察,但是 tf.metrics.accuracy
抛出 ValueError: cannot convert float NaN to integer
;也试过 np.inf
但得到 OverflowError: cannot convert float infinity to integer
.
如何将四舍五入的概率转换为 class 预测,同时适当地处理未预测的观察结果?
这已经很长时间没有答案了,所以我将 post 在这里作为我的解决方案的答案。我使用具有 3 个主要步骤的新函数将归属概率转换为 class 预测:
- 将任何 NaN 概率设置为 0
- 将任何低于
1/num_classes
的概率设置为 0
- 使用
np.argmax()
提取预测的classes,然后将任何未classed的观测值设置为统一选择的class
整数 class 标签的结果向量可以传递给 tf.metrics
函数。我的功能如下:
def predFromProb(classProbs):
'''
Take in as input an (m x p) matrix of m observations' class probabilities in
p classes and return an m-length vector of integer class labels (0...p-1).
Probabilities at or below 1/p are set to 0, as are NaNs; any unclassed
observations are randomly assigned to a class.
'''
numClasses = classProbs.shape[1]
# zero out class probs that are at or below chance, or NaN
probs = classProbs.copy()
probs[np.isnan(probs)] = 0
probs = probs*(probs > 1/numClasses)
# find any un-classed observations
unpred = ~np.any(probs,axis=1)
# get the predicted classes
preds = np.argmax(probs,axis=1)
# randomly classify un-classed observations
rnds = np.random.randint(0,numClasses,np.sum(unpred))
preds[unpred] = rnds
return preds
我有一个 tensorflow NN 模型,用于 class 单热编码组标签的化(组是排他的),它以(layerActivs[-1]
是最后一层的激活)结束:
probs = sess.run(tf.nn.softmax(layerActivs[-1]),...)
classes = sess.run(tf.round(probs))
preds = sess.run(tf.argmax(classes))
包含 tf.round
以强制任何低概率为 0。如果观察的所有概率都低于 50%,这意味着不会预测 class。也就是说,如果有 4 个 class,我们可以有 probs[0,:] = [0.2,0,0,0.4]
,所以 classes[0,:] = [0,0,0,0]
; preds[0] = 0
跟随。
显然这是模棱两可的,因为它与 probs[1,:]=[.9,0,.1,0]
-> classes[1,:] = [1,0,0,0]
-> 1 preds[1] = 0
的结果相同。这是使用 tensorflow 内置指标 class 时的问题,因为函数无法区分无预测和 class 0 中的预测。此代码演示了这一点:
import numpy as np
import tensorflow as tf
import pandas as pd
''' prepare '''
classes = 6
n = 100
# simulate data
np.random.seed(42)
simY = np.random.randint(0,classes,n) # pretend actual data
simYhat = np.random.randint(0,classes,n) # pretend pred data
truth = np.sum(simY == simYhat)/n
tabulate = pd.Series(simY).value_counts()
# create placeholders
lab = tf.placeholder(shape=simY.shape, dtype=tf.int32)
prd = tf.placeholder(shape=simY.shape, dtype=tf.int32)
AM_lab = tf.placeholder(shape=simY.shape,dtype=tf.int32)
AM_prd = tf.placeholder(shape=simY.shape,dtype=tf.int32)
# create one-hot encoding objects
simYOH = tf.one_hot(lab,classes)
# create accuracy objects
acc = tf.metrics.accuracy(lab,prd) # real accuracy with tf.metrics
accOHAM = tf.metrics.accuracy(AM_lab,AM_prd) # OHE argmaxed to labels - expected to be correct
# now setup to pretend we ran a model & generated OHE predictions all unclassed
z = np.zeros(shape=(n,classes),dtype=float)
testPred = tf.constant(z)
''' run it all '''
# setup
sess = tf.Session()
sess.run([tf.global_variables_initializer(),tf.local_variables_initializer()])
# real accuracy with tf.metrics
ACC = sess.run(acc,feed_dict = {lab:simY,prd:simYhat})
# OHE argmaxed to labels - expected to be correct, but is it?
l,p = sess.run([simYOH,testPred],feed_dict={lab:simY})
p = np.argmax(p,axis=-1)
ACCOHAM = sess.run(accOHAM,feed_dict={AM_lab:simY,AM_prd:p})
sess.close()
''' print stuff '''
print('Accuracy')
print('-known truth: %0.4f'%truth)
print('-on unprocessed data: %0.4f'%ACC[1])
print('-on faked unclassed labels data (s.b. 0%%): %0.4f'%ACCOHAM[1])
print('----------\nTrue Class Freqs:\n%r'%(tabulate.sort_index()/n))
输出为:
Accuracy
-known truth: 0.1500
-on unprocessed data: 0.1500
-on faked unclassed labels data (s.b. 0%): 0.1100
----------
True Class Freqs:
0 0.11
1 0.19
2 0.11
3 0.25
4 0.17
5 0.17
dtype: float64
Note freq for class 0 is same as faked accuracy...
我尝试将 preds
的值设置为 np.nan
用于没有预测的观察,但是 tf.metrics.accuracy
抛出 ValueError: cannot convert float NaN to integer
;也试过 np.inf
但得到 OverflowError: cannot convert float infinity to integer
.
如何将四舍五入的概率转换为 class 预测,同时适当地处理未预测的观察结果?
这已经很长时间没有答案了,所以我将 post 在这里作为我的解决方案的答案。我使用具有 3 个主要步骤的新函数将归属概率转换为 class 预测:
- 将任何 NaN 概率设置为 0
- 将任何低于
1/num_classes
的概率设置为 0 - 使用
np.argmax()
提取预测的classes,然后将任何未classed的观测值设置为统一选择的class
整数 class 标签的结果向量可以传递给 tf.metrics
函数。我的功能如下:
def predFromProb(classProbs):
'''
Take in as input an (m x p) matrix of m observations' class probabilities in
p classes and return an m-length vector of integer class labels (0...p-1).
Probabilities at or below 1/p are set to 0, as are NaNs; any unclassed
observations are randomly assigned to a class.
'''
numClasses = classProbs.shape[1]
# zero out class probs that are at or below chance, or NaN
probs = classProbs.copy()
probs[np.isnan(probs)] = 0
probs = probs*(probs > 1/numClasses)
# find any un-classed observations
unpred = ~np.any(probs,axis=1)
# get the predicted classes
preds = np.argmax(probs,axis=1)
# randomly classify un-classed observations
rnds = np.random.randint(0,numClasses,np.sum(unpred))
preds[unpred] = rnds
return preds