灵敏度与正预测值 - 哪个最好?
Sensitivity Vs Positive Predicted Value - which is best?
我正在尝试在 class 不平衡数据集(二进制 - 1:25% 和 0:75%)上构建模型。尝试使用分类算法和集成技术。我对以下两个概念有点困惑,因为我对预测更多 1 更感兴趣。
1. Should i give preference to Sensitivity or Positive Predicted Value.
Some ensemble techniques give maximum 45% of sensitivity and low Positive Predicted Value.
And some give 62% of Positive Predicted Value and low Sensitivity.
2. My dataset has around 450K observations and 250 features.
After power test i took 10K observations by Simple random sampling. While selecting
variable importance using ensemble technique's the features
are different compared to the features when i tried with 150K observations.
Now with my intuition and domain knowledge i felt features that came up as important in
150K observation sample are more relevant. what is the best practice?
3. Last, can i use the variable importance generated by RF in other ensemple
techniques to predict the accuracy?
你能帮我解决一下吗,我有点困惑
灵敏度和阳性预测值之间的偏好取决于您分析的最终目标。这两个值之间的区别在这里得到了很好的解释:https://onlinecourses.science.psu.edu/stat507/node/71/
总而言之,这是从两个不同角度看待结果的两个衡量标准。灵敏度使您有可能在测试中找到 "condition"。阳性预测值着眼于 "condition" 在接受测试的人中的流行程度。
准确性取决于您的分类结果:它被定义为(真阳性 + 真阴性)/(总计),而不是由 RF 生成的可变重要性。
此外,可以补偿数据集中的不平衡,参见 https://stats.stackexchange.com/questions/264798/random-forest-unbalanced-dataset-for-training-test
我正在尝试在 class 不平衡数据集(二进制 - 1:25% 和 0:75%)上构建模型。尝试使用分类算法和集成技术。我对以下两个概念有点困惑,因为我对预测更多 1 更感兴趣。
1. Should i give preference to Sensitivity or Positive Predicted Value.
Some ensemble techniques give maximum 45% of sensitivity and low Positive Predicted Value.
And some give 62% of Positive Predicted Value and low Sensitivity.
2. My dataset has around 450K observations and 250 features.
After power test i took 10K observations by Simple random sampling. While selecting
variable importance using ensemble technique's the features
are different compared to the features when i tried with 150K observations.
Now with my intuition and domain knowledge i felt features that came up as important in
150K observation sample are more relevant. what is the best practice?
3. Last, can i use the variable importance generated by RF in other ensemple
techniques to predict the accuracy?
你能帮我解决一下吗,我有点困惑
灵敏度和阳性预测值之间的偏好取决于您分析的最终目标。这两个值之间的区别在这里得到了很好的解释:https://onlinecourses.science.psu.edu/stat507/node/71/ 总而言之,这是从两个不同角度看待结果的两个衡量标准。灵敏度使您有可能在测试中找到 "condition"。阳性预测值着眼于 "condition" 在接受测试的人中的流行程度。
准确性取决于您的分类结果:它被定义为(真阳性 + 真阴性)/(总计),而不是由 RF 生成的可变重要性。
此外,可以补偿数据集中的不平衡,参见 https://stats.stackexchange.com/questions/264798/random-forest-unbalanced-dataset-for-training-test