百分位数在 sklearn 的部分依赖图中的作用是什么？

Question

percentiles: tuple of float, default=(0.05, 0.95) The lower and upper percentile used to create the extreme values for the PDP axes. Must be in [0, 1].

但是，我无法完全理解这句话的意思。这是否意味着部分依赖图是使用从 5prc 到 95prc 的数据计算的，从而忽略了该范围之外的数据点的贡献？

我应该如何解释它以及增加它的潜在问题是什么（比如 0.01、0.99）？

Answer 1

用于创建绘图的网格，估算器不变。您可以从用于计算值的 code 中看到更多信息。基本上它计算目标的变化值，同时保持其他不变。

下面我叠加了两个不同百分位数的图，你可以看到一个基本上是另一个的扩展：

import matplotlib.pyplot as plt
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.inspection import PartialDependenceDisplay
X, y = make_friedman1(random_state=321)
clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)
disp1 = PartialDependenceDisplay.from_estimator(estimator = clf, X = X, features = [3,2],
                                                percentiles=(0.05,0.95))
PartialDependenceDisplay.from_estimator(estimator = clf, X = X, features = [3,2],
                                        percentiles=(0.3,0.7),ax = disp1.axes_,pd_line_kw={'color':'k'})

百分位数在 sklearn 的部分依赖图中的作用是什么？

What is the role of percentiles in sklearn's partial dependence plots?

python

scikit-learn