Numpy 的 'where' 函数行为不明确

Question

我正在尝试创建一个 pandas 数据框来描述训练数据集中每个特征的 NULL 值百分比，并给出每个数字特征与因变量的相关系数。这是我的代码：

#Count nulls and compute share
null_cols = pd.DataFrame(train.isnull().sum().sort_values(ascending = False))
null_cols.columns = ['NullCount']
null_cols.index.name = 'Features'
null_cols['Share'] = np.round(100 * null_cols['NullCount'] / len(train), decimals=2)

#Compute correlation of each numeric feature with respect to the dependent variable
for row in null_cols.index:
    print(row, np.where(is_numeric_dtype(train[row]), str(train['Dependent Var'].corr(train[row])), ''))
    #print(row, np.where(is_numeric_dtype(train[row]), str(train[row].isnull().sum()), ''))

在运行上，我得到 TypeError: unsupported operand type(s) for /: 'str' and 'int'。我认为此错误来自 corr 函数，但为什么运行在非数字数据类型的 'where' 函数中。不应该落入else吗？

注释的代码行，即

print(row, np.where(is_numeric_dtype(train[row]),str(train[row].isnull().sum()),''))

运行正常，没有错误，'where' 函数按预期工作。

Answer 1

让我们回顾一下如何 Python 运行这段代码：

np.where(is_numeric_dtype(train[row]), str(train['Dependent Var'].corr(train[row])), '')

where 是一个函数。 Python 在将它们传递给函数之前评估函数的所有参数。所以它评估：

is_numeric_dtype(train[row])
str(train['Dependent Var'].corr(train[row]))
''

在调用 where.

之前

如果您只能运行 corr 某些类型的值，则 np.where 不是可用的工具。我想你需要：

for row in null_cols.index:
    if is_numeric_dtype(train[row]):
        print(row, str(train['Dependent Var'].corr(train[row])))
    else:
        print('')

Numpy 的 'where' 函数行为不明确

Numpy's 'where' function behaving ambiguously

python

numpy

correlation

pandas