通过使用 iloc 和布尔掩码设置数据帧（数据帧中多个不同索引（行）值的掩码）

Question

我想根据 Nan 值在另一个 pandas 数据框中的位置，将 pandas 数据框中的值更改为 Nan。我想在数组中的多个位置执行此操作。因此，如果它位于索引（行）值相同的数组的开头，它就可以工作。如果我想将它设置为在箭头中偏移 20 行然后偏移 40 行，我该怎么做。

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': list(range(0,100)), 'B': list(range(0,100))})


df2_null = df2.isnull()
df[df2_null] = np.NaN


df.iloc[0:4]

df2 = pd.DataFrame({'A': [1, None, 1, 1], 'B': [None, 1, None, 1]})

df2_null = df2.isnull()
df[df2_null] = np.NaN
df.iloc[0:4]

我如何让它在下面工作？因为第一行给出错误，第二行重现所有 np.Nan 无论我在哪里执行它。我还没弄清楚该怎么做。

df.iloc[20:24][df2_null] = np.Nan
df.loc[df[df2_null].iloc[20:24].index] = np.NaN

Answer 1

我认为需要 DataFrame.iloc and DataFrame.mask，它默认通过布尔掩码将值设置为 NaN（只需要使用布尔掩码选择的 df 的相同行数和列数）。

另外 df2_null 掩码被转换为 numpy 数组以避免按索引对齐。

df.iloc[20:24] = df.iloc[20:24].mask(df2_null.values)
print (df.iloc[15:30])
       A     B
15  15.0  15.0
16  16.0  16.0
17  17.0  17.0
18  18.0  18.0
19  19.0  19.0
20  20.0   NaN
21   NaN  21.0
22  22.0   NaN
23  23.0  23.0
24  24.0  24.0
25  25.0  25.0
26  26.0  26.0
27  27.0  27.0
28  28.0  28.0
29  29.0  29.0

带有 numpy.where 的 Numpy 解决方案，与 pandas 解决方案相同的原理：

df = pd.DataFrame({'A': list(range(0,30)), 'B': list(range(0,30))})

arr = df.values.astype(float)
arr[20:24] = np.where(df2_null.values, np.nan, arr[20:24])
print (arr)
[[ 0.  0.]
 [ 1.  1.]
 [ 2.  2.]
 [ 3.  3.]
 [ 4.  4.]
 [ 5.  5.]
 [ 6.  6.]
 [ 7.  7.]
 [ 8.  8.]
 [ 9.  9.]
 [10. 10.]
 [11. 11.]
 [12. 12.]
 [13. 13.]
 [14. 14.]
 [15. 15.]
 [16. 16.]
 [17. 17.]
 [18. 18.]
 [19. 19.]
 [20. nan]
 [nan 21.]
 [22. nan]
 [23. 23.]
 [24. 24.]
 [25. 25.]
 [26. 26.]
 [27. 27.]
 [28. 28.]
 [29. 29.]]

通过使用 iloc 和布尔掩码设置数据帧（数据帧中多个不同索引（行）值的掩码）

Setting dataframe by using both iloc and a boolean mask (mask at multiple different index (row) values in the dataframe)

python

dataframe

pandas

pandas-groupby

array-broadcasting