Lambda 应用：引用其他行和列

Question

我正在尝试根据给定单元格周围的值更改数据集中给定列的值。考虑以下数据：

Data = {'Col1': [5593 , 5114 , 6803 , 2175 , 2175] , 'Col2': [2879 , 1176 , 7114 , 8677 , 0]}
df = pd.DataFrame(data = Data)
df.head()

Col1    Col2
0   5593    2879
1   5114    1176
2   6803    7114
3   2175    8677
4   2175    0

我创建了一个新列来存储新值：

Data['Col3'] = Data['Col2']

我想创建一个 apply - lambda 函数，它执行以下操作：如果 Col3 为零且 Col1 的先前值等于 Col1 的当前值，即：(x.shift(-2 , -1) == x.shift(-2, 0)，则实际值Col3 的值应为 Col2 的先前值，即 x.shift(-1 , -1)。否则 Col3 的值应保持不变。

我试过类似下面的东西（伪代码）：

df['Col3'] = df['Col3'].apply(lambda x: x.shift(-1 , -1) if (x == 0 and x.shift(-2 , -1) == x.shift(-2, 0)) else x)

对于我的数据的这个特定子集应该如下所示：

Col1    Col2    Col3
0   5593    2879    2879
1   5114    1176    1176
2   6803    7114    7114
3   2175    8677    8677
4   2175    0       8677

我不确定 shift 是否是正确的使用方法（该系列包含 NaN），但希望思路清晰。

我的真实数据集非常大，所以我希望操作在很多行上都能很好地执行。

Answer 1

IIUC，您可以将 np.where 与移位的列一起使用：

df['Col3'] = np.where(df['Col1'].shift().eq(df['Col1']), df['Col2'].shift(), df['Col2'])
print(df)

输出

   Col1  Col2    Col3
0  5593  2879  2879.0
1  5114  1176  1176.0
2  6803  7114  7114.0
3  2175  8677  8677.0
4  2175     0  8677.0

下面是一步一步的解释和注释：

# create a mask, where is True if the consecutive values in Col1 are equal
mask = df['Col1'].shift().eq(df['Col1'])

# choose between the shifted Col2 (the previous value) and Col2 using the mask
df['Col3'] = np.where(mask, df['Col2'].shift(), df['Col2'])

print(df)

Lambda 应用：引用其他行和列

Lambda Apply : Referencing other rows and columns

python

lambda

apply

pandas