迭代行和 select 都在最大值和最小值之间
Iterate over rows and select all between max and min
假设我有一个 Pandas DataFrame,如下所示:
ID update_time cap date diff
A 05/05/21 1:45 136 05/05/21 136
A 05/05/21 1:50 0 05/05/21 -136
A 05/05/21 2:10 1 05/05/21 1
A 05/05/21 2:15 0 05/05/21 -1
A 05/05/21 3:35 1 05/05/21 1
A 05/05/21 3:40 0 05/05/21 -1
A 05/05/21 14:40 158 06/05/21 158
A 05/05/21 14:45 0 06/05/21 -158
A 05/05/21 15:10 1 06/05/21 1
A 07/05/21 9:49 0 07/05/21 -1
B 05/05/21 1:10 500 05/05/21 500
B 05/05/21 1:15 63 05/05/21 -437
B 05/05/21 1:20 0 05/05/21 -63
B 05/05/21 1:35 8 05/05/21 8
B 05/05/21 1:40 0 05/05/21 -8
B 05/05/21 1:45 3 05/05/21 3
B 05/05/21 1:50 0 05/05/21 -3
B 05/05/21 14:35 255 06/05/21 255
B 05/05/21 14:40 0 06/05/21 -255
我想将每个 ID 和日期中第一次下降到 0 后出现的任何 cap
值下降。关于如何实现这一目标的任何指示?我在下面附上了预期的输出。
ID update_time cap date diff
A 05/05/21 1:45 136 05/05/21 136
A 05/05/21 1:50 0 05/05/21 -136
A 05/05/21 14:40 158 06/05/21 158
A 05/05/21 14:45 0 06/05/21 -158
B 05/05/21 1:10 500 05/05/21 500
B 05/05/21 1:15 63 05/05/21 -437
B 05/05/21 1:20 0 05/05/21 -63
B 05/05/21 14:35 255 06/05/21 255
B 05/05/21 14:40 0 06/05/21 -255
如有指点,将不胜感激!
您需要先对 ID
和 date
执行 groupby
,因为您希望每个唯一的 ID-日期组合都 "all rows before a drop to 0 occurs in caps"
。然后我们将应用一个自定义函数,该函数选择第一个零出现之前的所有行。该函数考虑了边缘情况,即对于只出现一次的 ID 日期,不会出现“降为 0”。
请注意,我只使用了您的 DataFrame 的相关部分。
import numpy as np
import pandas as pd
## recreate the relevant portion of your DataFrame
df = pd.DataFrame({
'ID':['A']*10+['B']*9,
'cap':[136,0,1,0,1,0,158,0,1,0,500,63,0,8,0,3,0,255,0],
'date':['05/05/21']*6+['06/05/21']*3+['07/05/21']+['05/05/21']*7+['06/05/21']*2
})
## get the caps values before the first occurrence of a zero
def get_caps_before_zero(df_column):
## for an ID-cap groupby of length 1, no "drop" to zero can occur, so return an empty DataFrame
if len(df_column) == 1:
return df_column.iloc[0:0]
else:
idx_first_zero = np.where(df_column == 0)[0].min() + 1
return df_column.iloc[:idx_first_zero]
df_subset = (df.groupby(['ID','date'])
.apply(lambda x: get_caps_before_zero(x['cap']))
.reset_index()
.drop(columns='level_2')
)
输出:
>>> df_subset
ID date cap
0 A 05/05/21 136
1 A 05/05/21 0
2 A 06/05/21 158
3 A 06/05/21 0
4 B 05/05/21 500
5 B 05/05/21 63
6 B 05/05/21 0
7 B 06/05/21 255
8 B 06/05/21 0
假设我有一个 Pandas DataFrame,如下所示:
ID update_time cap date diff
A 05/05/21 1:45 136 05/05/21 136
A 05/05/21 1:50 0 05/05/21 -136
A 05/05/21 2:10 1 05/05/21 1
A 05/05/21 2:15 0 05/05/21 -1
A 05/05/21 3:35 1 05/05/21 1
A 05/05/21 3:40 0 05/05/21 -1
A 05/05/21 14:40 158 06/05/21 158
A 05/05/21 14:45 0 06/05/21 -158
A 05/05/21 15:10 1 06/05/21 1
A 07/05/21 9:49 0 07/05/21 -1
B 05/05/21 1:10 500 05/05/21 500
B 05/05/21 1:15 63 05/05/21 -437
B 05/05/21 1:20 0 05/05/21 -63
B 05/05/21 1:35 8 05/05/21 8
B 05/05/21 1:40 0 05/05/21 -8
B 05/05/21 1:45 3 05/05/21 3
B 05/05/21 1:50 0 05/05/21 -3
B 05/05/21 14:35 255 06/05/21 255
B 05/05/21 14:40 0 06/05/21 -255
我想将每个 ID 和日期中第一次下降到 0 后出现的任何 cap
值下降。关于如何实现这一目标的任何指示?我在下面附上了预期的输出。
ID update_time cap date diff
A 05/05/21 1:45 136 05/05/21 136
A 05/05/21 1:50 0 05/05/21 -136
A 05/05/21 14:40 158 06/05/21 158
A 05/05/21 14:45 0 06/05/21 -158
B 05/05/21 1:10 500 05/05/21 500
B 05/05/21 1:15 63 05/05/21 -437
B 05/05/21 1:20 0 05/05/21 -63
B 05/05/21 14:35 255 06/05/21 255
B 05/05/21 14:40 0 06/05/21 -255
如有指点,将不胜感激!
您需要先对 ID
和 date
执行 groupby
,因为您希望每个唯一的 ID-日期组合都 "all rows before a drop to 0 occurs in caps"
。然后我们将应用一个自定义函数,该函数选择第一个零出现之前的所有行。该函数考虑了边缘情况,即对于只出现一次的 ID 日期,不会出现“降为 0”。
请注意,我只使用了您的 DataFrame 的相关部分。
import numpy as np
import pandas as pd
## recreate the relevant portion of your DataFrame
df = pd.DataFrame({
'ID':['A']*10+['B']*9,
'cap':[136,0,1,0,1,0,158,0,1,0,500,63,0,8,0,3,0,255,0],
'date':['05/05/21']*6+['06/05/21']*3+['07/05/21']+['05/05/21']*7+['06/05/21']*2
})
## get the caps values before the first occurrence of a zero
def get_caps_before_zero(df_column):
## for an ID-cap groupby of length 1, no "drop" to zero can occur, so return an empty DataFrame
if len(df_column) == 1:
return df_column.iloc[0:0]
else:
idx_first_zero = np.where(df_column == 0)[0].min() + 1
return df_column.iloc[:idx_first_zero]
df_subset = (df.groupby(['ID','date'])
.apply(lambda x: get_caps_before_zero(x['cap']))
.reset_index()
.drop(columns='level_2')
)
输出:
>>> df_subset
ID date cap
0 A 05/05/21 136
1 A 05/05/21 0
2 A 06/05/21 158
3 A 06/05/21 0
4 B 05/05/21 500
5 B 05/05/21 63
6 B 05/05/21 0
7 B 06/05/21 255
8 B 06/05/21 0