根据另一个数据框中的条件从主数据框中删除所有先前的行
Remove all previous rows from primary dataframe based on condition from another dataframe
我有两个数据框 df1(主数据框)和 df2。我想根据 df2 的条件从 df1 中删除所有先前的行。我的数据框如下所示:
df2
tradingsymbol Time
0 BANKNIFTY2220339500CE 12:54:40
1 BANKNIFTY2220340000CE 12:53:33
2 BANKNIFTY2220340500CE 12:51:50
df1.head(20)
tradingsymbol Time last_price
0 BANKNIFTY2220339500CE 09:20:10 84.40
1 BANKNIFTY2220339500CE 09:20:10 85.95
2 BANKNIFTY2220339500CE 12:55:60 84.70 <-Valid Row
3 BANKNIFTY2220339500CE 13:22:10 86.35 <-Valid Row
4 BANKNIFTY2220339500CE 14:55:40 87.10 <-Valid Row
5 BANKNIFTY2220340000CE 09:20:13 88.95
6 BANKNIFTY2220340000CE 09:20:13 88.80
7 BANKNIFTY2220340000CE 09:20:14 88.30
8 BANKNIFTY2220340000CE 14:23:11 87.30 <-Valid Row
9 BANKNIFTY2220340500CE 09:20:15 90.15
10 BANKNIFTY2220340500CE 09:20:16 90.10
11 BANKNIFTY2220340500CE 09:20:17 91.05
12 BANKNIFTY2220340500CE 09:20:18 90.95
我想删除 df1 中每个交易品种的 df2 时间列中时间之前的所有行。我想要如下结果:
tradingsymbol Time last_price
2 BANKNIFTY2220339500CE 12:55:60 84.70
3 BANKNIFTY2220339500CE 13:22:10 86.35
4 BANKNIFTY2220339500CE 14:55:40 87.10
8 BANKNIFTY2220340000CE 14:23:11 87.30
如果列元素还不是日期时间格式,您可以转换:
df["Time"] = pd.to_datetime(df["Time"]).dt.time
或者,您可以在阅读时直接设置此选项:
df = pd.read_csv(
filename,
parse_dates=["Time"],
date_parser=lambda x: pd.to_datetime(x, format="%H:%M:%S").time()
)
为两个数据帧完成此操作后,过滤数据帧的一种方法是遍历 df2 中的所有行,并且对于每一行,删除满足 df1 中条件的行。所以:
for index, row in df2.iterrows():
df1.drop(
df1[(df1.tradingsymbol == row["tradingsymbol"]) & (df1.Time < row["Time"])].index,
inplace=True
)
我有两个数据框 df1(主数据框)和 df2。我想根据 df2 的条件从 df1 中删除所有先前的行。我的数据框如下所示:
df2
tradingsymbol Time
0 BANKNIFTY2220339500CE 12:54:40
1 BANKNIFTY2220340000CE 12:53:33
2 BANKNIFTY2220340500CE 12:51:50
df1.head(20)
tradingsymbol Time last_price
0 BANKNIFTY2220339500CE 09:20:10 84.40
1 BANKNIFTY2220339500CE 09:20:10 85.95
2 BANKNIFTY2220339500CE 12:55:60 84.70 <-Valid Row
3 BANKNIFTY2220339500CE 13:22:10 86.35 <-Valid Row
4 BANKNIFTY2220339500CE 14:55:40 87.10 <-Valid Row
5 BANKNIFTY2220340000CE 09:20:13 88.95
6 BANKNIFTY2220340000CE 09:20:13 88.80
7 BANKNIFTY2220340000CE 09:20:14 88.30
8 BANKNIFTY2220340000CE 14:23:11 87.30 <-Valid Row
9 BANKNIFTY2220340500CE 09:20:15 90.15
10 BANKNIFTY2220340500CE 09:20:16 90.10
11 BANKNIFTY2220340500CE 09:20:17 91.05
12 BANKNIFTY2220340500CE 09:20:18 90.95
我想删除 df1 中每个交易品种的 df2 时间列中时间之前的所有行。我想要如下结果:
tradingsymbol Time last_price
2 BANKNIFTY2220339500CE 12:55:60 84.70
3 BANKNIFTY2220339500CE 13:22:10 86.35
4 BANKNIFTY2220339500CE 14:55:40 87.10
8 BANKNIFTY2220340000CE 14:23:11 87.30
如果列元素还不是日期时间格式,您可以转换:
df["Time"] = pd.to_datetime(df["Time"]).dt.time
或者,您可以在阅读时直接设置此选项:
df = pd.read_csv(
filename,
parse_dates=["Time"],
date_parser=lambda x: pd.to_datetime(x, format="%H:%M:%S").time()
)
为两个数据帧完成此操作后,过滤数据帧的一种方法是遍历 df2 中的所有行,并且对于每一行,删除满足 df1 中条件的行。所以:
for index, row in df2.iterrows():
df1.drop(
df1[(df1.tradingsymbol == row["tradingsymbol"]) & (df1.Time < row["Time"])].index,
inplace=True
)