根据另一个数据框中的条件从主数据框中删除所有先前的行

Remove all previous rows from primary dataframe based on condition from another dataframe

我有两个数据框 df1(主数据框)和 df2。我想根据 df2 的条件从 df1 中删除所有先前的行。我的数据框如下所示:

df2

           tradingsymbol      Time
0  BANKNIFTY2220339500CE  12:54:40
1  BANKNIFTY2220340000CE  12:53:33
2  BANKNIFTY2220340500CE  12:51:50

df1.head(20)

            tradingsymbol      Time  last_price
0   BANKNIFTY2220339500CE  09:20:10       84.40
1   BANKNIFTY2220339500CE  09:20:10       85.95
2   BANKNIFTY2220339500CE  12:55:60       84.70 <-Valid Row
3   BANKNIFTY2220339500CE  13:22:10       86.35 <-Valid Row
4   BANKNIFTY2220339500CE  14:55:40       87.10 <-Valid Row

5   BANKNIFTY2220340000CE  09:20:13       88.95
6   BANKNIFTY2220340000CE  09:20:13       88.80
7   BANKNIFTY2220340000CE  09:20:14       88.30
8   BANKNIFTY2220340000CE  14:23:11       87.30 <-Valid Row

9   BANKNIFTY2220340500CE  09:20:15       90.15
10  BANKNIFTY2220340500CE  09:20:16       90.10
11  BANKNIFTY2220340500CE  09:20:17       91.05
12  BANKNIFTY2220340500CE  09:20:18       90.95

我想删除 df1 中每个交易品种的 df2 时间列中时间之前的所有行。我想要如下结果:

            tradingsymbol      Time  last_price
2   BANKNIFTY2220339500CE  12:55:60       84.70
3   BANKNIFTY2220339500CE  13:22:10       86.35
4   BANKNIFTY2220339500CE  14:55:40       87.10
8   BANKNIFTY2220340000CE  14:23:11       87.30

如果列元素还不是日期时间格式,您可以转换:

df["Time"] = pd.to_datetime(df["Time"]).dt.time

或者,您可以在阅读时直接设置此选项:

df = pd.read_csv(
    filename,
    parse_dates=["Time"],
    date_parser=lambda x: pd.to_datetime(x, format="%H:%M:%S").time()
)

为两个数据帧完成此操作后,过滤数据帧的一种方法是遍历 df2 中的所有行,并且对于每一行,删除满足 df1 中条件的行。所以:

for index, row in df2.iterrows():
    df1.drop(
        df1[(df1.tradingsymbol == row["tradingsymbol"]) & (df1.Time < row["Time"])].index,
        inplace=True
    )