如何汇总一列中两个单独列的最小值和最大值

how to summarize min and max of two separate columns in one column

我有这样一个债券市场的数据:

Id   row      Date       BuyPrice    SellPrice
1    1      2017-10-30    94520       0
1    2      2017-10-30    94538       0
1    3      2017-10-30    94609       0
1    4      2017-10-30    94615       0
1    5      2017-10-30    94617       0
1    1      2017-09-20    99100       99059
1    1      2017-09-20    98100       99090
2    1      2010-11-01    99890       100000
2    2      2010-11-01    99899       100000
2    3      2010-11-01    99901       99899
2    4      2010-11-01    99920       99850
2    5      2010-11-01    99933       99848

我想为每个 ID 选择最低卖出价和最高买入价并计算它们的减法,但如果最低卖出价或价格中的任何一个为零,我想例外并删除该日期。

并按日期给每个 id 一个索引。意思是第一天给1,第二天给2,依此类推。

最后的数据应该是这样的:

Id    Date    highest buy price     lowest sell price       NBBO(highest buy price - lowestSellPrice)Index

1     2017-10-30    94520                  0                       NaN                                 1
1     2017-09-20    99100                  99059                   41                                  2      
2     2017-11-01    99890                  99848                   42                                  1

您可以使用 groupby and aggregate min and max first and then numpy.where for NaNs by condition. Last use cumcount:

df = df.groupby(['Id','Date'], sort=False).agg({'BuyPrice':'max','SellPrice':'min'})
df['NBBO'] = np.where(df[['BuyPrice', 'SellPrice']].eq(0).any(1), 
                      np.nan, 
                      df['BuyPrice'] -  df['SellPrice'])
df['index'] =  df.groupby(level=0).cumcount() + 1

d = {'BuyPrice':'highest buy price','SellPrice':'lowest sell price'}
df = df.reset_index().rename(columns=d)
print (df)

   Id        Date  highest buy price  lowest sell price  NBBO  index
0   1  2017-10-30              94617                  0   NaN      1
1   1  2017-09-20              99100              99059  41.0      2
2   2  2010-11-01              99933              99848  85.0      1

详情:

#comapre with 0 eq is same as ==
print (df[['BuyPrice', 'SellPrice']].eq(0))
               BuyPrice  SellPrice
Id Date                           
1  2017-10-30     False       True
   2017-09-20     False      False
2  2010-11-01     False      False

#get at least one True per row by any(1)
print (df[['BuyPrice', 'SellPrice']].eq(0).any(1))
Id  Date      
1   2017-10-30     True
    2017-09-20    False
2   2010-11-01    False
dtype: bool