如何汇总一列中两个单独列的最小值和最大值
how to summarize min and max of two separate columns in one column
我有这样一个债券市场的数据:
Id row Date BuyPrice SellPrice
1 1 2017-10-30 94520 0
1 2 2017-10-30 94538 0
1 3 2017-10-30 94609 0
1 4 2017-10-30 94615 0
1 5 2017-10-30 94617 0
1 1 2017-09-20 99100 99059
1 1 2017-09-20 98100 99090
2 1 2010-11-01 99890 100000
2 2 2010-11-01 99899 100000
2 3 2010-11-01 99901 99899
2 4 2010-11-01 99920 99850
2 5 2010-11-01 99933 99848
我想为每个 ID 选择最低卖出价和最高买入价并计算它们的减法,但如果最低卖出价或价格中的任何一个为零,我想例外并删除该日期。
并按日期给每个 id 一个索引。意思是第一天给1,第二天给2,依此类推。
最后的数据应该是这样的:
Id Date highest buy price lowest sell price NBBO(highest buy price - lowestSellPrice)Index
1 2017-10-30 94520 0 NaN 1
1 2017-09-20 99100 99059 41 2
2 2017-11-01 99890 99848 42 1
您可以使用 groupby
and aggregate min
and max first and then numpy.where
for NaN
s by condition. Last use cumcount
:
df = df.groupby(['Id','Date'], sort=False).agg({'BuyPrice':'max','SellPrice':'min'})
df['NBBO'] = np.where(df[['BuyPrice', 'SellPrice']].eq(0).any(1),
np.nan,
df['BuyPrice'] - df['SellPrice'])
df['index'] = df.groupby(level=0).cumcount() + 1
d = {'BuyPrice':'highest buy price','SellPrice':'lowest sell price'}
df = df.reset_index().rename(columns=d)
print (df)
Id Date highest buy price lowest sell price NBBO index
0 1 2017-10-30 94617 0 NaN 1
1 1 2017-09-20 99100 99059 41.0 2
2 2 2010-11-01 99933 99848 85.0 1
详情:
#comapre with 0 eq is same as ==
print (df[['BuyPrice', 'SellPrice']].eq(0))
BuyPrice SellPrice
Id Date
1 2017-10-30 False True
2017-09-20 False False
2 2010-11-01 False False
#get at least one True per row by any(1)
print (df[['BuyPrice', 'SellPrice']].eq(0).any(1))
Id Date
1 2017-10-30 True
2017-09-20 False
2 2010-11-01 False
dtype: bool
我有这样一个债券市场的数据:
Id row Date BuyPrice SellPrice
1 1 2017-10-30 94520 0
1 2 2017-10-30 94538 0
1 3 2017-10-30 94609 0
1 4 2017-10-30 94615 0
1 5 2017-10-30 94617 0
1 1 2017-09-20 99100 99059
1 1 2017-09-20 98100 99090
2 1 2010-11-01 99890 100000
2 2 2010-11-01 99899 100000
2 3 2010-11-01 99901 99899
2 4 2010-11-01 99920 99850
2 5 2010-11-01 99933 99848
我想为每个 ID 选择最低卖出价和最高买入价并计算它们的减法,但如果最低卖出价或价格中的任何一个为零,我想例外并删除该日期。
并按日期给每个 id 一个索引。意思是第一天给1,第二天给2,依此类推。
最后的数据应该是这样的:
Id Date highest buy price lowest sell price NBBO(highest buy price - lowestSellPrice)Index
1 2017-10-30 94520 0 NaN 1
1 2017-09-20 99100 99059 41 2
2 2017-11-01 99890 99848 42 1
您可以使用 groupby
and aggregate min
and max first and then numpy.where
for NaN
s by condition. Last use cumcount
:
df = df.groupby(['Id','Date'], sort=False).agg({'BuyPrice':'max','SellPrice':'min'})
df['NBBO'] = np.where(df[['BuyPrice', 'SellPrice']].eq(0).any(1),
np.nan,
df['BuyPrice'] - df['SellPrice'])
df['index'] = df.groupby(level=0).cumcount() + 1
d = {'BuyPrice':'highest buy price','SellPrice':'lowest sell price'}
df = df.reset_index().rename(columns=d)
print (df)
Id Date highest buy price lowest sell price NBBO index
0 1 2017-10-30 94617 0 NaN 1
1 1 2017-09-20 99100 99059 41.0 2
2 2 2010-11-01 99933 99848 85.0 1
详情:
#comapre with 0 eq is same as ==
print (df[['BuyPrice', 'SellPrice']].eq(0))
BuyPrice SellPrice
Id Date
1 2017-10-30 False True
2017-09-20 False False
2 2010-11-01 False False
#get at least one True per row by any(1)
print (df[['BuyPrice', 'SellPrice']].eq(0).any(1))
Id Date
1 2017-10-30 True
2017-09-20 False
2 2010-11-01 False
dtype: bool