Pandas 在间隔中查找值
Pandas find value in interval
在 pandas 中,如果我在数据框 (transdf) 中有交易数据,如下所示:
OrderId, ShippmentSegmentsDays
1 , 1
2 , 3
3 , 4
4 , 10
我还有另一个指定间隔的 df (segmentdf):
ShippmentSegmentDaysStart , ShippmentSegmentDaysEnd , ShippmentSegment
-9999999 , 0 , 'On-Time'
0 , 1 , '1 day late'
1 , 2 , '2 days late'
2 , 3 , '3 days late'
3 , 9999999 , '>3 days late'
而且我需要再添加一列,它基于 "ShippmentSegmentsDays" 和 "ShippmentSegment"。所以基本上对于 "transdf" 中的每一行,我需要检查 "ShippmentSegmentsDays" 值,其中可以从 "segmentdf"
中找到间隔
因此 "transdf" 应该如下所示:
OrderId, ShippmentSegmentsDays, ShippmentSegment
1 , 1 , '1 day late'
2 , 0 , 'On-Time'
3 , 4 , '>3 days late'
4 , 10 , '>3 days late'
任何人都可以告诉我如何处理这种情况吗?
谢谢!
斯特凡
如果您知道 segmentdf
中设置的规则是静态的并且不会更改,则可以使用 pandas.apply(args)
将函数应用于 transdf
数据框中的每一行.也许以下代码片段可以帮助您。我还没有对此进行测试,所以要小心,但我认为它应该能让你朝着正确的方向开始。
# create a series of just the data from the 'ShippmentSegmentDays' column
seg_days_df = trends['ShippmentSegmentDays']
# Create a new column, 'ShippmentSegment', in 'transdf' data frame by calling
# our utility function on the series created above.
transdf['ShippmentSegment'] = seg_days_df.apply(calc_ship_segment, axis=1)
# Utility function to define the rules set in the 'segmentdf' data frame
def calc_ship_segment(num):
if not num:
return 'On Time'
elif num == 1:
return '1 Day Late'
elif num == 2:
return '2 Days Late'
elif num == 3:
return '3 Days Late'
else:
return '>3 Days Late'
旧 post,但我遇到了同样的问题。 Pandas 提供了一个对我有用的 Interval function。
在 pandas 中,如果我在数据框 (transdf) 中有交易数据,如下所示:
OrderId, ShippmentSegmentsDays
1 , 1
2 , 3
3 , 4
4 , 10
我还有另一个指定间隔的 df (segmentdf):
ShippmentSegmentDaysStart , ShippmentSegmentDaysEnd , ShippmentSegment
-9999999 , 0 , 'On-Time'
0 , 1 , '1 day late'
1 , 2 , '2 days late'
2 , 3 , '3 days late'
3 , 9999999 , '>3 days late'
而且我需要再添加一列,它基于 "ShippmentSegmentsDays" 和 "ShippmentSegment"。所以基本上对于 "transdf" 中的每一行,我需要检查 "ShippmentSegmentsDays" 值,其中可以从 "segmentdf"
中找到间隔因此 "transdf" 应该如下所示:
OrderId, ShippmentSegmentsDays, ShippmentSegment
1 , 1 , '1 day late'
2 , 0 , 'On-Time'
3 , 4 , '>3 days late'
4 , 10 , '>3 days late'
任何人都可以告诉我如何处理这种情况吗?
谢谢! 斯特凡
如果您知道 segmentdf
中设置的规则是静态的并且不会更改,则可以使用 pandas.apply(args)
将函数应用于 transdf
数据框中的每一行.也许以下代码片段可以帮助您。我还没有对此进行测试,所以要小心,但我认为它应该能让你朝着正确的方向开始。
# create a series of just the data from the 'ShippmentSegmentDays' column
seg_days_df = trends['ShippmentSegmentDays']
# Create a new column, 'ShippmentSegment', in 'transdf' data frame by calling
# our utility function on the series created above.
transdf['ShippmentSegment'] = seg_days_df.apply(calc_ship_segment, axis=1)
# Utility function to define the rules set in the 'segmentdf' data frame
def calc_ship_segment(num):
if not num:
return 'On Time'
elif num == 1:
return '1 Day Late'
elif num == 2:
return '2 Days Late'
elif num == 3:
return '3 Days Late'
else:
return '>3 Days Late'
旧 post,但我遇到了同样的问题。 Pandas 提供了一个对我有用的 Interval function。