Python:如何获取条件存在的pandas.series元素的位置?
Python: How to get position of pandas.series element where conditions exist?
我有一个 numpy.ndarrays: x,y
:
>>> x = np.ndarray(shape=(10,), buffer=np.array([0.9902, 0.9394, 0.839, 0.8574, 0.9174, 0.8742, 0.8955, 0.9196, 0.9388, 0.9602]), dtype=float)
[0.9902 0.9394 0.839 0.8574 0.9174 0.8742 0.8955 0.9196 0.9388 0.9602]
>>> y = np.ndarray(shape=(10,), buffer=np.array([0.956, 0.884, 0.875, 0.880, 0.865, 0.870, 0.861, 0.817, 0.771, 0.727]), dtype=float)
[0.956, 0.884, 0.875, 0.880, 0.865, 0.870, 0.861, 0.817, 0.771, 0.727]
和系列edge_or_not
:
>>> d = {'2020-03-17 04:39:00+03:00': 0,
'2020-03-17 04:40:00+03:00': 1,
'2020-03-17 04:41:00+03:00': 0,
'2020-03-17 04:42:00+03:00': -1,
'2020-03-17 04:43:00+03:00': 0,
'2020-03-17 04:44:00+03:00': 0,
'2020-03-17 04:45:00+03:00': 1,
'2020-03-17 04:46:00+03:00': -1,
'2020-03-17 04:47:00+03:00': -1,
'2020-03-17 04:48:00+03:00': -1}
>>> edge_or_not = pd.Series(data=d)
2020-03-17 04:39:00+03:00 0
2020-03-17 04:40:00+03:00 1
2020-03-17 04:41:00+03:00 0
2020-03-17 04:42:00+03:00 -1
2020-03-17 04:43:00+03:00 0
2020-03-17 04:44:00+03:00 0
2020-03-17 04:45:00+03:00 1
2020-03-17 04:46:00+03:00 -1
2020-03-17 04:47:00+03:00 -1
2020-03-17 04:48:00+03:00 -1
dtype: int64
我得到 up_edge_x
、up_edge_y
、down_edge_x
、down_edge_y
,如下所示:
>>> up_edge_x = x[edge_or_not > 0]
array([0.9394, 0.8955])
>>> up_edge_y = y[edge_or_not > 0]
array([0.884, 0.861])
>>> down_edge_x = x[edge_or_not < 0]
array([0.8574, 0.9196, 0.9388, 0.9602])
>>> down_edge_y = y[edge_or_not < 0]
array([0.88 , 0.817, 0.771, 0.727])
和all_edges_x
, all_edges_y
:
>>> all_edges_x = x[edge_or_not != 0]
array([0.9394, 0.8574, 0.8955, 0.9196, 0.9388, 0.9602])
>>> all_edges_y = y[edge_or_not != 0]
array([0.884, 0.88 , 0.861, 0.817, 0.771, 0.727])
然后创建数据帧:
>>> up_edge = pd.DataFrame({'y':up_edge_y}, index=up_edge_x)
y (pos)
0.9394 0.884 0
0.8955 0.861 1
>>> down_edge = pd.DataFrame({'y':down_edge_y}, index=down_edge_x)
y (pos)
0.8574 0.880 0
0.9196 0.817 1
0.9388 0.771 2
0.9602 0.727 3
我只需要创建 all_edges DataFrame
,其中将包含 3 列:'y'
、'edge'
、'pos'
>>> all_edges = pd.DataFrame({'y':all_edges_y, 'edge':edge_or_not[edge_or_not != 0].to_numpy(),
'pos':???},
index=all_edges_x)
所以 all_edges DataFrame
必须看起来像这样:
y edge pos
0.9394 0.884 1 0
0.8574 0.880 -1 0
0.8955 0.861 1 1
0.9196 0.817 -1 1
0.9388 0.771 -1 2
0.9602 0.727 -1 3
如何计算第 3 列 pos
,我可以从 up_edge
和 down_edge
DataFrame 链接到 all_edges
,如下面的愚蠢示例:
>>> down_x1 = 0.9602
>>> loc = down_edge.index.get_loc(down_x1)
>>> edges = all_edges.loc[all_edges['pos']==loc]['edge']
>>> print(edges)
0.9602 -1
Name: edge, dtype: int64
我还有第二个问题:如何获取另一个 DataFrame 的位置数组?
像这样:
>>> locations = down_edge.index.get_loc(#mb all indexes)
[0, 1, 2, 3]
使用:
up_edge_x = x[edge_or_not > 0]
up_edge_y = y[edge_or_not > 0]
down_edge_x = x[edge_or_not < 0]
down_edge_y = y[edge_or_not < 0]
all_edges_x = x[edge_or_not != 0]
all_edges_y = y[edge_or_not != 0]
首先按 up_edge_x, down_edge_x
索引的范围创建 Series
:
up_edge = pd.Series(range(len(up_edge_x)), index=up_edge_x, name='pos')
down_edge = pd.Series(range(len(down_edge_x)), index=down_edge_x, name='pos')
print (up_edge)
0.9394 0
0.8955 1
Name: pos, dtype: int64
print (down_edge)
0.8574 0
0.9196 1
0.9388 2
0.9602 3
Name: pos, dtype: int64
然后加入:
pos = pd.concat([up_edge, down_edge])
print (pos)
0.9394 0
0.8955 1
0.8574 0
0.9196 1
0.9388 2
0.9602 3
Name: pos, dtype: int64
最后映射新列:
all_edges = pd.DataFrame({'y':all_edges_y,
'edge':edge_or_not[edge_or_not != 0].to_numpy(),
'pos': pd.Index(all_edges_x).map(pos)},
index=all_edges_x)
print (all_edges)
y edge pos
0.9394 0.884 1 0
0.8574 0.880 -1 0
0.8955 0.861 1 1
0.9196 0.817 -1 1
0.9388 0.771 -1 2
0.9602 0.727 -1 3
我想我可能不会执着于up_edge
和down_edge
,然后继续如下:
>>> all_edges['pos'] = all_edges.groupby(all_edges['edge']).cumcount()
像这样预创建 DataFrame all_edges
:
>>> all_edges = pd.DataFrame({'y':all_edges_y, 'edge':edge_or_not[edge_or_not != 0].to_numpy()}, index=all_edges_x)
我有一个 numpy.ndarrays: x,y
:
>>> x = np.ndarray(shape=(10,), buffer=np.array([0.9902, 0.9394, 0.839, 0.8574, 0.9174, 0.8742, 0.8955, 0.9196, 0.9388, 0.9602]), dtype=float)
[0.9902 0.9394 0.839 0.8574 0.9174 0.8742 0.8955 0.9196 0.9388 0.9602]
>>> y = np.ndarray(shape=(10,), buffer=np.array([0.956, 0.884, 0.875, 0.880, 0.865, 0.870, 0.861, 0.817, 0.771, 0.727]), dtype=float)
[0.956, 0.884, 0.875, 0.880, 0.865, 0.870, 0.861, 0.817, 0.771, 0.727]
和系列edge_or_not
:
>>> d = {'2020-03-17 04:39:00+03:00': 0,
'2020-03-17 04:40:00+03:00': 1,
'2020-03-17 04:41:00+03:00': 0,
'2020-03-17 04:42:00+03:00': -1,
'2020-03-17 04:43:00+03:00': 0,
'2020-03-17 04:44:00+03:00': 0,
'2020-03-17 04:45:00+03:00': 1,
'2020-03-17 04:46:00+03:00': -1,
'2020-03-17 04:47:00+03:00': -1,
'2020-03-17 04:48:00+03:00': -1}
>>> edge_or_not = pd.Series(data=d)
2020-03-17 04:39:00+03:00 0
2020-03-17 04:40:00+03:00 1
2020-03-17 04:41:00+03:00 0
2020-03-17 04:42:00+03:00 -1
2020-03-17 04:43:00+03:00 0
2020-03-17 04:44:00+03:00 0
2020-03-17 04:45:00+03:00 1
2020-03-17 04:46:00+03:00 -1
2020-03-17 04:47:00+03:00 -1
2020-03-17 04:48:00+03:00 -1
dtype: int64
我得到 up_edge_x
、up_edge_y
、down_edge_x
、down_edge_y
,如下所示:
>>> up_edge_x = x[edge_or_not > 0]
array([0.9394, 0.8955])
>>> up_edge_y = y[edge_or_not > 0]
array([0.884, 0.861])
>>> down_edge_x = x[edge_or_not < 0]
array([0.8574, 0.9196, 0.9388, 0.9602])
>>> down_edge_y = y[edge_or_not < 0]
array([0.88 , 0.817, 0.771, 0.727])
和all_edges_x
, all_edges_y
:
>>> all_edges_x = x[edge_or_not != 0]
array([0.9394, 0.8574, 0.8955, 0.9196, 0.9388, 0.9602])
>>> all_edges_y = y[edge_or_not != 0]
array([0.884, 0.88 , 0.861, 0.817, 0.771, 0.727])
然后创建数据帧:
>>> up_edge = pd.DataFrame({'y':up_edge_y}, index=up_edge_x)
y (pos)
0.9394 0.884 0
0.8955 0.861 1
>>> down_edge = pd.DataFrame({'y':down_edge_y}, index=down_edge_x)
y (pos)
0.8574 0.880 0
0.9196 0.817 1
0.9388 0.771 2
0.9602 0.727 3
我只需要创建 all_edges DataFrame
,其中将包含 3 列:'y'
、'edge'
、'pos'
>>> all_edges = pd.DataFrame({'y':all_edges_y, 'edge':edge_or_not[edge_or_not != 0].to_numpy(),
'pos':???},
index=all_edges_x)
所以 all_edges DataFrame
必须看起来像这样:
y edge pos
0.9394 0.884 1 0
0.8574 0.880 -1 0
0.8955 0.861 1 1
0.9196 0.817 -1 1
0.9388 0.771 -1 2
0.9602 0.727 -1 3
如何计算第 3 列 pos
,我可以从 up_edge
和 down_edge
DataFrame 链接到 all_edges
,如下面的愚蠢示例:
>>> down_x1 = 0.9602
>>> loc = down_edge.index.get_loc(down_x1)
>>> edges = all_edges.loc[all_edges['pos']==loc]['edge']
>>> print(edges)
0.9602 -1
Name: edge, dtype: int64
我还有第二个问题:如何获取另一个 DataFrame 的位置数组? 像这样:
>>> locations = down_edge.index.get_loc(#mb all indexes)
[0, 1, 2, 3]
使用:
up_edge_x = x[edge_or_not > 0]
up_edge_y = y[edge_or_not > 0]
down_edge_x = x[edge_or_not < 0]
down_edge_y = y[edge_or_not < 0]
all_edges_x = x[edge_or_not != 0]
all_edges_y = y[edge_or_not != 0]
首先按 up_edge_x, down_edge_x
索引的范围创建 Series
:
up_edge = pd.Series(range(len(up_edge_x)), index=up_edge_x, name='pos')
down_edge = pd.Series(range(len(down_edge_x)), index=down_edge_x, name='pos')
print (up_edge)
0.9394 0
0.8955 1
Name: pos, dtype: int64
print (down_edge)
0.8574 0
0.9196 1
0.9388 2
0.9602 3
Name: pos, dtype: int64
然后加入:
pos = pd.concat([up_edge, down_edge])
print (pos)
0.9394 0
0.8955 1
0.8574 0
0.9196 1
0.9388 2
0.9602 3
Name: pos, dtype: int64
最后映射新列:
all_edges = pd.DataFrame({'y':all_edges_y,
'edge':edge_or_not[edge_or_not != 0].to_numpy(),
'pos': pd.Index(all_edges_x).map(pos)},
index=all_edges_x)
print (all_edges)
y edge pos
0.9394 0.884 1 0
0.8574 0.880 -1 0
0.8955 0.861 1 1
0.9196 0.817 -1 1
0.9388 0.771 -1 2
0.9602 0.727 -1 3
我想我可能不会执着于up_edge
和down_edge
,然后继续如下:
>>> all_edges['pos'] = all_edges.groupby(all_edges['edge']).cumcount()
像这样预创建 DataFrame all_edges
:
>>> all_edges = pd.DataFrame({'y':all_edges_y, 'edge':edge_or_not[edge_or_not != 0].to_numpy()}, index=all_edges_x)