将元组列表转换为切片列表以与 np.r_ 一起使用

Question

在 pandas 中工作，我创建了一个元组列表，表示一组给定索引点周围的行范围：

mask = df.loc[df['Illustration']=='Example'].index
idxlist = [(i-1,i+10) for i in mask]
idxlist
[(2, 13), (48, 59), (120, 131),...]

我想使用这个元组列表中的值作为范围切片索引来调用 np.r_，它采用这种类型的列表：

df.iloc[np.r_[2:13, 48:59, 120:131,...]

我可以通过 slice 函数传递我的元组列表：

slicelist = [slice(*(idxlist[j])) for j in range(len(idxlist))]

但是 slice 和 np.r_ 不兼容（据我所知）。

所以我正在寻找一种将元组列表转换为切片范围列表的方法，或者寻找一种使用列表理解生成切片范围列表的方法，类似于我所做的 idxslice 以上。我知道我可以找到一些非常不优雅的方式来做到这一点，但我正在寻找最 pythonic 的方式，最好没有循环。谢谢。

Answer 1

完整的 numpy 解决方案：

import numpy as np    
indices = np.array(np.concatenate([np.arange(i,j) for i, j in idxlist]))
df.iloc[indices]

Answer 2

不需要太花哨，因为索引遵循一种模式，只需列出所有索引即可。

from itertools import chain

#mask = [3, 49, 121, ...]
m = [*chain.from_iterable([range(i-1, i+10) for i in mask])]

# or simply
m = [x for i in mask for x in range(i-1, i+10)]

# Then
df.iloc[m]

Answer 3

只是想添加一个不同的答案，它仍然需要一个循环，但您可以堆叠从解包范围中获得的 numpy 数组。

np.hstack([np.fromiter(range(*i), dtype=int, count=len(range(*i))) for i in idx])

Answer 4

In [208]: alist = [(2, 13), (48, 59), (120, 131)]

r_ 使用索引符号将切片列表转换为索引（它实际上是一个带有 __getitem__ 方法的 class 实例。解释器将 n:m 转换为slice(n,m)，但 r_ 然后将其转换为 arange(n,m)。

In [209]: np.r_[2:13, 48:59, 120:131]
Out[209]: 
array([  2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  48,  49,
        50,  51,  52,  53,  54,  55,  56,  57,  58, 120, 121, 122, 123,
       124, 125, 126, 127, 128, 129, 130])

s_ 可以使用相同的输入，但生成切片对象：

In [211]: np.s_[2:13, 48:59, 120:131]
Out[211]: (slice(2, 13, None), slice(48, 59, None), slice(120, 131, None))

与（并且具有相同的迭代）相同：

In [212]: [slice(i,j) for i,j in alist]
Out[212]: [slice(2, 13, None), slice(48, 59, None), slice(120, 131, None)]

将 slice 替换为 arange:

In [213]: [np.arange(i,j) for i,j in alist]
Out[213]: 
[array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),
 array([48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58]),
 array([120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130])]

加入它们会产生与 r_:

相同的结果

In [214]: np.hstack(_)
Out[214]: 
array([  2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  48,  49,
        50,  51,  52,  53,  54,  55,  56,  57,  58, 120, 121, 122, 123,
       124, 125, 126, 127, 128, 129, 130])

r_ 很漂亮，但在计算上它是一样的。像这样的列表理解没有任何优雅或非 Pythonic 的地方。

由于每个范围具有相同的长度（11 个值），我们也可以使用 linspace:

In [220]: np.linspace((2,48,120),(13,59,131),11,endpoint=False, dtype=int)
Out[220]: 
array([[  2,  48, 120],
       [  3,  49, 121],
       [  4,  50, 122],
       [  5,  51, 123],
       [  6,  52, 124],
       [  7,  53, 125],
       [  8,  54, 126],
       [  9,  55, 127],
       [ 10,  56, 128],
       [ 11,  57, 129],
       [ 12,  58, 130]])
In [221]: np.hstack(_.T)
Out[221]: 
array([  2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  48,  49,
        50,  51,  52,  53,  54,  55,  56,  57,  58, 120, 121, 122, 123,
       124, 125, 126, 127, 128, 129, 130])

您仍然可以使用 r_ 和 alist（但使用 arange 更直接）：

In [225]: np.r_.__getitem__(tuple([slice(i,j) for i,j in alist]))
Out[225]: 
array([  2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  48,  49,
        50,  51,  52,  53,  54,  55,  56,  57,  58, 120, 121, 122, 123,
       124, 125, 126, 127, 128, 129, 130])

np.r_ 只是一个 concatenate 伪装成索引（添加了一些铃铛）：

np.r_[tuple([np.arange(i,j) for i,j in alist])]
np.hstack([np.arange(i,j) for i,j in alist])

Answer 5

我喜欢

 alist = [(2, 13), (48, 59), (120, 131)]
  #print(alist)
 results=[np.arange(x[0],x[1]) for x in alist]
 print(results)

Answer 6

你也可以使用元组来切片数据

data=[x for x in np.arange(131)]

idxlist=[(2, 13), (48, 59), (120, 131)]
 data=[x for x in np.arange(131)]

 for tupleSlice in idxlist:
     print(data[tupleSlice[0]:tupleSlice[1]])

将元组列表转换为切片列表以与 np.r_ 一起使用

Convert list of tuples into list of slices to use with np.r_

python

numpy

slice

pandas