从 Python 中的一组数字创建 "slice notation" 样式列表

Creating a "slice notation" style list from a set of numbers in Python

假设我有一组约 100,000 个不同的数字。有些是顺序的,有些不是。

为了演示这个问题,这些数字的一小部分可能是:

(a) {1,2,3,4,5,6,7,8,9,11,13,15,45,46,47,3467}

编写此子集的有效方法如下:

(b) 1:9:1,11:15:2,45:47:1,3467

这实际上是 python 和 matlab 切片符号的扩展版本。

我的问题是:如何从前一种类型的列表中高效地获取 Python 中后一种表示法的列表?

即给定(a),如何在Python中有效地得到(b)?

免责声明:我误读了问题并认为您想从切片表示法转到集合版本,这实际上并没有回答您的问题但我认为这是值得的离开张贴。 numpy._r 似乎也做同样(或至少非常相似)的事情。

首先请注意,如果您使用的是 python 3.5+ PEP 3132,则可以选择在 set literals 中使用 *unpacking 方法:

>>> {*range(1,9), *range(11,15,2), *range(45,47), 3467}
{1, 2, 3, 4, 5, 6, 7, 8, 11, 3467, 13, 45, 46}

否则符号 11:15:2 仅在对象上使用 __getitem____setitem__ 时使用,因此您只需要设置一个对象来生成您的集合:

def slice_to_range(slice_obj):
    assert isinstance(slice_obj, slice)
    assert slice_obj.stop is not None, "cannot have stop of None"
    start = slice_obj.start or 0
    stop = slice_obj.stop
    step = slice_obj.step or 1
    return range(start,stop,step)

class Slice_Set_Creator:
    def __getitem__(self,item):
        my_set = set()
        for part in item:
            if isinstance(part,slice):
                my_set.update(slice_to_range(part))
            else:
                my_set.add(part)
        return my_set

slice_set_creator = Slice_Set_Creator()

desired_set = slice_set_creator[1:9:1,11:15:2,45:47:1,3467]

>>> desired_set
{1, 2, 3, 4, 5, 6, 7, 8, 11, 3467, 13, 45, 46}

我想我明白了,但以下代码没有经过全面测试,可能包含错误。

基本上 get_partial_slices 将尝试创建 partial_slice 对象,当(已排序的)集合中的下一个数字未 .fit() 进入切片时,它将被 .end()ed并开始下一个切片。

如果切片中只有 1 个项目(或 2 个项目和 step!=1),它表示为单独的数字而不是切片(因此需要 yield from current.end(),因为结束切片可能结果是两个数字而不是一个切片。)

class partial_slice:
    """heavily relied on by get_partial_slices
This attempts to create a slice from repeatedly adding numbers
once a number that doesn't fit the slice is found use .end()
to generate either the slice or the individual numbers"""
    def __init__(self, n):
        self.start = n
        self.stop = None
        self.step = None
    def fit(self,n):
        "returns True if n fits as the next element of the slice (or False if it does not"
        if self.step is None:
            return True #always take the second element into consideration
        elif self.stop == n:
            return True #n fits perfectly with current stop value
        else:
            return False

    def add(self, n):
        """adds a number to the end of the slice, 
    will raise a ValueError if the number doesn't fit"""
        if not self.fit(n):
            raise ValueError("{} does not fit into the slice".format(n))
        if self.step is None:
            self.step = n - self.start
        self.stop = n+self.step

    def to_slice(self):
        "return slice(self.start, self.stop, self.step)"
        return slice(self.start, self.stop, self.step)
    def end(self):
        "generates at most 3 items, may split up small slices"
        if self.step is None:
            yield self.start
            return
        length = (self.stop - self.start)//self.step
        if length>2:
            #always keep slices that contain more then 2 items
            yield self.to_slice()
            return 
        elif self.step==1 and length==2:
            yield self.to_slice()
            return
        else:
            yield self.start
            yield self.stop - self.step


def get_partial_slices(set_):
    data = iter(sorted(set_))
    current = partial_slice(next(data))
    for n in data:
        if current.fit(n):
            current.add(n)
        else:
            yield from current.end()
            current = partial_slice(n)
    yield from current.end()


test_case = {1,2,3,4,5,6,7,8,9,11,13,15,45,46,47,3467}
result = tuple(get_partial_slices(test_case))

#slice_set_creator is from my other answer,
#this will verify that the result was the same as the test case.
assert test_case == slice_set_creator[result] 

def slice_formatter(obj):
    if isinstance(obj,slice):
        # the actual slice objects, like all indexing in python, doesn't include the stop value
        # I added this part to modify it when printing but not when created because the slice 
        # objects can actually be used in code if you want (like with slice_set_creator)
        inclusive_stop = obj.stop - obj.step
        return "{0.start}:{stop}:{0.step}".format(obj, stop=inclusive_stop)
    else:
        return repr(obj)

print(", ".join(map(slice_formatter,result)))

最简单的方法是使用 numpy 的 r_[] 语法。因此,对于您的示例,它只是:

>>> from numpy import r_
>>>
>>> a = r_[1:10, 11:17:2, 45:48, 3467]

请记住,python 切片不包括最后一个数字,x:y:1 是隐含的。而且这种方法在生产代码中的速度不如另一种更复杂的解决方案快,但它非常适合交互式使用。

您可以看到,这为您提供了一个包含您想要的数字的 numpy 数组:

>>> print(a)
[   1    2    3    4    5    6    7    8    9   11   13   15   45   46   47
 3467]