创建具有相同自相关的排列

Question

我的问题与类似，但不同之处在于我需要一个由 0 和 1 组成的数组作为输出。我有一个原始时间序列的零和一个具有高自相关性（即，这些是聚集的）。对于一些重要性测试，我需要创建具有相同数量的零和一的随机数组。 IE。然而，原始数组的排列，自相关也应该保持 same/similar 到原始数组，所以简单的 np.permutation 对我没有帮助。

由于我正在进行多项实现，因此我需要一个尽可能快的解决方案。任何帮助深表感谢。

Answer 1

根据您提到的问题，您想要排列 x 使得

np.corrcoef(x[0: len(x) - 1], x[1: ])[0][1]

不变。

说序列x由

组成

z₁o₁z₂o₂z₃o₃...z_ko_k,

其中每个 z_i 是一个 0 序列，每个 o_i 是 1 的序列。（有四种情况，看序列是以0s还是1s开始，以0s还是1s结束，但原则上都是一样的）

假设 p 和 q 都是 {1, ..., k}，并考虑序列

z_p[1]o_q[1]z_p[2]o_q[2]z_p[3]o_q[3] ... z_p[k] o_q[k],

也就是说，每个运行长度的 0 和 1 子序列都已在内部置换。

例如，假设原始序列是

0, 0, 0, 1, 1, 0, 1.

然后

0, 0, 0, 1, 0, 1, 1,

就是这样的排列，还有

0, 1, 1, 0, 0, 0, 1,

和

0, 1, 0, 0, 0, 1, 1.

执行此排列不会改变相关性：

在每个运行中，差异是相同的
运行之间的界限和之前一样

因此，这提供了一种生成不影响相关性的排列的方法。（另请参阅最后另一种更简单、更有效的方法，该方法适用于许多常见情况。）

我们从接受序列的函数preprocess开始，returns一个元组starts_with_zero, zeros, ones，分别表示

是否x以0开头
第0个运行s
第1 运行s

在代码中，这是

import numpy as np
import itertools

def preprocess(x):
    def find_runs(x, val):
        matches = np.concatenate(([0], np.equal(x, val).view(np.int8), [0]))
        absdiff = np.abs(np.diff(matches))
        ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
        return ranges[:, 1] - ranges[:, 0]

    starts_with_zero = x[0] == 0

    run_lengths_0 = find_runs(x, 0)
    run_lengths_1 = find_runs(x, 1)
    zeros = [np.zeros(l) for l in run_lengths_0]
    ones = [np.ones(l) for l in run_lengths_1]

    return starts_with_zero, zeros, ones

（此函数借用了 this question 的答案。）

要使用此功能，您可以这样做，例如

x = (np.random.uniform(size=10000) > 0.2).astype(int)

starts_with_zero, zeros, ones = preprocess(x)

现在我们编写一个函数来在内部置换 0 和 1 运行，并连接结果：

def get_next_permutation(starts_with_zero, zeros, ones):
    np.random.shuffle(zeros)
    np.random.shuffle(ones)

    if starts_with_zero:
        all_ = itertools.izip_longest(zeros, ones, fillvalue=np.array([]))
    else:
        all_ = itertools.izip_longest(ones, zeros, fillvalue=np.array([]))
    all_ = [e for p in all_ for e in p]

    x_tag = np.concatenate(all_)

    return x_tag

要生成另一个排列（具有相同的相关性），您将使用

x_tag = get_next_permutation(starts_with_zero, zeros, ones)

要生成许多排列，您可以这样做：

starts_with_zero, zeros, ones = preprocess(x)

for i in range(<number of permutations needed):
    x_tag = get_next_permutation(starts_with_zero, zeros, ones)

例子

假设我们运行

x = (np.random.uniform(size=10000) > 0.2).astype(int)
print np.corrcoef(x[0: len(x) - 1], x[1: ])[0][1]

starts_with_zero, zeros, ones = preprocess(x)

for i in range(10):
    x_tag = get_next_permutation(starts_with_zero, zeros, ones)

    print x_tag[: 50]
    print np.corrcoef(x_tag[0: len(x_tag) - 1], x_tag[1: ])[0][1]

然后我们得到：

0.00674330566615
[ 1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  0.
  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  0.  0.  1.  0.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
0.00674330566615
[ 1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.  1.  1.  1.  1.  0.  0.  1.  0.
  1.  1.  1.  1.  0.  0.  0.  1.  1.  1.  1.  1.  1.  1.]
0.00674330566615
[ 1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  0.  0.  1.  1.  1.  0.]
0.00674330566615
[ 1.  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.
  1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.]
0.00674330566615
[ 1.  1.  1.  1.  0.  0.  0.  0.  1.  1.  0.  1.  1.  0.  0.  1.  0.  1.
  1.  1.  0.  1.  0.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.
  0.  1.  1.  1.  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.]
0.00674330566615
[ 1.  1.  0.  1.  1.  1.  0.  0.  1.  1.  0.  1.  1.  0.  0.  1.  1.  0.
  1.  1.  1.  0.  1.  1.  1.  1.  0.  0.  0.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  0.  1.  1.  0.  1.  0.  0.  1.  1.]
0.00674330566615
[ 1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  0.  1.  1.  0.  1.  0.  1.  1.  1.  1.]
0.00674330566615
[ 1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.  0.  1.  0.  1.  1.
  1.  1.  1.  0.  1.  0.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  0.
  0.  1.  1.  1.  0.  1.  1.  0.  1.  1.  0.  1.  1.  1.]
0.00674330566615
[ 1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.  1.]
0.00674330566615
[ 1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  0.  1.  0.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  0.]

请注意，如果

，则有一个更简单的解决方案

你的序列长度为n,
一些数 m 有 m << n，并且
m!远远大于你需要的排列数

在这种情况下，只需将您的序列分成 m（大约）相等的部分，然后随机排列它们。如前所述，只有 m - 1 边界以可能影响相关性的方式发生变化。由于 m << n，这可以忽略不计。

对于某些数字，假设您有一个包含 10000 个元素的序列。众所周知 20! = 2432902008176640000，这可能比您需要的排列要多得多。通过将您的序列分成 20 个部分并进行排列，您最多会影响 19 / 10000，并且可能足够小。对于这些尺寸，这是我要使用的方法。

创建具有相同自相关的排列

Create a permutation with same autocorrelation

python

random

numpy

permutation