Python 不均匀子集生成的代码优化
Python code optimization for uneven subset genetarion
我想寻求帮助优化代码。
我有一个包含 26 个元素的列表:
indata = [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0]
仅供进一步阅读:当我提到“子集”时 => 是数据的子'set',而不是数据类型。我正在寻找“子列表”。
我正在准备一个函数,它将对该列表的子集执行进一步的计算。问题是,如果子集是在奇数上生成的,有时相同的元素会进入不同的子集两次或更多次。我正在寻找的子集是:
- 子集 1 => 原始数据
- 子集 2 & 3 => 数据的前半部分和后半部分
- 子集 4 - 7 => 第一、第二、第三和第四个数据的 1/4
- 子集 8 - 15 => 集合的下 1/8。
我在函数体内想出了一个相当草率和冗长的解决方案,它是这样的:
for i in iterate:
if i == 0:
subset = indata
elif i == 1:
subset = indata[0:int(len(indata)/2)]
elif i == 2:
subset = indata[int(len(indata)/2):]
elif i == 3:
subset = indata[0:int(len(indata)/4)]
elif i == 4:
subset = indata[int(len(indata)/4):int(round((len(indata)/4)*2,0))]
elif i == 5:
subset = indata[int(round((len(indata)/4)*2,0)):int(round((len(indata)/4)*3,0))]
elif i == 6:
subset = indata[int(round((len(indata)/4)*3,0)):]
elif i == 7:
subset = indata[0:int(len(indata)/8)]
elif i == 8:
subset = indata[int(len(indata)/8):int(round((len(indata)/8)*2,0))]
elif i == 9:
subset = indata[int(len(indata)/8)*2:int(round((len(indata)/8)*3,0))]
elif i == 10:
subset = indata[int((len(indata)/8)*3+0.25):int(round((len(indata)/8)*4,0))]
elif i == 11:
subset = indata[int((len(indata)/8)*4+0.25):int(round((len(indata)/8)*5,0))]
elif i == 12:
subset = indata[int((len(indata)/8)*5+0.25):int(round((len(indata)/8)*6,0))]
elif i == 13:
subset = indata[int((len(indata)/8)*6+0.5):int(round((len(indata)/8)*7,0))]
elif i == 14:
subset = indata[int((len(indata)/8)*7+0.5):]
else:
subset = indata[int((len(indata)/8)*7+0.5):]
-here go further instruction on the subset, then loop go back and repeat.
它做了它应该做的事情(添加的 0.25 和 0.5 部分是为了避免将相同的元素包含到两个或多个子集中,假设子集的长度是 3.25)。
但是肯定有更好的方法来做到这一点。
我不介意有不均匀的集合,比方说,除以 4 得到 2 个 7 元素列表和 2 个 6 元素列表。只要元素不同。
感谢您的帮助。
def divide_data(data, chunks):
idx = 0
sizes = [len(data) // chunks + int(x < len(data)%chunks) for x in range(chunks)]
for size in sizes:
yield data[idx:idx+size]
idx += size
data = list(range(26)) # or whatever, e.g. [0, 0, 50, ...]
for num_subsets in (1, 2, 4, 8):
print(f'num subsets: {num_subsets}')
for subset in divide_data(data, num_subsets):
print(subset)
num subsets: 1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
num subsets: 2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
num subsets: 4
[0, 1, 2, 3, 4, 5, 6]
[7, 8, 9, 10, 11, 12, 13]
[14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25]
num subsets: 8
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10]
[11, 12, 13]
[14, 15, 16]
[17, 18, 19]
[20, 21, 22]
[23, 24, 25]
感谢 this answer 的灵感
您可以使用列表理解来获取这些子集:
indata = [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85,
44, 89, 26, 0, 67, 67, 23, 0, 0]
subsets = [indata[p*size:(p+1)*size]
for parts in (1,2,4,8)
for size in [len(indata)//parts]
for p in range(parts)]
输出:
for i,subset in enumerate(subsets,1): print(i,subset)
1 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85, 44,
89, 26, 0, 67, 67, 23, 0, 0]
2 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42]
3 [30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0]
4 [0, 0, 50, 0, 32, 35]
5 [151, 163, 9, 1, 3, 3]
6 [42, 30, 16, 14, 85, 44]
7 [89, 26, 0, 67, 67, 23]
8 [0, 0, 50]
9 [0, 32, 35]
10 [151, 163, 9]
11 [1, 3, 3]
12 [42, 30, 16]
13 [14, 85, 44]
14 [89, 26, 0]
15 [67, 67, 23]
请注意,当列表的大小不是分区数的倍数(例如 26/4 和 26/8)时,这将删除项目。有几种方法可以处理这个问题(更多的子集,更大的块,不同的子集大小以均匀或随机地分布项目,添加到第一个子集,添加到最后一个,......)但你必须指定你想要的.
例如,此变体将额外项目分散到前几组(每组不超过 1 个额外项目):
subsets = [indata[p*size+min(p,spread):(p+1)*size+min(p+1,spread)]
for parts in (1,2,4,8)
for size,spread in [divmod(len(indata),parts)]
for p in range(parts)]
for i,subset in enumerate(subsets,1): print(i,subset,len(subset))
1 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14,
85, 44, 89, 26, 0, 67, 67, 23, 0, 0] 26
2 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42] 13
3 [30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0] 13
4 [0, 0, 50, 0, 32, 35, 151] 7
5 [163, 9, 1, 3, 3, 42, 30] 7
6 [16, 14, 85, 44, 89, 26] 6
7 [0, 67, 67, 23, 0, 0] 6
8 [0, 0, 50, 0] 4
9 [32, 35, 151, 163] 4
10 [9, 1, 3] 3
11 [3, 42, 30] 3
12 [16, 14, 85] 3
13 [44, 89, 26] 3
14 [0, 67, 67] 3
15 [23, 0, 0] 3
您可以使用 np.array_split
+ 列表理解:
sublists = [arr.tolist() for num in [1,2,4,8] for arr in np.array_split(np.array(indata), num)]
输出:
[[0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0],
[0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42],
[30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0],
[0, 0, 50, 0, 32, 35, 151],
[163, 9, 1, 3, 3, 42, 30],
[16, 14, 85, 44, 89, 26],
[0, 67, 67, 23, 0, 0],
[0, 0, 50, 0],
[32, 35, 151, 163],
[9, 1, 3],
[3, 42, 30],
[16, 14, 85],
[44, 89, 26],
[0, 67, 67],
[23, 0, 0]]
我想寻求帮助优化代码。 我有一个包含 26 个元素的列表:
indata = [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0]
仅供进一步阅读:当我提到“子集”时 => 是数据的子'set',而不是数据类型。我正在寻找“子列表”。
我正在准备一个函数,它将对该列表的子集执行进一步的计算。问题是,如果子集是在奇数上生成的,有时相同的元素会进入不同的子集两次或更多次。我正在寻找的子集是:
- 子集 1 => 原始数据
- 子集 2 & 3 => 数据的前半部分和后半部分
- 子集 4 - 7 => 第一、第二、第三和第四个数据的 1/4
- 子集 8 - 15 => 集合的下 1/8。
我在函数体内想出了一个相当草率和冗长的解决方案,它是这样的:
for i in iterate:
if i == 0:
subset = indata
elif i == 1:
subset = indata[0:int(len(indata)/2)]
elif i == 2:
subset = indata[int(len(indata)/2):]
elif i == 3:
subset = indata[0:int(len(indata)/4)]
elif i == 4:
subset = indata[int(len(indata)/4):int(round((len(indata)/4)*2,0))]
elif i == 5:
subset = indata[int(round((len(indata)/4)*2,0)):int(round((len(indata)/4)*3,0))]
elif i == 6:
subset = indata[int(round((len(indata)/4)*3,0)):]
elif i == 7:
subset = indata[0:int(len(indata)/8)]
elif i == 8:
subset = indata[int(len(indata)/8):int(round((len(indata)/8)*2,0))]
elif i == 9:
subset = indata[int(len(indata)/8)*2:int(round((len(indata)/8)*3,0))]
elif i == 10:
subset = indata[int((len(indata)/8)*3+0.25):int(round((len(indata)/8)*4,0))]
elif i == 11:
subset = indata[int((len(indata)/8)*4+0.25):int(round((len(indata)/8)*5,0))]
elif i == 12:
subset = indata[int((len(indata)/8)*5+0.25):int(round((len(indata)/8)*6,0))]
elif i == 13:
subset = indata[int((len(indata)/8)*6+0.5):int(round((len(indata)/8)*7,0))]
elif i == 14:
subset = indata[int((len(indata)/8)*7+0.5):]
else:
subset = indata[int((len(indata)/8)*7+0.5):]
-here go further instruction on the subset, then loop go back and repeat.
它做了它应该做的事情(添加的 0.25 和 0.5 部分是为了避免将相同的元素包含到两个或多个子集中,假设子集的长度是 3.25)。 但是肯定有更好的方法来做到这一点。 我不介意有不均匀的集合,比方说,除以 4 得到 2 个 7 元素列表和 2 个 6 元素列表。只要元素不同。
感谢您的帮助。
def divide_data(data, chunks):
idx = 0
sizes = [len(data) // chunks + int(x < len(data)%chunks) for x in range(chunks)]
for size in sizes:
yield data[idx:idx+size]
idx += size
data = list(range(26)) # or whatever, e.g. [0, 0, 50, ...]
for num_subsets in (1, 2, 4, 8):
print(f'num subsets: {num_subsets}')
for subset in divide_data(data, num_subsets):
print(subset)
num subsets: 1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
num subsets: 2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
num subsets: 4
[0, 1, 2, 3, 4, 5, 6]
[7, 8, 9, 10, 11, 12, 13]
[14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25]
num subsets: 8
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10]
[11, 12, 13]
[14, 15, 16]
[17, 18, 19]
[20, 21, 22]
[23, 24, 25]
感谢 this answer 的灵感
您可以使用列表理解来获取这些子集:
indata = [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85,
44, 89, 26, 0, 67, 67, 23, 0, 0]
subsets = [indata[p*size:(p+1)*size]
for parts in (1,2,4,8)
for size in [len(indata)//parts]
for p in range(parts)]
输出:
for i,subset in enumerate(subsets,1): print(i,subset)
1 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85, 44,
89, 26, 0, 67, 67, 23, 0, 0]
2 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42]
3 [30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0]
4 [0, 0, 50, 0, 32, 35]
5 [151, 163, 9, 1, 3, 3]
6 [42, 30, 16, 14, 85, 44]
7 [89, 26, 0, 67, 67, 23]
8 [0, 0, 50]
9 [0, 32, 35]
10 [151, 163, 9]
11 [1, 3, 3]
12 [42, 30, 16]
13 [14, 85, 44]
14 [89, 26, 0]
15 [67, 67, 23]
请注意,当列表的大小不是分区数的倍数(例如 26/4 和 26/8)时,这将删除项目。有几种方法可以处理这个问题(更多的子集,更大的块,不同的子集大小以均匀或随机地分布项目,添加到第一个子集,添加到最后一个,......)但你必须指定你想要的.
例如,此变体将额外项目分散到前几组(每组不超过 1 个额外项目):
subsets = [indata[p*size+min(p,spread):(p+1)*size+min(p+1,spread)]
for parts in (1,2,4,8)
for size,spread in [divmod(len(indata),parts)]
for p in range(parts)]
for i,subset in enumerate(subsets,1): print(i,subset,len(subset))
1 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14,
85, 44, 89, 26, 0, 67, 67, 23, 0, 0] 26
2 [0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42] 13
3 [30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0] 13
4 [0, 0, 50, 0, 32, 35, 151] 7
5 [163, 9, 1, 3, 3, 42, 30] 7
6 [16, 14, 85, 44, 89, 26] 6
7 [0, 67, 67, 23, 0, 0] 6
8 [0, 0, 50, 0] 4
9 [32, 35, 151, 163] 4
10 [9, 1, 3] 3
11 [3, 42, 30] 3
12 [16, 14, 85] 3
13 [44, 89, 26] 3
14 [0, 67, 67] 3
15 [23, 0, 0] 3
您可以使用 np.array_split
+ 列表理解:
sublists = [arr.tolist() for num in [1,2,4,8] for arr in np.array_split(np.array(indata), num)]
输出:
[[0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42, 30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0],
[0, 0, 50, 0, 32, 35, 151, 163, 9, 1, 3, 3, 42],
[30, 16, 14, 85, 44, 89, 26, 0, 67, 67, 23, 0, 0],
[0, 0, 50, 0, 32, 35, 151],
[163, 9, 1, 3, 3, 42, 30],
[16, 14, 85, 44, 89, 26],
[0, 67, 67, 23, 0, 0],
[0, 0, 50, 0],
[32, 35, 151, 163],
[9, 1, 3],
[3, 42, 30],
[16, 14, 85],
[44, 89, 26],
[0, 67, 67],
[23, 0, 0]]