如何生成 2 列增量值及其所有与 pandas 的唯一组合?
How to generate 2 columns of incremental values and all their unique combinations with pandas?
我需要创建一个 2 列数据框。
第一列包含从 7000 到 15000 的值以及该范围内的所有增量 500 (7000,7500,8000...14500,1500)
第二列包含从 6 到 24 的所有整数
我需要一种简单的方法来生成这些值及其所有独特的组合:
6,7000
6,7500
6,8000
....
24,14500
24,15000
您可以使用numpy.arange
for generating sequence of numbers, numpy.repeat
and numpy.tile
for generating cross-product and stack them using numpy.c_
or numpy.column_stack
x = np.arange(6, 25)
y = np.arange(7000, 15001, 500)
pd.DataFrame(np.c_[x.repeat(len(y)),np.tile(y, len(x))])
# pd.DataFrame(np.column_stack([x.repeat(len(y)),np.tile(y, len(x))]))
0 1
0 6 7000
1 6 7500
2 6 8000
3 6 8500
4 6 9000
.. .. ...
318 24 13000
319 24 13500
320 24 14000
321 24 14500
322 24 15000
[323 rows x 2 columns]
另一个想法是使用itertools.product
from itertools import product
pd.DataFrame(list(product(x,y)))
Timeit 结果:
# Henry' answer in comments
In [44]: %timeit pd.DataFrame([(x,y) for x in range(6,25) for y in range(7000,15001,500)])
657 µs ± 169 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# My solution
In [45]: %%timeit
...: x = np.arange(6, 25)
...: y = np.arange(7000, 15001, 500)
...:
...: pd.DataFrame(np.c_[x.repeat(len(y)),np.tile(y, len(x))])
...:
...:
155 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Using `np.column_stack`
In [49]: %%timeit
...: x = np.arange(6, 25)
...: y = np.arange(7000, 15001, 500)
...:
...: pd.DataFrame(np.column_stack([x.repeat(len(y)),np.tile(y, len(x))]))
...:
121 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# `itertools.product` solution
In [62]: %timeit pd.DataFrame(list(product(x,y)))
489 µs ± 7.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
我需要创建一个 2 列数据框。
第一列包含从 7000 到 15000 的值以及该范围内的所有增量 500 (7000,7500,8000...14500,1500)
第二列包含从 6 到 24 的所有整数
我需要一种简单的方法来生成这些值及其所有独特的组合:
6,7000
6,7500
6,8000
....
24,14500
24,15000
您可以使用numpy.arange
for generating sequence of numbers, numpy.repeat
and numpy.tile
for generating cross-product and stack them using numpy.c_
or numpy.column_stack
x = np.arange(6, 25)
y = np.arange(7000, 15001, 500)
pd.DataFrame(np.c_[x.repeat(len(y)),np.tile(y, len(x))])
# pd.DataFrame(np.column_stack([x.repeat(len(y)),np.tile(y, len(x))]))
0 1
0 6 7000
1 6 7500
2 6 8000
3 6 8500
4 6 9000
.. .. ...
318 24 13000
319 24 13500
320 24 14000
321 24 14500
322 24 15000
[323 rows x 2 columns]
另一个想法是使用itertools.product
from itertools import product
pd.DataFrame(list(product(x,y)))
Timeit 结果:
# Henry' answer in comments
In [44]: %timeit pd.DataFrame([(x,y) for x in range(6,25) for y in range(7000,15001,500)])
657 µs ± 169 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# My solution
In [45]: %%timeit
...: x = np.arange(6, 25)
...: y = np.arange(7000, 15001, 500)
...:
...: pd.DataFrame(np.c_[x.repeat(len(y)),np.tile(y, len(x))])
...:
...:
155 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Using `np.column_stack`
In [49]: %%timeit
...: x = np.arange(6, 25)
...: y = np.arange(7000, 15001, 500)
...:
...: pd.DataFrame(np.column_stack([x.repeat(len(y)),np.tile(y, len(x))]))
...:
121 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# `itertools.product` solution
In [62]: %timeit pd.DataFrame(list(product(x,y)))
489 µs ± 7.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)