减少 Julia 中生成器的内存分配
Reducing memory allocation of a generator in Julia
我正在尝试减少代码中内循环的内存分配。以下未按预期工作的部分。
using Random
using StatsBase
using BenchmarkTools
using Distributions
a_dist = Distributions.DiscreteUniform(1, 99)
v_dist = Distributions.DiscreteUniform(1, 2)
population_size = 10000
population = [rand(a_dist, population_size) rand(v_dist, population_size)]
find_all_it3(f::Function, A) = (p[2] for p in eachrow(A) if f(p[1]))
@btime begin
c_pool = find_all_it3(x -> (x < 5), population)
c_pool_dict = countmap(c_pool, alg=:dict)
end
@btime begin
c_pool_indexes = findall(x -> (x < 5) , view(population, :, 1))
c_pool_dict = countmap(population[c_pool_indexes, 2], alg=:dict)
end
我希望生成器 (find_all_it3) 不需要分配太多内存。
然而,根据 btime
输出,似乎每个循环都有一个分配。
98.040 μs (10006 allocations: 625.64 KiB)
18.894 μs (18 allocations: 11.95 KiB)
现在在我的场景中,findall
的速度和分配最终成为一个问题,因此我试图通过 generator/iterators 找到更好的替代方案,以便减少分配;有没有办法做到这一点?是否有可供考虑的选项?
我没有解释,但这是我做的一些测试的结果
- 最佳时间是
view(population, :, 1) .< 5
(test4
)
- 使用
broadcast!
减少了一点分配 (test5
)
- 减少分配的最好方法是做你自己的循环(
test6
)
using BenchmarkTools
using StatsBase
population_size = 10000
population = [rand(1:99, population_size) rand(1:2, population_size)]
find_all_it(f::Function, A) = (p[2] for p in eachrow(A) if f(p[1]))
function test1(population)
c_pool = find_all_it(x -> x < 5, population)
c_pool_dict = countmap(c_pool, alg=:dict)
end
function test3(population)
c_pool_indexes = findall(x -> x < 5, view(population, :, 1))
c_pool_dict = countmap(view(population,c_pool_indexes, 2), alg=:dict)
end
function test4(population)
c_pool_indexes = view(population, :, 1) .< 5
c_pool_dict = countmap(view(population,c_pool_indexes, 2), alg=:dict)
end
function test5(c_pool_indexes, population)
broadcast!(<, c_pool_indexes, view(population, :, 1), 5)
c_pool_dict = countmap(view(population,c_pool_indexes, 2), alg=:dict)
end
function test6(population)
d = Dict{Int,Int}()
for i in eachindex(view(population, :, 1))
if population[i, 1] < 5
d[population[i,2]] = 1 + get(d,population[i,2],0)
end
end
return d
end
julia> @btime test1(population);
68.200 μs (10004 allocations: 625.59 KiB)
julia> @btime test3(population);
14.800 μs (14 allocations: 9.00 KiB)
julia> @btime test4(population);
7.250 μs (8 allocations: 9.33 KiB)
julia> temp = zeros(Bool, population_size);
julia> @btime test5(temp, population);
16.599 μs (5 allocations: 3.78 KiB)
julia> @btime test6(population);
11.299 μs (4 allocations: 608 bytes)
我正在尝试减少代码中内循环的内存分配。以下未按预期工作的部分。
using Random
using StatsBase
using BenchmarkTools
using Distributions
a_dist = Distributions.DiscreteUniform(1, 99)
v_dist = Distributions.DiscreteUniform(1, 2)
population_size = 10000
population = [rand(a_dist, population_size) rand(v_dist, population_size)]
find_all_it3(f::Function, A) = (p[2] for p in eachrow(A) if f(p[1]))
@btime begin
c_pool = find_all_it3(x -> (x < 5), population)
c_pool_dict = countmap(c_pool, alg=:dict)
end
@btime begin
c_pool_indexes = findall(x -> (x < 5) , view(population, :, 1))
c_pool_dict = countmap(population[c_pool_indexes, 2], alg=:dict)
end
我希望生成器 (find_all_it3) 不需要分配太多内存。
然而,根据 btime
输出,似乎每个循环都有一个分配。
98.040 μs (10006 allocations: 625.64 KiB)
18.894 μs (18 allocations: 11.95 KiB)
现在在我的场景中,findall
的速度和分配最终成为一个问题,因此我试图通过 generator/iterators 找到更好的替代方案,以便减少分配;有没有办法做到这一点?是否有可供考虑的选项?
我没有解释,但这是我做的一些测试的结果
- 最佳时间是
view(population, :, 1) .< 5
(test4
) - 使用
broadcast!
减少了一点分配 (test5
) - 减少分配的最好方法是做你自己的循环(
test6
)
using BenchmarkTools
using StatsBase
population_size = 10000
population = [rand(1:99, population_size) rand(1:2, population_size)]
find_all_it(f::Function, A) = (p[2] for p in eachrow(A) if f(p[1]))
function test1(population)
c_pool = find_all_it(x -> x < 5, population)
c_pool_dict = countmap(c_pool, alg=:dict)
end
function test3(population)
c_pool_indexes = findall(x -> x < 5, view(population, :, 1))
c_pool_dict = countmap(view(population,c_pool_indexes, 2), alg=:dict)
end
function test4(population)
c_pool_indexes = view(population, :, 1) .< 5
c_pool_dict = countmap(view(population,c_pool_indexes, 2), alg=:dict)
end
function test5(c_pool_indexes, population)
broadcast!(<, c_pool_indexes, view(population, :, 1), 5)
c_pool_dict = countmap(view(population,c_pool_indexes, 2), alg=:dict)
end
function test6(population)
d = Dict{Int,Int}()
for i in eachindex(view(population, :, 1))
if population[i, 1] < 5
d[population[i,2]] = 1 + get(d,population[i,2],0)
end
end
return d
end
julia> @btime test1(population);
68.200 μs (10004 allocations: 625.59 KiB)
julia> @btime test3(population);
14.800 μs (14 allocations: 9.00 KiB)
julia> @btime test4(population);
7.250 μs (8 allocations: 9.33 KiB)
julia> temp = zeros(Bool, population_size);
julia> @btime test5(temp, population);
16.599 μs (5 allocations: 3.78 KiB)
julia> @btime test6(population);
11.299 μs (4 allocations: 608 bytes)