Julia 中的素数迭代器

Prime Iterator in Julia

在 Julia 中是否有一个(高效的)迭代器来生成素数?内置函数 primes[N] 一次生成最多 N 的所有素数,而不是按要求生成,并且当 N 非常大或未知时可能无法使用。

您可以使用概率素性测试

过滤通过(大)整数(Base.Count{BigInt} 迭代器)的计数器
iterprimes = filter(isprime,countfrom(big(2),1))

然后举例

julia> collect(take(iterprimes, 5))
5-element Array{Any,1}:
  2
  3
  5
  7
 11

这总体上不如筛子有效,但不会在内存中保留巨大的结构。我记得 isprime 在标准的重复选择中至少没有高达 2^64 的误报。

编辑:

第二种可能性是将 primes(N*(i-1)+1,N*i)Base.flatten 的块生成(参见 Generator)到一个列表中:

Base.flatten(primes(1000000*(i-1)+1,1000000*i) for i in countfrom(1))

在这台机器上,这个迭代器在计算前 10^9 个素数方面实际上胜过普通的 primes

编辑 2:

使用 gmpznextprime 的迭代器。

type 
   PrimeIter
end
function nextprime(y::BigInt)
    x = BigInt()
    ccall((:__gmpz_nextprime,:libgmp), Void, (Ptr{BigInt},Ptr{BigInt}), &x, &y)
    x
end
Base.start(::PrimeIter) = big(2)
Base.next(::PrimeIter, state) = state, nextprime(state)
Base.done(::PrimeIter, _) = false
Base.iteratorsize(::PrimeIter) = Base.IsInfinite()


> first(drop(PrimeIter(), 10^5))
1299721

您可以查看 Lazy.jl,它可以按需提供主要迭代。它适用于未知的大数字。假设您要使用小于上限的所有质数,并使用 space 来存储它们。

引用自述文件:-

# isprime defined in terms of the prime numbers:
isprime(n) =
  @>> primes begin
    takewhile(x -> x<=sqrt(n))
    map(x -> n % x == 0)
    any; !
  end

# the prime numbers defined in terms of isprime:
primes = filter(isprime, range(2));

take(20, primes)
#> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71)

为了解释代码,首先 isprime 函数是使用所有素数的列表 primes (当时尚未定义)定义的,方法是将所有素数减去比 n 的平方根,检查它们是否能整除 n,并从逻辑上取反。

然后 prime 被定义为 isprimefilter 对从 2 开始的所有整数。

要获取 n 以下的所有质数,您可以 运行 @>> primes takewhile(p -> p <= n) 而不是 take.

一种节省存储空间但会给您一些非质数的替代方法是使用轮子,请参阅 Wheel Factorization。所需要做的就是存储找到的最后一个数字并转到轮盘上的下一个数字。

例如,分别对待2和3。然后从 5 开始交替添加 2 和 4:5、7、11、13、15 ... 生成的数字流消除了 2 和 3 的所有倍数。还有更复杂的轮子,它们也将消除 5 或更高素数的倍数。

此方法将花费一些时间除以非素数,但会节省所需的存储空间。随着素数变得越来越稀有,所有轮子的效率都会降低。您将了解系统的时间和存储限制。

您没有说您认为迭代器的合理范围是多少,也没有说您希望处理它多长时间。此类算法通常有两种形式,例如:A) 短但慢(范围为一百万秒),以及 B) 更复杂但快得多(比 A 快大约一百倍)。以下是每个示例。

A) 基于内置 "Primes" 包 (pkg add "Primes") 的迭代器版本,特别是 nextprime 函数:

using Primes: nextprime

mutable struct PrimesGen
    lastprime :: UInt64
    PrimesGen() = new()
end
Base.eltype(::Type{PrimesGen}) = Int64
Base.IteratorSize(::PrimesGen) = Base.IsInfinite()
function Base.iterate(PG::PrimesGen, st::UInt64 = UInt64(1)) # :: Union{Nothing,Tuple{UInt64,UInt64}}
    next = nextprime(st + 1)
    next, next
end

EDIT_ADD: 由于他的多个嵌套迭代器,上述代码比@mschauer 的第一个解决方案(更新到当前 Julia 版本 1.0)稍微快一些,因为如下:

using Primes: isprime
PrimesGen() = Iterators.filter(isprime, Iterators.countfrom(UInt64(2)))

但是它很短,可以用同样的方式使用... END_EDIT_ADD

您可以使用它执行以下操作:

using Printf
@time let sm = 0
          for p in PrimesGen() p >= 2_000_000 && break; sm += p end
          Printf.@printf("%d\n",sm)
      end

生成以下内容:

142913828922
  0.651754 seconds (327.05 k allocations: 4.990 MiB)

这足以用于这些较小的范围,例如解决上面的欧拉问题 10(运行 在 1.92 GHz 的英特尔 x5-Z8350 上)。

上面的迭代器实际上有一个 "infinite" 范围的 UInt64 数字,但在超过 30 万年内不会达到,所以我们真的不需要担心它.. .

B) 对于涉及十亿或更多范围的 "industrial strength" 问题,需要迭代器(或直接函数调用)实现页面分段的埃拉托色尼筛法,速度大约快一百倍,实现如下:

const Prime = UInt64
const BasePrime = UInt32
const BasePrimesArray = Array{BasePrime,1}
const SieveBuffer = Array{UInt8,1}

# contains a lazy list of a secondary base primes arrays feed
# NOT thread safe; needs a Mutex gate to make it so...
abstract type BPAS end # stands in for BasePrimesArrays, not defined yet
mutable struct BasePrimesArrays <: BPAS
    thunk :: Union{Nothing,Function} # problem with efficiency - untyped function!!!!!!!!!
    value :: Union{Nothing,Tuple{BasePrimesArray, BPAS}}
    BasePrimesArrays(thunk::Function) = new(thunk)
end
Base.eltype(::Type{BasePrimesArrays}) = BasePrime
Base.IteratorSize(::Type{BasePrimesArrays}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(BPAs::BasePrimesArrays, state::BasePrimesArrays = BPAs)
    if state.thunk !== nothing
        newvalue :: Union{Nothing,Tuple{BasePrimesArray, BasePrimesArrays}} =
            state.thunk() :: Union{Nothing,Tuple{BasePrimesArray
                                                 , BasePrimesArrays}}
        state.value = newvalue
        state.thunk = nothing
        return newvalue
    end
    state.value
end

# count the number of zero bits (primes) in a byte array,
# also works for part arrays/slices, best used as an `@view`...
function countComposites(cmpsts::AbstractArray{UInt8,1})
    foldl((a, b) -> a + count_zeros(b), cmpsts; init = 0)
end

# converts an entire sieved array of bytes into an array of UInt32 primes,
# to be used as a source of base primes...
function composites2BasePrimesArray(low::Prime, cmpsts::SieveBuffer)
    limiti = length(cmpsts) * 8
    len :: Int = countComposites(cmpsts)
    rslt :: BasePrimesArray = BasePrimesArray(undef, len)
    i :: Int = 0
    j :: Int = 1
    @inbounds(
    while i < limiti
        if cmpsts[i >>> 3 + 1] & (1 << (i & 7)) == 0
            rslt[j] = low + i + i
            j += 1
        end
        i += 1
    end)
    rslt
end

# sieving work done, based on low starting value for the given buffer and
# the given lazy list of base prime arrays...
function sieveComposites(low::Prime, buffer::Array{UInt8,1},
                                     bpas::BasePrimesArrays)
    lowi :: Int = (low - 3) ÷ 2
    len :: Int = length(buffer)
    limiti :: Int = len * 8 - 1
    nexti :: Int = lowi + limiti
    for bpa::BasePrimesArray in bpas
        for bp::BasePrime in bpa
            bpint :: Int = bp
            bpi :: Int = (bpint - 3) >>> 1
            starti :: Int = 2 * bpi * (bpi + 3) + 3
            starti >= nexti && return
            if starti >= lowi starti -= lowi
            else
                r :: Int = (lowi - starti) % bpint
                starti = r == 0 ? 0 : bpint - r
            end
            lmti :: Int = limiti - 40 * bpint
            @inbounds(
            if bpint <= (len >>> 2) starti <= lmti
                for i in 1:8
                    if starti > limiti break end
                    mask = convert(UInt8,1) << (starti & 7)
                    c = starti >>> 3 + 1
                    while c <= len
                        buffer[c] |= mask
                        c += bpint
                    end
                    starti += bpint
                end
            else
                c = starti
                while c <= limiti
                    buffer[c >>> 3 + 1] |= convert(UInt8,1) << (c & 7)
                    c += bpint
                end
            end)
        end
    end
    return
end

# starts the secondary base primes feed with minimum size in bits set to 4K...
# thus, for the first buffer primes up to 8293,
# the seeded primes easily cover it as 97 squared is 9409.
function makeBasePrimesArrays() :: BasePrimesArrays
    cmpsts :: SieveBuffer = Array{UInt8,1}(undef, 512)
    function nextelem(low::Prime, bpas::BasePrimesArrays) ::
                                    Tuple{BasePrimesArray, BasePrimesArrays}
        # calculate size so that the bit span is at least as big as the
        # maximum culling prime required, rounded up to minsizebits blocks...
        reqdsize :: Int = 2 + isqrt(1 + low)
        size :: Int = (reqdsize ÷ 4096 + 1) * 4096 ÷ 8 # size in bytes
        if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
        fill!(cmpsts, 0)
        sieveComposites(low, cmpsts, bpas)
        arr :: BasePrimesArray = composites2BasePrimesArray(low, cmpsts)
        next :: Prime = low + length(cmpsts) * 8 * 2
        arr, BasePrimesArrays(() -> nextelem(next, bpas))
    end
    # pre-seeding breaks recursive race,
    # as only known base primes used for first page...
    preseedarr :: BasePrimesArray = # pre-seed to 100, can sieve to 10,000...
        [ 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
        , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97
        ]
    nextfunc :: Function = () ->
        (nextelem(convert(Prime,101), makeBasePrimesArrays()))
    firstfunc :: Function = () -> (preseedarr, BasePrimesArrays(nextfunc))
    BasePrimesArrays(firstfunc)
end

# an iterator over successive sieved buffer composite arrays,
# returning a tuple of the value represented by the lowest possible prime
# in the sieved composites array and the array itself;
# array has a 16 Kilobytes minimum size (CPU L1 cache), but
# will grow so that the bit span is larger than the
# maximum culling base prime required, possibly making it larger than
# the L1 cache for large ranges, but still reasonably efficient using
# the L2 cache: very efficient up to about 16e9 range;
# reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 week...
struct PrimesPages
    baseprimes :: BasePrimesArrays
    PrimesPages() = new(makeBasePrimesArrays())
end
Base.eltype(::Type{PrimesPages}) = SieveBuffer
Base.IteratorSize(::Type{PrimesPages}) = Base.IsInfinite()
function Base.iterate(PP::PrimesPages,
                      state :: Tuple{Prime,SieveBuffer} =
                            ( convert(Prime,3), Array{UInt8,1}(undef,16384) ))
    (low, cmpsts) = state
    # calculate size so that the bit span is at least as big as the
    # maximum culling prime required, rounded up to minsizebits blocks...
    reqdsize :: Int = 2 + isqrt(1 + low)
    size :: Int = (reqdsize ÷ 131072 + 1) * 131072 ÷ 8 # size in bytes
    if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
    fill!(cmpsts, 0)
    sieveComposites(low, cmpsts, PP.baseprimes)
    newlow :: Prime = low + length(cmpsts) * 8 * 2
    ( low, cmpsts ), ( newlow, cmpsts )
end

function countPrimesTo(range::Prime) :: Int64
    range < 3 && ((range < 2 && return 0) || return 1)
    count :: Int64 = 1
    for ( low, cmpsts ) in PrimesPages() # almost never exits!!!
        if low + length(cmpsts) * 8 * 2 > range
            lasti :: Int = (range - low) ÷ 2
            count += countComposites(@view cmpsts[1:lasti >>> 3])
            count += count_zeros(cmpsts[lasti >>> 3 + 1] |
                                 (0xFE << (lasti & 7)))
            return count
        end
        count += countComposites(cmpsts)
    end
    count
end

可以这样调用:

using Printf
@time let sm = 0
          for p in PrimesPaged() p >= 2_000_000 && break; sm += p end
          Printf.@printf("%d\n",sm)
      end

生成以下内容:

142913828922
  0.016245 seconds (60 allocations: 23.891 KiB)

但还不够"warm it up";可以用一百倍大的参数调用它以生成以下内容:

1075207199997334
  1.381198 seconds (2.35 k allocations: 103.875 KiB)

并且可以使用以下代码将所有素数数到十亿:

println(@time let count = 0
                  for p in PrimesPaged()
                      p > 1_000_000_000 && break
                      count += 1
                  end; count end)

生成以下内容:

  6.802044 seconds (11.51 k allocations: 396.734 KiB)
50847534

但是,与首先筛选素数相比,迭代筛选后的素数所花费的时间要长得多。这可以通过调用提供的优化计数函数来消除大部分枚举时间来显示,如下所示:

println(@time countPrimesTo(Prime(1_000_000_000)))

生成以下内容:

  1.959057 seconds (65 allocations: 39.266 KiB)
50847534

以类似的方式,可以编写一个 sumPrimesTo 函数来在多一点时间(比如两次)内将筛选出的素数求和为十亿...

这里的所有测试都是 运行 在同一个 x5-Z8350 CPU 上以 1.92 Gigahertz 进行的。

这表明,对于真正的大问题,不应使用迭代器,而应使用直接在剔除页面段上运行的自定义函数,如 countPrimesTo 函数在此处所做的那样。完成后,值得进行进一步的优化,例如最大轮分解(筛分速度进一步提高四倍)和多线程(以获得有效数量的增益 CPU使用的治疗(不包括超线程治疗)最终结果不会比 Kim Walich 的 "primesieve".

慢多少

同样,这也有相同的UInt64 "infinite"限制,但它仍然需要数百年才能到达那里,所以我们仍然不必担心。