Haskell vs C 速度：Eratosthenes 筛法

Question

Haskell 一个是使用优化的 Data.IntSet 实现的，复杂度为 O(lg n)。然而，尽管 Haskell 代码已经针对偶数情况进行了优化，但 n = 2000000 的速度差异为 15 倍（之前为 30 倍）。我想知道 whether/why 我在 Haskell 中的实现不完善。

原版Haskell

primesUpTo :: Int -> [Int]
primesUpTo n = 2 : put S.empty [3,5..n]
    where put :: S.IntSet -> [Int] -> [Int]
          put _ [] = []
          put comps (x:xs) =
            if S.member x comps
            then put comps xs
            else x : put (S.union comps multiples) xs
                where multiples = S.fromList [x*2, x*3 .. n]

更新

fromDistinctAscList 速度提高 4 倍。 2-3-5-7 轮加速 50%。

primesUpTo :: Int -> [Int]
primesUpTo n = 2 : 3 : 5 : 7 : put S.empty (takeWhile (<=n) (spin wheel 11))
    where put :: S.IntSet -> [Int] -> [Int]
          put _ [] = []
          put comps (x:xs) =
            if S.member x comps
            then put comps xs
            else x : put (S.union comps multiples) xs
                where multiples = S.fromDistinctAscList [x*x, x*(x+2) .. n]
          spin (x:xs) n = n : spin xs (n + x)
          wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel

基准测试

所有时间均由 *nix time 命令测量，真实 space

Haskell original : 2e6: N/A;    2e7: >30s
Haskell optimized: 2e6: 0.396s; 2e7: 6.273s
C++ Set (ordered): 2e6: 4.694s; 2e7: >30s
C++ Bool Array   : 2e6: 0.039s; 2e7: 0.421s

Haskell optimized 比 C++ Bool 慢 10~15x，比 C++ Set 快 10x。

源代码

C 编译器选项：g++ 5.3.1，g++ -std=c++11 Haskell 选项：ghc 7.8.4，ghc

C代码（布尔数组）http://pastebin.com/W0s7cSWi

 prime[0] = prime[1] = false;
 for (int i=2; i<=limit; i++) { //edited
     if (!prime[i]) continue;
     for (int j=2*i; j<=n; j+=i)
        prime[j] = false;
 }

C码（套）http://pastebin.com/sNpghrU4

 nonprime.insert(1);
 for (int i=2; i<=limit; i++) { //edited
     if (nonprime.count(i) > 0) continue;
     for (int j=2*i; j<=n; j+=i)
        nonprime.insert(j);
 }

Haskell代码http://pastebin.com/HuMqwvRW 代码如上所写。

Answer 1

I would like to know whether/why my implementation in Haskell is imperfect.

而不是 fromList you better use fromDistinctAscList 执行线性。您还可以添加 仅以 x*x 而非 x*2 开头的奇数倍数 ，因为所有较小的奇数倍数均已添加。就风格而言，右折叠可能比递归更适合。

这样做，当 n 等于 2,000,000 时，我的性能提高了 3 倍以上：

import Data.IntSet (member, union, empty, fromDistinctAscList)

sieve :: Int -> [Int]
sieve n = 2: foldr go (const []) [3,5..n] empty
    where
    go i run obs
        | member i obs = run obs
        | otherwise    = i: run (union obs inc)
        where inc = fromDistinctAscList [i*i, i*(i + 2)..n]

然而，数组具有 O(1) 访问和 缓存友好 内存分配。使用可变数组，我发现您的 Haskell 代码的性能提高了 15 倍以上（n 再次等于 2,000,000）：

{-# LANGUAGE FlexibleContexts #-}
import Data.Array.ST (STUArray)
import Control.Monad (forM_, foldM)
import Control.Monad.ST (ST, runST)
import Data.Array.Base (newArray, unsafeWrite, unsafeRead)

sieve :: Int -> [Int]
sieve n = reverse $ runST $ do
    arr <- newArray (0, n) False :: ST s (STUArray s Int Bool)
    foldM (go arr) [2] [3,5..n]
    where
    go arr acc i = do
        b <- unsafeRead arr i
        if b then return acc else do
            forM_ [i*i, i*(i + 2).. n] $ \k -> unsafeWrite arr k True
            return $ i: acc

Haskell vs C 速度：Eratosthenes 筛法

Haskell vs C speed: Sieve of Eratosthenes

c

math

performance

implementation

haskell

原版Haskell

更新

基准测试

源代码