提高 Haskell 代码性能（BangPatterns、LazyByteString）

Question

我用过 BangPatterns、Lazy ByteString。不知道还能做什么来提高这段代码的性能。有什么想法和建议吗？它显然不是最快的版本，因为它超过了时间限制。

-- Find the sum of all the multiples of 3 or 5 below N
-- Input Format 
-- First line contains T that denotes the number of test cases. This is followed by T lines, each containing an integer, N.

{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O2 -optc-O2 #-}
import qualified Data.ByteString.Lazy as L
import Control.Monad (mapM_)

readInt :: L.ByteString -> Int
readInt !s = L.foldl' (\x c -> 10 * x + fromIntegral c - 48) 0 s

main :: IO ()
main = do 
-- don't need the number of inputs, since it is read lazily.
-- split input by lines
  (_:ls) <- L.split 10 `fmap` L.getContents
-- length ls <= 10^5
  mapM_ (print . f . readInt) ls

-- n <= 10^9
f :: Int -> Int
f n = go 0 0
  where
    go !i !a | i == n            = a
    go !i !a | i `mod` 3 == 0
               || i `mod` 5 == 0 = go (i+1) (a+i)
    go !i !a                     = go (i+1) a

Answer 1

你在行print中的使用

mapM_ (print . f . readInt) ls

可能会引入一些开销，因为 print 依赖于 Int 的 Show 实例，这意味着将转换为低效的 Strings。

添加以下导入

import qualified Data.ByteString.Builder as BB
import qualified Data.Foldable as F
import Data.List.Split (chunksOf) -- from the "split" package
import System.IO -- for stdout

并尝试用类似

的方式更改该行

let resultList = map (f . readInt) ls
F.mapM_ (BB.hPutBuilder stdout . F.foldMap BB.intDec) (chunksOf 1000 resultList)

takes chunks of size 1000 from the list of Ints and uses the efficient Builder type and the specialized hPutBuilder 函数将它们写入标准输出。

（我添加了分块，否则我担心构建 Builder 会迫使整个输入列表进入内存。我们不希望这样，因为列表正在被延迟读取。）

不过我不确定这是否是主要瓶颈。

Answer 2

danidiaz 已经输入和输出问题。

产生 3 或 5 的倍数的一种快速方法是使用主筛常用的那种“轮”。

multiples3or5 = go 0 $ cycle [3,2,1,3,1,2,3]
  where
    go n (x : xs) = n : go (n+x) xs
    go n [] = error "impossible"

事实上，由于循环列表永远不会结束，所以使用不同的类型会更干净。并且由于您使用的是 Int，它也可以专门针对性能进行解压缩。请注意，GHC 版本 7.8 或更高版本不需要此上下文中的 UNPACK pragma。

data IntStream = {-# UNPACK #-} !Int :> IntStream
infixr 5 :>

wheel :: IntStream
wheel = 3 :> 2 :> 1 :> 3 :> 1 :> 2 :> 3 :> wheel

multiples3or5 = go 0 wheel
  where
    go !n (x :> xs) = n : go (n+x) xs

作为 fgv , this is in the nature of an anamorphism。你可以通过写

看到这个

multiples3or5 = unfoldr go (0, wheel) where
  go (!n, (x :> xs)) = Just (n, (n+x, xs))

但请注意，unfoldr 在尚未正式发布的 base 4.8 之前还没有变得足够高效以用于任何用途。

打印结果的时候，系统要把很多东西除以10，不知道那些例程有没有特别优化，但是我知道GHC的native code generator做的 not 目前优化已知除数的除法，除非该除数是 2 的幂。因此您可能会发现可以通过使用 -fllvm 并小心使用 LLVM 的兼容版本来提高性能。

编辑

请参阅以获得更好的方法。

Answer 3

如果您真的很关心效率，请重新考虑算法。您的主要瓶颈是您手动对 1 和 N 之间的一堆数字求和，无论您做什么，这在大 N 上都会表现不佳做。

相反，从数学角度思考。 N以内的所有3或5的倍数之和等于几乎所有3的倍数之和直至N（称之为S_3），加上5的所有倍数之和，直到N（称之为 S_5).我说 "almost" 是因为有些数字被重复计算；称它们的总和为 T。现在你想要的总和正好是S_3 + S_5 – T，每一项都有一个漂亮的封闭公式（它是什么？）。计算这三个数字比你现在做的要快得多。

Answer 4

这是没有那些"think about"导师答案的公式

sumMultiplesOf::Integral n=>n->n->n
sumMultiplesOf k n = d * (1 + d) `div` 2 * k where d = (n - 1) `div` k

sumMultiplesOf3or5::Integral n=>n->n
sumMultiplesOf3or5 n = sumMultiplesOf 3 n + sumMultiplesOf 5 n - sumMultiplesOf 15 n

提高 Haskell 代码性能（BangPatterns、LazyByteString）

Improving Haskell code performance (BangPatterns, LazyByteString)

haskell

编辑