Haskell 涉及 `read` 的程序比等效的 Python 慢得多

Question

作为编程挑战的一部分，我需要从标准输入读取一系列 space 分隔的整数（在一行），然后打印这些整数的总和到标准输出。所讨论的序列最多可以包含 10,000,000 个整数。

对此我有两个解决方案：一个用 Haskell (foo.hs) 编写，另一个等效的用 Python 2 (foo.py) 编写。不幸的是，（已编译的）Haskell 程序始终比 Python 程序慢，我无法解释这两个程序之间的性能差异；请参阅下面的 基准测试 部分。如果有的话，我希望 Haskell 占上风...

我做错了什么？我该如何解释这种差异？有没有一种简单的方法可以加速我的 Haskell 代码？

（有关信息，我使用的是 2010 年中期的 Macbook Pro，配备 8Gb RAM、GHC 7.8.4 和 Python 2.7.9。）

`foo.hs`

main = print . sum =<< getIntList

getIntList :: IO [Int]
getIntList = fmap (map read . words) getLine

（用ghc -O2 foo.hs编译）

`foo.py`

ns = map(int, raw_input().split())
print sum(ns)

基准

下面，test.txt 由一行 1000 万个 space 分隔的整数组成。

# Haskell
$ time ./foo < test.txt 
1679257

real    0m36.704s
user    0m35.932s
sys     0m0.632s

# Python
$ time python foo.py < test.txt
1679257 

real    0m7.916s
user    0m7.756s
sys     0m0.151s

Answer 1

读取速度慢

快速阅读，from this answer，将把你缩短到 5.5 秒。

import Numeric
fastRead :: String -> Int
fastRead s = case readDec s of [(n, "")] -> n

字符串是链表

在Haskell中，String类型是一个链表。使用打包表示（bytestring 如果你真的只想要 ascii 但 Text 也非常快并且支持 unicode）。如this answer所示，性能应该是并驾齐驱的。

Answer 2

read 很慢。对于批量解析，使用 bytestring 或 text 原语，或 attoparsec。

我做了一些基准测试。您的原始版本运行在 23,9 秒内出现在我的电脑上。在 0.35 秒内低于运行的版本：

import qualified Data.ByteString.Char8 as B
import Control.Applicative
import Data.Maybe
import Data.List
import Data.Char

main = print . sum =<< getIntList

getIntList :: IO [Int]
getIntList =
    map (fst . fromJust . B.readInt) . B.words <$> B.readFile "test.txt"

通过将解析器专门用于您的 test.txt 文件，我可以将运行时间缩短到 0.26 秒:

getIntList :: IO [Int]          
getIntList =
    unfoldr (B.readInt . B.dropWhile (==' ')) <$> B.readFile "test.txt"

Answer 3

我敢猜测你的问题的很大一部分实际上是 words。当你 map read . words 时，你实际上在做的是：

扫描输入以查找 space，同时构建非 space 的列表。 space 有很多不同的种类，检查任何不是 space 常见类型的字符还涉及对 C 函数的外部调用（慢）。我打算找个时间解决这个问题，但我还没有抽出时间去做，即使那样你仍然会无缘无故地构建和丢弃列表，并在你真正需要时检查 spaces只想检查数字。
通读累积的字符列表，尝试从中算出一个数字。产生号码。累积的列表现在变成垃圾了。
返回步骤 1。

这是一种相当荒谬的处理方式。我相信你甚至可以使用像 reads 这样可怕的东西做得更好，但是使用像 ReadP 这样的东西会更有意义。您还可以尝试更高级的东西，例如基于流的解析；不知道能不能帮上大忙

Haskell 涉及 `read` 的程序比等效的 Python 慢得多

Haskell program involving `read` is much slower than an equivalent Python one

python

io

performance

haskell

`foo.hs`

`foo.py`

基准