运行 计算大文件中的字符数时内存不足
Running out of memory while counting characters in a large file
我想统计一个大文件中每个字符出现的次数。虽然我知道计数应该在 Haskell 中以严格的方式实现(我试图使用 foldl' 实现),但我仍然 运行 内存不足。作为对比:文件大小约为2GB,而电脑有100GB内存。该文件中没有很多不同的字符 - 可能有 20 个。我做错了什么?
ins :: [(Char,Int)] -> Char -> [(Char,Int)]
ins [] c = [(c,1)]
ins ((c,i):cs) d
| c == d = (c,i+1):cs
| otherwise = (c,i) : ins cs d
main = do
[file] <- getArgs
txt <- readFile file
print $ foldl' ins [] txt
您的 ins
函数正在从 Control.DeepSeq
中创建大量 thunks that cause a lot of memory leak. foldl'
only evaluates to weak head normal form which is not enough here. What you need is deepseq
以获得 正常形式 .
或者,使用 Data.Map.Strict
for counting. Also, If your IO is on the order of 2GB, you better use lazy ByteString
代替普通字符串,而不是关联列表。
无论输入大小如何,下面的代码都应该在常量内存中执行space:
import System.Environment (getArgs)
import Data.Map.Strict (empty, alter)
import qualified Data.ByteString.Lazy.Char8 as B
main :: IO ()
main = getArgs >>= B.readFile . head >>= print . B.foldl' go empty
where
go = flip $ alter inc
inc :: Maybe Int -> Maybe Int
inc Nothing = Just 1
inc (Just i) = Just $ i + 1
我想统计一个大文件中每个字符出现的次数。虽然我知道计数应该在 Haskell 中以严格的方式实现(我试图使用 foldl' 实现),但我仍然 运行 内存不足。作为对比:文件大小约为2GB,而电脑有100GB内存。该文件中没有很多不同的字符 - 可能有 20 个。我做错了什么?
ins :: [(Char,Int)] -> Char -> [(Char,Int)]
ins [] c = [(c,1)]
ins ((c,i):cs) d
| c == d = (c,i+1):cs
| otherwise = (c,i) : ins cs d
main = do
[file] <- getArgs
txt <- readFile file
print $ foldl' ins [] txt
您的 ins
函数正在从 Control.DeepSeq
中创建大量 thunks that cause a lot of memory leak. foldl'
only evaluates to weak head normal form which is not enough here. What you need is deepseq
以获得 正常形式 .
或者,使用 Data.Map.Strict
for counting. Also, If your IO is on the order of 2GB, you better use lazy ByteString
代替普通字符串,而不是关联列表。
无论输入大小如何,下面的代码都应该在常量内存中执行space:
import System.Environment (getArgs)
import Data.Map.Strict (empty, alter)
import qualified Data.ByteString.Lazy.Char8 as B
main :: IO ()
main = getArgs >>= B.readFile . head >>= print . B.foldl' go empty
where
go = flip $ alter inc
inc :: Maybe Int -> Maybe Int
inc Nothing = Just 1
inc (Just i) = Just $ i + 1