读取每个文件的第一行在二进制文件中中止

Reading first line of each file getting aborted at binary files

我正在尝试读取当前目录中每个文件的第一行:

import System.IO(IOMode(ReadMode), withFile, hGetLine)
import System.Directory (getDirectoryContents, doesFileExist, getFileSize)
import System.FilePath ((</>))
import Control.Monad(filterM)

readFirstLine :: FilePath -> IO String
readFirstLine fp = withFile fp ReadMode System.IO.hGetLine

getAbsoluteDirContents :: String -> IO [FilePath]
getAbsoluteDirContents dir = do
    contents <- getDirectoryContents dir
    return $ map (dir </>) contents

main :: IO ()
main = do
    -- get a list of all files & dirs
    contents <- getAbsoluteDirContents "."
    -- filter out dirs
    files <- filterM doesFileExist contents
    -- read first line of each file
    d <- mapM readFirstLine files
    print d

它正在编译并且 运行 但由于二进制文件中的以下错误而中止:

mysrcfile: ./aBinaryFile: hGetLine: invalid argument (invalid byte sequence)

我想检测并避免此类文件并继续处理下一个文件。

二进制文件是包含无法解码为有效字符串的字节序列的文件。但是,如果不检查其内容,二进制文件与文本文件没有区别。

使用 "It's Easier to Ask Forgiveness than Permission (EAFP)" 方法可能更好:我们尝试读取第一行,如果失败,我们将忽略输出。

import Control.Exception(catch, IOException)
import System.IO(IOMode(ReadMode), withFile, hGetLine)

readFirstLine :: FilePath -> IO (Maybe String)
readFirstLine fp = withFile fp ReadMode $
    \h -> (catch (fmap Just (hGetLine h))
        ((const :: a -> IOException -> a) (return Nothing)))

对于 FilePath 这 return 是 IO (Maybe String)。如果我们 运行 IO (Maybe String),如果它可以读取这样的文件,它将 return 一个 Just x 第一行 x ,如果它可以读取 Nothing遇到了 IOException

然后我们可以利用catMaybes :: [Maybe a] -> [a]获得Just xs:

import Data.Maybe(<b>catMaybes</b>)

main :: IO ()
main = do
    -- get a list of all files & dirs
    contents <- getAbsoluteDirContents "."
    -- filter out dirs
    files <- filterM doesFileExist contents
    -- read first line of each file
    d <- mapM readFirstLine files
    print (<b>catMaybes</b> d)

或者您可以使用 mapMaybeM :: Monad m => (a -> m (Maybe b)) -> [a] -> m [b] in the extra package [Hackage] 自动为您完成这项工作。