读取每个文件的第一行在二进制文件中中止
Reading first line of each file getting aborted at binary files
我正在尝试读取当前目录中每个文件的第一行:
import System.IO(IOMode(ReadMode), withFile, hGetLine)
import System.Directory (getDirectoryContents, doesFileExist, getFileSize)
import System.FilePath ((</>))
import Control.Monad(filterM)
readFirstLine :: FilePath -> IO String
readFirstLine fp = withFile fp ReadMode System.IO.hGetLine
getAbsoluteDirContents :: String -> IO [FilePath]
getAbsoluteDirContents dir = do
contents <- getDirectoryContents dir
return $ map (dir </>) contents
main :: IO ()
main = do
-- get a list of all files & dirs
contents <- getAbsoluteDirContents "."
-- filter out dirs
files <- filterM doesFileExist contents
-- read first line of each file
d <- mapM readFirstLine files
print d
它正在编译并且 运行 但由于二进制文件中的以下错误而中止:
mysrcfile: ./aBinaryFile: hGetLine: invalid argument (invalid byte sequence)
我想检测并避免此类文件并继续处理下一个文件。
二进制文件是包含无法解码为有效字符串的字节序列的文件。但是,如果不检查其内容,二进制文件与文本文件没有区别。
使用 "It's Easier to Ask Forgiveness than Permission (EAFP)" 方法可能更好:我们尝试读取第一行,如果失败,我们将忽略输出。
import Control.Exception(catch, IOException)
import System.IO(IOMode(ReadMode), withFile, hGetLine)
readFirstLine :: FilePath -> IO (Maybe String)
readFirstLine fp = withFile fp ReadMode $
\h -> (catch (fmap Just (hGetLine h))
((const :: a -> IOException -> a) (return Nothing)))
对于 FilePath
这 return 是 IO (Maybe String)
。如果我们 运行 IO (Maybe String)
,如果它可以读取这样的文件,它将 return 一个 Just x
第一行 x
,如果它可以读取 Nothing
遇到了 IOException
。
然后我们可以利用catMaybes :: [Maybe a] -> [a]
获得Just x
s:
import Data.Maybe(<b>catMaybes</b>)
main :: IO ()
main = do
-- get a list of all files & dirs
contents <- getAbsoluteDirContents "."
-- filter out dirs
files <- filterM doesFileExist contents
-- read first line of each file
d <- mapM readFirstLine files
print (<b>catMaybes</b> d)
或者您可以使用 mapMaybeM :: Monad m => (a -> m (Maybe b)) -> [a] -> m [b]
in the extra
package [Hackage] 自动为您完成这项工作。
我正在尝试读取当前目录中每个文件的第一行:
import System.IO(IOMode(ReadMode), withFile, hGetLine)
import System.Directory (getDirectoryContents, doesFileExist, getFileSize)
import System.FilePath ((</>))
import Control.Monad(filterM)
readFirstLine :: FilePath -> IO String
readFirstLine fp = withFile fp ReadMode System.IO.hGetLine
getAbsoluteDirContents :: String -> IO [FilePath]
getAbsoluteDirContents dir = do
contents <- getDirectoryContents dir
return $ map (dir </>) contents
main :: IO ()
main = do
-- get a list of all files & dirs
contents <- getAbsoluteDirContents "."
-- filter out dirs
files <- filterM doesFileExist contents
-- read first line of each file
d <- mapM readFirstLine files
print d
它正在编译并且 运行 但由于二进制文件中的以下错误而中止:
mysrcfile: ./aBinaryFile: hGetLine: invalid argument (invalid byte sequence)
我想检测并避免此类文件并继续处理下一个文件。
二进制文件是包含无法解码为有效字符串的字节序列的文件。但是,如果不检查其内容,二进制文件与文本文件没有区别。
使用 "It's Easier to Ask Forgiveness than Permission (EAFP)" 方法可能更好:我们尝试读取第一行,如果失败,我们将忽略输出。
import Control.Exception(catch, IOException)
import System.IO(IOMode(ReadMode), withFile, hGetLine)
readFirstLine :: FilePath -> IO (Maybe String)
readFirstLine fp = withFile fp ReadMode $
\h -> (catch (fmap Just (hGetLine h))
((const :: a -> IOException -> a) (return Nothing)))
对于 FilePath
这 return 是 IO (Maybe String)
。如果我们 运行 IO (Maybe String)
,如果它可以读取这样的文件,它将 return 一个 Just x
第一行 x
,如果它可以读取 Nothing
遇到了 IOException
。
然后我们可以利用catMaybes :: [Maybe a] -> [a]
获得Just x
s:
import Data.Maybe(<b>catMaybes</b>)
main :: IO ()
main = do
-- get a list of all files & dirs
contents <- getAbsoluteDirContents "."
-- filter out dirs
files <- filterM doesFileExist contents
-- read first line of each file
d <- mapM readFirstLine files
print (<b>catMaybes</b> d)
或者您可以使用 mapMaybeM :: Monad m => (a -> m (Maybe b)) -> [a] -> m [b]
in the extra
package [Hackage] 自动为您完成这项工作。