Parsec3 用于带引号的字符串的文本解析器，引号之间允许所有内容

Question

其实我之前也问过这个问题(here)，但事实证明提供的解决方案并没有处理所有的测试用例。另外，我需要 'Text' 解析器而不是 'String'，所以我需要 parsec3.

好的，解析器应该允许引号之间的每种类型的字符，甚至是引号。引用文本的结尾由 ' 字符标记，后跟 |、space 或输入结束。

所以，

'aa''''|

应该return一个字符串

aa'''

这是我的：

import Text.Parsec
import Text.Parsec.Text


quotedLabel :: Parser Text
quotedLabel = do -- reads the first quote.
    spaces
    string "'"
    lab <-  liftM pack $ endBy1 anyChar endOfQuote
    return  lab

endOfQuote = do
    string "'"
    try(eof) <|> try( oneOf "| ")

现在，这里的问题当然是 eof 与 oneOf "| " 的类型不同，因此编译失败。

我该如何解决这个问题？有没有更好的方法来实现我想要做的事情？

Answer 1

要更改任何函子计算的结果，您可以使用：

fmap (const x) functor_comp

例如：

getLine :: IO String
fmap (const ()) getLine :: IO ()

eof :: Parser ()
oneOf "| "  :: Parser Char

fmap (const ()) (oneOf "| ") :: Parser ()

另一种选择是使用来自 Control.Applicative:

的运算符

getLine *> return 3  :: IO Integer

这会执行 getLine，丢弃结果并 returns 3.

在你的情况下，你可以使用：

try(eof) <|> try( oneOf "| " *> return ())

Answer 2

白色space

首先评论处理白色 space...

通常的做法是编写您的解析器，以便它们消耗令牌后的白色space 或句法单位。像这样定义组合器很常见：

lexeme p = p <* spaces

轻松地将解析器 p 转换为丢弃白色的解析器space 遵循任何 p 解析。例如，如果您有

number = many1 digit

想吃完就用lexeme number 白色space跟在数字后面

更多关于这种处理白色的方法space和其他建议关于解析语言，请参阅 this Megaparsec tutorial。

标签表达式

根据 your previous SO question 看来你想要解析以下形式的表达式：

label1 | label2 | ... | labeln

其中每个标签可以是简单标签或引用标签。

解析此模式的惯用方法是像这样使用 sepBy：

labels :: Parser String
labels = sepBy1 (try quotedLabel <|> simpleLabel) (char '|')

我们根据以下定义了 simpleLabel 和 quotedLabel 其中可能出现什么字符。对于 simpleLabel 有效字符是非 |和非space:

simpleLabel :: Parser String
simpleLabel = many (noneOf "| ")

quotedLabel 是单引号后跟运行后跟结尾的有效引用标签字符单引号：

sq = char '\''

quotedLabel :: Parser String
quotedLabel = do
  char sq
  chs <- many validChar
  char sq
  return chs

validChar 是非单引号或单引号引用后没有 eof 或竖线：

validChar = noneOf [sq] <|> try validQuote

validQuote = do
  char sq
  notFollowedBy eof
  notFollowedBy (char '|')
  return sq

如果只出现单引号，第一个notFollowedBy将失败输入结束前。第二个 notFollowedBy 将失败，如果下一个字符是竖线。因此两者的顺序仅当后面有非垂直条字符时才会成功单引号。在这种情况下，应该解释单引号作为字符串的一部分而不是终止单引号。

不幸的是，这并不完全有效，因为 notFollowedBy 的当前实施使用不消耗任何内容的解析器将始终成功输入——比如 eof。（有关详细信息，请参阅 this issue。）

要解决这个问题，我们可以使用这个替代方法实施：

notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
      do {a <- try p; return (unexpected (show a));}
  <|> return (return ())

这是包含一些测试的完整解决方案。通过添加一些 lexeme 调用你可以让这个解析器吃掉你决定的任何白色 space 这并不重要。

import Text.Parsec hiding (labels)
import Text.Parsec.String
import Control.Monad

notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
      do {a <- try p; return (unexpected (show a));}
  <|> return (return ())

sq = '\''

validChar = do
  noneOf "'" <|> try validQuote

validQuote = do
  char sq
  notFollowedBy' eof
  notFollowedBy (char '|')
  return sq

quotedLabel :: Parser String
quotedLabel = do
  char sq
  str <- many validChar
  char sq
  return str

plainLabel :: Parser String
plainLabel = many (noneOf "| ")

labels :: Parser [String]
labels = sepBy1 (try quotedLabel <|> try plainLabel) (char '|')

test input expected = do
  case parse (labels <* eof) "" input of
    Left e -> putStrLn $ "error: " ++ show e
    Right v -> if v == expected
                 then putStrLn $ "OK - got: " ++ show v
                 else putStrLn $ "NOT OK - got: " ++ show v ++ "  expected: " ++ show expected

test1 = test "a|b|c"      ["a","b","c"]
test2 = test "a|'b b'|c"  ["a", "b b", "c"]
test3 = test "'abc''|def" ["abc'", "def" ]
test4 = test "'abc'"      ["abc"]
test5 = test "x|'abc'"    ["x","abc"]

Parsec3 用于带引号的字符串的文本解析器，引号之间允许所有内容

Parsec3 Text parser for quoted string, where everything is allowed in between quotes

string

parsing

haskell

parsec