Haskell parsec：`optional` 组合器中的`many` 组合器

Question

我想使用 Haskell 的 parsec 库来实现这个语法规则：

((a | b | c)* (a | b))?

这是一个接受可选（即可能为空）字符串的解析器规则。如果它接受的字符串不为空，则可以通过零次或多次出现的 a b 或 c 解析器来使用它，但最外层 ? 可选解析器必须由解析器 a 或 b 使用，但不能被 c 使用。这是一个例子：

module Main where

import Text.Parsec
import Text.Parsec.Text

a,b,c :: GenParser () Char
a = char 'a'
b = char 'b'
c = char 'c'

-- ((a | b | c)* (a | b))?
myParser = undefined

shouldParse1,shouldParse2,shouldParse3,
      shouldParse4,shouldFail :: Either ParseError String
-- these should succeed
shouldParse1 = runParser myParser () "" "" -- because ? optional
shouldParse2 = runParser myParser () ""  "b"
shouldParse3 = runParser myParser () "" "ccccccb"
shouldParse4 = runParser myParser () "" "aabccab"

-- this should fail because it ends with a 'c'
shouldFail = runParser myParser () "" "aabccac"

main = do
  print shouldParse1
  print shouldParse2
  print shouldParse3
  print shouldParse4
  print shouldFail

第一次尝试可能是这样的：

myParser = option "" $ do
  str <- many (a <|> b <|> c)
  ch  <- a <|> b
  return (str ++ [ch])

但是 many 只消耗了每个测试用例中的所有 'a' 'b' 和 'c' 个字符，a <|> b 没有字符可以消耗。

问题:

使用秒差距组合器，((a | b | c)* (a | b))? 定义 myParser 的正确实现是什么？

Answer 1

我们也可以稍微不同地说：c 在您的解析器中只有在其后跟任何标记时才可能成功，这可以通过单个 lookAhead:

来完成

myParser = many (a <|> b <|> (c <* (lookAhead anyToken <?> "non C token"))) <* eof

Haskell parsec：`optional` 组合器中的`many` 组合器

Haskell parsec: `many` combinator inside an `optional` combinator

haskell

parsec