区分空正则表达式匹配和 Haskell 中没有匹配

Distinguish empty regexp matches from no matches in Haskell

我正在尝试使用 regex-pcre,但是 regex-base 包含太多 RegexContext 的重载,所以我不知道应该使用哪一个来完成手头的任务。

我想通过以下方式将字符串与 (foo)-(bar)|(quux)-(quux)(q*u*u*x*) 正则表达式匹配:

myMatch :: String -> Maybe (String, String, Maybe String)

示例输出:

这不是作业,我只是对 https://github.com/erantapaa/haskell-regexp-examples/blob/master/RegexExamples.hs 在没有匹配项或没有捕获组的情况下如何不包含代码路径感到困惑

实现它的一种方法是使用 getAllTextSubmatches:

import Text.Regex.PCRE

myMatch :: String -> Maybe (String, String, Maybe String)
myMatch str = case getAllTextSubmatches $ str =~ "(foo)-(bar)|(quux)-(quux)(q*u*u*x*)" :: [String] of
  []                      -> Nothing
  [_, g1, g2, "", "", ""] -> Just (g1, g2, Nothing)
  [_, "", "", g3, g4, g5] -> Just (g3, g4, Just g5)

getAllTextSubmatches 具有 [String] 作为 return 类型时,如果没有匹配项,它 return 是一个空列表,或者是一个包含所有捕获组的列表(其中索引 0 是第一个匹配项的整个匹配项。

或者,如果匹配的组可能为空并且您不能对空字符串进行模式匹配,您可以使用 [(String, (MatchOffset, MatchLength))] 作为 getAllTextSubmatches 的 return 类型和模式匹配 MatchOffset 与 -1 识别不匹配的组:

myMatch :: String -> Maybe (String, String, Maybe String)
myMatch str = case getAllTextSubmatches $ str =~ "(foo)-(bar)|(quux)-(quux)(q*u*u*x*)" :: [(String, (MatchOffset, MatchLength))] of
  []                                                              -> Nothing
  [_, (g1, _), (g2, _), (_, (-1, _)), (_, (-1, _)), (_, (-1, _))] -> Just (g1, g2, Nothing)
  [_, (_, (-1, _)), (_, (-1, _)), (g3, _), (g4, _), (g5, _)]      -> Just (g3, g4, Just g5)

现在,如果这看起来太冗长:

{-# LANGUAGE PatternSynonyms #-}

pattern NoMatch = ("", (-1, 0))

myMatch :: String -> Maybe (String, String, Maybe String)
myMatch str = case getAllTextSubmatches $ str =~ "(foo)-(bar)|(quux)-(quux)(q*u*u*x*)" :: [(String, (MatchOffset, MatchLength))] of
  []                                               -> Nothing
  [_, (g1, _), (g2, _), NoMatch, NoMatch, NoMatch] -> Just (g1, g2, Nothing)
  [_, NoMatch, NoMatch, (g3, _), (g4, _), (g5, _)] -> Just (g3, g4, Just g5)

要区分何时没有匹配项,请使用 =~~ 以便将结果放在 Maybe monad 中。如果没有匹配,它将使用 fail 到 return Nothing

myMatch :: String -> Maybe (String, String, Maybe String)
myMatch str = do
    let regex = "(foo)-(bar)|(quux)-(quux)(q*u*u*x*)"
    groups <- getAllTextSubmatches <$> str =~~ regex :: Maybe [String]
    case groups of
        [_, g1, g2, "", "", ""] -> Just (g1, g2, Nothing)
        [_, "", "", g3, g4, g5] -> Just (g3, g4, Just g5)

使用regex-applicative

myMatch = match re
re = foobar <|> quuces where
    foobar = (,,) <$> "foo" <* "-" <*> "bar" <*> pure Nothing
    quuces = (,,)
        <$> "quux" <* "-"
        <*> "quux"
        <*> (fmap (Just . mconcat) . sequenceA)
            [many $ sym 'q', many $ sym 'u', many $ sym 'u', many $ sym 'x']

或者,使用 ApplicativeDo,

re = foobar <|> quuces where
    foobar = do
        foo <- "foo"
        _ <- "-"
        bar <- "bar"
        pure (foo, bar, Nothing)
    quuces = do
        quux1 <- "quux"
        _ <- "-"
        quux2 <- "quux"
        quux3 <- fmap snd . withMatched $
            traverse (many . sym) ("quux" :: [Char])
            -- [many $ sym 'q', many $ sym 'u', many $ sym 'u', many $ sym 'x']
        pure (quux1, quux2, Just quux3)