如何在 SMLNJ 中使用正则表达式

How to use regex in SMLNJ

我想输入一个字符串,然后想看看它是否匹配某个正则表达式;如果不是,我想继续使用另一个正则表达式,直到我所有的正则表达式都用完为止。例如,假设我有以下 3 个正则表达式

现在假设所需的字符串是:

- val str_input="7569"

我想先用regex_1检查str_input;如果不匹配再用regex_2试试;如果不匹配最后再用regex_3试试. 问题是如何为此目的使用 SMLNJ。谢谢。

您可以使用 SML/NJ 提供的正则表达式库来实现您想要的。它的文档可以在这里找到:http://www.smlnj.org/doc/smlnj-lib/Manual/regexp-lib-part.html

作为入门的小示例,您需要执行以下操作。首先你需要告诉 SML/NJ 你想使用正则表达式库。您可以使用 .cm 文件(cm 来自 Compilation Manager,它是 SML/NJ 的 Makefile):

sources.cm

group is
  $/basis.cm      (* Load standard functions and modules. *)
  $/regexp-lib.cm (* Load the regexp library.             *)
  main.sml        (* Load our own source file.            *)

现在我们可以使用正则表达式库了。不幸的是,它并不是很简单,因为它使用了仿函数和读取器,但基本上,您需要的是 RE.match 函数,它接受成对列表,其中第一个元素是正则表达式,第二个元素是正则表达式匹配时调用的函数。使用此对列表,RE.match 函数将遍历输入字符串,直到找到匹配项,此时它将调用与在该点匹配的正则表达式关联的函数。该函数的结果是整个 RE.match 调用的结果。

main.sml

structure Main =
  struct
    (**
     * RE is a module created by calling the module-level function (functor)
     * RegExpFn (Fn comes from functor), with two module arguments.
     *
     * The first argument, called P, is the syntax used to write regular
     * expressions in. In this particular case, it's the Awk syntax, which
     * is the only syntax provided by SML/NJ right now.
     *
     * The second argument, called E, is the RegExp engine used behind the
     * scenes to compile and execute the syntax. In this particular case
     * I've opted for ThompsonEngine, which implements Ken Thompson's
     * matching algorithm. Other options are BackTrackEngine and DfaEngine.
     *)
    structure RE = RegExpFn(
      structure P = AwkSyntax
      structure E = ThompsonEngine
      (* structure E = BackTrackEngine *)
      (* structure E = DfaEngine *)
    )

    fun main () =
      let
        (**
         * A list of (regexp, match function) pairs. The function called by
         * RE.match is the one associated with the regexp that matched.
         *
         * The match parameter is described here:
         *   http://www.smlnj.org/doc/smlnj-lib/Manual/match-tree.html
         *)
        val regexes = [
          ("[a-zA-Z]*",   fn match => ("1st", match)),
          ("[0-9]*",      fn match => ("2nd", match)),
          ("1tom|2jerry", fn match => ("3rd", match))
        ]
        val input = "7569"
      in
        (**
         * StringCvt.scanString will traverse the `input` string and apply
         * the result of `RE.match regexes` to each character in the string.
         *
         * It's sort of a streaming matching process. The end result, however,
         * depends on your implementation above, in the match functions.
         *)
        StringCvt.scanString (RE.match regexes) input
      end
  end

您现在可以在命令行中像这样使用它:

$ sml sources.cm
Standard ML of New Jersey v110.79 [built: Sun Jan  3 23:12:46 2016]
[scanning sources.cm]
[library $/regexp-lib.cm is stable]
[parsing (sources.cm):main.sml]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-BASIS/(basis.cm):basis-common.cm is stable]
- Main.main ();
[autoloading]
[autoloading done]
val it = SOME ("2nd",Match ({len=4,pos=0},[]))
  : (string * StringCvt.cs Main.RE.match) option

文档