如何在 SMLNJ 中使用正则表达式
How to use regex in SMLNJ
我想输入一个字符串,然后想看看它是否匹配某个正则表达式;如果不是,我想继续使用另一个正则表达式,直到我所有的正则表达式都用完为止。例如,假设我有以下 3 个正则表达式
- regex_1 = [a-zA-Z]*
- regex_2 = [0-9]*
- regex_3 = (1tom|2jerry)
现在假设所需的字符串是:
- val str_input="7569"
我想先用regex_1检查str_input;如果不匹配再用regex_2试试;如果不匹配最后再用regex_3试试.
问题是如何为此目的使用 SMLNJ。谢谢。
您可以使用 SML/NJ 提供的正则表达式库来实现您想要的。它的文档可以在这里找到:http://www.smlnj.org/doc/smlnj-lib/Manual/regexp-lib-part.html
作为入门的小示例,您需要执行以下操作。首先你需要告诉 SML/NJ 你想使用正则表达式库。您可以使用 .cm
文件(cm 来自 Compilation Manager,它是 SML/NJ 的 Makefile):
sources.cm
group is
$/basis.cm (* Load standard functions and modules. *)
$/regexp-lib.cm (* Load the regexp library. *)
main.sml (* Load our own source file. *)
现在我们可以使用正则表达式库了。不幸的是,它并不是很简单,因为它使用了仿函数和读取器,但基本上,您需要的是 RE.match
函数,它接受成对列表,其中第一个元素是正则表达式,第二个元素是正则表达式匹配时调用的函数。使用此对列表,RE.match
函数将遍历输入字符串,直到找到匹配项,此时它将调用与在该点匹配的正则表达式关联的函数。该函数的结果是整个 RE.match
调用的结果。
main.sml
structure Main =
struct
(**
* RE is a module created by calling the module-level function (functor)
* RegExpFn (Fn comes from functor), with two module arguments.
*
* The first argument, called P, is the syntax used to write regular
* expressions in. In this particular case, it's the Awk syntax, which
* is the only syntax provided by SML/NJ right now.
*
* The second argument, called E, is the RegExp engine used behind the
* scenes to compile and execute the syntax. In this particular case
* I've opted for ThompsonEngine, which implements Ken Thompson's
* matching algorithm. Other options are BackTrackEngine and DfaEngine.
*)
structure RE = RegExpFn(
structure P = AwkSyntax
structure E = ThompsonEngine
(* structure E = BackTrackEngine *)
(* structure E = DfaEngine *)
)
fun main () =
let
(**
* A list of (regexp, match function) pairs. The function called by
* RE.match is the one associated with the regexp that matched.
*
* The match parameter is described here:
* http://www.smlnj.org/doc/smlnj-lib/Manual/match-tree.html
*)
val regexes = [
("[a-zA-Z]*", fn match => ("1st", match)),
("[0-9]*", fn match => ("2nd", match)),
("1tom|2jerry", fn match => ("3rd", match))
]
val input = "7569"
in
(**
* StringCvt.scanString will traverse the `input` string and apply
* the result of `RE.match regexes` to each character in the string.
*
* It's sort of a streaming matching process. The end result, however,
* depends on your implementation above, in the match functions.
*)
StringCvt.scanString (RE.match regexes) input
end
end
您现在可以在命令行中像这样使用它:
$ sml sources.cm
Standard ML of New Jersey v110.79 [built: Sun Jan 3 23:12:46 2016]
[scanning sources.cm]
[library $/regexp-lib.cm is stable]
[parsing (sources.cm):main.sml]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-BASIS/(basis.cm):basis-common.cm is stable]
- Main.main ();
[autoloading]
[autoloading done]
val it = SOME ("2nd",Match ({len=4,pos=0},[]))
: (string * StringCvt.cs Main.RE.match) option
文档
我想输入一个字符串,然后想看看它是否匹配某个正则表达式;如果不是,我想继续使用另一个正则表达式,直到我所有的正则表达式都用完为止。例如,假设我有以下 3 个正则表达式
- regex_1 = [a-zA-Z]*
- regex_2 = [0-9]*
- regex_3 = (1tom|2jerry)
现在假设所需的字符串是:
- val str_input="7569"
我想先用regex_1检查str_input;如果不匹配再用regex_2试试;如果不匹配最后再用regex_3试试. 问题是如何为此目的使用 SMLNJ。谢谢。
您可以使用 SML/NJ 提供的正则表达式库来实现您想要的。它的文档可以在这里找到:http://www.smlnj.org/doc/smlnj-lib/Manual/regexp-lib-part.html
作为入门的小示例,您需要执行以下操作。首先你需要告诉 SML/NJ 你想使用正则表达式库。您可以使用 .cm
文件(cm 来自 Compilation Manager,它是 SML/NJ 的 Makefile):
sources.cm
group is
$/basis.cm (* Load standard functions and modules. *)
$/regexp-lib.cm (* Load the regexp library. *)
main.sml (* Load our own source file. *)
现在我们可以使用正则表达式库了。不幸的是,它并不是很简单,因为它使用了仿函数和读取器,但基本上,您需要的是 RE.match
函数,它接受成对列表,其中第一个元素是正则表达式,第二个元素是正则表达式匹配时调用的函数。使用此对列表,RE.match
函数将遍历输入字符串,直到找到匹配项,此时它将调用与在该点匹配的正则表达式关联的函数。该函数的结果是整个 RE.match
调用的结果。
main.sml
structure Main =
struct
(**
* RE is a module created by calling the module-level function (functor)
* RegExpFn (Fn comes from functor), with two module arguments.
*
* The first argument, called P, is the syntax used to write regular
* expressions in. In this particular case, it's the Awk syntax, which
* is the only syntax provided by SML/NJ right now.
*
* The second argument, called E, is the RegExp engine used behind the
* scenes to compile and execute the syntax. In this particular case
* I've opted for ThompsonEngine, which implements Ken Thompson's
* matching algorithm. Other options are BackTrackEngine and DfaEngine.
*)
structure RE = RegExpFn(
structure P = AwkSyntax
structure E = ThompsonEngine
(* structure E = BackTrackEngine *)
(* structure E = DfaEngine *)
)
fun main () =
let
(**
* A list of (regexp, match function) pairs. The function called by
* RE.match is the one associated with the regexp that matched.
*
* The match parameter is described here:
* http://www.smlnj.org/doc/smlnj-lib/Manual/match-tree.html
*)
val regexes = [
("[a-zA-Z]*", fn match => ("1st", match)),
("[0-9]*", fn match => ("2nd", match)),
("1tom|2jerry", fn match => ("3rd", match))
]
val input = "7569"
in
(**
* StringCvt.scanString will traverse the `input` string and apply
* the result of `RE.match regexes` to each character in the string.
*
* It's sort of a streaming matching process. The end result, however,
* depends on your implementation above, in the match functions.
*)
StringCvt.scanString (RE.match regexes) input
end
end
您现在可以在命令行中像这样使用它:
$ sml sources.cm
Standard ML of New Jersey v110.79 [built: Sun Jan 3 23:12:46 2016]
[scanning sources.cm]
[library $/regexp-lib.cm is stable]
[parsing (sources.cm):main.sml]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-BASIS/(basis.cm):basis-common.cm is stable]
- Main.main ();
[autoloading]
[autoloading done]
val it = SOME ("2nd",Match ({len=4,pos=0},[]))
: (string * StringCvt.cs Main.RE.match) option